building resilient log aggregation pipeline with elasticsearch & kafka
TRANSCRIPT
![Page 1: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/1.jpg)
BuildingResilientLogAggregationPipeline
UsingElasticsearch andKafka
Rafał Kuć@Sematext Group,Inc.
![Page 2: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/2.jpg)
Sematext &I
LogseneSPM
logs
metrics
![Page 3: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/3.jpg)
Next30minutes…
Logshipping- buffers- protocols- parsing
Centralbuffering- Kafka- Redis
Storage&Analysis- Elasticsearch- Kibana- Grafana
![Page 4: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/4.jpg)
Logshippingarchitecture
File Shipper
File Shipper
File Shipper
CentralizedBuffer
ES ES ES
ES ES ES
ES ES ES
data
![Page 5: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/5.jpg)
Focus:Elasticsearch
File Shipper
File Shipper
File Shipper
CentralizedBuffer
ES ES ES
ES ES ES
ES ES ES
data
![Page 6: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/6.jpg)
Elasticsearchclusterarchitecture
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
![Page 7: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/7.jpg)
Dedicatedmastersplease
client
client
client
data
data
data
data
data
data
master
master
master
discovery.zen.minimum_master_nodes ->N/2+1mastereligiblenodes
ingest
ingest
ingest
![Page 8: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/8.jpg)
Onebigindexisano-go
Notscalableenoughfortimebaseddata
![Page 9: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/9.jpg)
Onebigindexisano-go
Indexingslowsdownwithtime
![Page 10: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/10.jpg)
Onebigindexisano-go
Expensivemerges
![Page 11: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/11.jpg)
Onebigindexisano-go
Delete byquery neededfordataretention
![Page 12: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/12.jpg)
Onebigindexisano-go
Notscalableenoughfortimebaseddata
Indexingslowsdownwithtime
Expensivemerges
Delete byquery neededfordataretention
![Page 13: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/13.jpg)
Dailyindicesareagoodstart
2016.11.18 2016.11.19 2016.11.22 2016.11.23...
Indexing isfaster forsmallerindices
Deletes arecheap
Search canbeperformedonindicesthatareneeded
Static indicesarecachefriendly
indexing
mostsearches
![Page 14: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/14.jpg)
Dailyindicesareagoodstart
2016.11.18 2016.11.19 2016.11.22 2016.11.23...
Indexing isfaster forsmallerindices
Deletes arecheap
Search canbeperformedonindicesthatareneeded
Static indicesarecachefriendly
indexing
mostsearches
Wedelete wholeindices
![Page 15: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/15.jpg)
Dailyindicesaresub-optimal
black
friday
saturdaysunday
loadisnoteven
![Page 16: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/16.jpg)
Sizebasedindicesareoptimal
sizelimitforindices
logs_01
indexing
around5– 10GBpershardonAWS
![Page 17: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/17.jpg)
Sizebasedindicesareoptimal
sizelimitforindices
logs_01
indexing
around5– 10GBpershardonAWS
![Page 18: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/18.jpg)
Sizebasedindicesareoptimal
sizelimitforindices
logs_01
indexing
logs_02
around5– 10GBpershardonAWS
![Page 19: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/19.jpg)
Sizebasedindicesareoptimal
sizelimitforindices
logs_01
indexing
logs_02
around5– 10GBpershardonAWS
![Page 20: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/20.jpg)
Sizebasedindicesareoptimal
sizelimitforindices
logs_01 logs_02
indexing
logs_N...
around5– 10GBpershardonAWS
![Page 21: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/21.jpg)
Sliceusingsize
Predictable searchingandindexingperformance
Better indicesbalancing
Fewershards
Easier handling ofspikyloads
Lesscostsbecauseofbetter hardwareutilization
![Page 22: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/22.jpg)
ProperElasticsearchconfiguration
Keepindex.refresh_interval atmaximumpossiblevalue1sec->100%,5sec->125%,30sec-> 175%
Youcanloosen upmerges- possiblebecauseofheavyaggregationuse- segments_per_tier ->higher-max_merge_at_once->higher-max_merged_segment ->lower
Allprefixedwithindex.merge.policy
} higherindexingthroughput
![Page 23: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/23.jpg)
ProperElasticsearchconfiguration
Index onlyneededfields
Usedocvalues
Donotindex_source
Donotstore_all
![Page 24: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/24.jpg)
Optimizationtime
Wecanoptimize datanodesfortimebaseddata
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
![Page 25: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/25.jpg)
Hot– coldarchitecture
EShot EScold EScold
-Dnode.attr.tag=hot -Dnode.attr.tag=cold -Dnode.attr.tag=cold
![Page 26: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/26.jpg)
Hot– coldarchitecture
logs_2016.11.22
EShot EScold EScold
-Dnode.attr.tag=hot -Dnode.attr.tag=cold -Dnode.attr.tag=cold
curl-XPUTlocalhost:9200/logs_2016.11.22 -d'{"settings":{"index.routing.allocation.exclude.tag":"cold","index.routing.allocation.include.tag":"hot"}}'
![Page 27: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/27.jpg)
Hot– coldarchitecture
logs_2016.11.22
EShot EScold EScold
indexing
![Page 28: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/28.jpg)
Hot– coldarchitecture
logs_2016.11.22logs_2016.11.23
EShot EScold EScold
indexing
![Page 29: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/29.jpg)
Hot– coldarchitecture
logs_2016.11.22logs_2016.11.23
EShot EScold EScold
indexing
moveindexafterdayends
curl-XPUTlocalhost:9200/logs_2016.11.22/_settings-d'{"index.routing.allocation.exclude.tag":"hot","index.routing.allocation.include.tag”:"cold"
}'
![Page 30: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/30.jpg)
Hot– coldarchitecture
logs_2016.11.23 logs_2016.11.22
EShot EScold EScold
indexing
![Page 31: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/31.jpg)
Hot– coldarchitecture
logs_2016.11.23logs_2016.11.24 logs_2016.11.22
EShot EScold EScold
indexing
![Page 32: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/32.jpg)
Hot– coldarchitecture
logs_2016.11.23logs_2016.11.24 logs_2016.11.22
EShot EScold EScold
indexing
moveindexafterdayends
![Page 33: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/33.jpg)
Hot– coldarchitecture
logs_2016.11.24 logs_2016.11.22 logs_2016.11.23
EShot EScold EScold
indexing
![Page 34: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/34.jpg)
Hot– coldarchitecture
HotESTier
GoodCPULotsofI/O
ColdESTier
MemoryboundDecentI/O
EScold
ColdESTier
MemoryboundDecentI/O
![Page 35: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/35.jpg)
Hot– coldarchitecturesummary
EScold
Optimizecosts – differenthardwarefordifferenttier
Performance – usecaseoptimizedhardware
Isolation – longrunningsearchesdon’taffectindexing
![Page 36: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/36.jpg)
Elasticsearchclient nodeneeds
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
![Page 37: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/37.jpg)
Elasticsearchclient nodeneeds
Nodata=noIOPS
Largequerythroughput=highCPUusage
Lotsofresults=highmemory usage
Lotsofconcurrentqueries=higherresources utilization
![Page 38: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/38.jpg)
Elasticsearchingest nodeneeds
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
![Page 39: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/39.jpg)
Elasticsearchingestnodeneeds
Nodata=noIOPS
Largeindexthroughput=highCPU&memoryusage
Complicatedrules=highCPUusage
Largerdocuments=moreresources utilization
![Page 40: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/40.jpg)
Elasticsearchmaster nodeneeds
client
client
client
data
data
data
data
data
data
master
master
master
ingest
ingest
ingest
![Page 41: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/41.jpg)
Elasticsearchingestnodeneeds
Nodata=noIOPS
Largenumberofindices=highCPU&memoryusage
Complicatedmappings=highmemoryusage
Dailyindices=spikesinresources utilization
![Page 42: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/42.jpg)
Focus:CentralizedBuffer
File Shipper
File Shipper
File Shipper
CentralizedBuffer
ES ES ES
ES ES ES
ES ES ES
data
![Page 43: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/43.jpg)
WhyApacheKafka?
Fast &easytouse
Easytoscale
Faulttolerantandhighlyavailable
Supportsstreaming
Worksinpublish/subscribemode
![Page 44: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/44.jpg)
Kafkaarchitecture
ZooKeeper
ZooKeeper
ZooKeeper
Kafka
Kafka
KafkaKafka
![Page 45: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/45.jpg)
Kafka&topics
security_logs access_logs
app1_logs app2_logs
Kafkastoresdatain topics
writtenondisk
![Page 46: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/46.jpg)
Kafka&topics&partitions&replicas
logspartition2
logspartition1
logspartition3
logspartition4
logsreplicapartition2
logsreplicapartition1
logsreplicapartition3
logsreplicapartition4
![Page 47: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/47.jpg)
ScalingKafka
logspartition1
![Page 48: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/48.jpg)
ScalingKafka
logspartition1
logspartition2
logspartition3
logspartition4
![Page 49: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/49.jpg)
ScalingKafka
logspartition1
logspartition2
logspartition3
logspartition4
logspartition5
logspartition6
logspartition7
logspartition8
logspartition9
logspartition10
logspartition11
logspartition12
logspartition13
logspartition14
logspartition15
logspartition16
![Page 50: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/50.jpg)
ThingstorememberwhenusingKafka
Scales byaddingmorepartitions notthreads
ThemoreIOPS thebetter
Keepthe#ofconsumersequalto#ofpartitions
Replicas usedforHA andFT only
Offsets storedperconsumer– multipledestinationseasilypossible
![Page 51: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/51.jpg)
Focus:Shipper
File Shipper
File Shipper
File Shipper
CentralizedBuffer
ES ES ES
ES ES ES
ES ES ES
data
![Page 52: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/52.jpg)
Whatabouttheshipper?
logs
CentralizedBuffer
Whichshippertouse?
Whichprotocol shouldbeused
Whataboutthebuffering
LogtoJSON orparse andhow
![Page 53: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/53.jpg)
Buffers
performance & availability
batches&threads whencentralbufferisgone
![Page 54: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/54.jpg)
Buffertypes
Disk ||memory ||combinedhybrid approachOnsource||centralized
App
Buffer
App
Buffer
fileorlocallogshipper
easyscaling– fewermovingpartsoftenwiththeuseoflightweightshipper
App
App
Kafka /Redis /Logstash /etc…
oneplaceforallchangesextrafeaturesmadeeasy(likeTTL)
ES
ES
![Page 55: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/55.jpg)
BuffersSummary
Simple Reliable
App
Buffer
App
Buffer
ES
App
App
ES
![Page 56: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/56.jpg)
Protocols
UDP– fast,coolfortheapplication,notreliableTCP – reliable(almost) applicationgetsACK whenwritten tobuffer
Application levelACKsmaybeneeded
HTTP
RELP
Beats
Kafka
Logstash,rsyslog,Fluentd
Logstash,rsyslog
Logstash,Filebeat
Logstash,rsyslog,Filebeat,Fluentd
![Page 57: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/57.jpg)
Choosingtheshipper
application
rsyslog Elasticsearchhttp
socket
memory&diskassistedqueues
![Page 58: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/58.jpg)
Choosingtheshipper
application
rsyslog Elasticsearchhttp
socket
memory&diskassistedqueues
application
filersyslogfilebeat
consumer
![Page 59: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/59.jpg)
WhataboutOS?
SayNO toswapSettherightdiskscheduler
CFQ forspinningdisksdeadline forSSD
Usepropermount optionsforext4noatimenodirtimedata=writeback,nobarier
ForbaremetalcheckCPUgovernordisabletransparenthugepages
/proc/sys/vm/nr_hugepages=0
![Page 60: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/60.jpg)
Weareengineers!
Wedevelop DevOpstools!
WeareDevOps people!
Wedofunstuff;)http://sematext.com/jobs
![Page 61: Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022021507/586e8cd01a28aba0038b85fd/html5/thumbnails/61.jpg)
Thankyouforlistening!Getintouch!
Rafał[email protected]@kucrafal
http://sematext.com@sematext http://sematext.com/jobs
Cometalktousatthebooth