distributed logging architecture in the container era

Distributed Logging Architecture in Container Era

LinuxCon Japan 2016 at Jun 13 2016

Satoshi "Moris" Tagomori (@tagomoris)

Satoshi "Moris" Tagomori (@tagomoris)

Fluentd, MessagePack-Ruby, Norikra, ...

Treasure Data, Inc.

http://www.linuxfoundation.org/news-media/announcements/2016/06/chaosuan-crunchy-data-qbox-storageos-and-treasure-data-join-cloud

Topics

• Microservices and logging in various industries

• Difficulties of logging with containers

• Distributed logging architecture

• Patterns of distributed logging architecture

• Case Study: Docker and Fluentd

Logging

Logging in Various Industries

• Web access logs • Views/visitors on media • Views/clicks on Ads

• Commercial transactions (EC, Game, ...)

• Data from devices • Operation logs on Apps of phones • Various sensor data

Microservices and Logging

• Monolithic service • a service produces all data

about an user's behavior

• Microservices • many services produce data

about an user's access • it's needed to collect logs

from many services to know what is happening

Users

Service (Application)

Logs

Users

Logs

Logging and Containers

Containers: "a must" for microservices

• Dividing a service into services • a service requires less computing resources

(VM -> containers)

• Making services independent from each other • but it is very difficult :( • some dependency must be solved even in

development environment(containers on desktop)

Redesign Logging: Why?• No permanent storages

• No fixed physical/network address

• No fixed mapping between servers and roles

• We should parse/label logs at the source, ship these logs by pushing to destination ASAP

Containers: immutable & disposable

• No permanent storages

• Where to write logs? • files in the container

→ gone w/ container instance 😞 • directories shared from hosts

→ hosts are shared by many containers/services ☹

• TODO: ship logs from container to anywhere ASAP

Containers: unfixed addresses

• No fixed physical / network address

• Where should we go to fetch logs? • Service discovery (e.g., consul)

→ one more component 😞 • rsync? ssh+tail? or ..? Is it installed in containers?

→ one more tool to depend on ☹

• TODO: push logs to anywhere from containers

Containers: instances per roles

• No fixed mapping between servers and roles

• How can we parse / store these logs? • Central repository about log syntax

→ very hard to maintain 😞 • Label logs by source address

→ many containers/roles in a host ☹

• TODO: label & parse logs at source of logs

Distributed Logging Architecture

Core Architecture

• Collector nodes

• Aggregator nodes

• Destinations

Collector nodes(Docker containers + agent)

Destinations (Storage, Database, ...)

Aggregator nodes

• Parse/Label (collector) • Raw logs are not good for processing • Convert logs to structured data (key-value pairs)

• Split/Sort (aggregator) • Mixed logs are not good for searching • Split whole data stream into streams per services

• Store (destination) • Format logs(records) as destination expects

Collecting and Storing Data

Scaling Logging• Network traffic

• CPU load to parse / format • Parse logs on each collector (distributed) • Format logs on aggregator (to be distributed)

• Capability • Make aggregators redundant

• Controlling delay • to make sure when we can know what's happening in our

systems

Patterns

source aggregationNO

source aggregationYES

destinationaggregation

NO

destinationaggregation

YES

Aggregation Patterns

Source Side Aggregation Patterns

w/o source aggregation w/ source aggregation

collector

aggregator /

destination

aggregate container

Without Source Aggregation

• Pros: • Simple configuration

• Cons: • fixed aggregator (endpoint) address • many network connections • high load in aggregator

collector

aggregator

With Source Aggregation

• Pros: • less connections • lower load in aggregator • less configuration in containers

(by specifying localhost) • highly flexible configuration

(by deployment only of aggregate containers)

• Cons: • a bit much resource (+1 container per host)

aggregate container

aggregator

Destination Side Aggregation Patterns

w/o destination aggregation w/ destination aggregation

aggregator

collector

destination

Without Destination Aggregation

• Pros: • Less nodes • Simpler configuration

• Cons: • Storage side change affects collector side • Worse performance: many small write requests

on storage

With Destination Aggregation

• Pros: • Collector side configuration is

free from storage side changes • Better performance with fine tune

on destination side aggregator

• Cons: • More nodes • A bit complex configuration

aggregator

Scaling PatternsScaling Up Endpoints

HTTP/TCP load balancer Huge queue + workers

Scaling Out Endpoints Round-robin clients

Load balancer

Backend nodes

Collector nodes

Aggregator nodes

Scaling Up Endpoints

• Pros: • Simple configuration

in collector nodes

• Cons: • Limits about scaling up

Load balancer

Backend nodes

Scaling Out Endpoints

• Pros: • Unlimited scaling

by adding aggregator nodes

• Cons: • Complex configuration • Client features for round-robin

WithoutDestination Aggregation

WithDestination Aggregation

Scaling UpEndpoints Systems in early stages

Collecting logs over Internet

or

Using queues

Scaling OutEndpoints

Impossible :(

Collector nodes must knowall endpoints

↓Uncontrollable

Collecting logsin datacenter

Case Studies

Case Study: Docker+Fluentd

• Destination aggregation + scaling up • Fluent logger + Fluentd

• Source aggregation + scaling up • Docker json logger + Fluentd + Elasticsearch • Docker fluentd logger + Fluentd + Kafka

• Source/Destination aggregation + scaling out • Docker fluentd logger + Fluentd

Why Fluentd?• Docker Fluentd logging driver

• Docker containers can send logs to Fluentd directly - less overhead

• Pluggable architecture • Various destination systems

• Small memory footprint • Source aggregation requires +1 container per host • Less additional resource usage ( < 100MB )

Destination aggregation + scaling up

• Sending logs directly over TCP by Fluentd logger library in application code

• Same with patterns of New Relic

• Easy to implement - good for startups Application code

Source aggregation + scaling up

• Kubernetes: Json logger + Fluentd + Elasticsearch

• Applications write logs to STDOUT

• Docker writes logs as JSON in files

• Fluentd reads logs from file parse JSON objects writes logs to Elasticsearch

• EFK stack (like ELK stack)

http://kubernetes.io/docs/getting-started-guides/logging-elasticsearch/

Elasticsearch

Application code

Files (JSON)

Source aggregation + scaling up/out• Docker fluentd logging driver + Fluentd + Kafka


• Docker sends logs to localhost Fluentd

• Fluentd gets logs over TCP pushes logs into Kafka

• Highly scalable & less overhead - very good for huge deployment

Kafka

Application code

Application code

Source/Destination aggregation + scaling out

• Docker fluentd logging driver + Fluentd


• Docker sends logs to localhost Fluentd

• Fluentd gets logs over TCP sends logs into Aggregator Fluentd w/ round-robin load balance

• Highly flexible- good for complex data processing requirements Any other storages

What's the Best?• Writing logs from containers: Some way to do it

• Docker logging driver • Write logs on files + read/parse it • Send logs from apps directly

• Make the platform scalable! • Source aggregation: Fluentd on localhost • Scalable storage: (Kafka, external services, ...)

• No destination aggregation + Scaling up • Non-scalable storage: (Filesystems, RDBMSs, ...)

• Destination aggregation + Scaling out

Why OSS Are Important For Logging?

Why OSS?

• Logging layer is interface • transparency • interoperability

• Keep the platform scalable • number of nodes • number of types of source/destination

Use OSS, Make Logging Scalable

Thank you!

distributed logging architecture in the container era

Data & Analytics