the patterns of distributed logging and containers

43
THE PATTERNS OF DISTRIBUTED LOGGING AND CONTAINERS CloudNativeCon Europe 2017 March 30, 2017 Satoshi Tagomori (@tagomoris) Treasure Data, Inc.

Upload: satoshi-tagomori

Post on 13-Apr-2017

1.337 views

Category:

Software


1 download

TRANSCRIPT

Page 1: The Patterns of Distributed Logging and Containers

THE PATTERNS OF DISTRIBUTED LOGGING AND CONTAINERS

CloudNativeCon Europe 2017 March 30, 2017

Satoshi Tagomori (@tagomoris) Treasure Data, Inc.

Page 2: The Patterns of Distributed Logging and Containers

SATOSHI TAGOMORI(@tagomoris)

Fluentd, MessagePack-Ruby, Norikra, ...

Treasure Data, Inc.

Page 3: The Patterns of Distributed Logging and Containers
Page 4: The Patterns of Distributed Logging and Containers
Page 5: The Patterns of Distributed Logging and Containers

1. Microservices, Containers and Logging 2. Scaling Logging Platform 3. Patterns: Source/Destination -side Aggregation 4. Patterns: Scaling Up/Out Destination 5. Practices

Page 6: The Patterns of Distributed Logging and Containers

MICROSERVICES, CONTAINERS AND LOGGING

Page 7: The Patterns of Distributed Logging and Containers

Logging in Industries

• Service Logs

• Web access logs

• Ad logs

• Commercial transaction logs for analytics (EC, Game, ...)

• System Logs

• Syslog and other OS logs

• Audit logs

• Performance metrics

Logs forBusiness Growth

Logs forService Stability

Page 8: The Patterns of Distributed Logging and Containers

Microservices and Logging

Users

LAMP/Rails/MEAN/... Apps

Logs

Users

Search

Logs

Recommendation Shopping cart Reviews Ads ...

Monolithic service Microservices

Page 9: The Patterns of Distributed Logging and Containers

Microservices and Containers

• Microservices

• Isolated dependencies

• Agile deployment

• Containers

• Isolated environments & resources

• Simple pull&restart deployment

• Less overhead, high density

Page 10: The Patterns of Distributed Logging and Containers

Logging Challenges with Microservices/Containers

• Containerization changes everything:

• No permanent storages

• No fixed physical/network addresses

• No fixed mapping between servers and roles

Page 11: The Patterns of Distributed Logging and Containers

Logging Challenges with Microservices/Containers

• Containerization changes everything:

• No permanent storages

• No fixed physical/network addresses

• No fixed mapping between servers and roles

Transfer Logs to Anywhere ASAP

Page 12: The Patterns of Distributed Logging and Containers

Logging Challenges with Microservices/Containers

• Containerization changes everything:

• No permanent storages

• No fixed physical/network addresses

• No fixed mapping between servers and roles

Push Logs From Containers

Page 13: The Patterns of Distributed Logging and Containers

Logging Challenges with Microservices/Containers

• Containerization changes everything:

• No permanent storages

• No fixed physical/network addresses

• No fixed mapping between servers and rolesLabel Logs With Service Names/Tags

Page 14: The Patterns of Distributed Logging and Containers

Logging Challenges with Microservices/Containers

• Containerization changes everything:

• No permanent storages

• No fixed physical/network addresses

• No fixed mapping between servers and rolesLabel Logs With Service Names/Tags

Parse Logs & Label Values At Source

Structured Logs

Page 15: The Patterns of Distributed Logging and Containers

Structured Logs: tag, time, key-value pairsOriginal log:

the customer put an item to cart: item_id=101, items=10, client=web

Structured log:ec_service.shopping_cart

2017-03-30 16:35:37 +0100

{"container_id": "bfdd5b9....","container_name": "/infallible_mayer","source": "stdout","event": "put an item to cart","item_id": 101,"items": 10,"client": "web"}

tag

timestamp

record

Page 16: The Patterns of Distributed Logging and Containers

How to Ship Logs from Docker Containers

nginx, mysql, ....

log files

agents

read files, parse plain texts

apps, middleware

json log files

agents

read files, parse json lines

applications

agents

just receive transferred logs

apps, middleware

agents

just receive transferred logs

Using mounted volume

Using container json logs

Sending logs to agents directly

Using logging drivers

+ disk I/O penalty+ mount points + disk I/O penalty + logger code

+ agent config 😃

Page 17: The Patterns of Distributed Logging and Containers

SCALING LOGGING PLATFORM

Page 18: The Patterns of Distributed Logging and Containers

Core Architecture: Distributed Logging

Source (Container + Agent)

Transferring/Aggregation layer

Destination (Storage, Database, Service)

Page 19: The Patterns of Distributed Logging and Containers

Distributed Logging Workflow

• Retrieve raw logs: file system / network • Parse log contentCollector

Aggregator

Destination

• Get data from multiple sources • Split/merge incoming data into streams

• Retrieve structured logs from Aggregator • Store formatted logs

Page 20: The Patterns of Distributed Logging and Containers

Scaling Logging• Network Traffic

• Split heavy log traffic into traffics to nodes

• CPU Load

• Distribute processing to nodes about parsing/formatting logs

• High Availability

• Switch traffic from a node to another for failures

• Agility

• Reconfigure whole logging layer to modify destinations

Page 21: The Patterns of Distributed Logging and Containers

PATTERNS: SOURCE/DESTINATION -SIDE AGGREGATION

Page 22: The Patterns of Distributed Logging and Containers

Source Side Aggregation

Destination Side

Aggregation

NO

YES

YESNO

Page 23: The Patterns of Distributed Logging and Containers

Now I'm Talking About:

Source

Transferring Aggregation

Destination

Source Side

Destination Side

Page 24: The Patterns of Distributed Logging and Containers

Source-side Aggregation Patterns

Without Source-side Aggregation

With Source-side Aggregation

Collector

Destination-side

AggregateContainer

Page 25: The Patterns of Distributed Logging and Containers

Aggregation Pattern without Source-side Aggregation

• Pros:

• Simple configuration

• Cons:

• Fixed aggregator (destination endpoint) addressconfigured in containers

• Many network connections

• High load in aggregator / destination

Page 26: The Patterns of Distributed Logging and Containers

Aggregation Pattern with Source-side Aggregation

• Pros:

• Less connections

• Lower load in aggregator / destination

• Less configurations in containers

• More agility(aggregate containers can be reconfigured)

• Cons:

• Need more resources (+1 container per host)

AggregateContainer

Page 27: The Patterns of Distributed Logging and Containers

Destination-side Aggregation Patterns

Without Destination-side Aggregation

With Destination-side Aggregation

AggregatorNode

Source-side

Destination

Page 28: The Patterns of Distributed Logging and Containers

Aggregation Pattern without Destination-side Aggregation

• Pros:

• Less nodes

• Simpler configuration

• Cons:

• Destination changes affects all source nodes

• Worse performance: many small write requests on destination(storage)

Page 29: The Patterns of Distributed Logging and Containers

• Pros:

• Destination changes does NOT affect source nodes

• Better performance: destination aggregator can merge write operations

• Cons:

• More nodes

• More complex configuration

Aggregation Pattern with Destination-side Aggregation

AggregatorNode

Page 30: The Patterns of Distributed Logging and Containers

PATTERNS: SCALING UP/OUT DESTINATION

Page 31: The Patterns of Distributed Logging and Containers

HOW TO SCALE HERE

Source

Transferring Aggregation

Destination

Now I'm Talking About:

Page 32: The Patterns of Distributed Logging and Containers

Scaling Destination Patterns

Scaling Up Aggregator/Destination Endpoints

Scaling Out Aggregator/Destination Endpoints

Destination-sideAggregator

orDestination

Load balancerorHuge queue

Backendnodes

Collectornodes

Using HTTP Load Balancer or Huge Queues Using Round Robin Clients

Page 33: The Patterns of Distributed Logging and Containers

• Pros:

• Simple configuration: specifying load balancer only in collector nodes

• Cons:

• Upper limits about scaling up on Load balancer (or queue)

Scaling Up Destination

Backendnodes

Load balanceror

Huge queue

Page 34: The Patterns of Distributed Logging and Containers

Scaling Out Destination

• Pros:

• Unlimited scaling by adding nodes

• Cons:

• Complex configuration in collector nodes

• Client feature required for round-robin

• Unavailable for traffic over Internet

Page 35: The Patterns of Distributed Logging and Containers

Destination-side Aggregation and Destination ScalingDestination Side Aggregation

Scaling Up Destination Endpoints

YESNO

Scaling Out Destination Endpoints

Early Stage SystemsCollect Logs over

Internetor

Using Queues

Collect Logsin Data Center

All Collector Nodes Must Know All Destination Nodes

↓Uncontrollable

Page 36: The Patterns of Distributed Logging and Containers

PRACTICES

Page 37: The Patterns of Distributed Logging and Containers

Practices: Docker + Fluentd

• Docker Fluentd Logging Driver

• Docker containers can send these logs to Fluentd directly, with less overhead

• Fluentd's Pluggable Architecture

• Various destination systems (storage/database/service) are available by changing configuration

• Small Memory Footprint

• Source aggregation requires +1 container per hosts: less additional resource usage is fine!

Page 38: The Patterns of Distributed Logging and Containers

Practice 1: Source-side Aggregation + Scaling Up

• Kubernetes: Fluentd + Elasticsearch

• a.k.a EFK stack (inspired by ELK stack)

• Elasticsearch - Fluentd - Kibana

https://kubernetes.io/docs/tasks/debug-application-cluster/logging-elasticsearch-kibana/

apps, middleware

json log files

Page 39: The Patterns of Distributed Logging and Containers

Practice 2: Source-side Aggregation + Scaling Up

• Containerized Applications

• w/ Google Stackdriver for Monitoring

• w/ Treasure Data for Analytics

Google Stackdriver Logging

apps, middleware

Page 40: The Patterns of Distributed Logging and Containers

Practice 3: Source/Destination-side Aggregation + Scaling Out

• Containerized Application

• w/ Log processing on Hadoop

• writing files on HDFS via WebHDFS

• Hadoop HDFS prefers large files on HDFS:

• Destination-side aggregation works well

apps, middleware

Page 41: The Patterns of Distributed Logging and Containers

Practice 4: Source/Destination-side Aggregation + Scaling Out

• Containerized Application

• w/ Log processing on Google BigQuery

• putting logs via HTTPS

• BigQuery has quota about write requests:

• Destination-side aggregation works well

apps, middleware

Page 42: The Patterns of Distributed Logging and Containers

Best practices?

• Source aggregation: do it

• it makes app containers free from logging problems (buffering, HA, ...)

• Destination aggregation: it depends

• no need for cloud logging services/storages

• may need for self-hosted distributed filesystems/databases

• may need for cloud services which charges per requests

• Destination scaling: it depends on destinations

Page 43: The Patterns of Distributed Logging and Containers

Make Logging Scalable, Service Stable & Business Growing.

Happy Logging! @tagomoris