distributed stream processing on fluentd / #fluentd

Distributed message stream processing on Fluentd

2012/02/04 Fluentd meetup in Japan

NHN Japan Corp. Web Service Business DivisionTAGOMORI Satoshi (@tagomoris)

12年2月4日土曜日

Working at NHN Japanwe are hiring!

What we are doing about logs with fluentd

data mining

reportingpage views, unique users,

traffic amount per page,

super large scale

'sed | grep | wc'like processes

What we are doing about logs with fluentd

What fluentd? (not Storm, Kafka or Flume?)

Ruby, Ruby, Ruby! (NOT Java!)we are working in lightweight language culture

easy to try, easy to patch

Plugin model architecture

Builtin TimeSlicedOutput mechanism

What I talk today

What we are trying with fluentd

How we did, and how we are doing now

What is distributed stream process topologies like?

What is important about stream processing

Implementation details

(appendix)

Architecture in last week's presentationサーバサーバサーバサーバサーバサーバ

serverサーバサーバサーバサーバサーバサーバ

server

deliver(scribed)

send data both archive servers and Fluentd workers (as stream)

archive server(scribed)大容量RAID

Hadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop Cluster

ShibHadoop HiveWeb Client

aggregation querieson demand

archive server(scribed)

Large volume RAIDdeliver server

(scribed)

Hadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterFluentd Cluster

convert logs as structured dataand write HDFS (as stream)

import past logs and converton demand (as batch)

Nowサーバサーバサーバサーバサーバサーバ

serverサーバサーバサーバサーバサーバサーバ

server

deliver(scribed)

archive server(scribed)大容量RAID

Hadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop Cluster

ShibHadoop HiveWeb Client

archive server(scribed)

Large volume RAIDdeliver server

(Fluentd)

Hadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterFluentd Cluster

Fluentd Watcher

Fluentd in production service

10 days

from 127 Web Servers

146 log streams

Scale of Fluentd processes

70,000 messages/sec

120 Mbps

(at peak time)

650 GB/day(non-blog: 100GB)

89 fluentd instances

12 nodes (4Core HT)

We can't go back.

crouton by kbysmnr12年2月4日土曜日

log conversion

from: raw log

(apache combined like format)

to: structured and query-friendly log

(TAB separated, masked some fields, many flags added)

log conversion

99.999.999.99 - - [03/Feb/2012:10:59:48 +0900] "GET /article/detail/6246245/ HTTP/1.1" 200 17509 "http://news.livedoor.com/topics/detail/6246245/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR

3.0.30729; Media Center PC 6.0; InfoPath.1; .NET4.0C)" "news.livedoor.com" "xxxxxxx.xx.xxxxxxx.xxx" "-" 163266

152930 news.livedoor.com /topics/detail/6242972/ GET 302 210 226 - 99.999.999.99 TQmljv9QtXkpNtCSuWVGGg Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X)

AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A406 Safari/7534.48.3 TRUE TRUE FALSE FALSE FALSE FALSE FALSE

hhmmdd vhost path method status bytes duration referer rhost userlabel agent FLAG [FLAGS]FLAGS: status_redirection status_errors rhost_internal suffix_miscfile suffix_imagefile agent_bot

FLAG: logical OR of FLAGSuserlabel: hash of (tracking cookie / terminal id (mobile phone) / rhost+agent)

TimeSlicedOutput of fluentd

Traditional 'log rotation' is important, but troublesome

We want:

2/3 23:59:59 log in access.0203_23.log

2/4 00:00:00 log in access.0204_00.log

How we did, and how we are doing now

collect

archive

convert

aggregate

How we did in past (2011)

collect (scribed)

archive (scribed)

convert (Hadoop Streaming)

aggregate (Hive)

streamstream

hourly/daily

on demand

HIGH LATENCYtime to flush +

hourly invocation +running time20-25mins

store to hdfs

How we are doing now

collect (Fluentd)

archive (scribed)

convert (Fluentd)

aggregate (Hive)

streamstream

on demand

stream convertVELY LOW LATENCY

2-3 minutes(only time to wait flush)

stream

store to hdfs(over Cloudera's Hoop)

break.crouton by kbysmnr

reasonable efficiency(compared with batch throughput)

ease to re-run same conversion as batch

None SPOF

ease to add/remove nodes

How to re-run conversion as batch when we got troubles?

We want to use 'just one' converter program for both stream processes

and batch processes!

Stream processing and batch

out_exec_filter (fluentd built-in plugin)

1. fork and exec 'command' program

2. write data to child process stdin as TAB separated fields specified by 'out_keys' (for tag, remove_prefix available)

3. read data from child process stdout as TAB separated fields named by 'in_keys' (for tag, add_prefix

available)

4. set message's timestamp by 'time_key' value in parsed data as format specified by 'time_format'

read from stdin / write to stdout

TAB separated values as input/output

WOW!!!!!!!

difference: 'tag' may be needed with out_exec_filter

simple solution: if not exists, ignore.

'out_exec_filter' and 'Hadoop Streaming'

reasonable efficiency(compared with batch throughput)

ease to re-run same conversion as batch

None SPOF

ease to add/remove nodes

What is distributed stream process toplogies like?

servers

deliver

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

Redundancy and load balancing MUST be guaranteed anywhere.

Deliver nodes

servers

deliver

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

Accept connections from web servers,Copy messages and send to:

1. archiver (and its backup)2. convert workers (w/ load balancing)

3. and ...

useful for casual worker append/remove12年2月4日土曜日

Worker nodes

servers

deliver

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

Under load balancing,workers as many as you want

Serializer nodes

servers

deliver

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

Receive converted data stream from workers,aggregate by services, and :

1. write to storage(hfds/hoop)2. and...

useful to reduce overhead of storage from many concurrent write operations

Watcher nodes

servers

deliver

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

watcherwatcher

Watching data for real-time workload repotings

and trouble notifications

1. for raw data from delivers2. for structured data from serializers

break.crouton by kbysmnr

Implementation details

log agents on servers (scribeline)

deliver (copy, in_scribe, out_scribe, out_forward)

worker (in/out_forward, out_exec_filter)

serializer/hooper (in/out_forward, out_hoop)

watcher (in_forward, out_flowcounter, out_growthforecast)

log agent: scribeline

log delivery agent tool, python 2.4, scribe/thrift

easy to setup and start/stopworks with any httpd configuration updates

works with logrotate-ed log filesautomatic delivery target failover/takeback

(NEW) Cluster support(random select from server list)

https://github.com/tagomoris/scribe_line

From scribeline To deliver

serversscribeline

deliver server (primary)category: blog

message: RAW LOG (Apache combined + α)

deliver server (secondary)

fluentd

in_scribe

scribe

From scribeline To deliverdeliver 01 (primary)

deliver 02 (secondary)

deliver 03 (primary for high throughput nodes)

x8 fluentdper node

xNN servers

From scribeline To deliver

serversscribeline

deliver server (primary)category: blog

message: RAW LOG (Apache combined + α)

deliver server (secondary)

fluentd

in_scribe

deliver node internal routing

deliver server (primary) x8 fluentd instancesdeliver fluentd

in_scribe add_prefix scribe remove_newline true

time: received_attag: scribe.blog

message: RAW LOG

copy scribe.*out_scribe host archive.server.local remove_prefix scribe add_newline true

category: blogmessage: RAW LOG

out_flowcounter (see later..)

roundrobin (see next)

out_forward (see later with out_flowcounter..)

deliver node: roundrobin strategy to workers

roundrobin x56 substore configurations (7workers x 8instances)

out_forward server: worker01 port 24211 secondary server: worker02 port 24211

message: RAW LOG

From deliver To worker

deliver serverdeliver fluentd

copy scribe.*roundrobin

out_forward

worker server Xworker fluentd Xn1

in_forward

message: RAW LOG

worker server Yworker fluentd Yn2

in_forward

worker node internal routing

worker server x8 worker instances, x1 serializer instanceworker fluentd

in_forwardserializer fluentd

out_forward converted.*

in_forwardout_exec_filter scribe.*command: convert.shin_keys: tag,messageremove_prefix scribeout_keys: .......add_prefix: convertedtime_key: timefieldtime_format: %Y%m%d%H%M%S

time:received_attag: scribe.blog

message: RAW LOG

time:written_timetag: converted.blog[many data fields]

out_hoop converted.bloghoop_server servername.localusernamepath /on_hdfs/%Y%m%d/blog-%H.log

out_hoop converted.newspath /on_hdfs/%Y%m%d/news-%H.log

TAB separatedtext data

out_exec_filter (review.)

1. fork and exec 'command' program

2. write data to child process stdin as TAB separated fields specified by 'out_keys' (for tag, remove_prefix available)

3. read data from child process stdout as TAB separated fields named by 'in_keys' (for tag, add_prefix

available)

4. set message's timestamp by 'time_key' value in parsed data as format specified by 'time_format'

worker fluentd

out_exec_filter behavior details

out_exec_filter scribe.*command: convert.sh in_keys: tag,message remove_prefix: scribe out_keys: ....... add_prefix: converted time_key: timefield time_format: %Y%m%d%H%M%S

message: RAW LOG

Forked Process (convert.sh -> perl convert.pl)stdin

blog RAWLOG

stdout

blog 20120204175035 field1 field2.....

time: 2012/02/04 17:50:35tag: converted.blog

path:... agent:...referer:... flag1:TRUE

From serializer To HDFS (Hoop)

worker serverserializer fluentd

in_forward

time:written_timetag: converted.blog[many data fields]

out_hoop converted.bloghoop_server servername.localusernamepath /on_hdfs/%Y%m%d/blog-%H.log

out_hoop converted.newspath /on_hdfs/%Y%m%d/news-%H.log

Hadoop NameNodeHoop Server

TAB separatedtext data

worker node cluster

deliver node clusterOverview

servers

deliver

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

にくきゅー。crouton by kbysmnr

Traffics: Bytes/sec (on deliver 2/3-4)

• bytes

Traffics: Messages/sec (on deliver 2/3-4)

• counts

Traffic/CPU/Load/Memory: deliver nodes (2/3-4)

Traffics: workers network traffics total

• total network traffics

Traffic/CPU/Load/Memory: a worker (2/3-4)

Fluentd stream processing

Finally, works fine, now

Log conversion latency dramatically reduced

Many useful plugins for monitoring are waiting shipped

Hundreds of cool features to implement are also waiting for us!

crouton by kbysmnr

Thank you!

Appendixcrouton by kbysmnr

input traffics: by fluent-plugin-flowcounter

deliver server (primary) x8 fluentd instancesdeliver fluentd

in_scribe add_prefix scribe remove_newline true

message: RAW LOG

copy scribe.*out_scribe host archive.server.local remove_prefix scribe add_newline true

category: blogmessage: RAW LOG

out_flowcounter (see later..)

roundrobin (see next)

out_forward (see later with out_flowcounter..)

bytes/messages counting on fluentd

1. 'out_flowcounter' counts input message and its size (specified fields) and its rate (/sec)

2. Counting results emitted per minute/hour/day

3. Worker fluentd sends results to 'Watcher' node over out_forward

4. Watcher receives counting results, and pass to 'out_growthforecast'.

'GrowthForecast' is graph drawing tool with REST API for data registration, by kazeburo

out_forward roundrobin is per buffer flushing !

(per buffer size, or flush_interval)

For high throughput stream,

this unit is too large.

We needs roundrobin per 'emit'.

Why not out_forward roundrobin in deliver?

deliver node: roundrobin strategy to workers

roundrobin x56 substore configurations (7workers x 8instances)

message: RAW LOG

out_forward roundrobin is per buffer flushing !

(per buffer size, or flush_interval)

For high throughput stream,

this unit is too large.

We needs roundrobin per 'emit'.

Why not out_forward roundrobin in deliver?

From worker To serializer: details

worker server x8 worker instances, x1 serializer instanceworker fluentd serializer fluentd

out_forward converted.*server: localhostsecondary: worker1, worker2, worker3, worker4 worker5, worker6, worker7

in_forward

normally send to localhost

in trouble, balance all traffic to all other worker's serializers

Software list:scribed: github.com/facebook/scribe/

scribeline: github.com/tagomoris/scribe_line

fluent-plugin-scribe: github.com/fluent/fluent-plugin-scribe

Hoop: http://cloudera.github.com/hoop/docs/latest/ServerSetup.html

fluent-plugin-hoop: github.com/fluent/fluent-plugin-hoop

GrowthForecast: github.com/kazeburo/growthforecast

fluent-plugin-growthforecast: github.com/tagomoris/fluent-plugin-growthforecast

fluent-plugin-flowcounter: github.com/tagomoris/fluent-plugin-flowcounter

distributed stream processing on fluentd / #fluentd

stream processes

batch processes

uentd log conversion

fluentd workers

scale of fluentd processesfrom

batch throughput ease

hadoop streamingrunning

fluentd instanceson12

Technology

fluentd meetup #2

fluentd in #tkrk10

"i want to use fluentd" fluentd casual talks lt

fluentd meetup #3

scalable distributed stream...

place*: a distributed spatio-temporal data stream ... ·...

scalable distributed stream processing - networks and mobile

streamcloud: an elastic parallel-distributed stream...

towards energy-efficient distributed data-stream...

distributed real-time stream processing: why and...

fluentd casual talks lt #fluentd #fluentdcasual

fluentd and php

それfluentdで! #fluentd

piwik fluentd

fluentd and docker - running fluentd within a docker...

fluentd 101

bigquery + fluentd

from a stream of relational queries to distributed stream

achieving per-stream qos with distributed airtime

apache flink – distributed stream processing