distributed stream processing on fluentd / #fluentd

Post on 10-May-2015

9.904 Views

Category:

Technology

10 Downloads

Preview:

Click to see full reader

DESCRIPTION

at Fluentd meetup in Japan 2012/02/04

TRANSCRIPT

Distributed message stream processing on Fluentd

2012/02/04 Fluentd meetup in Japan

NHN Japan Corp. Web Service Business DivisionTAGOMORI Satoshi (@tagomoris)

12年2月4日土曜日

12年2月4日土曜日

Working at NHN Japanwe are hiring!

12年2月4日土曜日

What we are doing about logs with fluentd

data mining

reportingpage views, unique users,

traffic amount per page,

...

12年2月4日土曜日

super large scale

'sed | grep | wc'like processes

What we are doing about logs with fluentd

12年2月4日土曜日

What fluentd? (not Storm, Kafka or Flume?)

Ruby, Ruby, Ruby! (NOT Java!)we are working in lightweight language culture

easy to try, easy to patch

Plugin model architecture

Builtin TimeSlicedOutput mechanism

12年2月4日土曜日

What I talk today

What we are trying with fluentd

How we did, and how we are doing now

What is distributed stream process topologies like?

What is important about stream processing

Implementation details

(appendix)

12年2月4日土曜日

Architecture in last week's presentationサーバサーバサーバサーバサーバサーバ

serverサーバサーバサーバサーバサーバサーバ

server

deliver(scribed)

send data both archive servers and Fluentd workers (as stream)

archive server(scribed)大容量RAID

Hadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop Cluster

ShibHadoop HiveWeb Client

aggregation querieson demand

archive server(scribed)

Large volume RAIDdeliver server

(scribed)

Hadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterFluentd Cluster

convert logs as structured dataand write HDFS (as stream)

import past logs and converton demand (as batch)

12年2月4日土曜日

Nowサーバサーバサーバサーバサーバサーバ

serverサーバサーバサーバサーバサーバサーバ

server

deliver(scribed)

archive server(scribed)大容量RAID

Hadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop Cluster

ShibHadoop HiveWeb Client

archive server(scribed)

Large volume RAIDdeliver server

(Fluentd)

Hadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterHadoop ClusterFluentd Cluster

Fluentd Watcher

12年2月4日土曜日

Fluentd in production service

10 days

12年2月4日土曜日

from 127 Web Servers

146 log streams

Scale of Fluentd processes

12年2月4日土曜日

Scale of Fluentd processes

70,000 messages/sec

120 Mbps

(at peak time)

12年2月4日土曜日

650 GB/day(non-blog: 100GB)

Scale of Fluentd processes

12年2月4日土曜日

89 fluentd instances

on

12 nodes (4Core HT)

Scale of Fluentd processes

12年2月4日土曜日

We can't go back.

crouton by kbysmnr12年2月4日土曜日

log conversion

from: raw log

(apache combined like format)

to: structured and query-friendly log

(TAB separated, masked some fields, many flags added)

What we are trying with fluentd

12年2月4日土曜日

log conversion

What we are trying with fluentd

99.999.999.99 - - [03/Feb/2012:10:59:48 +0900] "GET /article/detail/6246245/ HTTP/1.1" 200 17509 "http://news.livedoor.com/topics/detail/6246245/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR

3.0.30729; Media Center PC 6.0; InfoPath.1; .NET4.0C)" "news.livedoor.com" "xxxxxxx.xx.xxxxxxx.xxx" "-" 163266

152930 news.livedoor.com /topics/detail/6242972/ GET 302 210 226 - 99.999.999.99 TQmljv9QtXkpNtCSuWVGGg Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X)

AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A406 Safari/7534.48.3 TRUE TRUE FALSE FALSE FALSE FALSE FALSE

hhmmdd vhost path method status bytes duration referer rhost userlabel agent FLAG [FLAGS]FLAGS: status_redirection status_errors rhost_internal suffix_miscfile suffix_imagefile agent_bot

FLAG: logical OR of FLAGSuserlabel: hash of (tracking cookie / terminal id (mobile phone) / rhost+agent)

12年2月4日土曜日

TimeSlicedOutput of fluentd

Traditional 'log rotation' is important, but troublesome

We want:

2/3 23:59:59 log in access.0203_23.log

2/4 00:00:00 log in access.0204_00.log

What we are trying with fluentd

12年2月4日土曜日

How we did, and how we are doing now

collect

archive

convert

aggregate

show

12年2月4日土曜日

How we did in past (2011)

collect (scribed)

archive (scribed)

convert (Hadoop Streaming)

aggregate (Hive)

show

streamstream

hourly/daily

on demand

on demand

HIGH LATENCYtime to flush +

hourly invocation +running time20-25mins

store to hdfs

12年2月4日土曜日

How we are doing now

collect (Fluentd)

archive (scribed)

convert (Fluentd)

aggregate (Hive)

show

streamstream

on demand

on demand

stream convertVELY LOW LATENCY

2-3 minutes(only time to wait flush)

stream

store to hdfs(over Cloudera's Hoop)

12年2月4日土曜日

break.crouton by kbysmnr

12年2月4日土曜日

reasonable efficiency(compared with batch throughput)

ease to re-run same conversion as batch

None SPOF

ease to add/remove nodes

What is important about stream processing

12年2月4日土曜日

How to re-run conversion as batch when we got troubles?

We want to use 'just one' converter program for both stream processes

and batch processes!

Stream processing and batch

12年2月4日土曜日

out_exec_filter (fluentd built-in plugin)

1. fork and exec 'command' program

2. write data to child process stdin as TAB separated fields specified by 'out_keys' (for tag, remove_prefix available)

3. read data from child process stdout as TAB separated fields named by 'in_keys' (for tag, add_prefix

available)

4. set message's timestamp by 'time_key' value in parsed data as format specified by 'time_format'

12年2月4日土曜日

read from stdin / write to stdout

TAB separated values as input/output

WOW!!!!!!!

difference: 'tag' may be needed with out_exec_filter

simple solution: if not exists, ignore.

'out_exec_filter' and 'Hadoop Streaming'

12年2月4日土曜日

reasonable efficiency(compared with batch throughput)

ease to re-run same conversion as batch

None SPOF

ease to add/remove nodes

What is important about stream processing

12年2月4日土曜日

What is distributed stream process toplogies like?

servers

servers

servers

servers

servers

deliver

deliver

workerworker

workerworker

workerworker

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

Redundancy and load balancing MUST be guaranteed anywhere.

12年2月4日土曜日

Deliver nodes

servers

servers

servers

servers

servers

deliver

deliver

workerworker

workerworker

workerworker

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

Accept connections from web servers,Copy messages and send to:

1. archiver (and its backup)2. convert workers (w/ load balancing)

3. and ...

useful for casual worker append/remove12年2月4日土曜日

Worker nodes

servers

servers

servers

servers

servers

deliver

deliver

workerworker

workerworker

workerworker

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

Under load balancing,workers as many as you want

12年2月4日土曜日

Serializer nodes

servers

servers

servers

servers

servers

deliver

deliver

workerworker

workerworker

workerworker

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

Receive converted data stream from workers,aggregate by services, and :

1. write to storage(hfds/hoop)2. and...

useful to reduce overhead of storage from many concurrent write operations

12年2月4日土曜日

Watcher nodes

servers

servers

servers

servers

servers

deliver

deliver

workerworker

workerworker

workerworker

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

watcherwatcher

Watching data for real-time workload repotings

and trouble notifications

1. for raw data from delivers2. for structured data from serializers

12年2月4日土曜日

break.crouton by kbysmnr

12年2月4日土曜日

Implementation details

log agents on servers (scribeline)

deliver (copy, in_scribe, out_scribe, out_forward)

worker (in/out_forward, out_exec_filter)

serializer/hooper (in/out_forward, out_hoop)

watcher (in_forward, out_flowcounter, out_growthforecast)

12年2月4日土曜日

log agent: scribeline

log delivery agent tool, python 2.4, scribe/thrift

easy to setup and start/stopworks with any httpd configuration updates

works with logrotate-ed log filesautomatic delivery target failover/takeback

(NEW) Cluster support(random select from server list)

https://github.com/tagomoris/scribe_line

12年2月4日土曜日

From scribeline To deliver

serversscribeline

deliver server (primary)category: blog

message: RAW LOG (Apache combined + α)

deliver server (secondary)

fluentd

fluentd

in_scribe

in_scribe

scribe

12年2月4日土曜日

From scribeline To deliverdeliver 01 (primary)

deliver 02 (secondary)

deliver 03 (primary for high throughput nodes)

x8 fluentdper node

xNN servers

12年2月4日土曜日

From scribeline To deliver

serversscribeline

deliver server (primary)category: blog

message: RAW LOG (Apache combined + α)

deliver server (secondary)

fluentd

fluentd

in_scribe

in_scribe

12年2月4日土曜日

deliver node internal routing

deliver server (primary) x8 fluentd instancesdeliver fluentd

in_scribe add_prefix scribe remove_newline true

time: received_attag: scribe.blog

message: RAW LOG

copy scribe.*out_scribe host archive.server.local remove_prefix scribe add_newline true

category: blogmessage: RAW LOG

out_flowcounter (see later..)

roundrobin (see next)

out_forward (see later with out_flowcounter..)

12年2月4日土曜日

deliver node: roundrobin strategy to workers

roundrobin x56 substore configurations (7workers x 8instances)

out_forward server: worker01 port 24211 secondary server: worker02 port 24211

out_forward server: worker01 port 24212 secondary server: worker03 port 24212

out_forward server: worker01 port 24213 secondary server: worker04 port 24213

out_forward server: worker01 port 24214 secondary server: worker05 port 24214

time: received_attag: scribe.blog

message: RAW LOG

12年2月4日土曜日

From deliver To worker

deliver serverdeliver fluentd

copy scribe.*roundrobin

out_forward

worker server Xworker fluentd Xn1

in_forward

time: received_attag: scribe.blog

message: RAW LOG

time: received_attag: scribe.blog

message: RAW LOG

worker server Yworker fluentd Yn2

in_forward

12年2月4日土曜日

worker node internal routing

worker server x8 worker instances, x1 serializer instanceworker fluentd

in_forwardserializer fluentd

out_forward converted.*

in_forwardout_exec_filter scribe.*command: convert.shin_keys: tag,messageremove_prefix scribeout_keys: .......add_prefix: convertedtime_key: timefieldtime_format: %Y%m%d%H%M%S

time:received_attag: scribe.blog

message: RAW LOG

time:written_timetag: converted.blog[many data fields]

out_hoop converted.bloghoop_server servername.localusernamepath /on_hdfs/%Y%m%d/blog-%H.log

out_hoop converted.newspath /on_hdfs/%Y%m%d/news-%H.log

TAB separatedtext data

12年2月4日土曜日

out_exec_filter (review.)

1. fork and exec 'command' program

2. write data to child process stdin as TAB separated fields specified by 'out_keys' (for tag, remove_prefix available)

3. read data from child process stdout as TAB separated fields named by 'in_keys' (for tag, add_prefix

available)

4. set message's timestamp by 'time_key' value in parsed data as format specified by 'time_format'

12年2月4日土曜日

worker fluentd

out_exec_filter behavior details

out_exec_filter scribe.*command: convert.sh in_keys: tag,message remove_prefix: scribe out_keys: ....... add_prefix: converted time_key: timefield time_format: %Y%m%d%H%M%S

time: received_attag: scribe.blog

message: RAW LOG

Forked Process (convert.sh -> perl convert.pl)stdin

blog RAWLOG

stdout

blog 20120204175035 field1 field2.....

time: 2012/02/04 17:50:35tag: converted.blog

path:... agent:...referer:... flag1:TRUE

12年2月4日土曜日

From serializer To HDFS (Hoop)

worker serverserializer fluentd

in_forward

time:written_timetag: converted.blog[many data fields]

out_hoop converted.bloghoop_server servername.localusernamepath /on_hdfs/%Y%m%d/blog-%H.log

out_hoop converted.newspath /on_hdfs/%Y%m%d/news-%H.log

Hadoop NameNodeHoop Server

HDFS

TAB separatedtext data

HTTP

12年2月4日土曜日

worker node cluster

deliver node clusterOverview

servers

servers

servers

servers

servers

deliver

deliver

workerworker

workerworker

workerworker

workerworker

worker

archiver backup

serializer

HDFS(Hoop Server)

serializer

12年2月4日土曜日

にくきゅー。crouton by kbysmnr

12年2月4日土曜日

Traffics: Bytes/sec (on deliver 2/3-4)

• bytes

12年2月4日土曜日

Traffics: Messages/sec (on deliver 2/3-4)

• counts

12年2月4日土曜日

Traffic/CPU/Load/Memory: deliver nodes (2/3-4)

12年2月4日土曜日

Traffics: workers network traffics total

• total network traffics

12年2月4日土曜日

Traffic/CPU/Load/Memory: a worker (2/3-4)

12年2月4日土曜日

Fluentd stream processing

Finally, works fine, now

Log conversion latency dramatically reduced

Many useful plugins for monitoring are waiting shipped

Hundreds of cool features to implement are also waiting for us!

12年2月4日土曜日

crouton by kbysmnr

Thank you!

12年2月4日土曜日

Appendixcrouton by kbysmnr

12年2月4日土曜日

input traffics: by fluent-plugin-flowcounter

deliver server (primary) x8 fluentd instancesdeliver fluentd

in_scribe add_prefix scribe remove_newline true

time: received_attag: scribe.blog

message: RAW LOG

copy scribe.*out_scribe host archive.server.local remove_prefix scribe add_newline true

category: blogmessage: RAW LOG

out_flowcounter (see later..)

roundrobin (see next)

out_forward (see later with out_flowcounter..)

12年2月4日土曜日

bytes/messages counting on fluentd

1. 'out_flowcounter' counts input message and its size (specified fields) and its rate (/sec)

2. Counting results emitted per minute/hour/day

3. Worker fluentd sends results to 'Watcher' node over out_forward

4. Watcher receives counting results, and pass to 'out_growthforecast'.

'GrowthForecast' is graph drawing tool with REST API for data registration, by kazeburo

12年2月4日土曜日

out_forward roundrobin is per buffer flushing !

(per buffer size, or flush_interval)

For high throughput stream,

this unit is too large.

We needs roundrobin per 'emit'.

Why not out_forward roundrobin in deliver?

12年2月4日土曜日

deliver node: roundrobin strategy to workers

roundrobin x56 substore configurations (7workers x 8instances)

out_forward server: worker01 port 24211 secondary server: worker02 port 24211

out_forward server: worker01 port 24212 secondary server: worker03 port 24212

out_forward server: worker01 port 24213 secondary server: worker04 port 24213

out_forward server: worker01 port 24214 secondary server: worker05 port 24214

time: received_attag: scribe.blog

message: RAW LOG

12年2月4日土曜日

out_forward roundrobin is per buffer flushing !

(per buffer size, or flush_interval)

For high throughput stream,

this unit is too large.

We needs roundrobin per 'emit'.

Why not out_forward roundrobin in deliver?

12年2月4日土曜日

From worker To serializer: details

worker server x8 worker instances, x1 serializer instanceworker fluentd serializer fluentd

out_forward converted.*server: localhostsecondary: worker1, worker2, worker3, worker4 worker5, worker6, worker7

in_forward

normally send to localhost

in trouble, balance all traffic to all other worker's serializers

12年2月4日土曜日

Software list:scribed: github.com/facebook/scribe/

scribeline: github.com/tagomoris/scribe_line

fluent-plugin-scribe: github.com/fluent/fluent-plugin-scribe

Hoop: http://cloudera.github.com/hoop/docs/latest/ServerSetup.html

fluent-plugin-hoop: github.com/fluent/fluent-plugin-hoop

GrowthForecast: github.com/kazeburo/growthforecast

fluent-plugin-growthforecast: github.com/tagomoris/fluent-plugin-growthforecast

fluent-plugin-flowcounter: github.com/tagomoris/fluent-plugin-flowcounter

12年2月4日土曜日

top related