papers we love realtime at facebook

Papers We Love:

Realtime Data Processing at Facebook

Gwen ShapiraConfluent Inc.

Papers We Love:

Realtime Data Processing at Facebook

Published in 2016 (!)

What kind of paper is this?

This is NOT

The one true architecture

Please don’t cargo-cult this paper

Few real-time systems at Facebook

• Chorus – aggregate trends

• Realtime feedback for mobile app developers

• Page analytics – likes, engagement…

• Offload CPU-intensive dashboard queries

Looking for trending topics in 5 minute windows

The Tofu & Potatoes of the paper:

Design Decisions

/ KafkaStreams

+ exactly

Decision #1 – Language Paradigm

• Declarative (SQL) – easy & limited

• Functional

• Procedural (C++, Java, Python) –most flexibility, control, performance. Longer dev cycle.

Decision #1 – Language Paradigm

• Declarative (SQL) – easy & limited

• Functional

• Procedural (C++, Java, Python) –most flexibility, control, performance. Longer dev cycle.

Decision #2: Data Transfer

• RPC (Millwheel, Flink, SparkStreaming)

• All about speed

• Message-forwarding broker (Heron)

• Applies back-pressure, multiplex

• Persistent stream storage (Samza, Kafka’s Stream API)

• Most reliable

• Decouples processors

Decision #2: Data Transfer

Love Song to Scribe

Independent stream processing nodes

And storing inputs / outputs

Made everything great

Decision #3 – Processing Semantics

Facebook Verdict: It depends on requirements

• Ranker writes to idempotent system – at least once

• Scuba can lose data, but not handle duplicates – at most once

• …. Exactly once is REALLY HARD and requires transactions

Don’t miss the side-note on side-effects

• Exactly once means writing output + offsets to a transactional system

• This takes time

• Why just wait when you can deserialize? And maybe do other stateless stuff?

Decision #4 – State Saving

• In-memory state with replication (Old VoltDB)• Requires lots of hardware and network

• Local database (Samza, Kafka Streams API)

• Remote database (Millwheel)

• Upstream (i.e. replay everything on failure)

• Global consistent snapshot (Flink)

Decision #4 – State Saving

Facebook Verdict: It depends

Rhode Island Alaska

Best Part of the Paper – by far

How to efficiently work with state in remote DB?

Decision #5 - Reprocessing

• Stream only – requires long retention in the stream store

• Maintain both batch and stream systems

• Develop systems that can run in streams and batch (Flink, Spark)

Decision #5 - Reprocessing

• Stream only – requires long retention in the stream store

• Maintain both batch and stream systems

• Develop systems that can run in streams and batch (Flink, Spark)

Facebook Verdict:

SQL runs everywhere

And binary generation FTW

Applications – Or a whirlwind tour of good patterns

One example:

Lessons Learned!

The biggest win is pipelines composed of independent processors

• Mixing multiple systems let us move fast

• High level abstractions let us improve implementation

• Ease of debugging – Independent nodes and ability to replay

• Ease of deployment – Puma as-a-service

• Ease of monitoring – Lag is the most important metric. Everything is instrumented out of the box.

• In the future – auto-scale based on lag

Thank You!

papers we love realtime at facebook

Data & Analytics

realtime linux

love god | love each other | love the world letter from...

share the love - stanford facebook class

realtime apache hadoop at facebook

realtime search and monetizing the realtime web

realtime recordingexampledfdsf

realtime data processing at facebook - facebook · pdf...

celebrate love and your facebook fans

realtime marketing

afp: a facebook page your fans will like (or even love!)

professional realtime hd and sd editing platforms for...

love in the age of facebook

fabulous self-love party ideas -...

getting love from the facebook platform

messages found on facebook i love grammar 2!

tokyo hbase meetup - realtime big data at facebook with...

realtime recording

ic realtime

realtime apache hadoop at facebook - borthakur inc - home

facebook, tumblr, buzzfeed love helvetica font