stream processing at linkedin: apache kafka & apache … sf... · stream processing at...
TRANSCRIPT
![Page 1: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/1.jpg)
STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA
Processing billions of events every day
![Page 2: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/2.jpg)
Neha Narkhede
¨ Co-founder and Head of Engineering @ Stealth Startup
¨ Prior to this… ¤ Lead, Streams Infrastructure @ LinkedIn (Kafka &
Samza) ¤ One of the initial authors of Apache Kafka, committer
and PMC member
¨ Reach out at @nehanarkhede
![Page 3: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/3.jpg)
Agenda
¨ Real-time Data Integration ¨ Introduction to Logs & Apache Kafka ¨ Logs & Stream processing ¨ Apache Samza ¨ Stateful stream processing
![Page 4: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/4.jpg)
The Data Needs Pyramid
Physiological
Safety
Love/Belonging
Esteem
Selfactualization
Maslow's hierarchy of needs
Data collection
Data processing
Understanding
Automation
Data needs
![Page 5: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/5.jpg)
Agenda
¨ Real-time Data Integration ¨ Introduction to Logs & Apache Kafka ¨ Logs & Stream processing ¨ Apache Samza ¨ Stateful stream processing
![Page 6: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/6.jpg)
Increase in diversity of data
1980+
2000+
2010+
Siloeddatafeeds
Database data (users, products, orders etc)
IoT sensors
Events (clicks, impressions, pageviews)Application logs (errors, service calls)Application metrics (CPU usage, requests/sec)
![Page 7: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/7.jpg)
Explosion in diversity of systems
¨ Live Systems ¤ Voldemort ¤ Espresso ¤ GraphDB ¤ Search ¤ Samza
¨ Batch ¤ Hadoop ¤ Teradata
![Page 8: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/8.jpg)
Data integration disaster
OracleOracle
Oracle User Tracking
HadoopLog
SearchMonitoring
Data
Warehous
e
Social
Graph
Rec.
EngineSearch Email
VoldemortVoldemort
Voldemort
EspressoEspresso
EspressoLogs
Operational
Metrics
Production Services
...Security
![Page 9: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/9.jpg)
Centralized service
OracleOracle
Oracle User Tracking
HadoopLog
Search
Monitorin
g
Data
Warehous
e
Social
Graph
Rec
Engine &
Life
Search Email
VoldemortVoldemort
Voldemort
EspressoEspresso
EspressoLogs
Operational
Metrics
Production Services
...Security
Data Pipeline
![Page 10: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/10.jpg)
Agenda
¨ Real-time Data Integration
¨ Introduction to Logs & Apache Kafka
¨ Logs & Stream processing ¨ Apache Samza ¨ Stateful stream processing
![Page 11: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/11.jpg)
Kafka at 10,000 ft
¨ Distributed from ground up
¨ Persistent ¨ Multi-subscriber
Cluster of brokers
ProducerProducer
ProducerProducer
ProducerProducer
ProducerConsumer
ProducerConsumer
ProducerConsumer
![Page 12: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/12.jpg)
Key design principles
¨ Scalability of a file system ¤ Hundreds of MB/sec/server throughput ¤ Many TBs per server
¨ Guarantees of a database ¤ Messages strictly ordered ¤ All data persistent
¨ Distributed by default ¤ Replication model ¤ Partitioning model
![Page 13: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/13.jpg)
Kafka adoption
![Page 14: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/14.jpg)
Apache Kafka @ LinkedIn
¨ 175 TB of in-flight log data per colo ¨ Low-latency: ~1.5ms ¨ Replicated to each datacenter ¨ Tens of thousands of data producers ¨ Thousands of consumers ¨ 7 million messages written/sec ¨ 35 million messages read/sec ¨ Hadoop integration
![Page 15: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/15.jpg)
The data structure every systems engineer should know
Logs
![Page 16: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/16.jpg)
The Log
¨ Ordered ¨ Append only ¨ Immutable
0 1 2 3 4 5 6 7 8 9 10 11 12
1st record next record written
![Page 17: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/17.jpg)
The Log: Partitioning
0 1 2 3 4 5 6 7 8 9 10 11 12Partition 0
0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9 10 11 12
Partition 1
Partition 2 13 14 15 16
![Page 18: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/18.jpg)
Logs: pub/sub done right
0 1 2 3 4 5 6 7 8 9 10 11 12
writes
Data source
Destination system A(time = 7)
Destination system B(time = 11)
reads reads
![Page 19: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/19.jpg)
Logs for data integration
User updates profile with
new job
Newsfeed
KAFKA
Search Hadoop Standardizationengine
![Page 20: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/20.jpg)
Agenda
¨ Real-time Data Integration ¨ Introduction to Logs & Apache Kafka
¨ Logs & Stream processing ¨ Apache Samza ¨ Stateful stream processing
![Page 21: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/21.jpg)
Stream processing = f(log)
Log A Job 1 Log B
![Page 22: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/22.jpg)
Stream processing = f(log)
Log A Job 1
Job 2
Log B Log C
Log D Log E
![Page 23: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/23.jpg)
Apache Samza at LinkedIn
User updates profile with
new job
Newsfeed
KAFKA
Search Hadoop Standardizationengine
![Page 24: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/24.jpg)
Latency spectrum of data systems
Synchronous (milliseconds)
RPC
Batch (Hours)
Latency
Asynchronous processing (seconds to minutes)
![Page 25: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/25.jpg)
Agenda
¨ Real-time Data Integration ¨ Introduction to Logs & Apache Kafka ¨ Logs & Stream processing
¨ Apache Samza ¨ Stateful stream processing
![Page 26: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/26.jpg)
Samza API
public interface StreamTask {
void process (IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator);
}
getKey(), getMsg()
sendMsg(topic, key, value)
commit(), shutdown()
![Page 27: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/27.jpg)
Samza Architecture (Logical view)
Task 1 Task 2 Task 3
Log A
Log B
partition 0 partition 1 partition 2
partition 0 partition 1
![Page 28: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/28.jpg)
Samza Architecture (Logical view)
Task 1 Task 2 Task 3
Log A
Log B
partition 0 partition 1 partition 2
partition 0 partition 1
Samza container 1 Samza container 2
![Page 29: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/29.jpg)
Samza Architecture (Physical view)
Samza container 1 Samza container 2
Host 1 Host 2
![Page 30: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/30.jpg)
Samza Architecture (Physical view)
Samza container 1 Samza container 2
Host 1 Host 2
Samza YARN AM
Node manager Node manager
![Page 31: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/31.jpg)
Samza Architecture (Physical view)
Samza container 1 Samza container 2
Host 1 Host 2
Samza YARN AM
Node manager Node manager
Kafka Kafka
![Page 32: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/32.jpg)
Map Reduce Map Reduce YARN AM
Node manager Node manager
HDFS HDFS
Host 1 Host 2
Samza Architecture: Equivalence to Map Reduce
![Page 33: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/33.jpg)
M/R Operation Primitives
¨ Filter records matching some condition ¨ Map record = f(record) ¨ Join Two/more datasets by key ¨ Group records with same key ¨ Aggregate f(records within the same group) ¨ Pipe job 1’s output => job 2’s input
![Page 34: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/34.jpg)
M/R Operation Primitives on streams
¨ Filter records matching some condition ¨ Map record = f(record) ¨ Join Two/more datasets by key ¨ Group records with same key ¨ Aggregate f(records within the same group) ¨ Pipe job 1’s output => job 2’s input
Requires state maintenance
![Page 35: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/35.jpg)
Agenda
¨ Real-time Data Integration ¨ Introduction to Logs & Apache Kafka ¨ Logs & Stream processing ¨ Apache Samza
¨ Stateful stream processing
![Page 36: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/36.jpg)
Example: Newsfeed
User 567 posted "Hello World"
Status update log
Fan outmessages to
followers
Push notification log
567 -> [123, 679, 789, ...]999 -> [156, 343, ... ]
User 989 posted "Blah Blah"User ... posted "..."
External connection DB
Refresh user 123's newsfeedRefresh user 679's newsfeedRefresh user ...'s newsfeed
![Page 37: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/37.jpg)
Disk
100-500K msg/sec/node 100-500K msg/sec/node
1-5K queries/sec ??ex: Cassandra, MongoDB, etc
Remote state
Samza task partition 0
Samza task partition 1
Local state vs Remote state: Remote
❌ Performance ❌ Isolation ❌ Limited APIs
![Page 38: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/38.jpg)
Local
LevelDB/RocksDB
Samza task partition 0
Samza task partition 1
Local
LevelDB/RocksDB
Local state: Bring data closer to computation
![Page 39: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/39.jpg)
Local
LevelDB/RocksDB
Samza task partition 0
Samza task partition 1
Local
LevelDB/RocksDB
Local state: Bring data closer to computation
Disk Change log stream
![Page 40: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/40.jpg)
Example Revisited: Newsfeed
User 567 posted "Hello World"
Status update log New connection log
Fan outmessages to
followers
Push notification log
567 -> [123, 679, 789, ...]999 -> [156, 343, ... ]
User 123 followed 567User 890 followed 234
User ... followed ...User 989 posted "Blah Blah"User ... posted "..."
Refresh user 123's newsfeedRefresh user 679's newsfeedRefresh user ...'s newsfeed
![Page 41: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/41.jpg)
Fault tolerance?
Samza container 1 Samza container 2
Host 1 Host 2
Samza YARN AM
Node manager Node manager
Kafka Kafka
![Page 42: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/42.jpg)
Local
LevelDB/RocksDB
Samza task partition 0
Samza task partition 1
Local
LevelDB/RocksDB
Durable change log
Fault tolerance in Samza
![Page 43: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/43.jpg)
Slow jobs
Log A Job 1
Job 2
Log B Log C
Log D Log E
❌ Drop data ❌ Backpressure ❌ Queue ❌ In memory ✅ On disk (KAFKA)
![Page 44: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/44.jpg)
Summary
¨ Real time data integration is crucial for the success and adoption of stream processing
¨ Logs form the basis for real time data integration ¨ Stream processing = f(logs) ¨ Samza is designed from ground-up for scalability
and provides fault-tolerant, persistent state
![Page 45: STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to](https://reader030.vdocuments.mx/reader030/viewer/2022021500/5aed5cf87f8b9a3b2e9073e6/html5/thumbnails/45.jpg)
Thank you!
¨ The Log ¤ http://bit.ly/the_log
¨ Apache Kafka ¤ http://kafka.apache.org
¨ Apache Samza ¤ http://samza.incubator.apache.org
¨ Me ¤ @nehanarkhede ¤ http://www.linkedin.com/in/nehanarkhede