streaming in practice - putting apache kafka in production
TRANSCRIPT
![Page 1: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/1.jpg)
1
Streaming in PracticePutting Apache Kafka in Production
Roger Hoover, Engineer, Confluent
![Page 2: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/2.jpg)
2
Apache Kafka: Online Talk SeriesPart 1: September 27 Part 2: October 6 Part 3: October 27
Part 4: November 17 Part 6: December 15Part 5: December 1
Introduction To Streaming Data and Stream Processing with Apache Kafka
Deep Dive into Apache Kafka
Demystifying Stream Processing with Apache Kafka
Data Integration with Apache Kafka
A Practical Guide to Selecting a Stream
Processing Technology
Streaming in Practice: Putting
Apache Kafka in Production
https://www.confluent.io/apache-kafka-talk-series/
![Page 3: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/3.jpg)
3
Agenda• Kafka Basics• Tuning Kafka For Your Application• Data Balancing• Spanning Multiple Datacenters
![Page 4: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/4.jpg)
4
Agenda• Kafka Basics• Tuning Kafka For Your Application• Data Balancing• Spanning Multiple Datacenters
![Page 5: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/5.jpg)
5
![Page 6: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/6.jpg)
6
Architecture
Kafka cluster
broker 1…
producer
producer
producer
consumer
consumer
broker 2 broker n topic partition
server 1
server 2
server 3
ZooKeepercluster
![Page 7: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/7.jpg)
7
Operations• Simple Deployment• Rolling Upgrades• Good metrics for component monitoring
![Page 8: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/8.jpg)
8
Agenda• Kafka Basics• Tuning Kafka For Your Application• Data Balancing• Spanning Multiple Datacenters
![Page 9: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/9.jpg)
9
Two Example Apps• User activity tracking
• Collect page view events while users are browsing our web and mobile storefronts
• Persist the data to HDFS for subsequent use in recommendation engine
• Inventory adjustments• Track sales, maintain inventory, and re-order
on-demand
![Page 10: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/10.jpg)
10
Application Priorities• User activity tracking
• High throughput (100x the sales stream)• Availability is most important• Low retention required - 3 days
• Inventory adjustments• Relatively low throughput• Durability is most important• Long retention required – 6 months
![Page 11: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/11.jpg)
11
Knobs- Partition count- Replication factor- Retention- Batching + compression- Producer send acknowledgements- Minimum ISRs- Unclean Leader Election
![Page 12: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/12.jpg)
12
Partition Count- Partitions are the unit of consumer parallelism- Over-partition your topics (especially keyed topics)- Easy to add consumers but hard to add partitions for keyed topics- Kafka can support ~10s k partitions
![Page 13: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/13.jpg)
13
Partition Count- High Throughput (User activity tracking)
- Large number of partitions (~100)- Fewer Resources (Inventory adjustments)
- Smaller number of partitions (< 50)
![Page 14: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/14.jpg)
14
Replication Factor- More replicas require more storage, disk I/O, and network bandwidth- More replicas can tolerate more failures
topic1-part1
logs
broker 1
topic1-part2
logs
broker 2
topic2-part2
topic2-part1
logs
broker 3
topic1-part1
logs
broker 4
topic1-part2
topic2-part2 topic1-part1 topic1-part2
topic2-part1
topic2-part2
topic2-part1
![Page 15: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/15.jpg)
15
Replication Factor- Lower cost (User activity tracking)
- replication.factor = 2- High Fault Tolerance (Inventory adjustments)
- replication.factor = 3- Defaults to 1
![Page 16: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/16.jpg)
16
Retention- Retention time can be set per topic- Longer retention times require more storage (imagine that!)- Longer retention allows consumers to rewind further back in time
- Part of the consumer’s SLA!
![Page 17: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/17.jpg)
17
Retention- Less Storage (User activity tracking)
- log.retention.hours=72 (3 days)- Longer Time Travel (Inventory adjustments)
- log.retention.hours=4380 (6 months)- Default is 7 days
![Page 18: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/18.jpg)
18
Side-note: Time Travel- Kafka 0.10.1 supports rewinding by time
- E.g. “Rewind to 10 minutes ago”
![Page 19: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/19.jpg)
19
Batching & Compression- Producer: batch.size, linger.ms, compression.type- Consumer: fetch.min.bytes, fetch.wait.max.ms
compressed batch 1send()
send()send()send()
producer
asyncflush
poll()compressed batch 2
compressed batch 3
compressed batch 1
compressed batch 2
compressed batch 3
consumerbroker
![Page 20: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/20.jpg)
20
Batching & Compression- High throughput (User activity tracking)
- Producer: compression.type=lz4, batch.size (256KB), linger.ms (~10ms) or flush manually
- Consumer: fetch.min.bytes (256KB), fetch.wait.max.ms (~10ms)- Low latency (Inventory adjustments)
- Producer: linger.ms=0- Consumer: fetch.min.bytes=1
- Defaults- compression.type = none- linger.ms = 0 (i.e. send immediately)- fetch.min.bytes = 1 (i.e. receive immediately)
![Page 21: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/21.jpg)
21
Producer Acknowledgements on Send
broker 1
producer
leader
broker 2
follower
broker 3
follower
4
2
2
3commit
ack
When producer receives ack Latency Durability on failures
acks=0 (no ack) no network delay some data loss
acks=1 (wait for leader) 1 network roundtrip a few data loss
acks=all (wait for committed) 2 network roundtrips no data loss
topic1-part1 topic1-part1 topic1-part1consumer
1
![Page 22: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/22.jpg)
22
Producer Acknowledgements on Send- Throughput++ (User activity tracking)
- acks = 1- Durability++ (Inventory adjustments)
- acks = all- Default
- acks = 1
![Page 23: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/23.jpg)
23
In-Sync Replicas (ISRs)
broker 1
producer
leader
broker 2
follower
broker 3
follower
2
2
topic1-part1 topic1-part1 topic1-part1
1
m1 m1 m1
m2 m2 m2
ISR
last committed
m2, m1
In-sync : replica reads from leader’s log end within replica.lag.time.max.ms
![Page 24: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/24.jpg)
24
Minimum In-Sync Replicas
broker 1
producerleader
broker 2
follower
broker 3
topic1-part1 topic1-part1 topic1-part1
m1 m1 m1
m2 m2 m2
ISR
m3
m4last committed
m5 follower
- Topic config to tell Kafka how to handle writes during severe outages (rare)
- Leader will reject writes if the ISR count is too smalltopic1: min.insync.replicas=2
![Page 25: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/25.jpg)
25
Minimum In-Sync Replicas- Availability++ (User activity tracking)
- min.insync.replicas = 1- Durability++ (Inventory adjustments)
- min.insync.replicas = 2- Defaults to 1
![Page 26: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/26.jpg)
26
Unclean Leader Election- Topic config to tell Kafka how to handle topic leadership during severe
outages (rare)- Allows automatic recovery in exchange for losing data
m5
broker 1
producer
leader ???
broker 2
leader
broker 3
2
topic1-part1 topic1-part1 topic1-part1
1
m1 m1 m1
m2 m2 m2
ISR
m3 m3
m4 m4last committed
m3
follower
m4
m5
![Page 27: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/27.jpg)
27
Unclean Leader Election- Availability++ (User activity tracking)
- unclean.leader.election.enable = true- Durability++ (Inventory adjustments)
- unclean.leader.election.enable = false- Defaults to true
![Page 28: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/28.jpg)
28
Mission Critical Data- Producer acknowledgments
- acks=all- Replication factor
- replication.factor = 3- Minimum ISRs
- min.insync.replicas = 2- Unclean Leader Election
- unclean.leader.election.enable = false
![Page 29: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/29.jpg)
29
Agenda• Kafka Basics• Tuning Kafka For Your Application• Data Balancing• Spanning Multiple Datacenters
![Page 30: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/30.jpg)
30
Replica Placement• Partitions are replicated• Replicas are spread evenly across the cluster• Only when the topic is created or modified
topic1-part1
logs
broker 1
topic1-part2
logs
broker 2
topic2-part2
topic2-part1
logs
broker 3
topic1-part1
logs
broker 4
topic1-part2
topic2-part2 topic1-part1 topic1-part2
topic2-part1
topic2-part2
topic2-part1
![Page 31: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/31.jpg)
31
Replica Placement• Over time broker load and storage become unbalanced• Initial replica placement does not account for topic throughput or
retention• Adding or removing brokers
topic1-part1
broker 1
topic1-part2
broker 2
topic2-part2
topic2-part1
broker 3
topic1-part1
broker 4
topic1-part2
topic2-part2
topic1-part1 topic1-part2topic2-part1
topic2-part2
topic2-part1
broker 5
![Page 32: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/32.jpg)
32
Replica Reassignment• Create plan to rebalance replicas• Upload new assignment to the cluster• Kafka migrates replicas without disruption
topic1-part1
broker 1
topic1-part2
broker 2
topic2-part2
topic2-part1
broker 3
topic1-part1
broker 4
topic1-part2
topic1-part1
topic1-part2topic2-part1
topic2-part2
broker 5
topic2-part1
topic2-part2
topic1-part1
broker 1
topic1-part2
broker 2
topic2-part2
topic2-part1
broker 3
topic1-part1
broker 4
topic1-part2
topic2-part2
topic1-part1 topic1-part2topic2-part1
topic2-part2
topic2-part1
broker 5
Before
After
![Page 33: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/33.jpg)
33
Data Balancing: Tricky Parts• Creating a good plan
• Balance broker disk space• Balance broker load• Minimize data movement• Preserve rack placement
• Movement of replicas can overload I/O and bandwidth resources• Use replication quota feature in 0.10.1
![Page 34: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/34.jpg)
34
Data Balancing: Solutions• DIY
• kafka-reassign-partitions.sh script in Apache Kafka• Confluent Enterprise Auto Data Balancing
• Optimizes storage utilization• Rack awareness and minimal data movement• Leverages replication quotas during rebalance
![Page 35: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/35.jpg)
35
Agenda• Kafka Basics• Tuning Kafka For Your Application• Data Balancing• Spanning Multiple Datacenters
![Page 36: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/36.jpg)
36
Use cases • Disaster Recovery• Replicate data out to geo-localized data centers• Aggregate data from other data centers for analysis• Part of hybrid cloud or cloud migration strategy
![Page 37: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/37.jpg)
37
Multi-DC: Two Approaches• Stretched cluster• Mirroring across clusters
![Page 38: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/38.jpg)
38
Stretched Cluster• Low-latency links between 3 DCs. Typically AZs in a single AWS region.• Applications in all 3 DCs share the same cluster and handle failures automatically.• Relies on intra-cluster replication to copy data across DCs (replication.factor >= 3)
• Use rack awareness in Kafka 0.10; manual partition placement otherwise
Kafka
producers
consumers
AZ 1
AZ 3
AZ 2 produce
rsproduce
rs
consumers
consumers
AWS Region
![Page 39: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/39.jpg)
39
Mirroring Across Clusters• Separate Kafka clusters in each DC. Mirroring process copies data between them.• Several variations of this pattern. Some require manual intervention on failover and
recovery.
![Page 40: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/40.jpg)
40
How to Mirror Across Clusters• MirrorMaker tool in Apache Kafka
• Manual topic creation• Manual sync of topic configuration
• Confluent Enterprise Multi-DC• Dynamic topic creation at the destination• Automatic sync for topic configurations (including access controls)• Can be configured and managed from the Control Center UI• Leverages Connect API
![Page 41: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/41.jpg)
41
More Information: Tuning Tradeoffs• Apache Kafka and Confluent Documentation• When it Absolutely, Positively, Has to be There: Reliability Guarantees in Kafka
• Gwen Shapira and Jeff Holoman - https://www.confluent.io/kafka-summit-2016-ops-when-it-absolutely-positively-has-to-be-there/
• Chapter 6: Reliability Guarantees• Neha Narkhede, Gwen Shapira, Todd Palino – Kafka: The Definitive Guide
• Confluent Operations Training
![Page 42: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/42.jpg)
42
More Information: Multi-DC• Building Large Scale Stream Infrastructures Across Multiple Data Centers with Apache
Kafka – Jun Rao• Video: https://www.youtube.com/watch?v=XcvHmqmh16g• Slides: http://www.slideshare.net/HadoopSummit/building-largescale-stream-
infrastructures-across-multiple-data-centers-with-apache-kafka• Confluent Enterprise Multi-DC - https://www.confluent.io/product/multi-datacenter/
![Page 43: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/43.jpg)
43
More Information: Metadata Management• Yes, Virginia, You Really Do Need a Schema Registry
• Gwen Shapira - https://www.confluent.io/blog/schema-registry-kafka-stream-processing-yes-virginia-you-really-need-one/
![Page 44: Streaming in Practice - Putting Apache Kafka in Production](https://reader035.vdocuments.mx/reader035/viewer/2022062306/586f901a1a28ab54768b7841/html5/thumbnails/44.jpg)
44
Thank you!www.kafka-summit.org May 8, 2017
New York CityHilton Midtown
August 28, 2017San FranciscoHilton Union Square