multi-datacenter kafka - strata san jose 2017
TRANSCRIPT
![Page 1: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/1.jpg)
When One Data Center Is Not EnoughBuilding Large-scale Stream Infrastructures Across Multiple Data Centerswith Apache KafkaGwen Shapira
![Page 2: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/2.jpg)
There’s a book on that!
Actually… a chapter
![Page 3: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/3.jpg)
Outline
Kafka overviewCommon multi data center patterns Future stuff
![Page 4: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/4.jpg)
What is Kafka?▪It’s like a message queue, right?-Actually, it’s a “distributed commit log”-Or “streaming data platform”
0 1 2 3 4 5 6 7 8
Data Source
Data Consumer
A
Data Consumer
B
![Page 5: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/5.jpg)
Topics and Partitions▪Messages are organized into topics, and each topic is split into partitions.
- Each partition is an immutable, time-sequenced log of messages on disk.- Note that time ordering is guaranteed within, but not across, partitions.
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
0 1 2 3 4 5 6 7 8
Partition 0
Partition 1
Partition 2
Data SourceTopic
![Page 6: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/6.jpg)
Scalable consumption model
Topic T1Partition 0Partition 1
Partition 2Partition 3
Consumer Group 1
Consumer 1
Topic T1
Partition 0Partition 1
Partition 2Partition 3
Consumer Group 1Consumer 1
Consumer 2
Consumer 3
Consumer 4
![Page 7: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/7.jpg)
Kafka usage
![Page 8: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/8.jpg)
Common use case
Large scale real time data integration
![Page 9: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/9.jpg)
Other use cases
Scaling databasesMessagingStream processing…
![Page 10: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/10.jpg)
Important things to remember:
1. Consumers offset commits2. Within a cluster – each partition has replicas3. Inter-cluster replication, producer and consumer defaults – all tuned for LAN
![Page 11: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/11.jpg)
Why multiple data centers (DC)
Offload work from main clusterDisaster recoveryGeo-localization
• Saving cross-DC bandwidth• Better performance by being closer to users• Some activity is just local• Security / regulations
CloudSpecial case: Producers with network issues
![Page 12: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/12.jpg)
Why is this difficult?
1. It isn’t, really – you consume data from one cluster and produce to another2. Network between two data centers can get tricky3. Consumers have state (offsets) – syncing this between clusters get tough
• And leads to some counter intuitive results
![Page 13: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/13.jpg)
Pattern #1: stretched cluster
Typically done on AWS in a single region• Deploy Zookeeper and broker across 3 availability zones
Rely on intra-cluster replication to replica data across DCs
Kafka
producers
consumers
DC 1
DC 3
DC 2 produce
rsproduce
rs
consumers
consumers
![Page 14: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/14.jpg)
On DC failure
Producer/consumer fail over to new DCs• Existing data preserved by intra-cluster replication• Consumer resumes from last committed offsets and will see same data
Kafka
producers
consumers
DC 1
DC 3
DC 2 produce
rs
consumers
![Page 15: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/15.jpg)
When DC comes back
Intra cluster replication auto re-replicates all missing dataWhen re-replication completes, switch producer/consumer back
Kafka
producers
consumers
DC 1
DC 3
DC 2 produce
rsproduce
rs
consumers
consumers
![Page 16: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/16.jpg)
Be careful with replica assignment
Don’t want all replicas in same AZRack-aware support in 0.10.0
• Configure brokers in same AZ with same broker.rack
Manual assignment pre 0.10.0
![Page 17: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/17.jpg)
Stretched cluster NOT recommended across regions
Asymmetric network partitioning
Longer network latency => longer produce/consume timeCross region bandwidth: no read affinity in Kafka
region 1Kafk
a ZK
region 2Kafk
a ZK
region 3Kafk
a ZK
![Page 18: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/18.jpg)
Pattern #2: active/passive
Producers in active DCConsumers in either active or passive DC
Kafka
producers
consumers
DC 1
Replication
DC 2
Kafka
consumers
Critical Apps
Nice Reports
![Page 19: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/19.jpg)
Cross Datacenter Replication
Consumer & Producer: read from a source cluster and write to a target clusterPer-key ordering preservedAsynchronous: target always slightly behindOffsets not preserved
• Source and target may not have same # partitions• Retries for failed writes
Options:• Confluent Multi-Datacenter Replication• MirrorMaker
![Page 20: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/20.jpg)
On active DC failure
Fail over producers/consumers to passive clusterChallenge: which offset to resume consumption
• Offsets not identical across clusters
Kafka
producers
consumers
DC 1
Replication
DC 2
Kafka
![Page 21: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/21.jpg)
Solutions for switching consumers
Resume from smallest offset• Duplicates
Resume from largest offset• May miss some messages (likely acceptable for real time consumers)
Replicate offsets topic• May miss some messages, may get duplicates
Set offset based on timestamp• Old API hard to use and not precise• Better and more precise API in Apache Kafka 0.10.1 (Confluent 3.1)• Nice tool coming up!
Preserve offsets during replication• Harder to do
![Page 22: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/22.jpg)
When DC comes back
Need to reverse replication• Same challenge: determining the offsets
Kafka
producers
consumers
DC 1
Replication
DC 2
Kafka
![Page 23: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/23.jpg)
Limitations
Reconfiguration of replication after failoverResources in passive DC under utilized
![Page 24: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/24.jpg)
Pattern #3: active/active
Local aggregate replication to avoid cyclesProducers/consumers in both DCs
• Producers only write to local clusters
Kafka local
Kafka aggrega
te
Kafka aggrega
te
producers
producers
consumers
consumers
ReplicationKafka local
DC 1
DC 2
consumers
consumers
![Page 25: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/25.jpg)
On DC failure
Same challenge on moving consumers on aggregate cluster• Offsets in the 2 aggregate cluster not identical• Unless the consumers are continuously running in both clusters
Kafka local
Kafka aggrega
te
Kafka aggrega
te
producers
producers
consumers
consumers
ReplicationKafka local
DC 1
DC 2
consumers
consumers
![Page 26: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/26.jpg)
SFKafka
Cluster
HoustonKafka
Cluster
Allapps
Allapps
West coastUsers
South CentralUsers
![Page 27: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/27.jpg)
When DC comes back
No need to reconfigure replication
Kafka local
Kafka aggrega
te
Kafka aggrega
te
producers
producers
consumers
consumers
ReplicationKafka local
DC 1
DC 2
consumers
consumers
![Page 28: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/28.jpg)
Alternative: avoid aggregate clusters
Prefix topic names with DC tagConfigure replication to replicate remote topics onlyConsumers need to subscribe to topics with both DC tags
Kafka
producers
consumers
DC 1
Replication
DC 2
Kafka
producers
consumers
![Page 29: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/29.jpg)
![Page 30: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/30.jpg)
Beyond 2 DCs
More DCs better resource utilization• With 2 DCs, each DC needs to provision 100% traffic• With 3 DCs, each DC only needs to provision 50% traffic
Setting up replication with many DCs can be daunting• Only set up aggregate clusters in 2-3
![Page 31: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/31.jpg)
Comparison
Pros ConsStretched • Better utilization of
resources• Easy failover for
consumers
• Still need cross region story
Active/passive
• Needed for global ordering • Harder failover for consumers• Reconfiguration during failover• Resource under-utilization
Active/active • Better utilization of resources
• Can be used to avoid consumer failover
• Can be challenging to manage• More replication bandwidth
![Page 32: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/32.jpg)
Multi-DC beyond Kafka
Kafka often used together with other data storesNeed to make sure multi-DC strategy is consistent
![Page 33: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/33.jpg)
Example application
Consumer reads from Kafka and computes 1-min countCounts need to be stored in DB and available in every DC
![Page 34: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/34.jpg)
Independent database per DC
Run same consumer concurrently in both DCs• No consumer failover needed
Kafka local
Kafka aggrega
te
Kafka aggrega
te
producers
producers
consumer
consumer
ReplicationKafka local
DC 1
DC 2
DB DB
![Page 35: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/35.jpg)
Stretched database across DCs
Only run one consumer per DC at any given point of time
Kafka local
Kafka aggrega
te
Kafka aggrega
te
producers
producers
consumer
consumer
ReplicationKafka local
DC 1
DC 2
DB DB
on failover
![Page 36: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/36.jpg)
Practical tips
• Consume remote, produce local• Unless you need encrypted data on the wire• Monitor!
• Burrow for replication lag• Confluent Control Center for end-to-end• JMX metrics for rates and “busy-ness”
• Tune!• Producer / Consumer tuning• Number of consumers, producers• TCP tuning for WAN
• Don’t forget to replicate configuration• Separate critical topics from nice-to-have topics
![Page 37: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/37.jpg)
Future work
Offset reset toolOffset preservation“Remote Replicas”2-DC stretch cluster
Other cool Kafka future:• Exactly Once• Transactions• Headers
![Page 38: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/38.jpg)
THANK YOU!Gwen Shapira| [email protected] | @gwenshap
Kafka Training with Confluent University• Kafka Developer and Operations Courses• Visit www.confluent.io/training
Want more Kafka?• Download Confluent Platform Enterprise at http://www.confluent.io/product• Apache Kafka 0.10.2 upgrade documentation at http://docs.confluent.io/3.2.0/upgrade.html • Kafka Summit recordings now available at http://kafka-summit.org/schedule/
![Page 39: Multi-Datacenter Kafka - Strata San Jose 2017](https://reader036.vdocuments.mx/reader036/viewer/2022070600/58e4a1431a28aba3458b6063/html5/thumbnails/39.jpg)
Discount code: kafstrataSpecial Strata Attendee discount code = 25% off www.kafka-summit.orgKafka Summit New York: May 8Kafka Summit San Francisco: August 28
Presented by