kafka
TRANSCRIPT
Distributed Message BrokerPresented by
Majid Hajibaba
12 April 2015Majid Hajibaba 1
What? Is a messaging system (integration between producers and
consumers)
Distributed Peer-to-Peer
High-throughput
Fault Tolerant
Replicated
Developed at LinkedIn
Why? Log aggregation
Stream processing
real-time processing
as the output sink
act as a buffer or feeder for messages
What is Kafka?
12 April 2015Majid Hajibaba 2
Topics: categories in which message feed is maintained
Producer: Processes that publish messages to a Kafka topic
Consumers: processes that subscribe to topics and process the feed of published messages
Brokers: Servers which form a kafka cluster and act as a data transport channel between producers and consumers
12 April 2015Majid Hajibaba 3
Terminology
The Kafka architecture
12 April 2015Majid Hajibaba 4
A Kafka cluster
Stateless brokers
Topic is a queue
Have multiple partitions (scaling, parallelism)
Consumed by multiple consumers
Reads and writes can happen to each partition in parallel
12 April 2015Majid Hajibaba 5
Topic
partitions ≈ directories
consumers should pull data from brokers ?
brokers should push data to the consumer?
12 April 2015Majid Hajibaba 6
Pull vs. Push
push
pull
Reads are done by giving the 64-bit logical offset of a message and an S-byte max chunk size
The write allows serial appends which always go to the last file.
the maximum possible rate
Synchronous send
Producers get an ack. back when they publish a message
Asynchronous send
does not guarantee message delivery
Batching
will attempt to accumulate data in memory and to send out larger batches in a single request
Load balancing
client controls
12 April 2015Majid Hajibaba 7
Producer
12 April 2015Majid Hajibaba 8
Consumer
partition p
partition q
partition p , q
12 April 2015Majid Hajibaba 9
Kafka Storage Architecture
Each log file is named with the offset of the first message it contains
file is rolled over to a fresh file when it reaches a configurable size
n replicas can afford n-1 failures
one replica acts as the lead replica
lead replica maintains the list of all in-sync follower replicas
12 April 2015Majid Hajibaba 10
Replication
12 April 2015Majid Hajibaba 11
Replication bin/kafka-topics.sh --create --zookeeper 192.168.11.185:2181 --
replication-factor 3 --partitions 4 --topic test
At most once: Messages may be lost but are never redelivered
At least once: Messages are never lost but may be redelivered
Exactly once: this is what people actually want, each message is delivered once and only once
When publishing a message ??
At most once: without any acknowledgment
Exactly once: can be achieved by producer and acknowledgment
When consuming a message ??
At most once: by sending ack after taking the message
At least once: by sending ack after processing message
Exactly once: requires co-operation with the destination storage system
12 April 2015Majid Hajibaba 12
Message Delivery Semantics
Adding new server
Just assign a unique broker id and start up Kafka on it
Will connect to others through zookeeper
Will not automatically be assigned any data partitions
Won't be doing any work until new topics are created
Should migrate some existing data to these machines
Data migrating is manually initiated but fully automated
12 April 2015Majid Hajibaba 13
Scaling
no replication, no partition
Linux virtual machine, 2.6GHz Intel xenon, 2GB memory
publish a total of 1 million messages each of 300 bytes (300 MB)
12 April 2015Majid Hajibaba 14
Performance Test
7000
12000
17000
22000
27000
32000
37000
42000
300 600 900 1200
Me
ssag
es/
sec
Accomulated Message in MB
1 producer
2 producer
4 producer
8 producer
16 producer
16 producer with 2 cpu forbroker
40000 messages/second 12 MB/second
no replication, 2 partition
12 April 2015Majid Hajibaba 15
Scalability test
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
1 producer 2 producer 4 producer 8 producer 16 producer
Me
ssag
es/
sec
number of producers
1 Broker
2 Broker
12 April 2015Majid Hajibaba 16
Single node – multiple broker
12 April 2015Majid Hajibaba 17
Multiple node – multiple broker
12 April 2015Majid Hajibaba 18
Kafka Usage at LinkedIn
List Topics
Create Topics
Delete Topic
12 April 2015Majid Hajibaba 19
Commands
bin/kafka-topics.sh --list --zookeeper 192.168.11.185:2181
bin/kafka-topics.sh --create --zookeeper 192.168.11.185:2181 --
replication-factor 1 --partitions 1 --topic test
bin/kafka-topics.sh --delete --zookeeper 192.168.11.185:2181 --
topic test
ENDAny Question?
12 April 2015Majid Hajibaba 20