kafka

20
Distributed Message Broker Presented by Majid Hajibaba 12 April 2015 Majid Hajibaba 1

Upload: majid-hajibaba

Post on 18-Jul-2015

119 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Kafka

Distributed Message BrokerPresented by

Majid Hajibaba

12 April 2015Majid Hajibaba 1

Page 2: Kafka

What? Is a messaging system (integration between producers and

consumers)

Distributed Peer-to-Peer

High-throughput

Fault Tolerant

Replicated

Developed at LinkedIn

Why? Log aggregation

Stream processing

real-time processing

as the output sink

act as a buffer or feeder for messages

What is Kafka?

12 April 2015Majid Hajibaba 2

Page 3: Kafka

Topics: categories in which message feed is maintained

Producer: Processes that publish messages to a Kafka topic

Consumers: processes that subscribe to topics and process the feed of published messages

Brokers: Servers which form a kafka cluster and act as a data transport channel between producers and consumers

12 April 2015Majid Hajibaba 3

Terminology

Page 4: Kafka

The Kafka architecture

12 April 2015Majid Hajibaba 4

A Kafka cluster

Stateless brokers

Page 5: Kafka

Topic is a queue

Have multiple partitions (scaling, parallelism)

Consumed by multiple consumers

Reads and writes can happen to each partition in parallel

12 April 2015Majid Hajibaba 5

Topic

partitions ≈ directories

Page 6: Kafka

consumers should pull data from brokers ?

brokers should push data to the consumer?

12 April 2015Majid Hajibaba 6

Pull vs. Push

push

pull

Reads are done by giving the 64-bit logical offset of a message and an S-byte max chunk size

The write allows serial appends which always go to the last file.

the maximum possible rate

Page 7: Kafka

Synchronous send

Producers get an ack. back when they publish a message

Asynchronous send

does not guarantee message delivery

Batching

will attempt to accumulate data in memory and to send out larger batches in a single request

Load balancing

client controls

12 April 2015Majid Hajibaba 7

Producer

Page 8: Kafka

12 April 2015Majid Hajibaba 8

Consumer

partition p

partition q

partition p , q

Page 9: Kafka

12 April 2015Majid Hajibaba 9

Kafka Storage Architecture

Each log file is named with the offset of the first message it contains

file is rolled over to a fresh file when it reaches a configurable size

Page 10: Kafka

n replicas can afford n-1 failures

one replica acts as the lead replica

lead replica maintains the list of all in-sync follower replicas

12 April 2015Majid Hajibaba 10

Replication

Page 11: Kafka

12 April 2015Majid Hajibaba 11

Replication bin/kafka-topics.sh --create --zookeeper 192.168.11.185:2181 --

replication-factor 3 --partitions 4 --topic test

Page 12: Kafka

At most once: Messages may be lost but are never redelivered

At least once: Messages are never lost but may be redelivered

Exactly once: this is what people actually want, each message is delivered once and only once

When publishing a message ??

At most once: without any acknowledgment

Exactly once: can be achieved by producer and acknowledgment

When consuming a message ??

At most once: by sending ack after taking the message

At least once: by sending ack after processing message

Exactly once: requires co-operation with the destination storage system

12 April 2015Majid Hajibaba 12

Message Delivery Semantics

Page 13: Kafka

Adding new server

Just assign a unique broker id and start up Kafka on it

Will connect to others through zookeeper

Will not automatically be assigned any data partitions

Won't be doing any work until new topics are created

Should migrate some existing data to these machines

Data migrating is manually initiated but fully automated

12 April 2015Majid Hajibaba 13

Scaling

Page 14: Kafka

no replication, no partition

Linux virtual machine, 2.6GHz Intel xenon, 2GB memory

publish a total of 1 million messages each of 300 bytes (300 MB)

12 April 2015Majid Hajibaba 14

Performance Test

7000

12000

17000

22000

27000

32000

37000

42000

300 600 900 1200

Me

ssag

es/

sec

Accomulated Message in MB

1 producer

2 producer

4 producer

8 producer

16 producer

16 producer with 2 cpu forbroker

40000 messages/second 12 MB/second

Page 15: Kafka

no replication, 2 partition

12 April 2015Majid Hajibaba 15

Scalability test

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

1 producer 2 producer 4 producer 8 producer 16 producer

Me

ssag

es/

sec

number of producers

1 Broker

2 Broker

Page 16: Kafka

12 April 2015Majid Hajibaba 16

Single node – multiple broker

Page 17: Kafka

12 April 2015Majid Hajibaba 17

Multiple node – multiple broker

Page 18: Kafka

12 April 2015Majid Hajibaba 18

Kafka Usage at LinkedIn

Page 19: Kafka

List Topics

Create Topics

Delete Topic

12 April 2015Majid Hajibaba 19

Commands

bin/kafka-topics.sh --list --zookeeper 192.168.11.185:2181

bin/kafka-topics.sh --create --zookeeper 192.168.11.185:2181 --

replication-factor 1 --partitions 1 --topic test

bin/kafka-topics.sh --delete --zookeeper 192.168.11.185:2181 --

topic test

Page 20: Kafka

ENDAny Question?

12 April 2015Majid Hajibaba 20