apache kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe...

35
Apache Kafka... ...”a system optimized for writing” Bernhard Hopfenmüller 23. Oktober 2018

Upload: others

Post on 26-Apr-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Apache Kafka......”a system optimized for writing”

Bernhard Hopfenmüller

23. Oktober 2018

Page 2: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

whoami

Bernhard HopfenmüllerIT Consultant @ ATIX AG

IRC: Fobhepgithub.com/Fobhep

#atix #ossummit

Page 3: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

whoarewe

The Linux & Open Source CompanyUnterschleißheim @ München

over 15 yearsdatacenter automation, Linux

Consulting, Engineering, Support,Training

#atix #ossummit

Page 4: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Kafka

Quora.com

What is the relation between Kafka, the writer, and Apache Kaf-ka, the distributed messaging system?

Jay Kreps: I thought that since Kafka was a system optimized forwriting using a writer’s name would make sense. I had taken a lotof lit classes in colleague and liked Franz Kafka. Plus the namesounded cool for an OS project

#atix #ossummit

Page 5: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

I developed by LinkedIn, Open Source since 2011

I 2014 foundation of Confluent

#atix #ossummit

Page 6: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Messaging-Systems

Why do we need a messaging system?

#atix #ossummit

Page 7: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Messaging-Systems

Why do we need a messaging system?

I Challenge 1: Sender not available

I Challenge 2: Sending too much(DoS)

I Challenge 3: Receiver crash uponprocessing

#atix #ossummit

Page 8: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Queues vs Topics

Supermarket vs Television Source[1]

Supermarket Wait until it’s your turn Television Choose what you want toreceive

#atix #ossummit

Page 9: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Kafka-Basic structure

#atix #ossummit

Page 10: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Use Cases

I Messaging (ActiveMQ or RabbitMQ)

I Website Activity Tracking

I Metrics

I Log Aggregation

I Stream Processing

I Apache Storm and Apache Samza.

I Commit Log

#atix #ossummit

Page 11: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Topics I

I core component of Kafka

I is filled by producer

I consists of one or more partitions

#atix #ossummit

Page 12: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Topics II

I producer can choose partition

I partition has running offset

I message is identified by offset

#atix #ossummit

Page 13: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Topics III

I messages are stored physically!

I key-value principle

I Clean-Up policies:

#atix #ossummit

Page 14: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Topics IV

I Clean-Up policies:

I default: Retention-time(delete old data after x days)

I Retention-size(delete old data if datamemory > x)

#atix #ossummit

Page 15: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Topics V

I Clean-Up policies:

I default: Retention-time(delete old data after x days)

I Retention-size(delete old data if datamemory > x)

I Log-Compaction(replace old value to key withnew)

#atix #ossummit

Page 16: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Topic consumption

I topics are pulled! (no DoS)

I any existing data can be pulled

#atix #ossummit

Page 17: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Consumer Groups

I parallelism allowshigh throughput

I never more consumersthan partitions

I Kafka features exactly-once-semantics!

#atix #ossummit

Page 18: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Wait but who knows what’s read?

I Consumercommit theiroffset

I Upon failurere-processingpossible

#atix #ossummit

Page 19: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Replication

implemented on partition level

Source[3]

#atix #ossummit

Page 20: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

In and Out of Sync Replica

#atix #ossummit

Page 21: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Did somebody hear my message?

Producer decides if message was successfully sentConfiguration possibilities:

I as soon as sent

I as soon as received by first broker

I as soon as desired number of replica exist

#atix #ossummit

Page 22: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

ZooKeeper

I distributed, hierachical file system

I management of znodes()

I HA via ensemble (=ZooKeepercluster)

Source[4]

#atix #ossummit

Page 23: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Broker and ZooKeeper

I Brokers are stateless!

I Which Broker is alive?

I Broker communication?

I → ZooKeeper!

#atix #ossummit

Page 24: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Talk to Kafka - Kafka Connect

Source[7]

I I/O for Kakfa

I Connect with externalsystems

I Open Source byConfluent

#atix #ossummit

Page 25: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Talk to Kafka - Schema Registry

I define standards

I version and store them

I Open Source byConfluent

source: confluent

#atix #ossummit

Page 26: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

TV or Netflix?

source: confluent

I live filtering of topics

I KSQL!

I Open Source byConfluent

#atix #ossummit

Page 27: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Who likes Kafka?

I zalando - microservices

I Cisco Systems - security

I Airbnb - event pipeline

I Netflix (Monitoring!)

I The New York Times ( Kafka as data storage! Super awesome blogpost) [5][6]

I Audi - IoT

I Spotify

I Twitter

I Uber (Kafka = Backbone!!!)

I https://kafka.apache.org/powered-by

#atix #ossummit

Page 28: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Sources

1 https://www.informatik-aktuell.de/betrieb/verfuegbarkeit/apache-kafka-eine-schluesselplattform-fuer-hochskalierbare-systeme.html

2 https://thecattlecrew.net/2017/09/28/apache-kafka-im-detail-teil-1/ andhttps://thecattlecrew.net/2017/09/28/apache-kafka-im-detail-teil-2/

3 https://www.confluent.io/blog/hands-free-kafka-replication-a-lesson-in-operational-simplicity/

4 https://www.infoq.com/articles/apache-kafka

5 https://www.confluent.io/blog/okay-store-data-apache-kafka/

6 https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/

7 https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/

#atix #ossummit

Page 29: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Install Kafka with Docker/Ansible

I Run containers as services

I No SSL/SASL yet!

I have a look at playbooks and docker-compose files

I https://github.com/confluentinc/cp-ansible

I https://docs.confluent.io/current/installation/docker/docs/installa-tion/index.html

I Wurstmeister: https://github.com/wurstmeister/kafka-docker

#atix #ossummit

Page 30: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Single Components

---- name: Start zookeeper

docker_container:name: zookeeperimage: "{{ images.zookeeper }}:{{ versions.kafka }}"state: startedrestart_policy: unless-stoppedports:- "{{ ports.zookeeper.client }}:2181"- "{{ ports.zookeeper.peer }}:2888"- "{{ ports.zookeeper.leader }}:2181"

volumes:- "/zookeeper/data:/var/lib/zookeeper/data"- "/zookeeper/log:/var/lib/zookeeper/log"

env:ZOOKEEPER_SERVER_ID: "{{ zookeeper_server_id }}"ZOOKEEPER_CLIENT_PORT: "2181"ZOOKEEPER_SERVERS: "{{ lookup('template', 'sort_zookeeper.j2') }}"ZOOKEEPER_DATA_DIR: "/var/lib/zookeeper/data"ZOOKEEPER_LOG_DIR: "/var/lib/zookeeper/log"...

...#atix #ossummit

Page 31: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

{% for host in groups['zookeeper'] %}{% if inventory_hostname == hostvars[host]['inventory_hostname'] %}

0.0.0.0{% else %}

{{ hostvars[host]['ansible_default_ipv4']['address'] }}{% endif %}{% if not index_loop.last %}

;{% endif %}

{% endfor %}

#atix #ossummit

Page 32: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Check system health

---- name : "Check Zookeeper Health"

command : docker run --rm -it confluentinc/zookeeper cub zk-ready "{{ ansible_default_ipv4.address ~ ':2181' }}" 5register : outputuntil: output is successretries: 3

...

#atix #ossummit

Page 33: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Configure via REST/uri

---- name: create new topic

command: "{{ 'sudo docker run --rm confluentinc/cp-kafkakafka-topics --create' ... }}"

- name: get information of current topicuri:

url: "{{ restproxy_url ~ /topics/' + topic.name }}"register: result

...

#atix #ossummit

Page 34: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

whoami

Bernhard Hopfenmüller

IRC: Fobhepgithub.com/Fobhep twitter.com/fobhep

#atix #ossummit

Page 35: Apache Kafka - 'a system optimized for writing' · 2019-12-21 · whoarewe TheLinux&OpenSourceCompany Unterschleißheim@München over15years datacenterautomation,Linux Consulting,Engineering,Support,

Kafka vs MQ

I Kafka has no P2P model!

I Messages are Persistent!

I Topic Partitioning!

I Message Sequencing: for one partition (send order=received order)

I Message reading: Choose where to read, Rewind, no FIFO!

I Loadbalancing: automatic distribution easier with metadata

I HA and failover implemented very easily

#atix #ossummit