Download - MyHeritage Kakfa use cases - Feb 2014 Meetup
![Page 1: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/1.jpg)
MyHeritage and Kafka
Author: Ran LevyFeb 2014
![Page 2: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/2.jpg)
• MyHeritage use cases
• Possible solutions
• Kafka overview
• Actual implementation @MyHeritage
• Summary
Agenda
![Page 3: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/3.jpg)
• Two major use case:
– Indexing to SuperSearch and Record Matching.
– Stats reporting to BI.
Use cases
![Page 4: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/4.jpg)
• Indexing to SuperSearch and Record Matching
Use case 1
![Page 5: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/5.jpg)
• Custom and non-scalable solution that involved changes processing and updating SuperSearch (SOLR over Lucene).
• Required solution should support:– Continuous mode.– High throughput.– Scaling up. – Repeating the process from some point.– Guaranteed order of processed items.– Reliable.– Multiple consumers.
Use case 1 – con’t
![Page 6: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/6.jpg)
• Statistics reporting to BI system
Use case 2
![Page 7: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/7.jpg)
• Required solution should support:
• High scale (~500GB of data / day).• Scale up – few hundred millions per day.• Repeating the process from some point.• Multiple consumers.
Use case 2 – con’t
![Page 8: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/8.jpg)
MyHeritage use cases
• Possible solutions
• Kafka overview
• Actual implementation @MyHeritage
• Summary
Agenda
![Page 9: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/9.jpg)
• So what we have considered ….– DB
• Queues
Possible Solutions
![Page 10: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/10.jpg)
• Key point about queues
– Messages are deleted after consumed.– Messages are duplicated to support multiple readers.
Possible Solutions
![Page 11: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/11.jpg)
MyHeritage use cases
Possible solutions
• Kafka overview
• Actual implementation @MyHeritage
• Summary
Agenda
![Page 12: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/12.jpg)
• A high throughput distributed messaging system
– Fast– Scalable– Durable– Distributed by design– Simplicity (over functionality)
Kafka Overview
![Page 13: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/13.jpg)
• Fast (very fast) – both for producer and consumer
Kafka Overview
Reference: http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
![Page 14: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/14.jpg)
• Main entities– Producer – push data.– Consumer – pull data.– Brokers – load balance producers by partition.– Topic – feeds of messages belongs to the same logical category.
Kafka Overview
![Page 15: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/15.jpg)
• Communication between the clients and the servers is done with a simple, high-performance TCP protocol.
• For each topic, the Kafka cluster maintains a partitioned log which is a commit-log (appends only).
Kafka Overview – some internals
![Page 16: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/16.jpg)
• Messages stay on disk when consumed, deleted after defined TTL.
• The partitions of the log are distributed over the servers in the Kafka cluster with each server handling data and requests for a share of the partitions.
• Each partition is replicated across a configurable number of servers for fault tolerance.
Kafka Overview – some internals
![Page 17: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/17.jpg)
MyHeritage use cases
Possible solutions
Kafka overview
• Actual implementation @MyHeritage
• Summary
Agenda
![Page 18: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/18.jpg)
High Level Overview
Broker 1
Family Tree changes Topic
part 1
part 2
part 32
Indexing
Consumers
RecordMatching
Logstash reader
Web
Producers
Daemons
Face recog.
Activity Topic
part 1
part 2
part 32
DRBD replica
Of Broker2
Broker 2
Family Tree changes Topic
part 1
part 2
part 32
Activity Topic
part 1
part 2
part 32
DRBD replica
Of Broker1
… ………
…
![Page 19: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/19.jpg)
Kafka @Myheritage - producers
App ModuleApp
ModuleApp Module
Events System
Dispatch event
Subscriber
Subscriber
EventLoggerSubscriber
Notify
Notify
Notify
ILogWrite
ActivityManager
Dispatch
event
![Page 20: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/20.jpg)
Kafka @Myheritage - producers
KafkaWriter
Topic
BrokersConfig
ISelector
ISerializer
ILogger
IStats
![Page 21: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/21.jpg)
Kafka @Myheritage - producers
App ModuleApp
ModuleApp Module
Events System
Dispatch event
Subscriber
Subscriber
EventLoggerSubscriber
Notify
Notify
Notify
KafkaWriter
BrokerBroker
Attempt 1st broker(if failed) Attempt 2nd broker
![Page 22: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/22.jpg)
Kafka @Myheritage – Consumers (Indexing)
EventProcessor
1 Per consumer type, reader per
partition
Broker 2
Broker 1
EventProcessorEventProcessor
Fetch event from part<x>, offset <z>
Fetch event from part<x>, offset <z>
IndexingQueue
IndexingWorkersIndexingWorkers
IndexingWorkers
Fetch work
SOLRUpdate item
KafkaWatermark
Get/update watermark
Add event to queue
![Page 23: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/23.jpg)
MyHeritage use cases
Possible solutions
Kafka overview
Actual implementation @MyHeritage
• Summary
Agenda
![Page 24: MyHeritage Kakfa use cases - Feb 2014 Meetup](https://reader038.vdocuments.mx/reader038/viewer/2022110109/54c6dd9a4a7959aa138b45b7/html5/thumbnails/24.jpg)
Kafka is very fast and scalable system, that is extensively used at MyHeritage, and you would want to consider it for high scale systems you
are using.
Summary