kafka, the "dialtone for data": building a self-service, scalable, streaming analytics...
TRANSCRIPT
© Copyright 2016 HomeAway, Inc.
Kafka: The “Dial Tone” for Data
HomeAwayThe world leader for vacation
rentals
> 1 million listings(and growing!)
Agenda
© Copyright 2016 HomeAway, Inc.
• Overview• The Problem• The Experiment• Results: Use Cases• Lessons Learned• Next Steps
© Copyright 2016 HomeAway, Inc.
Overview
Difference between Dinosaurs and Unicorns
© Copyright 2016 HomeAway, Inc.
In the old days: “Dial Tone” looked like this
© Copyright 2016 HomeAway, Inc.
ATDT
Today: Kafka is the modern “Dial Tone” for Data
© Copyright 2016 HomeAway, Inc.
Producer
Consumer
The Problem
© Copyright 2016 HomeAway, Inc.
The Problem
© Copyright 2016 HomeAway, Inc.
Our original problem/motivation
© Copyright 2016 HomeAway, Inc.
search head
indexer
indexerapp server forwarder
app server forwarder
1 TB/day ingress and growing!40,000 calls/sec
Also… Historical Analytic Pipeline was slow/expensive
© Copyright 2016 HomeAway, Inc.
app server
OLTP OLAP
analyticsETL
Fill the Lake! Alternatives
?
Problem: Fill Hadoop!
Problem Data Lake
© Copyright 2016 HomeAway, Inc.
What we wanted… the Big Idea
© Copyright 2016 HomeAway, Inc.
If you can log it… … you can analyze it!
How to build self-service?
© Copyright 2016 HomeAway, Inc.
Hypothesis: Use Kafka!
© Copyright 2016 HomeAway, Inc.
2 ms medianlatency
http://bit.ly/jay_on_logs the log
2 Million Events / Sec! (3 cheap machines)
http://goo.gl/pv5GoL “Benchmarking Apache
Kafka”
© Copyright 2016 HomeAway, Inc.
The Experiment
HACommonsLogging• KafkaAppender
Schema-on-read• KafkaAvroLogger
Schema-on-write
Experiment: Schema-on-Read, Schema-on-Write
Data Lake
© Copyright 2016 HomeAway, Inc.
SchemaRegistry
Architecture: Kafka + Camus = BigData Ingress
© Copyright 2016 HomeAway, Inc.
Camus
© Copyright 2016 HomeAway, Inc.
The Results
Use Cases: ITOA / SLA Reporting
© Copyright 2016 HomeAway, Inc.
Use Cases: ITOA / SLA Reporting
© Copyright 2016 HomeAway, Inc.
Use Cases: Fraud
© Copyright 2016 HomeAway, Inc.
Use Cases: Search + ClickStream
© Copyright 2016 HomeAway, Inc.
User Behavior
Search RequestsA/B Test
Readouts
Proctor
EDAP
Use Cases: Search + ClickStream
© Copyright 2016 HomeAway, Inc.
Use Cases: Traveler Segmentation
© Copyright 2016 HomeAway, Inc.
EDAP
Data Mode
l
Lessons Learned
© Copyright 2016 HomeAway, Inc.
Lesson #1: The Schema [registry] is Everything!
Data Lake
© Copyright 2016 HomeAway, Inc.
SchemaRegistry
• Decouples producers from consumers
• Enforces backwards compatibility
• Enables self-service / democratization
• SOT for schemas in the pipe
Lesson #2: A Kafka/SR governance module is helpful
Data Lake
© Copyright 2016 HomeAway, Inc.
• TURN OFF Auto Topic Creation!
• Need a place for developersto request topics• Retention Policy• Expected Load• Compaction• Partition Size / Partition Key• Owner• LTS Date
Lesson #3: Make it easy to do stream processing
© Copyright 2016 HomeAway, Inc.
SchemaRegistry
• samza-archetype• samza-job-deployer
• Will evaluate k-streams!!!!
http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple
Next Steps
© Copyright 2016 HomeAway, Inc.
Consistency : 3 types of Data
© Copyright 2016 HomeAway, Inc.
Event
Document
Transactional
Kafka Producer Spooling
© Copyright 2016 HomeAway, Inc.
Conclusion
© Copyright 2016 HomeAway, Inc.
Yesterday
© Copyright 2016 HomeAway, Inc.
Systems of Record
Today
© Copyright 2016 HomeAway, Inc.
Systems of Engagement
Tomorrow
© Copyright 2016 HomeAway, Inc.
Systems of Intelligence
Don’t be a dinosaur…
© Copyright 2016 HomeAway, Inc.
ATDT
Thank you
© Copyright 2016 HomeAway, Inc.
End of Presentation
© Copyright 2016 HomeAway, Inc.