a gentle introduction to storm and kafka
TRANSCRIPT
![Page 1: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/1.jpg)
The Leader in Big Data Consulting
![Page 2: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/2.jpg)
www.mammothdata.com | @mammothdataco
A Gentle Introduction of Kafka and Storm
{Percona University | Raleigh}
![Page 3: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/3.jpg)
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
Open Software Integrators
Open Software Integrators is a Big Data consulting and services company specializing in Hadoop, Cassandra, MongoDB and other NoSQL technologies. OSI focuses on executive strategy, initial install, design and implementation.
Founded January 2008 by Andrew C. Oliver
Based in downtown Durham, NC
Partnered with Hortonworks, MongoDB, DataStax, Cloudera, Couchbase, Cloudbees & Neo Technology
![Page 4: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/4.jpg)
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
A Gentle Introduction
What Kafka and Storm are?What they can be used for?What they excel at?
![Page 5: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/5.jpg)
www.mammothdata.com | @mammothdataco
Kafka
Kafka and Storm
![Page 6: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/6.jpg)
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
What is Apache Kafka?
Kafka is a distributed, partitioned, replicated commit log service.
![Page 7: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/7.jpg)
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
The Commit Log
An append-only, immutable sequence of records ordered by time.
firstrecord
next writtenrecord
![Page 8: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/8.jpg)
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
Kafka is:
● fast● durable● distributed● scalable
![Page 9: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/9.jpg)
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
Kafka Abstractions
● Topic: feeds of messages in categories● Broker: a host running Kafka● Producer: a process that publishes messages● Consumer: a process that pulls messages● Partition: portion of a topic’s stream of messages
![Page 10: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/10.jpg)
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
What Kafka is used for:
Enterprise-grade event streaming
![Page 11: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/11.jpg)
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
What Kafka is not good at:
Doing anything other than being a commit log.
![Page 12: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/12.jpg)
www.mammothdata.com | @mammothdataco
Storm
Kafka and Storm
![Page 13: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/13.jpg)
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
What is Apache Storm?
Storm is a distributed, real time computation system
![Page 14: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/14.jpg)
www.mammothdata.com | @mammothdatacowww.mammothdata.com | @mammothdataco
Stream Processing
● AKA Event Sourcing ● Command and Query Responsibility Segregation● Complex Event Processing● etc.
Several process fail into the domain of stream processing.
![Page 15: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/15.jpg)
www.mammothdata.com | @mammothdataco
● Simple API● Guaranteed data processing● Fault tolerant● Scalable● Usable with any language
What Storm Does
![Page 16: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/16.jpg)
www.mammothdata.com | @mammothdataco
Three abstractions:● Spouts● Bolts● Topology
Storm Abstractions
SpoutSpout
BoltBoltBolt
Bolt
![Page 17: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/17.jpg)
www.mammothdata.com | @mammothdataco
Processes:● UI● Nimbus● Supervisor● Worker
Storm Processes
SupervisorWorker
Worker
SupervisorWorker
Worker
Zookeeper
Web UI Nimbus
![Page 18: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/18.jpg)
www.mammothdata.com | @mammothdataco
● Worker process● Executors● Tasks
Storm Parallelism Model
![Page 19: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/19.jpg)
www.mammothdata.com | @mammothdataco
Use Case: Security
Kafka and Storm
![Page 20: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/20.jpg)
www.mammothdata.com | @mammothdataco
Security customer analytics platform ● Pulling data from customer sites, ● Placed data in a SQL database ● Performing analysis to spot anomalous traffic ● Pushing results back to client to blocking traffic sources
Use Case: Security
![Page 21: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/21.jpg)
www.mammothdata.com | @mammothdataco
Original system mean turn around time: 4.5 hoursStorm / Kafka solution, maximum processing time:
2.6 seconds
Use Case: Security
![Page 22: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/22.jpg)
www.mammothdata.com | @mammothdataco
Thank You
Kafka and Storm
![Page 23: A Gentle Introduction To Storm And Kafka](https://reader034.vdocuments.mx/reader034/viewer/2022052308/58edbcf11a28abbe5e8b45f5/html5/thumbnails/23.jpg)
www.mammothdata.com | @mammothdataco
Kafka: http://kafka.apache.org/Storm: http://storm.apache.org/
Links