real time analytics with kafka and sparkstreaming
TRANSCRIPT
![Page 1: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/1.jpg)
1© Cloudera, Inc. All rights reserved.
Real-time analytics with Kafka and Spark StreamingAshish Singh | Software Engineer, Cloudera
![Page 2: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/2.jpg)
2© Cloudera, Inc. All rights reserved.
It’s Real-Time timeWhy now? Complex Event Processing (CEP) is not a new concept.
![Page 3: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/3.jpg)
3© Cloudera, Inc. All rights reserved.
It’s Real-Time time
Emergence of Real-Time
Stream Processing
Exponential growth in
continuous data streams
Open Source tools for
reliable high-throughput low latency event
queuing and processing
Tools run on “Commodity”
Hardware
Why now? Complex Event Processing (CEP) is not a new concept.
![Page 4: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/4.jpg)
4© Cloudera, Inc. All rights reserved.
It’s happening! …Across IndustriesCredit Card& MonetaryTransactionsIdentifyfraudulent transactions as soon as they occur.
Transportation& Logistics• Real-timetrafficconditions• Trackingfleet and cargo locations and dynamic re-routing to meet SLAs
Retail• Real-timein-storeOffers and Recommendations.• Email andmarketing campaigns based on real-time social trends
Consumer Internet, Mobile & E-Commerce
Optimize userengagement basedon user’s currentbehavior. Deliver recommendations relevant “in the moment”
HealthcareContinuouslymonitor patientvital stats and proactively identifyat-risk patients.
Manufacturing• Identifyequipmentfailures and react instantly• Perform proactivemaintenance.• Identify product quality defects immediately to prevent resource wastage.
Security & Surveillance
Identifythreatsand intrusions, both digital and physical, in real-time.
Digital Advertising& MarketingOptimize and personalize digital ads based on real-time information.
![Page 5: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/5.jpg)
5© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
![Page 6: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/6.jpg)
6© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
![Page 7: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/7.jpg)
7© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
FilterEnrich
TransformStats on Sliding Windows
Stream JoinsFeature Engineering Predictive Analytics
Active Model Training....
And combinations of the above
![Page 8: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/8.jpg)
8© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
HDFS
![Page 9: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/9.jpg)
9© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
NoSql
HDFS
![Page 10: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/10.jpg)
10© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
NoSql
HDFS
![Page 11: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/11.jpg)
11© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
NoSql
HDFS
![Page 12: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/12.jpg)
12© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
NoSql
HDFS
![Page 13: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/13.jpg)
13© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
NoSql
HDFS
Kafka
![Page 14: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/14.jpg)
14© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
NoSql
HDFS
Kafka ...
![Page 15: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/15.jpg)
15© Cloudera, Inc. All rights reserved.
Too much?
15© Cloudera, Inc. All rights reserved.
![Page 16: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/16.jpg)
16© Cloudera, Inc. All rights reserved.
Example application to demonstrate how real time analytics can be done using Kafka and Spark Streaming
Pankh
https://github.com/SinghAsDev/pankh
![Page 17: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/17.jpg)
17© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
DataSources
![Page 18: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/18.jpg)
18© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
DataSources
Kafka
![Page 19: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/19.jpg)
19© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
DataSources
Kafka
![Page 20: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/20.jpg)
20© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
DataSources
Kafka
NoSql
![Page 21: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/21.jpg)
21© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
DataSources
Kafka
NoSql
![Page 22: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/22.jpg)
22© Cloudera, Inc. All rights reserved.
Demo Time
22© Cloudera, Inc. All rights reserved.
![Page 23: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/23.jpg)
25© Cloudera, Inc. All rights reserved.
Kappa Architecture
![Page 24: Real time analytics with Kafka and SparkStreaming](https://reader035.vdocuments.mx/reader035/viewer/2022081506/55b6d3dbbb61eb93418b48d8/html5/thumbnails/24.jpg)
26© Cloudera, Inc. All rights reserved.
Demo Time
26© Cloudera, Inc. All rights reserved.