real time analytics with kafka and sparkstreaming
TRANSCRIPT
1© Cloudera, Inc. All rights reserved.
Real-time analytics with Kafka and Spark StreamingAshish Singh | Software Engineer, Cloudera
2© Cloudera, Inc. All rights reserved.
It’s Real-Time timeWhy now? Complex Event Processing (CEP) is not a new concept.
3© Cloudera, Inc. All rights reserved.
It’s Real-Time time
Emergence of Real-Time
Stream Processing
Exponential growth in
continuous data streams
Open Source tools for
reliable high-throughput low latency event
queuing and processing
Tools run on “Commodity”
Hardware
Why now? Complex Event Processing (CEP) is not a new concept.
4© Cloudera, Inc. All rights reserved.
It’s happening! …Across IndustriesCredit Card& MonetaryTransactionsIdentifyfraudulent transactions as soon as they occur.
Transportation& Logistics• Real-timetrafficconditions• Trackingfleet and cargo locations and dynamic re-routing to meet SLAs
Retail• Real-timein-storeOffers and Recommendations.• Email andmarketing campaigns based on real-time social trends
Consumer Internet, Mobile & E-Commerce
Optimize userengagement basedon user’s currentbehavior. Deliver recommendations relevant “in the moment”
HealthcareContinuouslymonitor patientvital stats and proactively identifyat-risk patients.
Manufacturing• Identifyequipmentfailures and react instantly• Perform proactivemaintenance.• Identify product quality defects immediately to prevent resource wastage.
Security & Surveillance
Identifythreatsand intrusions, both digital and physical, in real-time.
Digital Advertising& MarketingOptimize and personalize digital ads based on real-time information.
5© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
6© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
7© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
FilterEnrich
TransformStats on Sliding Windows
Stream JoinsFeature Engineering Predictive Analytics
Active Model Training....
And combinations of the above
8© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
HDFS
9© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
NoSql
HDFS
10© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
NoSql
HDFS
11© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
NoSql
HDFS
12© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
NoSql
HDFS
13© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
NoSql
HDFS
Kafka
14© Cloudera, Inc. All rights reserved.
Canonical Stream Processing Architecture
DataSources
Kafka Flume
NoSql
HDFS
Kafka ...
15© Cloudera, Inc. All rights reserved.
Too much?
15© Cloudera, Inc. All rights reserved.
16© Cloudera, Inc. All rights reserved.
Example application to demonstrate how real time analytics can be done using Kafka and Spark Streaming
Pankh
https://github.com/SinghAsDev/pankh
17© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
DataSources
18© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
DataSources
Kafka
19© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
DataSources
Kafka
20© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
DataSources
Kafka
NoSql
21© Cloudera, Inc. All rights reserved.
Pankh – Building Pieces
DataSources
Kafka
NoSql
22© Cloudera, Inc. All rights reserved.
Demo Time
22© Cloudera, Inc. All rights reserved.
25© Cloudera, Inc. All rights reserved.
Kappa Architecture
26© Cloudera, Inc. All rights reserved.
Demo Time
26© Cloudera, Inc. All rights reserved.
27© Cloudera, Inc. All rights reserved.
Thank youAshish [email protected]@singhasdev