streaming data and stream processing with apache kafka
TRANSCRIPT
![Page 1: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/1.jpg)
11
Streaming Data and Stream Processing with Apache Kafka™
David Tucker, Director of Partner Engineering, Confluent
Sid Goel, Partner and Solution Architect, KPI Partners
![Page 2: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/2.jpg)
33
The opportunity: The shift to streams & digital transformation
By 2020, 70% of organizations will adopt data streaming to enable real-time analytics.
- Gartner | Nov 2016
Streaming ingestion and analytics will become a must-have for digital winners.
- Forrester | Nov.
2015
![Page 3: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/3.jpg)
44
More Facts & Figures
90% of CEO’s believe the digital economy will have a major impact on their industry.
- MIT Sloan / Capgemini (2013)
#1 most important capability executives hope to improve via digital transformation: Ability to support real-time transactions.
- The Economist (2015)
Digital disruptors will displace 40% of incumbent companies over the next 5 years.
- Center for Digital Transformation (2015)
![Page 4: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/4.jpg)
55
Vision of a Streaming Enterprise
Search
NewSQL / NoSQL
RDBMS Monitoring
Document StoreReal-time Analytics Data Warehouse
Mobile Apps
Legacy Apps
Hadoop
Streaming Platform
![Page 5: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/5.jpg)
66
What Can You Do with a Streaming Platform ?
• Publish and Subscribe to streams of data
• Analogous to traditional messaging systems
• Store streams of data
• Consumers can look back in time
• Process streams of data
• Analyze and correlate events in real time
![Page 6: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/6.jpg)
77
The typical architecture
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational Metrics
Data WarehouseApp
Databases
Storage
Interfaces
Monitoring App
Databases
Storage
Interfaces
![Page 7: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/7.jpg)
88
Challenges abound
Search Security
Fraud Detection Application
User Tracking Operational Logs Operational Metrics
HadoopData
WarehouseApp
Databases
Storage
Interfaces
Monitoring
App
Databases
Storage
Interfaces
Diverse data sets, arriving at
an increasing rate
Many complex
data pipelines
Require a separate cluster
for real-time
Difficult & time consuming
to change
Require mission critical
availability into most
recent/relevant data
Difficult to handle
massive amounts
of data
![Page 8: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/8.jpg)
99
Modernized architecture using Apache Kafka
Search Security
Fraud Detection Application
Streams API
App
Streams API
Monitoring
App Data
Warehouse
User Tracking Operational Logs Operational Metrics
![Page 9: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/9.jpg)
1010
Search Security
Fraud Detection Application
Streams API
App
Streams API
Monitoring
App Data
Warehouse
User Tracking Operational Logs Operational Metrics
Modernized architecture using Apache Kafka
Pub/sub to data streams,
alleviate back pressure
Lightweight, easy to modify
with minimal disruption
Decoupled from upstream
apps creating agility
Real-time, context specific
data in the moment
Handle any
volume of data
with ease Scale to meet demands of
diverse streams
![Page 10: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/10.jpg)
1111
Stream Data isThe Faster the Better
Stream Data can beBig or Fast (Lambda)
Stream Data will beBig AND Fast
(Kappa)
Our vision: from big data to stream data
Apache Kafka is the Enabling Technology of this Transition
Big Data wasThe More the Better
Valu
e o
f D
ata
Volume of Data
Valu
e o
f D
ata
Age of Data
Job 1 Job 2
Streams
Table 1 Table 2
DB
Speed Table Batch Table
DB
Streams Hadoop
![Page 11: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/11.jpg)
1212
Kafka Adoption in Large Enterprises Growing Rapidly
Travel Global Banks Insurance Telecom
6 of top 10 7 of top 10 8 of top 10 9 of top 10
Over 35% of the Fortune 500 are using Apache
Kafka™
![Page 12: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/12.jpg)
1313
Industries & Use Cases
Universal Use Cases: IoT, Data Pipelines, Microservices, Monitoring
Industry Use Cases
Financial Services Fraud Detection, Trade Data Capture, Customer 360
Retail Inventory Management, Product Catalog, A/B Testing, Proactive Alerts
Automotive Connected Car, Manufacturing Data Processing
Enterprise Tech Analytics, Security Operations, Collect Performance Data
Telecom Personalized Ad Placement, Customer 360, Network Integrity Systems
Entertainment/Media Log Delivery, Increase Ad Delivery Operations, Cross-Device Insights
Travel/ Leisure Visitor Segmentation, Fraud Detection
Consumer Tech Streaming Video, Personalized Customer Experience, Device Telemetry and Analytics
Healthcare Patient Monitoring, Pharma Substance control, Patient Relapse, Lab Results Alerts
![Page 13: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/13.jpg)
1515
Kafka Adoption Across Key Companies
Financial Services Enterprise Tech Consumer Tech
Entertainment & Media Telecom Retail Travel & Leisure
![Page 14: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/14.jpg)
1616
Confluent Enterprise
The only enterprise streaming platform
based entirely on Apache KafkaTM
![Page 15: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/15.jpg)
1717
Confluent Platform: Enterprise Streaming based on Apache Kafka™
Database
ChangesLog Events loT Data
Web
Events…
CRM
Data Warehouse
Database
Hadoop
Data
Integration
…
Monitoring
Analytics
Custom Apps
Transformations
Real-time
Applications
…
Apache Open Source Confluent Open Source Confluent Enterprise
Confluent Platform
Apache Kafka™
Data Compatibility
Monitoring & Administration
Operations
Clients Connectors
Complete Open Trusted Enterprise Grade
![Page 16: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/16.jpg)
1818
Feature Benefit Apache Kafka Confluent Open Source Confluent Enterprise
Apache KafkaHigh throughput, low latency, high availability, secure distributed streaming
platform
Kafka Connect API Advanced API for connecting external sources/destinations into Kafka
Kafka Streams APISimple library that enables streaming application development within the Kafka
framework
Additional Clients Supports non-Java clients; C, C++, Python, etc.
REST ProxyProvides universal access to Kafka from any network connected device via
HTTP
Schema RegistryCentral registry for the format of Kafka data – guarantees all data is always
consumable
Pre-Built ConnectorsHDFS, JDBC, elasticsearch and other connectors fully certified
and fully supported by Confluent
Confluent Control Center Enables easy connector management and stream monitoring
Auto Data Balancing Rebalancing data across cluster to remove bottlenecks
Replication Multi-datacenter replication simplifies and automates MDC Kafka clusters
SupportEnterprise class support to keep your Kafka environment running at top
performanceCommunity Community 24x7x365
Confluent Completes Kafka
![Page 17: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/17.jpg)
1919
How do I get streams of data
into and out of my apps?
Connect Clients REST
![Page 18: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/18.jpg)
2020
Apache KafkaTM Connect – Streaming Data Capture
JDBC
IRC / Twitter
CDC
Elastic
NoSQL
HDFS
Kafka Connect API
Kafka Pipeline
Connector
Connector
Connector
Connector
Connector
Connector
Sources Sinks
Fault tolerant
Manage hundreds of data sources and sinks
Preserves data schema
Part of Apache Kafka project
Integrated within Confluent Platform’s Control Center
![Page 19: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/19.jpg)
2121
Kafka Connect API, Part of the Apache KafkaTM Project
Connect any source to any target system with Apache Kafka
Integrated
• 100% compatible with Kafka v0.9 and higher
• Integrated with Confluent’s Schema Registry
• Easy to manage with Confluent Control Center
Flexible
• 40+ open source connectors available
• Easy to develop additional connectors
• Flexible support for data types and formats
Compatible
• Maintains critical metadata
• Preserves schema information
• Supports schema evolution
Reliable
• Automated failover
• At-least-once guaranteed
• Balances workload between nodes
![Page 20: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/20.jpg)
2222
Kafka Connect API Library of Connectors
* Denotes Connectors developed at Confluent and distributed by Confluent. Extensive validation and testing have been performed.
Databases
*
Analytics
*
Applications / Other
Datastore/File Store
*
*
![Page 21: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/21.jpg)
2323
New in Kafka 0.10.2: Single Message Transforms for Kafka Connect
Modify events before storing in Kafka:
• Mask sensitive information
• Add identifiers
• Tag events
• Store lineage
• Remove unnecessary columns
Modify events going out of Kafka:
• Route high priority events to faster data stores
• Direct events to different ElasticSearch indexes
• Cast data types to match destination
• Remove unnecessary columns
![Page 22: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/22.jpg)
2424
Kafka Clients
Ruby Proxy http/REST
Stdin/stdout
Apache Kafka Native Clients
Confluent Native Clients
Community Supported Clients
![Page 23: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/23.jpg)
2525
REST Proxy: Talking to Non-native Kafka Apps and Outside the Firewall
REST Proxy
Non-Java Applications
Native Kafka Java
Applications
Schema Registry
REST / HTTP
Simplifiesadministrative actions
Simplifies message creation and consumption
Provides a RESTful interface to a Kafka cluster
![Page 24: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/24.jpg)
2626
How do I maintain my data
formats and ensure compatibility?
![Page 25: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/25.jpg)
2727
The Challenge of Data Compatibility at Scale
App 1
App 2
App 3
Many sources without a policy causes mayhem in a centralized data pipeline
Ensuring downstream systems can use the data is key to an operational stream pipeline
Example: Date formats
Even within a single application, different formats can be presented
Incompatibly formatted message
![Page 26: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/26.jpg)
2828
Schema Registry
Elastic
Cassandra
HDFS
Example Consumers
SerializerApp 1
SerializerApp 2
!
Kafka Topic!
Schema
Registry
Define the expected fields for each Kafka topic
Automatically handle schema changes (e.g. new fields)
Prevent backwards incompatible changes
Supports multi-datacenter environments
![Page 27: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/27.jpg)
2929
How do I build stream
processing apps?
![Page 28: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/28.jpg)
3030
Kafka Streams API: the Easiest Way to Process Data in Apache Kafka™
Example Use Cases
• Microservices
• Large-scale continuous queries and transformations
• Event-triggered processes
• Reactive applications
• Customer 360-degree view, fraud detection, location-based marketing, smart electrical grids, fleet management, …
Key Benefits of Apache Kafka’s Streams API
• Build Apps, Not Clusters: no additional cluster required
• Elastic, highly-performant, distributed, fault-tolerant, secure
• Equally viable for small, medium, and large-scale use cases
• “Run Everywhere”: integrates with your existing deployment strategies such as containers, automation, cloud
Your App
Kafka
Streams
API
![Page 29: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/29.jpg)
3131
Architecture Example
Before: Complexity for development and operations, heavy footprint
1 2 3
Capture businessevents in Kafka
Must process events with separate,
special-purpose clusters
Write resultsback to Kafka
Your Processing Job
![Page 30: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/30.jpg)
3232
Architecture Example
With Kafka Streams: App-centric architecture that blends well into your existing infrastructure
1 2
3a
Capture businessevents in Kafka
Process events fast, reliably, securely with
standard Java applicationsWrite resultsback to Kafka
3b
Query latest results directly from
external apps
AppApp
Your App
Kafka
Streams API
![Page 31: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/31.jpg)
3333
New in Kafka 0.10.2 : Session windows in Kafka Streams API
Group events in a stream based on session windows
• Sessions are periods of activity terminated by agap of inactivity
• Purely time-based windows are incorrect for session-based data analysis
Input data
Colors representdifferent users event
Results
User sessions,grouped by event-time session windows
processing-time
event-time
session windowing
Alice
Bob
Dave
![Page 32: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/32.jpg)
3535
How do I synchronize and migrate data
to and from the cloud?
![Page 33: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/33.jpg)
3636
Before: Hybrid Cloud Environments Today
DC1
DB2
DB1
DWH
App2
App3
App4
KV2KV3
DB3
App2-v2
App5
App7
App1-v2
AWS
App8
DWH
App1
Challenges
• Each team/department
must execute their own cloud
migration
• May be moving the same data
multiple times
• Each box represented here
require development, testing,
deployment, monitoring and
maintenance
KV
![Page 34: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/34.jpg)
3737
DC1
After: Cloud Synchronization and Migrations with Confluent Platform
DB2
DB1
KV
DWH
App2
App4
KV2KV3
App2-v2
App5 App7
App1-v2
AWS
App8
DWH
App1K
afk
a
Ka
fka
App3
Benefits
• Continuous low-latency
synchronization
• Centralized manageability and
monitoring
– Track at event level data
produced in all data centers
• Security and governance– Track and control where data
comes from and who is
accessing it
• Cost Savings– Move Data Once
DB3
![Page 35: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/35.jpg)
3838
How do I manage and monitor
my streaming platform at scale?
![Page 36: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/36.jpg)
3939
What Does End-to-End Mean?
“Clocks and Cables” Monitoring
How fast is the throughput?
How many CPU cycles are we using?
End-to-End Monitoring
Did you
leave?
Did you
arrive?
![Page 37: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/37.jpg)
4040
Confluent Control Center: Cluster Health & Administration
Cluster health dashboard
• Monitor the health of your Kafka clustersand get alerts if any problems occur
• Measure system load, performance,and operations
• View aggregate statistics or drill downby broker or topic
Cluster administration
• Monitor topic configurations
![Page 38: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/38.jpg)
4141
Confluent Control Center: End-to-end Monitoring
See exactly where your messages are going in your Kafka cluster
![Page 39: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/39.jpg)
4242
Confluent Control Center: Connector Management
![Page 40: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/40.jpg)
4343
Confluent Control Center: Alerting
Alerts
• Configure alerts on incomplete data
delivery, high latency, Kafka connector
status, and more
• Manage alerts for different users and
applications from a web UI
• Manage alerts for different users and
applications from a web UI
User authentication
• Control access to Confluent Control
Center
• Integrates with existing enterprise
authentication systems
![Page 41: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/41.jpg)
4444
Auto Data Balancing
Dynamically move partitions to optimize resource utilization and reliability
• Easily add and remove nodes from your Kafka cluster
• Rack aware algorithm rebalances partitions across a cluster
• Traffic from balancer is throttled when datatransfer occurs
Befo
re
After
Rebalanc
e
![Page 42: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/42.jpg)
4545
Multi-Datacenter Replication
An easy reliable way to run Kafka across datacenters
Improve reliability
• Easily configure & maintain crosscluster replication
Simplify management
• Centralized configuration and monitoring
• Replicate entire cluster or a subset of topics
• Automatic replication of topic configuration
• Use Kafka’s SASL for Kerberos,Active Directory
• SSL encryption between datacenters
![Page 43: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/43.jpg)
4646
Get Started with Apache Kafka Today!
https://www.confluent.io/downloads/
THE place to start with Apache Kafka!
Thoroughly tested and
quality assured
More extensible developer
experience
Easy upgrade path to
Confluent Enterprise
![Page 44: Streaming Data and Stream Processing with Apache Kafka](https://reader031.vdocuments.mx/reader031/viewer/2022020119/5a660d037f8b9a99198b476b/html5/thumbnails/44.jpg)
4747
Thank You