an introduction to apache kafka...2020/01/01 · apache kafka has quickly become popular, due...
TRANSCRIPT
IBM Event StreamsApache Kafka
© 2019 IBM Corporation
An introduction to Apache Kafka®This is event-streaming, not just messagingSubhajit Maitra
MQ & Event Streams Workshop
© 2019 IBM Corporation
87% of companies are transforming to be more customer-centric
Source: A commissioned study conducted by Forrester Consulting on behalf of IBM, September 2016
© 2019 IBM Corporation
Typical Event-driven Use Case | Customer Satisfaction
‘Zoom Air’ is a commercial airline
Re-accommodate passengers before they realize their journey has been disrupted
© 2019 IBM Corporation
Data-centric to event-centric
Source: Gartner May 2017, “CIO Challenge: Adopt Event-Centric IT for Digital Business Success ”
© 2019 IBM Corporation
App
Event Backbone
1
2 App
3
4
Building Blocks
1 :: Event Sources
2 :: Stream Processing
3 :: Event Archive
4 :: Notifications
Components of an event-streaming application
© 2019 IBM Corporation
Why is Apache Kafka so popular?
Decisions driven by data
Business Trends
Apps that derive insight from large
volumes of data
Apps that react to changing
events
Runs natively on cloud
Technology Trends
Immutable event stream historyEvent stream replay
Replication for HANaturally scales horizontally
Kafka arrived at the right time, captured mindshareamong developers and exploded in popularity
© 2019 IBM Corporation
EVENT STREAMING
Stream history Immutable dataScalable consumption
MESSAGE QUEUING
Two Styles of Messaging
�Request/replyTransient data
persistence Targeted reliable
delivery
© 2019 IBM Corporation
Stream history Immutable data Highly availableScalableScalableconsumption
Properties of the Event Backbone
© 2019 IBM Corporation
Apache Kafka is an open source, distributed streaming platform
Publish and subscribe to streams of events
Store events in durable way
Process streams of events asthey occur
© 2019 IBM Corporation
Brokers
© 2019 IBM Corporation
Partitions
© 2019 IBM Corporation
Replication
© 2019 IBM Corporation
Replication
© 2019 IBM Corporation
Replication
© 2019 IBM Corporation
Replication
© 2019 IBM Corporation
Producers
© 2019 IBM Corporation
Topics
0 1 2 3 4 5
Offset
© 2019 IBM Corporation
Topics and Keys
0 1 2 3
TOPIC
0 1 2 3
0 1 2 3
PARTITION 0
PARTITION 1
PARTITION 2
© 2019 IBM Corporation
Retention Period and Compaction
Retention set in time or size
Compacted topics are evolving data stores
log.retention.minutes = 3000log.retention.bytes = 1048
0key:aval:A1
1key:bval:B1
2key:aval:A2
3key:cval:C1
4key:cval:C2
5key:bval:B2
PARTITION 0
2key:aval:A2
4key:cval:C2
5key:bval:B2
PARTITION 0(REWRITTEN)
© 2019 IBM Corporation
Producers and Keys
0 1 2 3 4
TOPIC
0 1 2 3
0 1 2 3 4
PARTITION 0
PARTITION 1
PARTITION 2
4
© 2019 IBM Corporation
Producers and Keys
0 1 2 3 4 5
TOPIC
0 1 2 3
0 1 2 3 4 5
PARTITION 0
PARTITION 1
PARTITION 2
4 5
© 2019 IBM Corporation
Producers and Keys
0 1 2 3 4 5
TOPIC
0 1 2 3
0 1 2 3 4 5
PARTITION 0
PARTITION 1
PARTITION 2
6
4 5
© 2019 IBM Corporation
Producers and Keys
0 1 2 3 4 5
TOPIC
0 1 2 3
0 1 2 3 4 5
PARTITION 0
PARTITION 1
PARTITION 2
6 7
4 5
© 2019 IBM Corporation
Consumers
© 2019 IBM Corporation
Consumers
0 1 2 3 4 5 6
CONSUMER BOffset 5
CONSUMER AOffset 2
Consumer can choose how to commit offsets:
Commits might go faster than processing
Manual, asynchronous
Fairly safe, but could re-process messages
Safe, but slows down processing
Can group sending messages and committing offsets into transactions
Automatic
Manual, synchronous
Exactly once semantics
Primarily aimed at stream processing applications
A common pattern is to commit offsets on a timer
© 2019 IBM Corporation
Consumer Groups
© 2019 IBM Corporation
Consumer GroupsCONSUMER GROUP A
p0, offset 7
p1, offset 3
p2, offset 5
TOPIC
0 1 2 3
0 1 2 3 4 5
PARTITION 0
PARTITION 1
PARTITION 2
0 1 2 3 4 5 6 7
CONSUMER
CONSUMER
CONSUMER
CONSUMER GROUP B
p0, offset 7
p1, offset 3p2, offset 5
CONSUMER
CONSUMER
© 2019 IBM Corporation
Consumer GroupsCONSUMER GROUP A
p0, offset 7
p1, offset 3
p2, offset 5
TOPIC
0 1 2 3
0 1 2 3 4 5
PARTITION 0
PARTITION 1
PARTITION 2
0 1 2 3 4 5 6 7
CONSUMER
CONSUMER
CONSUMER
CONSUMER GROUP B
p0, offset 7
p1, offset 3p2, offset 5
CONSUMER
CONSUMER
© 2019 IBM Corporation
Consumer GroupsCONSUMER GROUP A
p0, offset 7
p1, offset 3
p2, offset 5
TOPIC
0 1 2 3
0 1 2 3 4 5
PARTITION 0
PARTITION 1
PARTITION 2
0 1 2 3 4 5 6 7
CONSUMER
CONSUMER
CONSUMER
CONSUMER GROUP B
p0, offset 7
p1, offset 3p2, offset 5
CONSUMER
CONSUMER
© 2019 IBM Corporation
Consumer GroupsCONSUMER GROUP A
p0, offset 7
p1, offset 3
p2, offset 5
TOPIC
0 1 2 3
0 1 2 3 4 5
PARTITION 0
PARTITION 1
PARTITION 2
0 1 2 3 4 5 6 7
CONSUMER
CONSUMER
CONSUMER
CONSUMER GROUP B
p0, offset 7
p1, offset 3p2, offset 5
CONSUMER
CONSUMER
© 2019 IBM Corporation
Kafka Streams
Client library for processing and analysing data stored in Kafka
Processing happens in the app
Supports per-record processing – no batching
my-input
FILTER
my-outputMAP
© 2019 IBM Corporation
Kafka Streams
foored
barorange
fooyellow
bingogreen
bingoblue
barpurpleINPUT TOPIC
KStream<String, String> source = builder.stream("my-input").filter((key,value) -> key.equals("bingo")).map((key,value) -> KeyValue.pair(key, value.toUpperCase())).to("my-output");
bingoGREEN
bingoBLUEOUTPUT TOPIC
© 2019 IBM Corporation
Kafka Connect – bridge to cloud-native apps
Event Backbone
© 2019 IBM Corporation
Kafka Connect
Over 80 connectorsHDFS
Elasticsearch
MySQL
JDBC
IBM MQ
MQTT
CoAP
+ many others
© 2019 IBM Corporation
© 2019 IBM Corporation
IBM Event StreamsApache Kafka
© 2019 IBM Corporation
Welcome to IBM Event Streams Apache Kafka® for the Enterprise
This is event-streaming, not just messagingSubhajit Maitra
MQ & Event Streams Workshop
© 2019 IBM Corporation
Key usage patternsDrive interactive applications
© 2019 IBM Corporation
Customer Loyalty
Application
Instant loyalty
balance alerts
Combine data from multiple
channels
Customer Loyalty
Application
Location
• Using Event Streams to bring the data closer to the customer loyalty application allows the application to provide a more responsive and interactiveexperience for customers
• Event Streams can bring data together from disparate sources, examples include location, weather to allow the customer loyalty application to behave in a more relevant way to the customer
Systems of Record
© 2019 IBM Corporation
Key usage patternsProvide data to applications while protecting your backend
© 2019 IBM Corporation
Protected
Systems of Record
IBM Event Streams
Scalable, High speed Multiple readers
Expand to multi-cloud
• IBM Event Streams allows you to emit streams of data from a backend system, providing a high speed decoupled buffer – allowing many readers to absorb the data
• Allows microservices to be developed, acting against the stream with no impact on critical systems of record
• Expand the pattern to multi-cloud – Event streams can be used to create a local buffer of data in each cloud environment
• Minimize on-prem to cloud data transfer • Give the fastest response for cloud applications
© 2019 IBM Corporation
Key usage patternsTake advantage of Machine Learning
© 2019 IBM Corporation
Feed data to batch
analytics
Progress towards real-time
• IBM Event Streams offers a great way to feed data warehouse/batch analytics
• Data is available whenever analytics processing is ready to process it
• Allows Data analytics to be taken offline without losing any data
• Data replay/history is always available
• Whilst insights from static data are great, the speed of Event Streams means that it’s possible to supply data to conduct analytics and respond in real time
• Take it further and apply machine learning to take action ahead of the competition
IBM Event Streams is fully supported Apache Kafka® with value-add capabilities
Trusted support
Unlocks core data
Enterprise-ready
Intuitive to use
57
© 2019 IBM Corporation
IBM Event Streams Delivers Differentiated Value
IBM offers Event Streams in several form factors:
• IBM has years of operational expertise running Apache Kafka for Enterprises • This experience has been embedded in the DNA of Event Streams
• Event Streams makes Kafka easy to run, manage & consume, reducing skill requirements and increasing speed of deployment for faster time to value
• Security integration simplifies Kafka access control using roles and policies
• IBM’s experience in enterprise-critical software has shaped features like geo-replication for Disaster Recovery & integration with IBM MQ, to give confidence deploying mission-critical workloads
• Support you can trust – IBM has decades of experience supporting the World’s toughest environments
58
IBM Public Cloud
IsolatedaaS
Multi-tenant
aaS
Private Cloud
Red Hat OpenShift
Container–native Software
Linux (Kubernetes
included)
X86_64 X86_64 & zLinux
© 2019 IBM Corporation
In 2015 IBM was the first vendorto offer a fully managed, Apache
Kafka cloud service
Benefit from IBM’s Kafka Expertise
59
Public Multi Tenant service
Dedicated Single Tenant service
IBM has years of experience running Apache Kafka across the globe
© 2019 IBM Corporation60
Easy to deploy• Kafka has many distinct components to
deploy, configure and coordinate for secure connectivity
• Container placement critical to ensure production-level availability
• Secured network traffic ingress
• Ensuring consistent and repeatable deployment
IBM Event Streams | Making Apache Kafka Intuitive and Easy
© 2019 IBM Corporation61
Visualisation of your topic data
Tools to boost productivity
IBM Event Streams | Making Apache Kafka Intuitive and Easy
IBM Event Streams
62
External monitoring toolsDatadog, Splunk, etc
IBM Event Streams | Integrates with External Monitoring Tools
© 2019 IBM Corporation
IBM Event Streams | Enterprise Grade Reliability
63
Integrated geo-replication for Disaster Recovery
© 2019 IBM Corporation
IBM MQ connects mission-critical Systems of Record, requiring transactional, once-only deliveryE.g. payment transactions
IBM Event Streams distributes and processes streams of events in real-time to intelligently engage with customers
E.g. alerts on spending patterns
TransactionsAlerts
Send push notifications
IBM MQ
IBM Event Streams | Integrates Seamlessly with IBM MQ
Unlock Events from Systems where Kafka Connectivity is a ProblemREST API for Inbound Data
HTTP
z/OS Connect
Applications
HTTP
HTTP
REST API
IBM Event Streams
66
© 2019 IBM Corporation
{"type": "record",
"name": "Aircraft_schema","namespace": "com.mycompany.schemas.aircraft",
"fields": [{
"name": "Aircraft_number","type": "string"
},{
"name": "GPS_coordinates","type": "string"
},{
"name": "Departure_time","type": "string"
},
Avro Schema
Schema Registry
App
Schema Registry
App
2. Validate & Serialize 6. Deserialize
3. Send 4. receive
© 2019 IBM Corporation
It’s easy to connect IBM MQ to Apache Kafka
IBM has created a pair of connectors, available as source code or as part of IBM Event Streams
Source connector
From MQ queue to Kafka topic
https://github.com/ibm-messaging/kafka-connect-mq-source
Sink connector
From Kafka topic to MQ queue
https://github.com/ibm-messaging/kafka-connect-mq-sink
Fully supported by IBM for customers with support entitlement for IBM Event Streams
© 2019 IBM Corporation
MQ sink connector
Converter MessageBuilder
TOPIC: TO.MQ SinkRecord
Value(may be complex)
Schema
Kafka record
Valuebyte[]
Key
MQ message
Payload
MQMD
(MQRFH2)
MQ SINKCONNECTOR
QUEUE: FROM.KAFKA
© 2019 IBM Corporation
MQ source connector
RecordBuilder Converter
TOPIC: FROM.MQSourceRecord
Value(may be complex)
Schema
MQ message Kafka record
Key optional(MsgId/CorrelId)
MQ SOURCECONNECTOR
QUEUE: TO.KAFKA
Valuebyte[]
Payload
MQMD
(MQRFH2)
© 2019 IBM Corporation
Connecting IBM MQ to Apache Kafka
The connectors are deployed into a Kafka Connect runtimeThis runs between IBM MQ and Apache Kafka
CLIENT
IBM MQ
QUEUE: TO.KAFKA
QUEUE: FROM.KAFKA
Kafka Connect worker
TOPIC: FROM.MQ
Kafka Connect worker
MQ SINK CONNECTOR
TOPIC: TO.MQ
MQ SOURCE CONNECTOR
CLIENT
© 2019 IBM Corporation
IBM MQ provides support for the Kafka Connect workers to be deployed onto z/OS Unix System Services using bindings connections to MQ
Running Kafka Connect on a mainframe
BINDINGS
IBM MQ Advancedfor z/OS VUE
QUEUE: TO.KAFKA
QUEUE: FROM.KAFKA
Kafka Connect worker
TOPIC: FROM.MQ
Kafka Connect worker
MQ SINK CONNECTOR
TOPIC: TO.MQ
MQ SOURCE CONNECTOR
BINDINGS
Unix System Services
© 2019 IBM Corporation
Advantages of running the MQ connector on z/OS
74
1) Lower workload costs- Local bindings are 3x less CPU intensive than client bindings- Use of bindings mode reduces latency as removes one network hop, important for real-time analytics use cases
2) Better performance in bindings mode
3) Kafka Connect on z/OS is offloadable- This is pure Java-based workload, and so is eligible for offload to zIIPS
4) Simplified configuration- One less set of channel and TLS configuration required