an introduction to apache kafka...2020/01/01  · apache kafka has quickly become popular, due...

65
IBM Event Streams Apache Kafka © 2019 IBM Corporation An introduction to Apache Kafka® This is event-streaming, not just messaging Subhajit Maitra [email protected] MQ & Event Streams Workshop

Upload: others

Post on 26-Apr-2020

57 views

Category:

Documents


0 download

TRANSCRIPT

IBM Event StreamsApache Kafka

© 2019 IBM Corporation

An introduction to Apache Kafka®This is event-streaming, not just messagingSubhajit Maitra

[email protected]

MQ & Event Streams Workshop

© 2019 IBM Corporation

87% of companies are transforming to be more customer-centric

Source: A commissioned study conducted by Forrester Consulting on behalf of IBM, September 2016

Presenter
Presentation Notes
Every company everywhere know they need to continue to get faster and ensure they create a more personalized experience for our customers. This means thinking beyond a one touch, single customer experience. Establishing a personalized experience with the customer is about building a relationship based on the culmination of every interaction you have ever had with that customer. It is understanding them in context. It is knowing what they are doing when they are not interacting with you. Bottom line, it is knowing them so well that you can anticipate their needs and proactively meet them. You become so ingrained in their life that they don’t even know you are there.

© 2019 IBM Corporation

Typical Event-driven Use Case | Customer Satisfaction

‘Zoom Air’ is a commercial airline

Re-accommodate passengers before they realize their journey has been disrupted

Presenter
Presentation Notes
‘Zoom Air’ is a commercial airline� New initiative to improve outcome during unexpected delays by re-accommodating passengers before they realize their journey has been disrupted Goals: Reduce the expense of re-accommodating disrupted customers by 10% Reduce airport staff time spent re-routing disrupted passengers by 50% Increase airline Net Promoter Score

© 2019 IBM Corporation

Data-centric to event-centric

Source: Gartner May 2017, “CIO Challenge: Adopt Event-Centric IT for Digital Business Success ”

Presenter
Presentation Notes
Event driven solutions require different thinking Events form the nervous system of the digital business Application infrastructure needs to provide event stream processing capabilities and support emerging event-driven programming models This is an event-driven journey and will underpin the next generation of digital customer experiences

© 2019 IBM Corporation

Event-driven in practice

© 2019 IBM Corporation

App

Event Backbone

1

2 App

3

4

Building Blocks

1 :: Event Sources

2 :: Stream Processing

3 :: Event Archive

4 :: Notifications

Components of an event-streaming application

Presenter
Presentation Notes
Components of an event-streaming application Event sources e.g. sensors, web traffic (think back to abandoned baskets) or data in an existing database Ability to process streams of data in real time for analytics or machine learning Storing events for future reference Triggering notifications such as the Zoom Air example All connected by an event backbone -> this is where Apache Kafka fits

© 2019 IBM Corporation

Why is Apache Kafka so popular?

Decisions driven by data

Business Trends

Apps that derive insight from large

volumes of data

Apps that react to changing

events

Runs natively on cloud

Technology Trends

Immutable event stream historyEvent stream replay

Replication for HANaturally scales horizontally

Kafka arrived at the right time, captured mindshareamong developers and exploded in popularity

Presenter
Presentation Notes
Apache Kafka has quickly become popular, due to: - business trends – what we’ve already talked about - technology trends – moving to cloud native applications, requirement for apps that can come and go (microservices) Right place, right time

© 2019 IBM Corporation

EVENT STREAMING

Stream history Immutable dataScalable consumption

MESSAGE QUEUING

Two Styles of Messaging

�Request/replyTransient data

persistence Targeted reliable

delivery

Presenter
Presentation Notes
Often asked what is the difference between something like Kafka and well know messaging systems like IBM MQ

© 2019 IBM Corporation

Stream history Immutable data Highly availableScalableScalableconsumption

Properties of the Event Backbone

Presenter
Presentation Notes
As well as stream history, scalable consumption and immutable data also need system that can scale to handle high throughout and provide high availability

© 2019 IBM Corporation

Apache Kafka

© 2019 IBM Corporation

Apache Kafka is an open source, distributed streaming platform

Publish and subscribe to streams of events

Store events in durable way

Process streams of events asthey occur

Presenter
Presentation Notes
Open sourced distributed streaming platform, often being adopted as the “de-facto” event streaming technology

© 2019 IBM Corporation

© 2019 IBM Corporation

Brokers

Presenter
Presentation Notes
A Kafka cluster consists of a set of brokers. A cluster has a minimum of 3 brokers.

© 2019 IBM Corporation

Partitions

Presenter
Presentation Notes
Brokers hold topics (more on that later) A topic is made up of one or more partitions. By having multiple partitions distributed across the brokers, a topic can handle a large scale of traffic without overloading one broker.

© 2019 IBM Corporation

Replication

Presenter
Presentation Notes
In order to improve availability, each topic can be replicated onto multiple brokers. For each partition, one of the brokers is the leader, and the other brokers are the followers. Replication works by the followers repeatedly fetching messages from the leader. This is done automatically by Kafka. For production we recommend at least 3 replicas: you’ll see why in a minute.

© 2019 IBM Corporation

Replication

Presenter
Presentation Notes
Each topic of each partition will have a set of replicas, you can set this number and it is know as “replication factor”. Generally the leaders are spread across the brokers so the load is spread out.

© 2019 IBM Corporation

Replication

Presenter
Presentation Notes
Imagine a broker goes down, this means the leader of Topic A, partition 1 is offline

© 2019 IBM Corporation

Replication

Presenter
Presentation Notes
This will trigger leader election – a new leader is selected from the other available brokers. The means topic and partition remain available even though a broker is down – this is how Kafka provides availability. During normal operation you may need to restart brokers to apply config changes etc, this is why you want at least 3 brokers. If you are restarting broker 1 and broker 2 goes down, there is still a broker left standing.

© 2019 IBM Corporation

© 2019 IBM Corporation

Producers

Presenter
Presentation Notes
Producers A producer publishes messages to one or more topics.

© 2019 IBM Corporation

Topics

0 1 2 3 4 5

Offset

Presenter
Presentation Notes
Topics are an immutable sequence of records. Always append to the end and the numbers indicate offsets -> offset number will always increase

© 2019 IBM Corporation

Topics and Keys

0 1 2 3

TOPIC

0 1 2 3

0 1 2 3

PARTITION 0

PARTITION 1

PARTITION 2

Presenter
Presentation Notes
An individual record is made up of a key and a value.

© 2019 IBM Corporation

Retention Period and Compaction

Retention set in time or size

Compacted topics are evolving data stores

log.retention.minutes = 3000log.retention.bytes = 1048

0key:aval:A1

1key:bval:B1

2key:aval:A2

3key:cval:C1

4key:cval:C2

5key:bval:B2

PARTITION 0

2key:aval:A2

4key:cval:C2

5key:bval:B2

PARTITION 0(REWRITTEN)

Presenter
Presentation Notes
Records on topics are not kept forever. Kafka will remove old records. This can be either set to remove after a certain amount of time, or when the log reaches a certain size. Compaction is an alternative way to remove records When compaction is turned on Kafka will periodically remove all old records for a given key. So in the example here only the offsets 2,4 and 5 are kept since they represent the last record with that key

© 2019 IBM Corporation

Producers and Keys

0 1 2 3 4

TOPIC

0 1 2 3

0 1 2 3 4

PARTITION 0

PARTITION 1

PARTITION 2

4

Presenter
Presentation Notes
When no key is set, producers records will be appended in a round-robin fashion

© 2019 IBM Corporation

Producers and Keys

0 1 2 3 4 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

4 5

Presenter
Presentation Notes
An individual record is made up of a key and a value. When no key is set, producers records will be appended in a round-robin fashion

© 2019 IBM Corporation

Producers and Keys

0 1 2 3 4 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

6

4 5

Presenter
Presentation Notes
If a key is set, Kafka will always append records with that same key to the same partition

© 2019 IBM Corporation

Producers and Keys

0 1 2 3 4 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

6 7

4 5

Presenter
Presentation Notes
Kafka guarantees ordering on a topic in a particular partition Worth noting this is only true while the number of partitions remains static, if you add more partitions, Kafka might start putting records with the same key to a different partition -> so ordering is no longer guaranteed

© 2019 IBM Corporation

Consumers

Presenter
Presentation Notes
Consumers A consumer reads messages from one or more topics and processes them

© 2019 IBM Corporation

Consumers

0 1 2 3 4 5 6

CONSUMER BOffset 5

CONSUMER AOffset 2

Consumer can choose how to commit offsets:

Commits might go faster than processing

Manual, asynchronous

Fairly safe, but could re-process messages

Safe, but slows down processing

Can group sending messages and committing offsets into transactions

Automatic

Manual, synchronous

Exactly once semantics

Primarily aimed at stream processing applications

A common pattern is to commit offsets on a timer

Presenter
Presentation Notes
Consumers can start reading from any offset on a topic, they store the offset they got to in a special hidden topic in Kafka incase the consumer goes down and has to find out where it got to. Again lots of options for configuration, here are a couple that are worth being aware of.

© 2019 IBM Corporation

Consumer Groups

Presenter
Presentation Notes
To allow scalability of consumers, consumers are grouped into consumer groups. Consumer declare what group they are in using a group id

© 2019 IBM Corporation

Consumer GroupsCONSUMER GROUP A

p0, offset 7

p1, offset 3

p2, offset 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

0 1 2 3 4 5 6 7

CONSUMER

CONSUMER

CONSUMER

CONSUMER GROUP B

p0, offset 7

p1, offset 3p2, offset 5

CONSUMER

CONSUMER

Presenter
Presentation Notes
Suppose we have two consumer groups, each wanting to consume from the topic. Kafka will make sure that between all the consumers they will see every message on a topic.

© 2019 IBM Corporation

Consumer GroupsCONSUMER GROUP A

p0, offset 7

p1, offset 3

p2, offset 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

0 1 2 3 4 5 6 7

CONSUMER

CONSUMER

CONSUMER

CONSUMER GROUP B

p0, offset 7

p1, offset 3p2, offset 5

CONSUMER

CONSUMER

Presenter
Presentation Notes
So for consumer group A this means each consumer gets one partition

© 2019 IBM Corporation

Consumer GroupsCONSUMER GROUP A

p0, offset 7

p1, offset 3

p2, offset 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

0 1 2 3 4 5 6 7

CONSUMER

CONSUMER

CONSUMER

CONSUMER GROUP B

p0, offset 7

p1, offset 3p2, offset 5

CONSUMER

CONSUMER

Presenter
Presentation Notes
For consumer group B one consumer get one partition..

© 2019 IBM Corporation

Consumer GroupsCONSUMER GROUP A

p0, offset 7

p1, offset 3

p2, offset 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

0 1 2 3 4 5 6 7

CONSUMER

CONSUMER

CONSUMER

CONSUMER GROUP B

p0, offset 7

p1, offset 3p2, offset 5

CONSUMER

CONSUMER

Presenter
Presentation Notes
…and the other get two partitions

© 2019 IBM Corporation

© 2019 IBM Corporation

Kafka Streams

Client library for processing and analysing data stored in Kafka

Processing happens in the app

Supports per-record processing – no batching

my-input

FILTER

my-outputMAP

Presenter
Presentation Notes
Streams: unbounded flow of facts - key, value pairs, they can have types Kafka Streams is an open source Java api No separate processing cluster required – processing happens in your app Supports per-record stream processing – no batching Stateless processing, stateful processing, windowing Elastic, highly scaleable, fault-tolerant Fully integrated with Kafka security Supports exactly once semantics

© 2019 IBM Corporation

Kafka Streams

foored

barorange

fooyellow

bingogreen

bingoblue

barpurpleINPUT TOPIC

KStream<String, String> source = builder.stream("my-input").filter((key,value) -> key.equals("bingo")).map((key,value) -> KeyValue.pair(key, value.toUpperCase())).to("my-output");

bingoGREEN

bingoBLUEOUTPUT TOPIC

Presenter
Presentation Notes
Example: Kafka streams app can take input of one or more topics, and can output 0 or more topics. e.g. take these records with foo, bar etc being keys and red, orange etc the values We filter to find all records with a key of bingo Then apply a map to uppercase the values Then push the records onto the output topic Can you guess what the records on the output topic will be?

© 2019 IBM Corporation

© 2019 IBM Corporation

Kafka Connect – bridge to cloud-native apps

Event Backbone

Presenter
Presentation Notes
Kafka Connect allows you to use Kafka to extract events from existing systems to power next generation of responsive, cloud-native applications You can do this with minimal changes to your existing system

© 2019 IBM Corporation

Kafka Connect

Over 80 connectorsHDFS

Elasticsearch

MySQL

JDBC

IBM MQ

MQTT

CoAP

+ many others

Presenter
Presentation Notes
Kafka connect is another open source Java api Useful for connecting Kafka to systems where you can’t run a Kafka client Many connectors available, often open source

© 2019 IBM Corporation

Presenter
Presentation Notes
This was an introduction to Apache Kafka, it is highly configurable and can be set up to your exact requirements so recommend you try it out for yourself.

© 2019 IBM Corporation

Presenter
Presentation Notes
IBM Event Streams - a fully supported Apache Kafka with value-add capabilities, e.g. geo replication to protect against losing your data if your datacenter goes down Runs on IBM Cloud Private which is a Kubernetes environment that provides many other IBM products

IBM Event StreamsApache Kafka

© 2019 IBM Corporation

Welcome to IBM Event Streams Apache Kafka® for the Enterprise

This is event-streaming, not just messagingSubhajit Maitra

[email protected]

MQ & Event Streams Workshop

© 2019 IBM Corporation

Key usage patternsDrive interactive applications

© 2019 IBM Corporation

Customer Loyalty

Application

Instant loyalty

balance alerts

Combine data from multiple

channels

Customer Loyalty

Application

Location

• Using Event Streams to bring the data closer to the customer loyalty application allows the application to provide a more responsive and interactiveexperience for customers

• Event Streams can bring data together from disparate sources, examples include location, weather to allow the customer loyalty application to behave in a more relevant way to the customer

Systems of Record

Presenter
Presentation Notes
Top: Clients are looking to improve their interactions with their users – one common approach is to use customer loyalty applications that can give a more responsive, interactive experience for their customers. In this pattern they are able to offer instant alerts and balance updates to customers via an app without increasing the load on the backend systems. American Airlines are a reference customer following this pattern. We can progress that scenario by incorporating additional streams of data – perhaps location data allowing personalised offers to be presented to customers as they enter the store. Bringing the data together in a rapidly accessible way enables a new generation of applications. A railway in Europe are taking this approach to combine data from different sources to better serve their customers.

© 2019 IBM Corporation

Key usage patternsProvide data to applications while protecting your backend

© 2019 IBM Corporation

Protected

Systems of Record

IBM Event Streams

Scalable, High speed Multiple readers

Expand to multi-cloud

• IBM Event Streams allows you to emit streams of data from a backend system, providing a high speed decoupled buffer – allowing many readers to absorb the data

• Allows microservices to be developed, acting against the stream with no impact on critical systems of record

• Expand the pattern to multi-cloud – Event streams can be used to create a local buffer of data in each cloud environment

• Minimize on-prem to cloud data transfer • Give the fastest response for cloud applications

Presenter
Presentation Notes
3 typical usage patterns/starting points over 3 charts across simple cases at the top of each chart then moving down takes each forward to the next level. Top: One of the most common usage patterns for Kafka/Event Streams. Multiple Innovation/Application/Microservice teams are demanding data from the systems of record, often hosted on IBM z. The teams running the systems of record need to ensure that they are stable, but this new demand is unpredictable, and often not well defined at the outset. Kafka/Event Streams offers a way to expose the data so that it can be consumed by multiple readers at their own pace. The load on the backend systems is well defined and predicable. Kafka offers the data as a persistent read-only buffer or cache of the data which is continually refreshed and updated. Example customers include JPMC and M&S who have followed this pattern. Progress that scenario to the next level by pushing that data out to multiple consumers on multiple clouds. Allowing each set of consumers on each cloud to have it’s own buffer or stream of the data allows them to have a consistent, rapid read of the data, whilst minimising the amount of data transferred. Maersk are taking this approach.

© 2019 IBM Corporation

Key usage patternsTake advantage of Machine Learning

© 2019 IBM Corporation

Feed data to batch

analytics

Progress towards real-time

• IBM Event Streams offers a great way to feed data warehouse/batch analytics

• Data is available whenever analytics processing is ready to process it

• Allows Data analytics to be taken offline without losing any data

• Data replay/history is always available

• Whilst insights from static data are great, the speed of Event Streams means that it’s possible to supply data to conduct analytics and respond in real time

• Take it further and apply machine learning to take action ahead of the competition

Presenter
Presentation Notes
Top: Taking advantage of machine learning: begin by feeding data to batch analytics and data warehousing for after hours insights. Then progress to get near real-time interactions allowing insights to be acted upon immediately.

IBM Event Streams is fully supported Apache Kafka® with value-add capabilities

Trusted support

Unlocks core data

Enterprise-ready

Intuitive to use

57

© 2019 IBM Corporation

IBM Event Streams Delivers Differentiated Value

IBM offers Event Streams in several form factors:

• IBM has years of operational expertise running Apache Kafka for Enterprises • This experience has been embedded in the DNA of Event Streams

• Event Streams makes Kafka easy to run, manage & consume, reducing skill requirements and increasing speed of deployment for faster time to value

• Security integration simplifies Kafka access control using roles and policies

• IBM’s experience in enterprise-critical software has shaped features like geo-replication for Disaster Recovery & integration with IBM MQ, to give confidence deploying mission-critical workloads

• Support you can trust – IBM has decades of experience supporting the World’s toughest environments

58

IBM Public Cloud

IsolatedaaS

Multi-tenant

aaS

Private Cloud

Red Hat OpenShift

Container–native Software

Linux (Kubernetes

included)

X86_64 X86_64 & zLinux

© 2019 IBM Corporation

In 2015 IBM was the first vendorto offer a fully managed, Apache

Kafka cloud service

Benefit from IBM’s Kafka Expertise

59

Public Multi Tenant service

Dedicated Single Tenant service

IBM has years of experience running Apache Kafka across the globe

© 2019 IBM Corporation60

Easy to deploy• Kafka has many distinct components to

deploy, configure and coordinate for secure connectivity

• Container placement critical to ensure production-level availability

• Secured network traffic ingress

• Ensuring consistent and repeatable deployment

IBM Event Streams | Making Apache Kafka Intuitive and Easy

© 2019 IBM Corporation61

Visualisation of your topic data

Tools to boost productivity

IBM Event Streams | Making Apache Kafka Intuitive and Easy

IBM Event Streams

62

External monitoring toolsDatadog, Splunk, etc

IBM Event Streams | Integrates with External Monitoring Tools

© 2019 IBM Corporation

IBM Event Streams | Enterprise Grade Reliability

63

Integrated geo-replication for Disaster Recovery

© 2019 IBM Corporation

Geo-Replication is Effortless to Configure

64

1 2 3

© 2019 IBM Corporation

IBM MQ connects mission-critical Systems of Record, requiring transactional, once-only deliveryE.g. payment transactions

IBM Event Streams distributes and processes streams of events in real-time to intelligently engage with customers

E.g. alerts on spending patterns

TransactionsAlerts

Send push notifications

IBM MQ

IBM Event Streams | Integrates Seamlessly with IBM MQ

Unlock Events from Systems where Kafka Connectivity is a ProblemREST API for Inbound Data

HTTP

z/OS Connect

Applications

HTTP

HTTP

REST API

IBM Event Streams

66

© 2019 IBM Corporation

{"type": "record",

"name": "Aircraft_schema","namespace": "com.mycompany.schemas.aircraft",

"fields": [{

"name": "Aircraft_number","type": "string"

},{

"name": "GPS_coordinates","type": "string"

},{

"name": "Departure_time","type": "string"

},

Avro Schema

Schema Registry

App

Schema Registry

App

2. Validate & Serialize 6. Deserialize

3. Send 4. receive

Presenter
Presentation Notes
In Event Streams 2019.2.1 we introduced a Schema Registry. This is an additional component, fully integrated into the Event Streams experience that provides a place to store schemas so your Kafka applications can access them at run time. This is in support of two main use cases 1) Sending applications can validate the data they’re about to send against the schema BEFORE publishing it to the topic 2) You can use a scheme to automatically serialize data, making the transmission more efficient, and the receiving applications can use the schema to deserialize the same data automatically as it receives the message. Currently the Event Streams Schema Registry supports Apache Avro schemas, however we did not pick up the OSS schema registry because it was not the right foundation for how we see needs progressing. The two specific use cases we see emerging are We will need support for schema types beyond Avro, things like protobuff, JSON schema, etc. We needed to build a foundation with this future in mind. Schemas need to be Globally identifiable. We see many patterns where Kafka clusters need to be connected together, and messages need to be moved between them. As messages are moved between clusters, the schema they refer to needs to make sense to the new cluster. Therefore we have started from a basis that supports unique identifiers.

© 2019 IBM Corporation

Connectors

68

Connector Catalog

© 2019 IBM Corporation

It’s easy to connect IBM MQ to Apache Kafka

IBM has created a pair of connectors, available as source code or as part of IBM Event Streams

Source connector

From MQ queue to Kafka topic

https://github.com/ibm-messaging/kafka-connect-mq-source

Sink connector

From Kafka topic to MQ queue

https://github.com/ibm-messaging/kafka-connect-mq-sink

Fully supported by IBM for customers with support entitlement for IBM Event Streams

© 2019 IBM Corporation

MQ sink connector

Converter MessageBuilder

TOPIC: TO.MQ SinkRecord

Value(may be complex)

Schema

Kafka record

Valuebyte[]

Key

MQ message

Payload

MQMD

(MQRFH2)

MQ SINKCONNECTOR

QUEUE: FROM.KAFKA

© 2019 IBM Corporation

MQ source connector

RecordBuilder Converter

TOPIC: FROM.MQSourceRecord

Value(may be complex)

Schema

MQ message Kafka record

Key optional(MsgId/CorrelId)

MQ SOURCECONNECTOR

QUEUE: TO.KAFKA

Valuebyte[]

Payload

MQMD

(MQRFH2)

© 2019 IBM Corporation

Connecting IBM MQ to Apache Kafka

The connectors are deployed into a Kafka Connect runtimeThis runs between IBM MQ and Apache Kafka

CLIENT

IBM MQ

QUEUE: TO.KAFKA

QUEUE: FROM.KAFKA

Kafka Connect worker

TOPIC: FROM.MQ

Kafka Connect worker

MQ SINK CONNECTOR

TOPIC: TO.MQ

MQ SOURCE CONNECTOR

CLIENT

© 2019 IBM Corporation

IBM MQ provides support for the Kafka Connect workers to be deployed onto z/OS Unix System Services using bindings connections to MQ

Running Kafka Connect on a mainframe

BINDINGS

IBM MQ Advancedfor z/OS VUE

QUEUE: TO.KAFKA

QUEUE: FROM.KAFKA

Kafka Connect worker

TOPIC: FROM.MQ

Kafka Connect worker

MQ SINK CONNECTOR

TOPIC: TO.MQ

MQ SOURCE CONNECTOR

BINDINGS

Unix System Services

© 2019 IBM Corporation

Advantages of running the MQ connector on z/OS

74

1) Lower workload costs- Local bindings are 3x less CPU intensive than client bindings- Use of bindings mode reduces latency as removes one network hop, important for real-time analytics use cases

2) Better performance in bindings mode

3) Kafka Connect on z/OS is offloadable- This is pure Java-based workload, and so is eligible for offload to zIIPS

4) Simplified configuration- One less set of channel and TLS configuration required

Presenter
Presentation Notes
Great advantages of running the connectors under unix system services on the mainframe.

© 2019 IBM Corporation

Thank you

IBM Event Streams: ibm.com/cloud/event-streams

https://slack-invite-ibm-cloud-tech.mybluemix.net/

Subhajit MaitraConsulting IT Specialist