an introduction to apache kafka...2020/01/01 · apache kafka has quickly become popular, due...

$: An introduction to Apache Kafka...2020/01/01 · Apache Kafka has quickly become popular, due to:\爀戀甀猀椀渀攀猀猀琀爀攀渀搀猀†ጀ 眀栀愀琀眀攠ᤀ瘀攀$
IBM Event StreamsApache Kafka

© 2019 IBM Corporation

An introduction to Apache Kafka®This is event-streaming, not just messagingSubhajit Maitra

[email protected]

MQ & Event Streams Workshop


87% of companies are transforming to be more customer-centric

Source: A commissioned study conducted by Forrester Consulting on behalf of IBM, September 2016

Presenter

Presentation Notes

Every company everywhere know they need to continue to get faster and ensure they create a more personalized experience for our customers. This means thinking beyond a one touch, single customer experience. Establishing a personalized experience with the customer is about building a relationship based on the culmination of every interaction you have ever had with that customer. It is understanding them in context. It is knowing what they are doing when they are not interacting with you. Bottom line, it is knowing them so well that you can anticipate their needs and proactively meet them. You become so ingrained in their life that they don’t even know you are there.


Typical Event-driven Use Case | Customer Satisfaction

‘Zoom Air’ is a commercial airline

Re-accommodate passengers before they realize their journey has been disrupted

Presenter

Presentation Notes

‘Zoom Air’ is a commercial airline� New initiative to improve outcome during unexpected delays by re-accommodating passengers before they realize their journey has been disrupted Goals: Reduce the expense of re-accommodating disrupted customers by 10% Reduce airport staff time spent re-routing disrupted passengers by 50% Increase airline Net Promoter Score


Data-centric to event-centric

Source: Gartner May 2017, “CIO Challenge: Adopt Event-Centric IT for Digital Business Success ”

Presenter

Presentation Notes

Event driven solutions require different thinking Events form the nervous system of the digital business Application infrastructure needs to provide event stream processing capabilities and support emerging event-driven programming models This is an event-driven journey and will underpin the next generation of digital customer experiences


Event-driven in practice


App

Event Backbone

1

2 App

3

4

Building Blocks

1 :: Event Sources

2 :: Stream Processing

3 :: Event Archive

4 :: Notifications

Components of an event-streaming application

Presenter

Presentation Notes

Components of an event-streaming application Event sources e.g. sensors, web traffic (think back to abandoned baskets) or data in an existing database Ability to process streams of data in real time for analytics or machine learning Storing events for future reference Triggering notifications such as the Zoom Air example All connected by an event backbone -> this is where Apache Kafka fits


Why is Apache Kafka so popular?

Decisions driven by data

Business Trends

Apps that derive insight from large

volumes of data

Apps that react to changing

events

Runs natively on cloud

Technology Trends

Immutable event stream historyEvent stream replay

Replication for HANaturally scales horizontally

Kafka arrived at the right time, captured mindshareamong developers and exploded in popularity

Presenter

Presentation Notes

Apache Kafka has quickly become popular, due to: - business trends – what we’ve already talked about - technology trends – moving to cloud native applications, requirement for apps that can come and go (microservices) Right place, right time


EVENT STREAMING

Stream history Immutable dataScalable consumption

MESSAGE QUEUING

Two Styles of Messaging

�Request/replyTransient data

persistence Targeted reliable

delivery

Presenter

Presentation Notes

Often asked what is the difference between something like Kafka and well know messaging systems like IBM MQ


Stream history Immutable data Highly availableScalableScalableconsumption

Properties of the Event Backbone

Presenter

Presentation Notes

As well as stream history, scalable consumption and immutable data also need system that can scale to handle high throughout and provide high availability


Apache Kafka


Apache Kafka is an open source, distributed streaming platform

Publish and subscribe to streams of events

Store events in durable way

Process streams of events asthey occur

Presenter

Presentation Notes

Open sourced distributed streaming platform, often being adopted as the “de-facto” event streaming technology


Brokers

Presenter

Presentation Notes

A Kafka cluster consists of a set of brokers. A cluster has a minimum of 3 brokers.


Partitions

Presenter

Presentation Notes

Brokers hold topics (more on that later) A topic is made up of one or more partitions. By having multiple partitions distributed across the brokers, a topic can handle a large scale of traffic without overloading one broker.


Replication

Presenter

Presentation Notes

In order to improve availability, each topic can be replicated onto multiple brokers. For each partition, one of the brokers is the leader, and the other brokers are the followers. Replication works by the followers repeatedly fetching messages from the leader. This is done automatically by Kafka. For production we recommend at least 3 replicas: you’ll see why in a minute.


Replication

Presenter

Presentation Notes

Each topic of each partition will have a set of replicas, you can set this number and it is know as “replication factor”. Generally the leaders are spread across the brokers so the load is spread out.


Replication

Presenter

Presentation Notes

Imagine a broker goes down, this means the leader of Topic A, partition 1 is offline


Replication

Presenter

Presentation Notes

This will trigger leader election – a new leader is selected from the other available brokers. The means topic and partition remain available even though a broker is down – this is how Kafka provides availability. During normal operation you may need to restart brokers to apply config changes etc, this is why you want at least 3 brokers. If you are restarting broker 1 and broker 2 goes down, there is still a broker left standing.


Producers

Presenter

Presentation Notes

Producers A producer publishes messages to one or more topics.


Topics

0 1 2 3 4 5

Offset

Presenter

Presentation Notes

Topics are an immutable sequence of records. Always append to the end and the numbers indicate offsets -> offset number will always increase


Topics and Keys

0 1 2 3

TOPIC

0 1 2 3

0 1 2 3

PARTITION 0

PARTITION 1

PARTITION 2

Presenter

Presentation Notes

An individual record is made up of a key and a value.


Retention Period and Compaction

Retention set in time or size

Compacted topics are evolving data stores

log.retention.minutes = 3000log.retention.bytes = 1048

0key:aval:A1

1key:bval:B1

2key:aval:A2

3key:cval:C1

4key:cval:C2

5key:bval:B2

PARTITION 0

2key:aval:A2

4key:cval:C2

5key:bval:B2

PARTITION 0(REWRITTEN)

Presenter

Presentation Notes

Records on topics are not kept forever. Kafka will remove old records. This can be either set to remove after a certain amount of time, or when the log reaches a certain size. Compaction is an alternative way to remove records When compaction is turned on Kafka will periodically remove all old records for a given key. So in the example here only the offsets 2,4 and 5 are kept since they represent the last record with that key


Producers and Keys

0 1 2 3 4

TOPIC

0 1 2 3

0 1 2 3 4

PARTITION 0

PARTITION 1

PARTITION 2

4

Presenter

Presentation Notes

When no key is set, producers records will be appended in a round-robin fashion


Producers and Keys

0 1 2 3 4 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

4 5

Presenter

Presentation Notes

An individual record is made up of a key and a value. When no key is set, producers records will be appended in a round-robin fashion


Producers and Keys

0 1 2 3 4 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

6

4 5

Presenter

Presentation Notes

If a key is set, Kafka will always append records with that same key to the same partition


Producers and Keys

0 1 2 3 4 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

6 7

4 5

Presenter

Presentation Notes

Kafka guarantees ordering on a topic in a particular partition Worth noting this is only true while the number of partitions remains static, if you add more partitions, Kafka might start putting records with the same key to a different partition -> so ordering is no longer guaranteed


Consumers

Presenter

Presentation Notes

Consumers A consumer reads messages from one or more topics and processes them


Consumers

0 1 2 3 4 5 6

CONSUMER BOffset 5

CONSUMER AOffset 2

Consumer can choose how to commit offsets:

Commits might go faster than processing

Manual, asynchronous

Fairly safe, but could re-process messages

Safe, but slows down processing

Can group sending messages and committing offsets into transactions

Automatic

Manual, synchronous

Exactly once semantics

Primarily aimed at stream processing applications

A common pattern is to commit offsets on a timer

Presenter

Presentation Notes

Consumers can start reading from any offset on a topic, they store the offset they got to in a special hidden topic in Kafka incase the consumer goes down and has to find out where it got to. Again lots of options for configuration, here are a couple that are worth being aware of.


Consumer Groups

Presenter

Presentation Notes

To allow scalability of consumers, consumers are grouped into consumer groups. Consumer declare what group they are in using a group id


Consumer GroupsCONSUMER GROUP A

p0, offset 7

p1, offset 3

p2, offset 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

0 1 2 3 4 5 6 7

CONSUMER

CONSUMER

CONSUMER

CONSUMER GROUP B

p0, offset 7

p1, offset 3p2, offset 5

CONSUMER

CONSUMER

Presenter

Presentation Notes

Suppose we have two consumer groups, each wanting to consume from the topic. Kafka will make sure that between all the consumers they will see every message on a topic.



p0, offset 7

p1, offset 3

p2, offset 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

0 1 2 3 4 5 6 7

CONSUMER

CONSUMER

CONSUMER

CONSUMER GROUP B

p0, offset 7


CONSUMER

CONSUMER

Presenter

Presentation Notes

So for consumer group A this means each consumer gets one partition



p0, offset 7

p1, offset 3

p2, offset 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

0 1 2 3 4 5 6 7

CONSUMER

CONSUMER

CONSUMER

CONSUMER GROUP B

p0, offset 7


CONSUMER

CONSUMER

Presenter

Presentation Notes

For consumer group B one consumer get one partition..



p0, offset 7

p1, offset 3

p2, offset 5

TOPIC

0 1 2 3

0 1 2 3 4 5

PARTITION 0

PARTITION 1

PARTITION 2

0 1 2 3 4 5 6 7

CONSUMER

CONSUMER

CONSUMER

CONSUMER GROUP B

p0, offset 7


CONSUMER

CONSUMER

Presenter

Presentation Notes

…and the other get two partitions


Kafka Streams

Client library for processing and analysing data stored in Kafka

Processing happens in the app

Supports per-record processing – no batching

my-input

FILTER

my-outputMAP

Presenter

Presentation Notes

Streams: unbounded flow of facts - key, value pairs, they can have types Kafka Streams is an open source Java api No separate processing cluster required – processing happens in your app Supports per-record stream processing – no batching Stateless processing, stateful processing, windowing Elastic, highly scaleable, fault-tolerant Fully integrated with Kafka security Supports exactly once semantics


Kafka Streams

foored

barorange

fooyellow

bingogreen

bingoblue

barpurpleINPUT TOPIC

KStream<String, String> source = builder.stream("my-input").filter((key,value) -> key.equals("bingo")).map((key,value) -> KeyValue.pair(key, value.toUpperCase())).to("my-output");

bingoGREEN

bingoBLUEOUTPUT TOPIC

Presenter

Presentation Notes

Example: Kafka streams app can take input of one or more topics, and can output 0 or more topics. e.g. take these records with foo, bar etc being keys and red, orange etc the values We filter to find all records with a key of bingo Then apply a map to uppercase the values Then push the records onto the output topic Can you guess what the records on the output topic will be?


Kafka Connect – bridge to cloud-native apps

Event Backbone

Presenter

Presentation Notes

Kafka Connect allows you to use Kafka to extract events from existing systems to power next generation of responsive, cloud-native applications You can do this with minimal changes to your existing system


Kafka Connect

Over 80 connectorsHDFS

Elasticsearch

MySQL

JDBC

IBM MQ

MQTT

CoAP

+ many others

Presenter

Presentation Notes

Kafka connect is another open source Java api Useful for connecting Kafka to systems where you can’t run a Kafka client Many connectors available, often open source


Presenter

Presentation Notes

This was an introduction to Apache Kafka, it is highly configurable and can be set up to your exact requirements so recommend you try it out for yourself.


Presenter

Presentation Notes

IBM Event Streams - a fully supported Apache Kafka with value-add capabilities, e.g. geo replication to protect against losing your data if your datacenter goes down Runs on IBM Cloud Private which is a Kubernetes environment that provides many other IBM products

IBM Event StreamsApache Kafka


Welcome to IBM Event Streams Apache Kafka® for the Enterprise

This is event-streaming, not just messagingSubhajit Maitra

[email protected]

MQ & Event Streams Workshop


Key usage patternsDrive interactive applications


Customer Loyalty

Application

Instant loyalty

balance alerts

Combine data from multiple

channels

Customer Loyalty

Application

Location

• Using Event Streams to bring the data closer to the customer loyalty application allows the application to provide a more responsive and interactiveexperience for customers

• Event Streams can bring data together from disparate sources, examples include location, weather to allow the customer loyalty application to behave in a more relevant way to the customer

Systems of Record

Presenter

Presentation Notes

Top: Clients are looking to improve their interactions with their users – one common approach is to use customer loyalty applications that can give a more responsive, interactive experience for their customers. In this pattern they are able to offer instant alerts and balance updates to customers via an app without increasing the load on the backend systems. American Airlines are a reference customer following this pattern. We can progress that scenario by incorporating additional streams of data – perhaps location data allowing personalised offers to be presented to customers as they enter the store. Bringing the data together in a rapidly accessible way enables a new generation of applications. A railway in Europe are taking this approach to combine data from different sources to better serve their customers.


Key usage patternsProvide data to applications while protecting your backend


Protected

Systems of Record

IBM Event Streams

Scalable, High speed Multiple readers

Expand to multi-cloud

• IBM Event Streams allows you to emit streams of data from a backend system, providing a high speed decoupled buffer – allowing many readers to absorb the data

• Allows microservices to be developed, acting against the stream with no impact on critical systems of record

• Expand the pattern to multi-cloud – Event streams can be used to create a local buffer of data in each cloud environment

• Minimize on-prem to cloud data transfer • Give the fastest response for cloud applications

Presenter

Presentation Notes

3 typical usage patterns/starting points over 3 charts across simple cases at the top of each chart then moving down takes each forward to the next level. Top: One of the most common usage patterns for Kafka/Event Streams. Multiple Innovation/Application/Microservice teams are demanding data from the systems of record, often hosted on IBM z. The teams running the systems of record need to ensure that they are stable, but this new demand is unpredictable, and often not well defined at the outset. Kafka/Event Streams offers a way to expose the data so that it can be consumed by multiple readers at their own pace. The load on the backend systems is well defined and predicable. Kafka offers the data as a persistent read-only buffer or cache of the data which is continually refreshed and updated. Example customers include JPMC and M&S who have followed this pattern. Progress that scenario to the next level by pushing that data out to multiple consumers on multiple clouds. Allowing each set of consumers on each cloud to have it’s own buffer or stream of the data allows them to have a consistent, rapid read of the data, whilst minimising the amount of data transferred. Maersk are taking this approach.


Key usage patternsTake advantage of Machine Learning


Feed data to batch

analytics

Progress towards real-time

• IBM Event Streams offers a great way to feed data warehouse/batch analytics

• Data is available whenever analytics processing is ready to process it

• Allows Data analytics to be taken offline without losing any data

• Data replay/history is always available

• Whilst insights from static data are great, the speed of Event Streams means that it’s possible to supply data to conduct analytics and respond in real time

• Take it further and apply machine learning to take action ahead of the competition

Presenter

Presentation Notes

Top: Taking advantage of machine learning: begin by feeding data to batch analytics and data warehousing for after hours insights. Then progress to get near real-time interactions allowing insights to be acted upon immediately.

IBM Event Streams is fully supported Apache Kafka® with value-add capabilities

Trusted support

Unlocks core data

Enterprise-ready

Intuitive to use

57


IBM Event Streams Delivers Differentiated Value

IBM offers Event Streams in several form factors:

• IBM has years of operational expertise running Apache Kafka for Enterprises • This experience has been embedded in the DNA of Event Streams

• Event Streams makes Kafka easy to run, manage & consume, reducing skill requirements and increasing speed of deployment for faster time to value

• Security integration simplifies Kafka access control using roles and policies

• IBM’s experience in enterprise-critical software has shaped features like geo-replication for Disaster Recovery & integration with IBM MQ, to give confidence deploying mission-critical workloads

• Support you can trust – IBM has decades of experience supporting the World’s toughest environments

58

IBM Public Cloud

IsolatedaaS

Multi-tenant

aaS

Private Cloud

Red Hat OpenShift

Container–native Software

Linux (Kubernetes

included)

X86_64 X86_64 & zLinux


In 2015 IBM was the first vendorto offer a fully managed, Apache

Kafka cloud service

Benefit from IBM’s Kafka Expertise

59

Public Multi Tenant service

Dedicated Single Tenant service

IBM has years of experience running Apache Kafka across the globe

© 2019 IBM Corporation60

Easy to deploy• Kafka has many distinct components to

deploy, configure and coordinate for secure connectivity

• Container placement critical to ensure production-level availability

• Secured network traffic ingress

• Ensuring consistent and repeatable deployment

IBM Event Streams | Making Apache Kafka Intuitive and Easy

© 2019 IBM Corporation61

Visualisation of your topic data

Tools to boost productivity

IBM Event Streams | Making Apache Kafka Intuitive and Easy

IBM Event Streams

62

External monitoring toolsDatadog, Splunk, etc

IBM Event Streams | Integrates with External Monitoring Tools


IBM Event Streams | Enterprise Grade Reliability

63

Integrated geo-replication for Disaster Recovery


Geo-Replication is Effortless to Configure

64

1 2 3


IBM MQ connects mission-critical Systems of Record, requiring transactional, once-only deliveryE.g. payment transactions

IBM Event Streams distributes and processes streams of events in real-time to intelligently engage with customers

E.g. alerts on spending patterns

TransactionsAlerts

Send push notifications

IBM MQ

IBM Event Streams | Integrates Seamlessly with IBM MQ

Unlock Events from Systems where Kafka Connectivity is a ProblemREST API for Inbound Data

HTTP

z/OS Connect

Applications

HTTP

HTTP

REST API

IBM Event Streams

66


{"type": "record",

"name": "Aircraft_schema","namespace": "com.mycompany.schemas.aircraft",

"fields": [{

"name": "Aircraft_number","type": "string"

},{

"name": "GPS_coordinates","type": "string"

},{

"name": "Departure_time","type": "string"

},

Avro Schema

Schema Registry

App

Schema Registry

App

2. Validate & Serialize 6. Deserialize

3. Send 4. receive

Presenter

Presentation Notes

In Event Streams 2019.2.1 we introduced a Schema Registry. This is an additional component, fully integrated into the Event Streams experience that provides a place to store schemas so your Kafka applications can access them at run time. This is in support of two main use cases 1) Sending applications can validate the data they’re about to send against the schema BEFORE publishing it to the topic 2) You can use a scheme to automatically serialize data, making the transmission more efficient, and the receiving applications can use the schema to deserialize the same data automatically as it receives the message. Currently the Event Streams Schema Registry supports Apache Avro schemas, however we did not pick up the OSS schema registry because it was not the right foundation for how we see needs progressing. The two specific use cases we see emerging are We will need support for schema types beyond Avro, things like protobuff, JSON schema, etc. We needed to build a foundation with this future in mind. Schemas need to be Globally identifiable. We see many patterns where Kafka clusters need to be connected together, and messages need to be moved between them. As messages are moved between clusters, the schema they refer to needs to make sense to the new cluster. Therefore we have started from a basis that supports unique identifiers.


Connectors

68

Connector Catalog


It’s easy to connect IBM MQ to Apache Kafka

IBM has created a pair of connectors, available as source code or as part of IBM Event Streams

Source connector

From MQ queue to Kafka topic

https://github.com/ibm-messaging/kafka-connect-mq-source

Sink connector

From Kafka topic to MQ queue

https://github.com/ibm-messaging/kafka-connect-mq-sink

Fully supported by IBM for customers with support entitlement for IBM Event Streams

https://github.com/ibm-messaging/kafka-connect-mq-source

https://github.com/ibm-messaging/kafka-connect-mq-sink


MQ sink connector

Converter MessageBuilder

TOPIC: TO.MQ SinkRecord

Value(may be complex)

Schema

Kafka record

Valuebyte[]

Key

MQ message

Payload

MQMD

(MQRFH2)

MQ SINKCONNECTOR

QUEUE: FROM.KAFKA


MQ source connector

RecordBuilder Converter

TOPIC: FROM.MQSourceRecord

Value(may be complex)

Schema

MQ message Kafka record

Key optional(MsgId/CorrelId)

MQ SOURCECONNECTOR

QUEUE: TO.KAFKA

Valuebyte[]

Payload

MQMD

(MQRFH2)


Connecting IBM MQ to Apache Kafka

The connectors are deployed into a Kafka Connect runtimeThis runs between IBM MQ and Apache Kafka

CLIENT

IBM MQ

QUEUE: TO.KAFKA

QUEUE: FROM.KAFKA

Kafka Connect worker

TOPIC: FROM.MQ


MQ SINK CONNECTOR

TOPIC: TO.MQ

MQ SOURCE CONNECTOR

CLIENT


IBM MQ provides support for the Kafka Connect workers to be deployed onto z/OS Unix System Services using bindings connections to MQ

Running Kafka Connect on a mainframe

BINDINGS

IBM MQ Advancedfor z/OS VUE

QUEUE: TO.KAFKA

QUEUE: FROM.KAFKA


TOPIC: FROM.MQ


MQ SINK CONNECTOR

TOPIC: TO.MQ

MQ SOURCE CONNECTOR

BINDINGS

Unix System Services


Advantages of running the MQ connector on z/OS

74

1) Lower workload costs- Local bindings are 3x less CPU intensive than client bindings- Use of bindings mode reduces latency as removes one network hop, important for real-time analytics use cases

2) Better performance in bindings mode

3) Kafka Connect on z/OS is offloadable- This is pure Java-based workload, and so is eligible for offload to zIIPS

4) Simplified configuration- One less set of channel and TLS configuration required

Presenter

Presentation Notes

Great advantages of running the connectors under unix system services on the mainframe.


Thank you

IBM Event Streams: ibm.com/cloud/event-streams

https://slack-invite-ibm-cloud-tech.mybluemix.net/

Subhajit MaitraConsulting IT Specialist

an introduction to apache kafka...2020/01/01 · apache kafka has quickly become popular, due...

Documents