the power of snapshots stateful stream processing with apache … · 2020-05-14 · stateful stream...

58
The Power of Snapshots Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1

Upload: others

Post on 28-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

The Power of Snapshots

Stateful Stream Processing

with Apache Flink

Stephan Ewen

QCon San Francisco, 2017

1

Page 2: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

2

Original creators of

Apache Flink®

dA Platform 2

Open Source Apache Flink +

dA Application Manager

Page 3: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

3

Stream Processing

Page 4: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

What changes faster? Data or Query?

4

Data changes slowly

compared to fast

changing queries

ad-hoc queries, data exploration, ML training and

(hyper) parameter tuning

Batch Processing

Use Case

Data changes fast

application logic

is long-lived

continuous applications,data pipelines, standing queries,

anomaly detection, ML evaluation, …

Stream Processing

Use Case

Page 5: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Batch Processing

5

Page 6: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Stream Processing

6

Page 7: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

7

Stateful

Stream Processing

Page 8: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Moving State into the Processors

8

Application

External DBstate

Stateless

Stream Processor

Stateful

Stream Processor

Application

state

Page 9: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

9

Apache Flink

Page 10: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Apache Flink in a Nutshell

10

Queries

Applications

Devices

etc.

Database

Stream

File / Object

Storage

Stateful computations over streams

real-time and historic

fast, scalable, fault tolerant, in-memory,

event time, large state, exactly-once

Historic

Data

Streams

Application

Page 11: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

11

Event Streams State (Event) Time Snapshots

The Core Building Blocks

real-time and

hindsight

complex

business logic

consistency with

out-of-order data

and late data

forking /

versioning /

time-travel

Page 12: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Stateful Event & Stream Processing

12

Source

Transformation

Transformation

Sink

val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer09(…))

val events: DataStream[Event] = lines.map((line) => parse(line))

val stats: DataStream[Statistic] = stream.keyBy("sensor").timeWindow(Time.seconds(5)).sum(new MyAggregationFunction())

stats.addSink(new RollingSink(path))

StreamingDataflow

Source Transform Window

(state read/write)Sink

Page 13: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Stateful Event & Stream Processing

13

Scalable embedded state

Access at memory speed &

scales with parallel operators

Page 14: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Event time and Processing Time

14

Event Producer Message QueueFlink

Data Source

Flink

Window Operator

partition 1

partition 2

EventTime

IngestionTime

ProcessingTime

BrokerTime

Event time, Watermarks, as in the Dataflow model

Page 15: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Powerful Abstractions

15

Process Function (events, state, time)

DataStream API (streams, windows)

Stream SQL / Tables (dynamic tables)

Stream- & Batch

Data Processing

High-level

Analytics API

Stateful Event-

Driven Applications

val stats = stream.keyBy("sensor").timeWindow(Time.seconds(5)).sum((a, b) -> a.add(b))

def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = {// work with event and state(event, state.value) match { … }

out.collect(…) // emit eventsstate.update(…) // modify state

// schedule a timer callbackctx.timerService.registerEventTimeTimer(event.timestamp + 500)

}

Layered abstractions to

navigate simple to complex use cases

Page 16: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

16

Distributed Snapshots

Page 17: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Event Sourcing + Memory Image

17

event log

persists events

(temporarily)

event /

command

Process

main memory

update local

variables/structures

periodically snapshot

the memory

Page 18: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Event Sourcing + Memory Image

18

Recovery: Restore snapshot and replay events

since snapshot

event log

persists events

(temporarily)Process

Page 19: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Consistent Distributed Snapshots

19

Scalable embedded state

Access at memory speed &

scales with parallel operators

Page 20: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Checkpoint Barriers

20

Page 21: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Consistent Distributed Snapshots

21

Trigger checkpoint Inject checkpoint barrier

Page 22: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Consistent Distributed Snapshots

22

Take state snapshot Trigger state

copy-on-write

Page 23: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Consistent Distributed Snapshots

23

Persist state snapshots Persist

snapshots

asynchronously

Processing pipeline continues

Page 24: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Consistent Distributed Snapshots

25

Re-load state

Reset positions

in input streams

Rolling back computation

Re-processing

Page 25: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Consistent Distributed Snapshots

26

Restore to different

programs

Page 26: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

27

Checkpoints and Savepoints

in Apache Flink

Page 27: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Speed or Operability?

28

Fast snapshots

Checkpoint

Flexible

Operations on

Snapshots

Savepoint

What to optimize for?

Page 28: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Savepoints: Opt. for Operability

Self contained: No references to other checkpoints

Canonical format: Switch between state structures

Efficiently re-scalable: Indexed by key group

Future: More self-describing serialization format for to

archiving / versioning (like Avro, Thrift, etc.)

29

Page 29: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Checkpoints: Opt. for Efficiency

Often incremental:

• Snapshot only diff from last snapshot

• Reference older snapshots, compaction over time

Format specific to state backend:

• No extra copied or re-encoding

• Not possible to switch to another state backend between checkpoints

Compact serialization: Optimized for speed/space, not long term

archival and evolution

Key goups not indexed: Re-distribution may be more expensive

30

Page 30: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

31

What else are snapshots /

checkpoints good for?

Page 31: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

What users built on checkpoints

Upgrades and Rollbacks

Cross Datacenter Failover

State Archiving

State Bootstrapping

Application Migration

Spot Instance Region Arbitrage

A/B testing

32

Page 32: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

33

Distributed Snapshots

and side effects

Page 33: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Transaction coordination for side fx

34

One snapshot can transactionally move

data between different systems

Snapshots may include side effects

Page 34: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Transaction coordination for side fx

Similar to a distributed 2-phase commit

Coordinated by asynchronous checkpoints, no voting delays

Basic algorithm:

• Between checkpoints: Produce into transaction or Write Ahead Log

• On operator snapshot: Flush local transaction (vote-to-commit)

• On checkpoint complete: Commit transactions (commit)

• On recovery: check and commit any pending transactions

35

Page 35: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

36

Distributed Snapshots

and Application Architectures

(A Philosophical Monologue)

Page 36: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Good old centralized architecture

37

The big mean

central database

$$$

The grumpy

DBA

Application Application Application Application

Page 37: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Stateful Stream Proc. & Applications

38

Application Application Application

Application Application

decentralized infrastructure

DevOps

decentralized responsibilities

still involves

managing databases

Page 38: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Stateless Application Containers

39

State management

is nasty, let's pretend we don't

have to do it

Page 39: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Stateless Application Containers

40

Kudos to Kiki Carter

for the Broccoli

Metaphor

Broccoli (state management)

is nasty, let's pretend we don't

have to eat do it

Page 40: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Stateful Stream Proc. to the rescue

41

Application

Sensor

APIsApplication

Application

Application

very simple: state is just part

of the application

Page 41: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Compute, State, and Storage

42

Classic tiered architecture Streaming architecture

database

layer

compute

layer

application state

+ backup

compute

+

stream storage

and

snapshot storage

(backup)

application state

Page 42: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Performance

43

synchronous reads/writes

across tier boundary

asynchronous writes

of large blobs

all modifications

are local

Classic tiered architecture Streaming architecture

Page 43: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Consistency

44

distributed transactions

at scale typically

at-most / at-least once

exactly once

per state =1 =1

Classic tiered architecture Streaming architecture

Page 44: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Scaling a Service

45

separately provision additional

database capacity

provision compute

and state together

Classic tiered architecture Streaming architecture

provision

compute

Page 45: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Rolling out a new Service

46

provision a new database

(or add capacity to an existing one)simply occupies some

additional backup space

Classic tiered architecture Streaming architecture

provision compute

and state together

Page 46: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Time, Completeness, Out-of-order

47

?

event time clocks

define data completeness

event time timers

handle actions for

out-of-order data

Classic tiered architecture Streaming architecture

Page 47: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Stateful Stream Processing

48

Application

Sensor

APIsApplication

Application

Application

very simple: state is just part

of the application

Page 48: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

The Challenges with that:

Upgrades are stateful, need consistency

• application evolution and bug fixes

Migration of application state

• cluster migration, A/B testing

Re-processing and reinstatement

• fix corrupt results, bootstrap new applications

State evolution (schema evolution)

49

Page 49: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

50

Consistent Distributed

Snapshots

The answer

(my personal and obviously biased take)

Page 50: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

51

Payments Dashboard

Demo Time!

Page 51: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

52

Thank you very much (shameless plug)

Page 52: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

We are hiring!

data-artisans.com/careers

Page 53: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Appendix

54

Page 54: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

55

Details about Snapshots

and Transactional

Side Effects

Page 55: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Exactly-once via Transactions

56

chk-1 chk-2

TXN-1

✔chk-1 ✔chk-2

TXN-2

TXN-3

Side effect

✔ global ✔ global

Page 56: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Transaction fails after local snapshot

57

chk-1 chk-2

TXN-1

✔chk-1

TXN-2

TXN-3

✔ global

Side effect

Page 57: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

Transaction fails before commit…

58

chk-1 chk-2

TXN-1

✔chk-1

TXN-2

TXN-3

✔ global ✔ global

Side effect

Page 58: The Power of Snapshots Stateful Stream Processing with Apache … · 2020-05-14 · Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1. 2 Original

… commit on recovery

59

chk-2

TXN-2 TXN-3

✔ global

recoverTXN handle

chk-3

Side effect