flink 0.10 @ bay area meetup (october 2015)

46
Stephan Ewen, Kostas Tzoumas Flink committers co-founders @ data Artisans @StephanEwen, @kostas_tzoumas Flink- 0.10

Upload: stephan-ewen

Post on 08-Jan-2017

995 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Flink 0.10 @ Bay Area Meetup (October 2015)

Stephan Ewen, Kostas TzoumasFlink committers

co-founders @ data Artisans@StephanEwen, @kostas_tzoumas

Flink-0.10

Page 2: Flink 0.10 @ Bay Area Meetup (October 2015)

What is Flink

2

Gelly

Tabl

e

ML

SAM

OA

DataSet (Java/Scala) DataStream (Java/Scala)

Hado

op M

/R

Local Remote Yarn Tez Embedded

Data

flow

Data

flow

(WiP

)

MRQ

L

Tabl

e

Casc

adin

g (W

iP)

Streaming dataflow runtime

Page 3: Flink 0.10 @ Bay Area Meetup (October 2015)

Flink 0.10 Summary

Focus on operational readiness• high availability• monitoring• integration with other systems

First-class support for event time

Refined DataStream API: easy and powerful

3

Page 4: Flink 0.10 @ Bay Area Meetup (October 2015)

Improved DataStream API Stream data analysis differs from batch data analysis

by introducing time

Streams are unbounded and produce data over time

Simple as batch API if handling time in a simple way

Powerful if you want to handle time in an advanced way (out-of-order records,preliminary results, etc)

4

Page 5: Flink 0.10 @ Bay Area Meetup (October 2015)

Improved DataStream API

5

case class Event(location: Location, numVehicles: Long)

val stream: DataStream[Event] = …;

stream .filter { evt => isIntersection(evt.location) }

Page 6: Flink 0.10 @ Bay Area Meetup (October 2015)

Improved DataStream API

6

case class Event(location: Location, numVehicles: Long)

val stream: DataStream[Event] = …;

stream .filter { evt => isIntersection(evt.location) } .keyBy("location") .timeWindow(Time.of(15, MINUTES), Time.of(5, MINUTES)) .sum("numVehicles")

Page 7: Flink 0.10 @ Bay Area Meetup (October 2015)

Improved DataStream API

7

case class Event(location: Location, numVehicles: Long)

val stream: DataStream[Event] = …;

stream .filter { evt => isIntersection(evt.location) } .keyBy("location") .timeWindow(Time.of(15, MINUTES), Time.of(5, MINUTES)) .trigger(new Threshold(200)) .sum("numVehicles")

Page 8: Flink 0.10 @ Bay Area Meetup (October 2015)

Improved DataStream API

8

case class Event(location: Location, numVehicles: Long)

val stream: DataStream[Event] = …;

stream .filter { evt => isIntersection(evt.location) } .keyBy("location") .timeWindow(Time.of(15, MINUTES), Time.of(5, MINUTES)) .trigger(new Threshold(200)) .sum("numVehicles")

.keyBy( evt => evt.location.grid ) .mapWithState { (evt, state: Option[Model]) => { val model = state.orElse(new Model()) (model.classify(evt), Some(model.update(evt))) }}

Page 9: Flink 0.10 @ Bay Area Meetup (October 2015)

IoT / Mobile Applications

9

Events occur on devices

Queue / Log

Events analyzed in a

data streaming system

Stream Analysis

Events stored in a log

Page 10: Flink 0.10 @ Bay Area Meetup (October 2015)

IoT / Mobile Applications

10

Page 11: Flink 0.10 @ Bay Area Meetup (October 2015)

IoT / Mobile Applications

11

Page 12: Flink 0.10 @ Bay Area Meetup (October 2015)

IoT / Mobile Applications

12

Page 13: Flink 0.10 @ Bay Area Meetup (October 2015)

IoT / Mobile Applications

13

Out of order !!!

First burst of eventsSecond burst of events

Page 14: Flink 0.10 @ Bay Area Meetup (October 2015)

IoT / Mobile Applications

14

Event time windows

Arrival time windows

Instant event-at-a-time

Flink supports out of order time (event time) windows,arrival time windows (and mixtures) plus low latency processing.

First burst of eventsSecond burst of events

Page 15: Flink 0.10 @ Bay Area Meetup (October 2015)

15

We need aGlobal Clockthat runs onevent time instead of processing time.

Page 16: Flink 0.10 @ Bay Area Meetup (October 2015)

16

This is a source

This is our window operator

1

0

0

0 0

1

2

1

2

1

1

This is the current event-time time

2

2

2

2

2

This is a watermark.

Page 17: Flink 0.10 @ Bay Area Meetup (October 2015)

17

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

env.setStreamTimeCharacteristic(ProcessingTime);

DataStream<Tweet> text = env.addSource(new TwitterSrc());

DataStream<Tuple2<String, Integer>> counts = text .flatMap(new ExtractHashtags()) .keyBy(“name”) .timeWindow(Time.of(5, MINUTES) .apply(new HashtagCounter());

Processing Time

Page 18: Flink 0.10 @ Bay Area Meetup (October 2015)

18

Event TimeStreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

env.setStreamTimeCharacteristic(EventTime);

DataStream<Tweet> text = env.addSource(new TwitterSrc());text = text.assignTimestamps(new MyTimestampExtractor());

DataStream<Tuple2<String, Integer>> counts = text .flatMap(new ExtractHashtags()) .keyBy(“name”) .timeWindow(Time.of(5, MINUTES) .apply(new HashtagCounter());

Page 19: Flink 0.10 @ Bay Area Meetup (October 2015)

32-35

24-27

20-23

8-110-3

4-7

19Tumbling Windows of 4 Seconds

123412

4

59

9 0

20

20

22212326323321

26

353642

Page 20: Flink 0.10 @ Bay Area Meetup (October 2015)

20

Page 21: Flink 0.10 @ Bay Area Meetup (October 2015)

High Availability and Consistency

21

No Single-Point-Of-Failureany more

Exactly-once processing semanticsacross pipeline

Checkpoints/Fault Tolerance is decoupled from windows Allows for highly flexible window implementations

ZooKeeperensemble

MultipleMasters

failover

Page 22: Flink 0.10 @ Bay Area Meetup (October 2015)

Operator State

Stateless operators System state

User defined state

22

ds.filter(_ != 0)

ds.keyBy("id").timeWindow(Time.of(5, SECONDS)).reduce(…)

public class CounterSum implements RichReduceFunction<Long> { private OperatorState<Long> counter;

@Override public Long reduce(Long v1, Long v2) throws Exception { counter.update(counter.value() + 1); return v1 + v2; }

@Override public void open(Configuration config) { counter = getRuntimeContext().getOperatorState(“counter”, 0L, false); }}

Page 23: Flink 0.10 @ Bay Area Meetup (October 2015)

Checkpoints

Consistent snapshots of distributed data stream and operator state

23

Page 24: Flink 0.10 @ Bay Area Meetup (October 2015)

Barriers

Markers for checkpoints Injected in the data flow

24

Page 25: Flink 0.10 @ Bay Area Meetup (October 2015)

25

Alignment for multi-input operators

Page 26: Flink 0.10 @ Bay Area Meetup (October 2015)

High Availability and Consistency

26

No Single-Point-Of-Failureany more

Exactly-once processing semanticsacross pipeline

Checkpoints/Fault Tolerance is decoupled from windows Allows for highly flexible window implementations

ZooKeeperensemble

MultipleMasters

failover

Page 27: Flink 0.10 @ Bay Area Meetup (October 2015)

Without high availability

27

JobManager

TaskManager

Page 28: Flink 0.10 @ Bay Area Meetup (October 2015)

With high availability

28

JobManager

TaskManager

Stand-byJobManager

Apache Zookeeper™

KEEP GOING

Page 29: Flink 0.10 @ Bay Area Meetup (October 2015)

Persisting jobs

29

JobManager

Client

TaskManagers

Apache Zookeeper™

Job

1. Submit job

Page 30: Flink 0.10 @ Bay Area Meetup (October 2015)

Persisting jobs

30

JobManager

Client

TaskManagers

Apache Zookeeper™

1. Submit job2. Persist execution graph

Page 31: Flink 0.10 @ Bay Area Meetup (October 2015)

Persisting jobs

31

JobManager

Client

TaskManagers

Apache Zookeeper™

1. Submit job2. Persist execution graph3. Write handle to ZooKeeper

Page 32: Flink 0.10 @ Bay Area Meetup (October 2015)

Persisting jobs

32

JobManager

Client

TaskManagers

Apache Zookeeper™

1. Submit job2. Persist execution graph3. Write handle to ZooKeeper4. Deploy tasks

Page 33: Flink 0.10 @ Bay Area Meetup (October 2015)

Handling checkpoints

33

JobManager

Client

TaskManagers

Apache Zookeeper™

1. Take snapshots

Page 34: Flink 0.10 @ Bay Area Meetup (October 2015)

Handling checkpoints

34

JobManager

Client

TaskManagers

Apache Zookeeper™

1. Take snapshots2. Persist snapshots3. Send handles to JM

Page 35: Flink 0.10 @ Bay Area Meetup (October 2015)

Handling checkpoints

35

JobManager

Client

TaskManagers

Apache Zookeeper™

1. Take snapshots2. Persist snapshots3. Send handles to JM4. Create global checkpoint

Page 36: Flink 0.10 @ Bay Area Meetup (October 2015)

Handling checkpoints

36

JobManager

Client

TaskManagers

Apache Zookeeper™

1. Take snapshots2. Persist snapshots3. Send handles to JM4. Create global checkpoint5. Persist global checkpoint

Page 37: Flink 0.10 @ Bay Area Meetup (October 2015)

Handling checkpoints

37

JobManager

Client

TaskManagers

Apache Zookeeper™

1. Take snapshots2. Persist snapshots3. Send handles to JM4. Create global checkpoint5. Persist global checkpoint6. Write handle to ZooKeeper

Page 38: Flink 0.10 @ Bay Area Meetup (October 2015)

Performance

38

Continuousstreaming

Latency-boundbuffering

DistributedSnapshots

High Throughput &Low Latency

With configurable throughput/latency tradeoff

Page 39: Flink 0.10 @ Bay Area Meetup (October 2015)

Batch and Streaming

39

case class WordCount(word: String, count: Int)

val text: DataStream[String] = …;

text .flatMap { line => line.split(" ") } .map { word => new WordCount(word, 1) } .keyBy("word") .window(GlobalWindows.create()) .trigger(new EOFTrigger()) .sum("count")

Batch Word Count in the DataStream API

Page 40: Flink 0.10 @ Bay Area Meetup (October 2015)

Batch and Streaming

40

Batch Word Count in the DataSet API

case class WordCount(word: String, count: Int)

val text: DataStream[String] = …;

text .flatMap { line => line.split(" ") } .map { word => new WordCount(word, 1) } .keyBy("word") .window(GlobalWindows.create()) .trigger(new EOFTrigger()) .sum("count")

val text: DataSet[String] = …;

text .flatMap { line => line.split(" ") } .map { word => new WordCount(word, 1) } .groupBy("word") .sum("count")

Page 41: Flink 0.10 @ Bay Area Meetup (October 2015)

Batch and Streaming

41

Pipelined andblocking operators Streaming Dataflow Runtime

Batch Parameters

DataSet DataStream

RelationalOptimizer

WindowOptimization

Pipelined andwindowed operators

Schedule lazilySchedule eagerly

Recompute wholeoperators Periodic checkpoints

Streaming data movement

Stateful operations

DAG recoveryFully buffered streams DAG resource management

Streaming Parameters

Page 42: Flink 0.10 @ Bay Area Meetup (October 2015)

Batch and Streaming

42

A full-fledged batch processor as well

Page 43: Flink 0.10 @ Bay Area Meetup (October 2015)

Batch and Streaming

43

A full-fledged batch processor as well

More details at Dongwon Kim's Talk"A comparative performance evaluation of Flink"

Page 44: Flink 0.10 @ Bay Area Meetup (October 2015)

Integration (picture not complete)

44

POSIX Java/ScalaCollections

POSIX

Page 45: Flink 0.10 @ Bay Area Meetup (October 2015)

Monitoring

45

Life system metrics anduser-defined accumulators/statistics

Get http://flink-m:8081/jobs/7684be6004e4e955c2a558a9bc463f65/accumulators

Monitoring REST API for custom monitoring tools

{ "id": "dceafe2df1f57a1206fcb907cb38ad97", "user-accumulators": [ { "name":"avglen", "type":"DoubleCounter", "value":"123.03259440000001" }, { "name":"genwords", "type":"LongCounter", "value":"75000000" } ] }

Page 46: Flink 0.10 @ Bay Area Meetup (October 2015)

Flink 0.10 Summary

Focus on operational readiness• high availability• monitoring• integration with other systems

First-class support for event time

Refined DataStream API: easy and powerful

46