apache flink training - datastream api - processfunction
TRANSCRIPT
![Page 1: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/1.jpg)
1
Apache Flink® Training
Flink v1.3 – 14.9.2017
DataStream API
ProcessFunction
![Page 2: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/2.jpg)
ProcessFunction
Combining timers with stateful event processing
2
![Page 3: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/3.jpg)
Common Pattern
On each incoming element:
• update some state
• register a callback for a moment in the future
When that moment comes:
• Check a condition and perform a certain action, e.g.
emit an element
3
![Page 4: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/4.jpg)
Flink 1.2 added ProcessFunction
Gives access to all basic building blocks:
• Events
• Fault-tolerant, Consistent State
• Timers (event- and processing-time)
4
![Page 5: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/5.jpg)
ProcessFunction
Simple yet powerful API:
5
/**
* Process one element from the input stream.
*/
void processElement(I value, Context ctx, Collector<O> out) throws Exception;
/**
* Called when a timer set using {@link TimerService} fires.
*/
void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception;
![Page 6: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/6.jpg)
ProcessFunction
Simple yet powerful API:
6
/**
* Process one element from the input stream.
*/
void processElement(I value, Context ctx, Collector<O> out) throws Exception;
/**
* Called when a timer set using {@link TimerService} fires.
*/
void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception;
A collector to emit result values
![Page 7: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/7.jpg)
ProcessFunction
Simple yet powerful API:
7
/**
* Process one element from the input stream.
*/
void processElement(I value, Context ctx, Collector<O> out) throws Exception;
/**
* Called when a timer set using {@link TimerService} fires.
*/
void onTimer(long timestamp, OnTimerContext ctx, Collector<O> out) throws Exception;
1. Get the timestamp of the element2. Interact with the TimerService to:
• query the current time • and register timers
1. Do the above2. Query if we are operating on Event or
Processing time
![Page 8: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/8.jpg)
ProcessFunction: example
Requirements:
• maintain counts per incoming key, and
• emit the key/count pair if no element came for the key
in the last 100 ms (in event time)
8
![Page 9: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/9.jpg)
ProcessFunction: example
Implementation sketch:• Store the count, key and last mod timestamp in
a ValueState (scoped by key)
• For each record:
• update the counter and the last mod timestamp
• register a timer 100ms from “now” (in event time)
• When the timer fires:
• check the callback’s timestamp against the last mod time for the key and
• emit the key/count pair if they match
9
![Page 10: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/10.jpg)
ProcessFunction: example
// the data type stored in the statepublic class CountWithTimestamp {
public String key;public long count;public long lastModified;
}
// apply the process function onto a keyed streamDataStream<Tuple2<String, Long>> result = stream
.keyBy(0)
.process(new CountWithTimeoutFunction());
10
![Page 11: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/11.jpg)
ProcessFunction: example
public class CountWithTimeoutFunction extends RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> {
@Overridepublic void open(Configuration parameters) throws Exception {
// register our state with the state backend}
@Override public void processElement(Tuple2<String, Long> value, Context ctx, Collector<Tuple2<String, Long>> out) throws Exception {
// update our state and register a timer}
@Override public void onTimer(long timestamp, OnTimerContext ctx,
Collector<Tuple2<String, Long>> out) throws Exception { // check the state for the key and emit a result if needed
}}
11
![Page 12: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/12.jpg)
public class CountWithTimeoutFunction extends RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> {
private ValueState<CountWithTimestamp> state;
@Overridepublic void open(Configuration parameters) throws Exception {
state = getRuntimeContext().getState(new ValueStateDescriptor<>("myState", CountWithTimestamp.class));
}
}
ProcessFunction: example
12
![Page 13: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/13.jpg)
public class CountWithTimeoutFunction extends RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> {
@Override public void processElement(Tuple2<String, Long> value, Context ctx,
Collector<Tuple2<String, Long>> out) throws Exception {
CountWithTimestamp current = state.value(); if (current == null) {
current = new CountWithTimestamp(); current.key = value.f0;
} current.count++; current.lastModified = ctx.timestamp();state.update(current);ctx.timerService().registerEventTimeTimer(current.lastModified + 100);
}
}
ProcessFunction: example
13
![Page 14: Apache Flink Training - DataStream API - ProcessFunction](https://reader031.vdocuments.mx/reader031/viewer/2022030401/5aacf9687f8b9a003b8b46a5/html5/thumbnails/14.jpg)
public class CountWithTimeoutFunction extends RichProcessFunction<Tuple2<String, String>, Tuple2<String, Long>> {
@Override public void onTimer(long timestamp, OnTimerContext ctx,
Collector<Tuple2<String, Long>> out) throws Exception {
CountWithTimestamp result = state.value(); if (timestamp == result.lastModified + 100) {
out.collect(new Tuple2<String, Long>(result.key, result.count));state.clear();
} }
}
ProcessFunction: example
14