aljoscha krettek - apache flink for iot: how event-time processing enables easy and accurate...
TRANSCRIPT
![Page 1: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/1.jpg)
1
Aljoscha Krettek@aljoscha
Big Data SpainNovember 17, 2016
Apache Flink for IoT:How Event-Time Processing Enables Easy and Accurate Analytics
![Page 2: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/2.jpg)
What I’d Like to Talk About
2
Streaming architecture and Flink
IoT and event-time stream processing
Use-case examples
![Page 3: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/3.jpg)
3
Original creators of Apache Flink®
Providers of the dA Platform, a supported
Flink distribution
![Page 4: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/4.jpg)
Intro: The Streaming Architecture
4
![Page 5: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/5.jpg)
5
Big Data Architecture
Collect events in HDFS (or similar) Periodically run (batch) jobs to process Problems:• Huge latency• Natural boundaries in data don’t match
batch boundaries
![Page 6: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/6.jpg)
6
Rethinking Data Architecture
Real-time reaction to events
Continuous applications
Process both real-time and historical data
![Page 7: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/7.jpg)
What is (Distributed) Streaming Streaming:
Computations on never-ending “streams” of data records (“events”)
Distributed:Computation spread across many machines
7
Your code
Your code
Your code
Your code
![Page 8: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/8.jpg)
What is Stateful Streaming Result depends on
history of stream A stateful stream
processor should gives the tools to manage state• Recover, roll back, version,
upgrade, etc8
Your code
state
![Page 9: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/9.jpg)
What is Event-Time Streaming Events have timestamps
Processing depends on timestamps
An event-time stream processor should give you the tools to reason about time• Handle streams that are out of
order9
Your code
state
t3 t1 t2t4 t1-t2 t3-t4
![Page 10: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/10.jpg)
10
app state
app state
app state
event log
Queryservice
![Page 11: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/11.jpg)
Recap: What is Streaming? Continuous processing of data that is
continuously generated I.e., pretty much all “big” data It’s all about state and time Flink does all of that
11
![Page 12: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/12.jpg)
12
IoT and Event-time Stream Processing
![Page 13: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/13.jpg)
13 1read.bi/1yDOQQ3
The 'Internet Of Everything' Will Generate $14.4 Trillion Of Value Over The Next Decade.1
![Page 14: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/14.jpg)
Example Event Sources
14
![Page 15: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/15.jpg)
A Simple Definition
15
IoT use cases from the system’s perspective:
A large number of (distributed) things continuously generating a large amount of data.
![Page 16: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/16.jpg)
IoT: Some Insights
16
Data is continuously produced → Stream Processing
Events have a timestamp→ Event-time based processing
Data/Events can arrive with huge delays/out-of-order
Most analyses happen on time windows
![Page 17: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/17.jpg)
What Is Event-Time Processing
17
1977 1980 1983 1999 2002 2005 2015
Processing Time
EpisodeIV
EpisodeV
EpisodeVI
EpisodeI
EpisodeII
EpisodeIII
EpisodeVII
Event Time
![Page 18: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/18.jpg)
What Is Event-Time Processing
18
1312735961112
1234567891011121314Processing Time
Event timestamp
Message Queue
![Page 19: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/19.jpg)
What’s The Problem?
19
13
12
735961112
1234567891011121314Processing Time
Processing-Time Windows 137356
12 137 356Event-Time Windows
12
1112
Mismatch between event time and processing time.
![Page 20: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/20.jpg)
Sources of Time Mismatch Big Mismatch• Network disconnects• Slow network
Small Mismatch• The nature of distributed systems• Differing system clock time
20
![Page 21: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/21.jpg)
Small Event-Time Mismatch
21
Robust Stream Processing with Apache Flink®:A Simple Walkthroughhttp://data-artisans.com/robust-stream-processing-flink-walkthrough/
![Page 22: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/22.jpg)
22
![Page 23: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/23.jpg)
23
![Page 24: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/24.jpg)
24
![Page 25: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/25.jpg)
Recap: Event-Time IoT use cases need event-time
processing Even small mismatch of event
time/processing time will lead to wrong results
25
![Page 26: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/26.jpg)
26
Use-Case Examples
![Page 27: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/27.jpg)
30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily
Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees
27
![Page 28: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/28.jpg)
King Challenges:• Many games (Candy Crush, Farm
Heroes, Pet Rescue, and Bubble Witch…)• 300 million monthly unique users • 30 billion events received every day
Need event-time based statistics
28https://techblog.king.com/rbea-scalable-real-time-analytics-king/
![Page 29: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/29.jpg)
Solution: RBEA
29https://techblog.king.com/rbea-scalable-real-time-analytics-king/
![Page 30: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/30.jpg)
Solution: RBEA Multiplexing of multiple data scientist
requests into a single Flink job Groovy as language for analysis
scripts Event-time windowing
30https://techblog.king.com/rbea-scalable-real-time-analytics-king/
![Page 31: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/31.jpg)
Bouygues Telecom
31http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
~120users*
5 FlinkProductionApps
750 TBStorage
4 billionEvents/day
2015
~300users*
30 FlinkProductionApps
2 PBStorage
10 billionEvents/day
2016* Users of the information system
![Page 32: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/32.jpg)
Bouygues: Challenges Low latency & streaming fashion counters Massive amounts of data + bursty loads Reliability Multiple flow correlation Time management: • Out of order & late events → our worst enemies
32http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
![Page 33: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/33.jpg)
33http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
![Page 34: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/34.jpg)
In Summary
34
If you need to ask: you already have a streaming use case!
IoT requires Proper Time Management
Apache Flink has done that for a long time now*
* Since version 0.10
![Page 35: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/35.jpg)
35
Thank you!
@aljoscha@ApacheFlink @dataArtisans
![Page 36: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/36.jpg)
36
One day of hands-on Flink training
One day of conference
Tickets are on sale
Call for Papers is already open
Please visit our website:http://sf.flink-forward.org
Follow us on Twitter: @FlinkForward
![Page 37: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics](https://reader035.vdocuments.mx/reader035/viewer/2022062523/586f901a1a28ab54768b783f/html5/thumbnails/37.jpg)
We are hiring!
data-artisans.com/careers