![Page 1: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/1.jpg)
Budapest Data Forum, 2018
Structured Streaming in
![Page 2: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/2.jpg)
Spark / Big Data / Cloud Computing Trainings Building Data Infrastructures for Industry 4.0 & Online
![Page 3: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/3.jpg)
![Page 4: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/4.jpg)
Why Real-time?
![Page 5: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/5.jpg)
Why Spark Streaming?
![Page 6: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/6.jpg)
Why Real-time?
![Page 7: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/7.jpg)
![Page 8: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/8.jpg)
How to chose a streaming tool?
![Page 9: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/9.jpg)
The Apache landscape
streams
![Page 10: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/10.jpg)
![Page 11: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/11.jpg)
Sometimes you just want to keep it simple
+
![Page 12: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/12.jpg)
Remember this from 1 hour ago?
![Page 13: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/13.jpg)
So, our fancy tools
streams
![Page 14: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/14.jpg)
How to chose a fancy streaming tool?
![Page 15: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/15.jpg)
Popularity
![Page 16: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/16.jpg)
See the bigger picture
![Page 17: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/17.jpg)
Throughput
source: https://databricks.com/blog/2017/10/11/benchmarking-structured-streaming-on-databricks-runtime-against-state-of-the-art-streaming-systems.html
*as the Spark folks measured it
![Page 18: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/18.jpg)
Throughput
source:https://data-artisans.com/blog/curious-case-broken-benchmark-revisiting-apache-flink-vs-databricks-runtime
*as the Flink folks measured it
![Page 19: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/19.jpg)
Developers!
![Page 20: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/20.jpg)
LatencyNative Streaming
(event-based processing)
vs
Microbatching
streams
trident
![Page 21: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/21.jpg)
https://www.theguardian.com/technology/2014/feb/05/why-google-engineers-designers
![Page 22: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/22.jpg)
Structured Streaming
![Page 23: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/23.jpg)
Pain points to solve• Interoperability
batch, interactive and real-time analytics
• Event time based processingevent time instead of processing time
• End-to-end guarantees consistent data throughout the whole pipeline exactly-once processing
![Page 24: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/24.jpg)
Pain points to solve• Interoperability
batch, interactive and real-time analytics
• Event time based processingevent time instead of processing time
• End-to-end guarantees consistent data throughout the whole pipeline exactly-once processing
![Page 25: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/25.jpg)
Unbounded Table
image credit: http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
![Page 26: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/26.jpg)
Pain points to solve• Interoperability
batch, interactive and real-time analytics
• Event time based processingevent time instead of processing time
• End-to-end guarantees consistent data throughout the whole pipeline exactly-once processing
![Page 27: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/27.jpg)
Late data
![Page 28: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/28.jpg)
Handling late data with Watermarking
![Page 29: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/29.jpg)
Pain points to solve• Interoperability
batch, interactive and real-time analytics
• Event time based processingevent time instead of processing time
• End-to-end guarantees consistent data throughout the whole pipeline exactly-once processing
![Page 30: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/30.jpg)
The drama of Exactly-once processing (Act I)
Spark: got it, thanks! Consider line 11 done.Spark: Hey Postgres,
store the results please
Spark: give me data
Kafka: you were at the 10th line, there you go with the 11th.
Spark: give me data
Kafka: you were at the 11th line, there you go with the 12th.
OK!
...
![Page 31: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/31.jpg)
The drama of Exactly-once processing (Act II)
Spark: got it, thanks! Consider line 13 done.Spark: Hey Postgres,
store the re.....
Spark: give me data
Kafka: you were at the 12th line, there you go with the 13th.
Claudius: Hey Spark, got thirsty? ;)
![Page 32: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/32.jpg)
Demo
![Page 33: Structured Streaming in - BI Consultingbiconsulting.hu/letoltes/2018budapestdata/toth... · The drama of Exactly-once processing (Act I) Spark: got it, thanks!Consider line 11 done](https://reader034.vdocuments.mx/reader034/viewer/2022042222/5ec9185d5263de629b5d9d49/html5/thumbnails/33.jpg)
Summary• Only use fancy tools if you need them ;)
• Structured Streaming
• Great Concept
• Access to core Spark functionalities
• Probably takes 1-2 years to make it feature-rich