have your cake and eat it too, further dispelling the myths of the lambda architecture
TRANSCRIPT
![Page 1: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/1.jpg)
Have Your Cake & Eat It TooFurther Dispelling the Myths of the Lambda ArchitectureTyler AkidauStaff Software Engineer
![Page 2: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/2.jpg)
MillWheel Streaming
FlumeCloud Dataflow
- Stream Processing System- High-level API- Data Processing Service
![Page 3: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/3.jpg)
Google Cloud Dataflow
Optimize
Schedule
GCS
GCS
![Page 4: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/4.jpg)
- Slava Chernyak, Josh Haberman, Reuven Lax, Daniel Mills, Paul Nordstrom, Sam McVeety, Sam Whittle, and more...
- Robert Bradshaw, Daniel Mills,and more...
- Robert Bradshaw, Craig Chambers, Reuven Lax, Daniel Mills, Frances Perry, and more...
MillWheel
Streaming Flume
Cloud Dataflow
![Page 5: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/5.jpg)
Cloud Dataflow is unreleased.
Things may change.
![Page 6: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/6.jpg)
Lambda vs Streaming
Strong Consistency
Reasoning About Time
1
2
3
Agenda
![Page 7: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/7.jpg)
Lambda vs Streaming1
![Page 8: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/8.jpg)
http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
![Page 9: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/9.jpg)
The Lambda Architecture
![Page 10: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/10.jpg)
http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html
![Page 11: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/11.jpg)
The Evolution of Streaming
![Page 12: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/12.jpg)
Strong Consistency
Tools for Reasoning About Time
What does it take?
![Page 13: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/13.jpg)
Strong Consistency2
![Page 14: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/14.jpg)
Consistent Storage
Storage
![Page 15: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/15.jpg)
• Mostly correct is not good enough• Required for exactly-once processing• Required for repeatable results• Cannot replace batch without it
Why consistency is important
![Page 16: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/16.jpg)
• Sequencers (e.g. BigTable)• Leases (e.g. Spanner)• Federation of storage silos (e.g. Samza,
Dataflow)• RDDs (e.g. Spark)
How?
![Page 17: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/17.jpg)
http://research.google.com/pubs/pub41378.html
![Page 18: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/18.jpg)
Reasoning About Time 3
![Page 19: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/19.jpg)
Event Time vs Stream TimeBatch vs Streaming
ApproachesDataflow API
![Page 20: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/20.jpg)
Event Time - When Events Happened
Stream Time - When Events Are Processed
![Page 21: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/21.jpg)
Batch vs Streaming
![Page 22: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/22.jpg)
MapReduce
Batch
![Page 23: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/23.jpg)
MapReduce
[10:00 - 11:00)
[10:00 - 11:00)
[11:00 - 12:00)[12:00 -
13:00)[13:00 - 14:00)[14:00 -
15:00)[15:00 - 16:00)[16:00 -
17:00)[18:00 - 19:00)[19:00 -
20:00)[21:00 - 22:00)[22:00 -
23:00)[23:00 - 0:00)
Batch: Fixed Windows
![Page 24: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/24.jpg)
MapReduce
[10:00 - 11:00)
[11:00 - 12:00)
Batch: User Sessions
Joan
Larry
Ingo
Amanda
Cheryl
Arthur
[11:00 - 12:00)
[10:00 - 11:00)
![Page 25: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/25.jpg)
Streaming
11:00
10:00
16:00
15:00
14:00
13:00
12:00
![Page 26: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/26.jpg)
UnorderedUnbounded
Of Varying Event Time Skew
Confounding characteristics of data streams
![Page 27: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/27.jpg)
Event Time Skew
Stre
am
Tim
e
Event Time
Skew
![Page 28: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/28.jpg)
Approaches
![Page 29: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/29.jpg)
1.Time-Agnostic Processing2.Approximation3.Stream Time Windowing4.Event Time Windowing
Approaches to reasoning about time
![Page 30: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/30.jpg)
1.Time-Agnostic Processing - Filters
11:00
10:00
16:00
15:00
14:00
13:00
12:00
Stream Time
Example Input:Example Output:
Pros:
Cons:
Web server traffic logsAll traffic from specific domains
StraightforwardEfficientLimited utility
![Page 31: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/31.jpg)
1.Time-Agnostic Processing - Hash Join
11:00
10:00
16:00
15:00
14:00
13:00
12:00
Stream Time
Example Input:Example Output:
Pros:
Cons:
Query & Click traffic Joined stream of Query + Click pairs
StraightforwardEfficientLimited utility
![Page 32: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/32.jpg)
2.Approximation via Online Algorithms
11:00
10:00
16:00
15:00
14:00
13:00
12:00
Stream Time
Example Input:Example Output:
Pros:Cons:
Twitter hashtagsApproximate top N hashtags per prefix
EfficientInexactComplicated Algorithms
![Page 33: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/33.jpg)
11:00
10:00
16:00
15:00
14:00
13:00
12:00
Stream Time
Web server request trafficPer-minute rate of received requestsStraightforwardResults reflect contents of streamResults don’t reflect events as they happenedIf approximating event time, usefulness varies
Example Input:Example Output:
Pros:
Cons:
3.Windowing by Stream Time
![Page 34: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/34.jpg)
11:00
10:00
16:00
15:00
14:00
13:00
12:00
Event Time
Example Input:Example Output:
Pros:Cons:
Twitter hashtagsTop N hashtags by prefix per hour.Reflects events as they occurredMore complicated bufferingCompleteness issues
11:00
10:00
16:00
15:00
14:00
13:00
12:00
Stream Time
4.Windowing by Event Time - Fixed Windows
![Page 35: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/35.jpg)
11:00
10:00
16:00
15:00
14:00
13:00
12:00
Event Time
Example Input:Example Output:
Pros:Cons:
User activity streamPer-session group of activitiesReflects events as they occurredMore complicated bufferingCompleteness issues
11:00
10:00
16:00
15:00
14:00
13:00
12:00
Stream Time
4.Windowing by Event Time - Sessions
![Page 36: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/36.jpg)
Dataflow API
![Page 37: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/37.jpg)
What are you computing?
Where in event time?
When in stream time?
![Page 38: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/38.jpg)
What = Aggregation API
Where = Windowing API
When = Watermarks + Triggers API
![Page 39: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/39.jpg)
Aggregation API
PCollection<KV<String, Double>> sums = Pipeline .begin() .read(“userRequests”) .apply(new Sum());
![Page 40: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/40.jpg)
Aggregation API
2
4
7
0
1
6
33
8
918
9
16Sum
![Page 41: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/41.jpg)
Streaming Mode
10:02 10:0010:06 10:04 Stream Time
24 3 1
63
3 87
0 24
16
3
38
9
0
4
7
0 33
2
10:02 10:0010:06 10:04 Event Time
1
38
9
046
18
7
0 23
432
74 3
6
3
0 33
2
![Page 42: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/42.jpg)
Windowing API
PCollection<KV<String, Long>> sums = Pipeline .begin() .read(“userRequests”) .apply(Window.into(new FixedWindows(2, MINUTE))); .apply(new Sum());
![Page 43: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/43.jpg)
Windowing API
10:02 10:0010:06 10:04 Stream Time
24 3 1
63
3 87
0
10:02 10:0010:06 10:04 Event Time
10:02 10:0010:06 10:04 Event Time
24
16
3
38
9
0
4
7
0 33
2
FixedWindows
Sum
1
38
9
046
18
7
0 23
432
74 3
6
3
0 33
2
13 12616 1835 415
![Page 44: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/44.jpg)
Watermarks
●f(S) -> E●S = a point in stream time (i.e. now)
●E = the point in event time up to which input data is complete as of S
![Page 45: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/45.jpg)
Event Time Skew
Stre
am
Tim
e
Event Time
![Page 46: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/46.jpg)
FixedWindows
Sum
10:02 10:00
13616 1835 415 12
10:06 10:04 Stream Time
24 3 1
63
3 87
0
10:02 10:0010:06 10:04 Event Time
10:01 10:0010:03 10:02 Event Time
24
16
3
38
9
0
4
7
0 33
2
1
38
9
046
18
7
0 23
432
74 3
6
3
0 33
2
Watermarks
![Page 47: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/47.jpg)
Watermark Caveats
Too slow = more latency
Too fast = late data
![Page 48: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/48.jpg)
Triggers
When in stream time to emit?
![Page 49: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/49.jpg)
Triggers API
PCollection<KV<String, Long>> sums = Pipeline .begin() .read(“userRequests”) .apply(Window.into(new FixedWindows(2, MINUTES)) .trigger(new AtWatermark()); .apply(new Sum());
![Page 50: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/50.jpg)
2
3
1
8
4
8
7
6 3
1313
Event Time10:05 10:0610:0110:00
10:0310:00
10:06
2
12 5
52020
99
10:0110:02
10:0510:04
10:02 10:03 10:04
5Late datum
![Page 51: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/51.jpg)
A Better Strategy
1.Once per stream time minute
2.At watermark3.Once per record for two weeks
![Page 52: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/52.jpg)
1325 5
2
3
1
8
4
8
7
6 3
5
Event Time10:05 10:0610:0110:00
10:0310:00
10:06
2
12 5
52020
99
10:0110:02
10:0510:04
10:02 10:03 10:04
12
20
1320
5
1320
13
1325
2
12
13
9
20 520
25Late datum
91325
![Page 53: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/53.jpg)
Triggers APIPCollection<KV<String, Long>> sums = Pipeline .begin() .read(“userRequests”) .apply(Window.into(new FixedWindows(2, MINUTE)) .trigger(new SequenceOf( new RepeatUntil( new AtPeriod(1, MINUTE), new AtWatermark()), new AtWatermark(), new RepeatUntil( new AfterCount(1), new AfterDelay( 14, DAYS, TimeDomain.EVENT_TIME)))); .apply(new Sum());
![Page 54: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/54.jpg)
Lambda vs Streaming
Low-latency, approximate resultsComplete, correct results as soon as possible
Ability to deal with changes upstream
![Page 55: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/55.jpg)
One Last Thing...
What if I want sessions?
![Page 56: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/56.jpg)
Triggers APIPCollection<KV<String, Long>> sums = Pipeline .begin() .read(“userRequests”) .apply(Window.into(new Sessions(1, MINUTE)) .trigger(new SequenceOf( new RepeatUntil( new AtPeriod(1, MINUTE), new AtWatermark()), new AtWatermark(), new RepeatUntil( new AfterCount(1), new AfterDelay( 14, DAYS, TimeDomain.EVENT_TIME)))); .apply(new Sum());
![Page 57: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/57.jpg)
2
8
4
8
7
6 3
Event Time10:05 10:0610:0110:00
10:0310:00
10:0610:01
10:0210:05
10:04
10:02 10:03 10:04
22
3
1
9
1 minute2 7
1 minute2
2 7
1 minute
9 39 3
39 1 48
89 7 2525
25
5Late datum
25 83333
33
335 3838
6 3938 9
20 5
13
2
12
25
9
![Page 58: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/58.jpg)
Summary
Lambda is greatStreaming by itself is better :-)
Strong Consistency = CorrectnessStreaming = Aggregation + Windowing +
TriggersTools For Reasoning About Time = Power +
Flexibility
![Page 59: Have your cake and eat it too, further dispelling the myths of the lambda architecture](https://reader036.vdocuments.mx/reader036/viewer/2022070601/588b112e1a28abdf3b8b6f51/html5/thumbnails/59.jpg)
Thank you!
Questions?Questions about this talk:
Questions about Cloud Dataflow:
[email protected] (Tyler Akidau)[email protected] (Eric Schmidt)