lambda architecture - storm, trident, summingbird ... - architecture and overview
DESCRIPTION
This talk given at Devoxx Paris 2014 gives an overview of lambda architecture, and possible alternative in their implementationTRANSCRIPT
![Page 1: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/1.jpg)
@fdouetteau#lambdataiku
Lambda Architecture
@fdouetteauDataiku, www.dataiku.comFlorian Douetteau, CEO Dataiku
![Page 2: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/2.jpg)
@fdouetteau#lambdataiku
Topics For Today
• WHAT is a lambda architecture• Examples - Principle• Motivation – Hard Points
• HOW to you build a lambda architecture ? • Components per component
![Page 3: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/3.jpg)
@fdouetteau#lambdataiku
Lambda
EVENTS PROCESS
STATE
SE
RV
E
![Page 4: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/4.jpg)
@fdouetteau#lambdataiku
ƛ : SOME USE CASES
• Online Advertising• Keep track of number of displays / clicks
per positions / campaigns
• Recommender Systems• Keep track of production displays / views /
click / buy
• Statistical Time Line• Keep Track of number of tweets per
hashtag / hour
![Page 5: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/5.jpg)
@fdouetteau#lambdataiku
SQL WAY
EVENTS PROCESS
STATE
SE
RV
EUSER1 ITEM1 VIEW
USER1 ITEM2 BUY
INSERT OR UPDATE VIEWS SET pageviews = pageviews + 1
WHERE user=USER1 …
RDBMSSQL
![Page 6: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/6.jpg)
@fdouetteau#lambdataiku
Functional Programming Append Only
EVENTS PROCESS
STATE(APPEND ONLY)
SE
RV
E
newstate = Fagg (oldstate, Fstore(events))
result= F (state, lastevents, scope)
![Page 7: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/7.jpg)
@fdouetteau#lambdataiku
E.g. counting twitter hashtags
EVENTS PROCESS
STATE SE
RV
E
Fmap ( ) = { (#tag, time) -> count }
FReduce( hashmap, hashmap ) = fuse count in maps
FDisplay( hashmap, events ) = Freduce(hashmap, Fmap(events))
TWEET COUNTS(2014-02-31 13, #foo) -> 3(2014-02-31 13, #foo) -> 3(2014-02-31 13, #foo) -> 3(2014-02-31 13, #foo) -> 3
NEW TWEETS TABLE2014-02-31 13:14 #foo bar 2014-02-31 13:14 #foo bar 2014-02-31 13:14 #foo bar 2014-02-31 13:14 #foo bar 2014-02-31 13:14 #foo bar
![Page 8: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/8.jpg)
@fdouetteau#lambdataiku
E.g. counting twitter hashtags in “SQL”
EVENTS
SE
RV
E
TWEET COUNTS TABLE(2014-02-31 13, #foo) -> 8(2014-02-31 13, #foo2) -> 3(2014-02-31 13, #foo3) -> 3(2014-02-31 13, #foo4) -> 1
NEW TWEETS TABLE2014-02-31 13:14 #foo bar 2014-02-31 13:14 #foo bar 2014-02-31 13:14 #foo bar 2014-02-31 13:14 #foo bar 2014-02-31 13:14 #foo bar
PARTIAL TWEET COUNT TABLE(2014-02-31 13, #foo) -> 1(2014-02-31 14, #foo) -> 3(2014-02-31 14, #foo) -> 3(2014-02-31 14, #foo) ->
NEW TWEET COUNT TABLE(2014-02-31 13, #foo) -> 9(2014-02-31 13, #foo) -> 3(2014-02-31 13, #foo) -> 3(2014-02-31 13, #foo) -> 3
CREATE … AS SELECT time, tag, COUNT(*) GROUP BY TIME, TAGCREATE AS
SELEC time, tag, SUM(counts)FROM ( oldtable … UNION
partialtable) GROUP BY TIME, TAG
SELECT, time, tag, SUM(c) FROM (SELECT time, tag, c FROM oldtable WHERE tag = …UNIONSELECT time, tag, c FROM partialtable WHERE tag=…)
INSERT VALUES …
RENAME TABLE …
EXECUTE EACH 5 MINUTES
EXECUTEEACH HOUR
![Page 9: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/9.jpg)
@fdouetteau#lambdataiku
ƛ : PRINCIPLE
EVENTS
BATCH VIEW
REAL-TIME RESULT
BATCH PROC
REAL-TIMEPROC
FED
ER
ATIO
N
![Page 10: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/10.jpg)
@fdouetteau#lambdataiku
Backtype Story
Capture events and logs from twitter
25TB binary data100 Billlios records400 QPS AverageScale 1 -> 150 on peak
Take off with a team of 3 engineers with seed funding in 2008 Christopher Golda Michael Montano Nathan Marz
Acquired by Twitter ( power twitter trends …) in 2011
CascalogStormElephantDB
![Page 11: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/11.jpg)
@fdouetteau#lambdataiku
TWITTER HASHTAGS
2014-02-31 13:14
#foo bar
BATCH VIEW
REAL-TIME RESULT
BATCH PROC
REAL-TIMEPROC
FED
ER
ATIO
N2014-02-31 13:14
#foo bar
2014-02-31 13:14
#foo bar
(2014-02-31 13, #foo) -> 3
(2014-02-31 13, #foo) -> 3
COMPUTE EVERY 5 MINUTESHASHTAG COUNTS FORTHE LAST 5 MINUTES
(IN MEMORY)
COMPUTE EVERY HOUR HASHTAG
COUNT FOR THE LAST HOUR(ON DISK)
![Page 12: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/12.jpg)
@fdouetteau#lambdataiku
RECOMMENDER SYSTEM
BATCH VIEW
REAL-TIME RESULT
BATCH PROC
REAL-TIMEPROC
FED
ER
ATIO
N
USER1 ITEM1 VIEW
USER1 ITEM2 BUY
USER1 ITEM1 VIEW
USER1 ITEM1 VIEW
ITEM-ITEM SIMILARITY MATRIX
USER -> [ ITEM1, … ITEMn]
RECOMMENDATION
![Page 13: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/13.jpg)
@fdouetteau#lambdataiku
THREE KEY DRIVERS FOR LAMBDA ARCH
![Page 14: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/14.jpg)
@fdouetteau#lambdataiku
DRIVER 1: Support Smooth Evolution
2014-02-31 13:14 #foo bar
BATCH VIEW
REAL-TIME RESULT
BATCH PROC
REAL-TIMEPROC
FED
ER
ATIO
N2014-02-31 13:14
#foo bar
2014-02-31 13:14 #foo bar
(2014-02-31 13:14,, #foo) -> 3
(2014-02-31 13:14, #foo) -> 3
(1) RECOMPUTE NEW VERSIONON BATCH WHILE KEEPING THE OLD ONE (2014-02-31 13, #foo) -> 3
(2) THEN UPDATE THE ONLINE VERSION
![Page 15: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/15.jpg)
@fdouetteau#lambdataiku
DRIVER 2: Real-Time System Offline
2014-02-31 13:14
#foo bar
BATCH VIEW
REAL-TIME RESULT
BATCH PROC
REAL-TIMEPROC
FED
ER
ATIO
N2014-02-31 13:14
#foo bar
2014-02-31 13:14
#foo bar
(2014-02-31 13, #foo) -> 3
(2014-02-31 13, #foo) -> 3
COMPUTE EVERY HOUR HASHTAG
COUNT FOR THE LAST HOUR(ON DISK)
FALLBACK TO PARTIAL RESULT WHEN REAL-TIMEGRID IS OFFLINE
![Page 16: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/16.jpg)
@fdouetteau#lambdataiku
DRIVER 3 : CAN’T RECOMPUTE
BATCH VIEW
REAL-TIME RESULT
BATCH PROC
REAL-TIMEPROC
FED
ER
ATIO
N
USER1 ITEM1 VIEW
USER1 ITEM2 BUY
USER1 ITEM1 VIEW
USER1 ITEM1 VIEW
ITEM-ITEM SIMILARITY MATRIX
USER -> [ ITEM1, … ITEMn]
RECOMMENDATION
![Page 17: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/17.jpg)
@fdouetteau#lambdataiku
PAIN POINTS
![Page 18: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/18.jpg)
@fdouetteau#lambdataiku
PAINT POINT 1 : EXACTLY ONCE
2014-02-31 13:14 #foo bar
2014-02-31 13:15 toto
2014-02-31 13:15 tutu
2014-02-31 13:16 #two
…
…
Retry
![Page 19: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/19.jpg)
@fdouetteau#lambdataiku
PAINT POINT 2 : DYNAMIC SCALE
START AT 100 events per secondHOW TO GROW TO 10k events per second without rebuilding everything ?
![Page 20: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/20.jpg)
@fdouetteau#lambdataiku
PAINT POINT 3 : SCHEMA CHANGE
BATCH VIEW
REAL-TIME RESULT
BATCH PROC
REAL-TIMEPROC
FED
ER
ATIO
N
EVENTS V1
EVENTS V2
MIX OF VERSION 1 AND VERSION
2 !!!!
![Page 21: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/21.jpg)
@fdouetteau#lambdataiku
TOOLSAND
FRAMEWORK
![Page 22: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/22.jpg)
@fdouetteau#lambdataiku
Lambda Architecture Building Blocks
Message Queue
Batch State
BatchPump
Real-Time State
Real-Time Views
Service
FederatedView
Batch Views
Service
BatchProcessi
ng
Real-Time Processin
g
![Page 23: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/23.jpg)
@fdouetteau#lambdataiku
Components
Message Queue
Batch State
BatchPump
Real-Time State
Real-Time Views
Service
FederatedView
Batch Views
Service
BatchProcessi
ng
Real-Time Processin
g
STORM
HDFS MapRed HBASE
MEMCACHE MONGODB
WEBAPPRABBITMQ
FLUME
![Page 24: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/24.jpg)
@fdouetteau#lambdataiku
Components
Message Queue
Batch State
BatchPump
Real-Time State
Real-Time Views
Service
FederatedView
Batch Views
Service
BatchProcessi
ng
Real-Time Processin
g
![Page 25: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/25.jpg)
@fdouetteau#lambdataiku
Message Queues
Kestrel (Single Node)
Kafka(Linkedin, Distributed)
RabbitMQActiveMQ
Micro-Batch, State in ProcessorPersitent
Event, State in Queue, Rich Routing
![Page 26: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/26.jpg)
@fdouetteau#lambdataiku
TOPOLOGY : SINGLE PIPE
Message Queue
Batch State
BatchPump
Real-Time State
Real-Time Views
Service
FederatedView
Batch Views
Service
BatchProcessi
ng
Real-Time Processin
g
STORM
STORM
![Page 27: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/27.jpg)
@fdouetteau#lambdataiku
Storm
Developped in 2008-2009 at BackType
First open source release in 2011
BOLTTUPLE
TUPLE
TUPLE
SPOUTTUPLE
![Page 28: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/28.jpg)
@fdouetteau#lambdataiku
Topologies
SPOUT
SPOUT
BOLT
BOLT
BOLT
BOLT
This onelikely to write
in a State
This one tooo
![Page 29: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/29.jpg)
@fdouetteau#lambdataiku
public class HashTagParseBolt extends BaseRichBolt { OutputCollector _collector
public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; } public void execute(Tuple tweet) {
for(String hashtag : tweet.getString(‘hashtags’)) { _collector.emit(new Values(tweet.time, hashtag));
} } public void deplaceOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(‘time’, ‘hashtag’)); } }
Parse Tweet Bolt
![Page 30: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/30.jpg)
@fdouetteau#lambdataiku
Topologies
TweetSpout
ParseTweetBolt
Count HashTags Bolt
Storein
Flat File
Tweet
![Page 31: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/31.jpg)
@fdouetteau#lambdataiku
BALANCING
CLUSTERNODE
PROCESS
EXECUTOR
TASK
TASK
ONE PER TOPOLOGYPER SPOUT OR
BOLTEXECUTOR
TASK
NODE
PROCESS
REBALANCE
![Page 32: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/32.jpg)
@fdouetteau#lambdataiku
(Optional) RELIABILITY
• When emitting a tuple from an existing tuple, trace origin• “Ack” or “Fail” each tuple• If a tuple or dependent
tuples not fully “acked” REPLAY
![Page 33: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/33.jpg)
@YourTwitterHandle#YourSessionHashtag
public class HashTagParseBolt extends BaseRichBolt { OutputCollector _collector
public void prepare(Map conf, TopologyContext context, OutputCollector collector) { _collector = collector; } public void execute(Tuple tweet) {
for(String hashtag : tweet.getString(‘hashtags’)) { _collector.emit(tweet, new Values(tweet.time, hashtag));
} _collector.ack(tweet); } public void deplaceOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(‘time’, ‘hashtag’)); } }
Reliable Parse Tweet
![Page 34: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/34.jpg)
@fdouetteau#lambdataiku
TOPOLOGY 2 : SHARE RT
Message Queue
Batch State
BatchPump
Real-Time State
Real-Time Views
Service
FederatedView
Batch Views
Service
BatchProcessi
ng
Real-Time Processin
g
TRIDENT
TRIDENT
TRIDENT
![Page 35: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/35.jpg)
@fdouetteau#lambdataiku
TRIDENT
• Higher Level Operations
• Use Storm as an RPC Framework
• State “Management”
![Page 36: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/36.jpg)
@fdouetteau#lambdataiku
From Schema To Storm Topology
![Page 37: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/37.jpg)
@fdouetteau#lambdataiku
How is exactly-once implemented?{user=paul, item=car, event=imp}{user=pierre, item=car, event=imp}{user=1, item=car, event=imp}
{user=paul, item=car, event=imp}{user=pierre, item=car, event=imp}{user=pierre, item=car, event=imp}
…
txid=1
txid=3
txid=2
![Page 38: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/38.jpg)
@fdouetteau#lambdataiku
Exactly-Once in statepaul -> { car: 2, txid=2 } pierre -> {car : 5, txid=3 }
paul -> { car: 3, txid=3 } pierre -> {car : 5, txid=3 }
{user=paul, item=car, event=imp}{user=pierre, item=car, event=imp}{user=pierre, item=car, event=imp}
txid=3
Keep Track of last transaction in
state
Transaction does not applyto newer state
parts
![Page 39: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/39.jpg)
@fdouetteau#lambdataiku
TOPOLOGY 1 : SHARE STATE
Message Queue
Batch State
BatchPump
Real-Time State
Real-Time Views
Service
FederatedView
Batch Views
Service
BatchProcessi
ng
Real-Time Processin
gUSE A SINGLE NOSQL SERVICE FOR ALL USE
CASES
![Page 40: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/40.jpg)
@fdouetteau#lambdataiku
REDIS VARIANT
Message Queue
Batch State
BatchPump
Real-Time State
Real-Time Views
Service
FederatedView
Batch Views
Service
BatchProcessi
ng
Real-Time Processin
g
REDIS
REDIS REDIS
REDISALSO USE THE NOSQL AS A MESSAGE QUEUE
![Page 41: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/41.jpg)
@fdouetteau#lambdataiku
TOPOLOGY 3 : SHARED PROCESSING
Message Queue
Batch State
BatchPump
Real-Time State
Real-Time Views
Service
FederatedView
Batch Views
Service
BatchProcessi
ng
Real-Time Processin
g
![Page 42: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/42.jpg)
@fdouetteau#lambdataiku
SummingBird
Single Scala specification than can run in “Batch” on “Real-Time” Mode Single Scala
Code
Run on Storm
Topology
Run on Cascading
(Batch)
![Page 43: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/43.jpg)
@fdouetteau#lambdataiku
object TweetHashTagCount { implicit val timeOf: TimeExtractor[Status] = TimeExtractor(_.getCreatedAt.getTime) implicit val batcher = Batcher.ofHours(1)
….def hashTagCount[P <: Platform[P]]( source: Producer[P, Status], store: P#Store[String, Long]) = source .filter(_.getText != null) .flatMap { tweet: Status => tweet.getHashTags.map(_ -> 1L) } .sumByKey(store)}
Tweet SummingBird
![Page 44: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/44.jpg)
@fdouetteau#lambdataiku
Putting this together
SUMMING BIRD
CASCADING
MAP REDUCE
TRIDENT STORM
RT STORES(NoSQL .. etc..
BATCH STORES(HDFS …)
Distributed Batch Computation
SQL Level Abstraction
DistributedRT Computation
COMMON ABSTRACTION
STATERPC
![Page 45: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/45.jpg)
@fdouetteau#lambdataiku
WEB-SCALE VARIANT
Message Queue
Batch State
BatchPump
Real-Time State
Real-Time Views
Service
FederatedView
Batch Views
Service
BatchProcessi
ng
Real-Time Processin
g
Insert in Mongo
Insert in Mongo
MongoMapRedu
ce
MongoCollectio
n
MongoMongo
Aggregation
![Page 46: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/46.jpg)
@fdouetteau#lambdataiku
HADOOPY VARIANT
Message Queue
Batch State
BatchPump
Real-Time State
Real-Time Views
Service
FederatedView
Batch Views
Service
BatchProcessi
ng
Real-Time Processin
g
INSERT IN
HBASE
HIVE/MAP
REDUCE HBASE
HBASE HBASE Queries
![Page 47: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/47.jpg)
@fdouetteau#lambdataiku
Integrated Publish
Message Queue
Batch State
BatchPump
Real-Time State
Real-Time Views
Service
FederatedView
Batch Views
Service
BatchProcessi
ng
Real-Time Processin
g
![Page 48: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/48.jpg)
@fdouetteau#lambdataiku
SploutSQL
![Page 49: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/49.jpg)
@fdouetteau#lambdataiku
SPARK VARIANT
Message Queue
Batch State
BatchPump
Real-Time State
Real-Time Views
Service
FederatedView
Batch Views
Service
BatchProcessi
ng
Real-Time Processin
g
SPARK STREAMING
HDFS SPARK
MEMORY
![Page 50: Lambda Architecture - Storm, Trident, SummingBird ... - Architecture and Overview](https://reader037.vdocuments.mx/reader037/viewer/2022110306/554f641eb4c905c8088b4c3b/html5/thumbnails/50.jpg)
@fdouetteau#lambdataiku
QUESTIONS
QUESTION QUEUE
florian.douetteau@
dataiku.com
MY MEMORY ANSWER
AUDIENCEHAPPY
ANSWERTO
BatchProcessi
ng
Real-Time Processin
g