rtb and big data where erlang and hadoop meet · avro log format • binary format • structural...
TRANSCRIPT
![Page 1: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/1.jpg)
RTB and Big Data
Where Erlang and
Hadoop Meet
![Page 2: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/2.jpg)
Agenda
What is RTB in the context of Online Advertising? RTB Exchange Architecture Data Handling with Hadoop
![Page 3: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/3.jpg)
What is RTB ?
![Page 4: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/4.jpg)
Online Advertising
Paid for by Advertising
Free Content on WWW
Upfront agreements between Publishers and Advertisers
Placement with Advertiser’s Banner
![Page 5: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/5.jpg)
What is Real Time Bidding • The buying a selling of impressions in real time while a page is loading
Exoclick, www.exoclick.com (2014)
PUBL
ISHER
S B
ette
r val
ue fo
r Inv
ento
ry ADVERTISERS B
etter audience
SSP DSP
![Page 6: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/6.jpg)
SSP
Adserving Workflow
Web Server
AdServer
RTB
DSP 1
DSP 2
DSP 3
DSP N Advertiser CDN
![Page 7: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/7.jpg)
SSP
Adserving Workflow
Web Server
AdServer
RTB
DSP 1
DSP 2
DSP 3
DSP N Advertiser CDN
![Page 8: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/8.jpg)
RTB Exchange Architecture
![Page 9: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/9.jpg)
AdServer RTB Exchange
ZMQ
RTB Exchange
![Page 10: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/10.jpg)
AdServer RTB Exchange
![Page 11: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/11.jpg)
AdServer RTB Exchange
AdServer
AdServer
![Page 12: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/12.jpg)
AdServer RTB Exchange
AdServer
AdServer
RTB Exchange
RTB Exchange
![Page 13: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/13.jpg)
AdServer RTB Exchange
ETS ETS
ETS
ZMQ Auction
RTB Exchange
![Page 14: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/14.jpg)
Retrieving Campaign Data
Config Service
MySql Cluster
RTB RTB RTB RTB
JSON
![Page 15: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/15.jpg)
DSP 1
AdServer RTB Exchange
ETS
DSP 2
DSP 3
DSP N
Web UI
SNMP
folsomite
Creative Scanning
DSP
DSP
DSP
DSP
ETS ETS
Couchbase
ZMQ Auction
erlmc
RTB Exchange
lhttpc
![Page 16: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/16.jpg)
How Much does it Scale ?
![Page 17: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/17.jpg)
60 Billion The number of bid requests build and sent
to DSPs every day
17
![Page 18: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/18.jpg)
13,000 The number of bid requests build and sent
to DSPs every second per host at peak
18
![Page 19: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/19.jpg)
“Linear” Scaling
![Page 20: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/20.jpg)
A Single Request
thrift_req_handler
Auction processes
DSP Processes
httpc_client process
zmq_message_handler
...
...
...
![Page 21: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/21.jpg)
github.com/aol/erlgraph
![Page 22: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/22.jpg)
Data Handling Size matters!
![Page 23: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/23.jpg)
What is Big Data?
Columnar Datastore
Parallelization
Linear Scaling
K Savety Eventual Consitency
![Page 24: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/24.jpg)
Some Metrics?
![Page 25: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/25.jpg)
2 x 100 Count of Hadoop nodes
25
![Page 26: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/26.jpg)
2TB / h Data Processed
The size in printed paper would equal 500
million pages – a 50km tall pile
26
![Page 27: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/27.jpg)
20TB Data stored in Vertica Cluster
Paper pile would reach from Berlin to Stuttgart
27
![Page 28: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/28.jpg)
SSP
Data Handling Logging
Web Server
AdServer
RTB
DSP 1
DSP 2
DSP 3
DSP N Advertiser CDN
LogCollector
TCP
![Page 29: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/29.jpg)
LogCollector
• Written in Java • Based on Netty • TCP input file output • Optimized to maximize TCP throughput
• Other solutions suffered under finer network control
• Circular ringbuffers • Zero copy • Gets binary payload + metadata for control flow
Image source: wikipedia
![Page 30: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/30.jpg)
Avro Log Format
• Binary format • Structural data support
• Arrays, Trees etc.
• Compression • Self descriptive
• JSON schema header
• Well supported in Hadoop
Schema Example: Header Part { "name": "DataAvroPacket", "fields": [ { "name": "SGSHeader", "type": { "name": "SGSHeader", "fields": [ { "name": "VersionID", "type": "int" } ], "type": "record" } },...
![Page 31: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/31.jpg)
Partners in Crime Erlang and MapReduce
![Page 32: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/32.jpg)
Map Reduce
„MapReduce is a framework for processing parallelizable problems across huge datasets using a large number of computers (nodes), collectively referred to as a cluster“ (Wikipedia.com)
![Page 33: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/33.jpg)
Map Reduce Example
Dino Sky Red Bike Bike Red Dino Bike Sky
Bike Bike Red
Dino Sky Red
Dino Bike Sky
Dino, 1 Sky, 1 Red, 1
Bike, 1 Bike, 1 Red, 1
Dino, 1 Bike, 1 Sky, 1
Sky, 1 Sky, 1
Bike, 1 Bike, 1 Bike, 1
Dino, 1 Dino, 1
Sky, 2
Bike, 3
Dino, 2
Sky, 2 Bike, 3 Dino, 2 Red, 2
Red, 1 Red, 1
Red, 2
Input
Splitting Reduce
Mapping Shuffling
Result
![Page 34: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/34.jpg)
ColumnarDatastore
• We use Vertica • Significant read
performance gain compared to traditional RDBMS
• Each Column end up in own file
• Trick is stream compression combined with smart search (like binary)
![Page 35: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/35.jpg)
PIG
• Script language • Creates Map Reduce • Overcomes the need
to write native jobs • Dataflow oriented
Script Example: REGISTER /usr/lib/pig/contrib/piggybank/java/lib/avro-1.5.4.jar %default INFILE '/var/tmp/example1.avro’ ... rec1 = LOAD '$INFILE’ USING org.apache.pig.piggybank.storage.avro.AvroStorage ('{}'); rec1Data = FOREACH rec1 GENERATE SGSMainPacket.PlacementId,SGSMainPacket.CampaignId, SGSMainPacket.BannerNumber, $REP_DATE AS DATE, $REP_HOUR AS HOUR; recGroup = GROUP rec1Data BY ( PlacementId,CampaignId,BannerNumber,DATE,HOUR); fullCount = FOREACH recGroup GENERATE 1, -- VERSION COUNTER group.PlacementId,group.CampaignId,group.BannerNumber,group.DATE,group.HOUR, COUNT(rec1Data) AS TOTAL; STORE fullCount INTO '$OUTFILE’ USING org.apache.pig.piggybank.storage.avro.AvroStorage (‘ { "schema": { "name" : "SummaryHourly”, "type" : "record”, "fields": [ { "name": "Version", "type": "int" }, { "name": "PlacementId", "type": "int" }, { "name": "CampaignId", "type": "int" }, { "name": "BannerNumber", "type": "int" }, { "name": "DateEntered", "type": "int" }, { "name": "Hour", "type": "int" }, { "name": "COUNT", "type": "long" } ] } }');
![Page 36: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/36.jpg)
Reporting Architecture
![Page 37: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/37.jpg)
What’s Next?
• Moving into AWS • Easier scaling • Easy test cluster ramp ups • Easy to get additional ressources in error cases to catch up
• Spark • Optimized Dataflow • Streaming, less intermediate files • More functionality • Written in Scala
![Page 38: RTB and Big Data Where Erlang and Hadoop Meet · Avro Log Format • Binary format • Structural data support • Arrays, Trees etc. • Compression • Self descriptive ... •](https://reader034.vdocuments.mx/reader034/viewer/2022052320/5f246ee885caf300cc54d840/html5/thumbnails/38.jpg)
Interested? We’re hiring!