criteo infrastructure (platform) meetup
TRANSCRIPT
![Page 1: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/1.jpg)
Criteo Infrastructure (Platform) Meetup
22nd February 2017
Diarmuid Gill, VP R&D - Platforms
Introduction & welcome note
![Page 2: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/2.jpg)
About Criteo
1
![Page 3: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/3.jpg)
3 | Copyright © 2017 Criteo
Our mission
TARGET THE RIGHT USER
AT THE RIGHT TIME
WITH THE RIGHT MESSAGE
![Page 4: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/4.jpg)
4 | Copyright © 2017 Criteo
Key Figures
18 000 PUBLISHERS90% RETENTION RATE2
+130COUNTRIES
LISTED ON THE NASDAQ SINCE
OCTOBER 2013
R&D REPRESENTS 21% OF THE WORKFORCE
2500EMPLOYEES
21 BILLIONS $3
14 000 ADVERTISERS
$1,799 million1
31OFFICES
1: REVENUE IN 20162: ANNUAL RATE 2015
3: $ OF TURNOVER GENERATED TO OUR CLIENTS - TURNOVER POST-CLICK WW FROM JANUARY TO DECEMBER 2015
![Page 5: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/5.jpg)
How does it work ?
2
![Page 6: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/6.jpg)
6 | Copyright © 2017 Criteo
GENERAL CONCEPT
Users visit an advertiser’s website
1
Criteo identifies the users (via cookies)
2
Users leave the advertiser’s website& browse publisher on the Internet
3
Criteo identifies users on these pages(via cookie)
4
Criteo displays an advertising banner, personalized for
each user
5
Click through directlyto the advertiser’s
page
6
@
Retargeting principles
![Page 7: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/7.jpg)
Underlying infrastructure
3
![Page 8: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/8.jpg)
8 | Copyright © 2017 Criteo
• 3.2B catalog items ingested/day, 6B items stored
• 3.6B cookies/device IDs seen per month
• 3.9B personalized banners/day• 49 RTBs @ 120B bid requests/day
• 3M QPS at peak• 90 Gbps bandwidth• 20K servers• 27PB of data stored• 3.6PB of data read daily• 500B log lines processed/day• 363TB of RAM in memcached, 37M req/s• 300K Hadoop jobs/day
Scale @ Criteo
![Page 9: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/9.jpg)
9 | Copyright © 2017 Criteo
Batch processing:
• Hadoop as a Service:• 2 clusters – main + backup one for degraded mode• Cloudera CDH5• 2300 servers total (1300 + 1000), 76K vcores• 50PiB storage capacity
• Own job scheduler for improved data flow and coordination• 300k jobs per day
Hadoop @ Criteo
![Page 10: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/10.jpg)
10 | Copyright © 2017 Criteo
Infrastructure Key Figures
Hosting Global Partners :
Sunnyvale2 PoP
500 kVA2 006 Servers
New York2 PoP
930 kVA2 793 Servers
Hong Kong2 PoP
472 kVA2 185 Servers
Paris3 Pop
1 800 kVA5 003 Servers
Amsterdam2 PoP
+2 500 kVA3 874 Servers
Tokyo2 PoP
455 kVA2 564 Servers
Shanghai1 PoP
200 kVA907 Servers
Worldwide16 PoP
~8 MVA Contracted20 526 ServersUp to 90 Gbps
3M QPS
Ashburn2 PoP
1,1 MVA1 170 Servers
Hosting Global Partners :
![Page 11: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/11.jpg)
11 | Copyright © 2017 Criteo
Some of the many technologies used at Criteo
![Page 12: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/12.jpg)
What does “Platforms”
mean in Criteo?
4
![Page 13: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/13.jpg)
13 | Copyright © 2017 Criteo
Top Level Applications
Platforms
Infrastructure
SRE
Advertiser Publisher
WebScale
Prediction DynamicCreative
Recommendation
Engine• Catalog• User Events• Campaigns• Reporting
• RTB• Direct• Campaigns• Reporting
Systems
Platforms
Systems
Engine
![Page 14: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/14.jpg)
14 | Copyright © 2017 Criteo
Analytics Platforms
Advertiser Publisher
Analytics
AX/BI
Reporting / Billing Reporting / Payments
![Page 15: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/15.jpg)
Tonight’s programme
4
![Page 16: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/16.jpg)
16 | Copyright © 2017 Criteo
Tonight’s menu
Bill of Fare***
1st talk: FastTrack: scaling customer integration - Nicolas Laveau, Leo-Paul Goffic & Camille Coueslant -
2nd talk: Evolution of data structures in Yandex.Metrica- Alexey Milovidov -
3rd talk: Don't take your software for granted- Cedrick Montout -
4th talk: Evolution of analytics at Criteo- Justin Coffey -
***21:05 - 22:00 Networking
![Page 17: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/17.jpg)
Thank you!
![Page 18: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/18.jpg)
Camille Coueslant, Léo-Paul Goffic, Nicolas Laveau
2017/02/22
Scaling customer integration
FastTrackPLACEHOLDER IMAGE
![Page 19: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/19.jpg)
19 | Copyright © 2017 Criteo
What do we do in Criteo?
Deliver the right message to the right user at the right time
![Page 20: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/20.jpg)
20 | Copyright © 2017 Criteo
Integration: Creatives settings
• Banners need branding• Logo• Font• Color palette
• Banners come in many formats
![Page 21: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/21.jpg)
21 | Copyright © 2017 Criteo
Integration: Tags
• Banners are based on user intent• Tags on customer store• Different types of intent
• Home page view• Product view• Listing view• Basket• Sales
• Intent at product level
<script type="text/javascript" src="//static.criteo.net/js/ld/ld.js" async="true"></script><script type="text/javascript">window.criteo_q = window.criteo_q || [];window.criteo_q.push({ event: "setAccount", account: 666 },{ event: "setEmail", email: "[email protected]" },{ event: "setSiteType", type: "g" },{ event: "viewHome" });</script>
<script type="text/javascript" src="//static.criteo.net/js/ld/ld.js" async="true"></script><script type="text/javascript">window.criteo_q = window.criteo_q || [];window.criteo_q.push({ event: "setAccount", account: 666 },{ event: "setEmail", email: "[email protected]" },{ event: "setSiteType", type: "g" },{ event: "trackTransaction", id: "tr-56182-2123", item: [ { id: "patronus", price: 12.54, quantity: 3 }, { id: "avada-kedavra", price: 1099.99, quantity: 1 }/* add a line for each item in the user's basket */]});</script>
Home
Sales
![Page 22: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/22.jpg)
22 | Copyright © 2017 Criteo
Integration: Product Feed
• Banners contain products• Characteristics of products are used for
recommendation• Name, description, image, price for display
<item> <g:id>0</g:id> <title>Abracadabra</title> <g:image_link> http://www.magic.com/assets/spells/abracadabra.png </g:image_link> <link> http://www.magic.com/spells/abracadabra </link> <description> Multi-purpose spell. Your companion for every occasion! </description> <g:price>625.99</g:price> <g:google_product_category>35</g:google_product_category></item>
id;title;image_link;link;description;price;google_product_category0;Abracadabra;http://www.magic.com/assets/spells/abracadabra.png;http://www.magic.com/spells/abracadabra;Multi-purpose spell. Your companion for every occasion!;625.99;Arts & Entertainment > Hobbies & Creative Arts > Magic & Novelties
XML
CSV
![Page 23: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/23.jpg)
23 | Copyright © 2017 Criteo
Back in 2014
When the customer was seeing what he had to implement
![Page 24: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/24.jpg)
24 | Copyright © 2017 Criteo
Back in 2014
When the technical support was seeing the first implementation
![Page 25: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/25.jpg)
25 | Copyright © 2017 Criteo
Back in 2014
When the customer was trying to debug his implementation
![Page 26: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/26.jpg)
26 | Copyright © 2017 Criteo
Criteo grows… fast!
This does not scale!
« Performance is everything »BUT
we need to onboard first
Clients
TS
![Page 27: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/27.jpg)
27 | Copyright © 2017 Criteo
All is not lost!
Technology & UX to the rescue!
![Page 28: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/28.jpg)
TagsPart 1:Tag Validation Dashboard
![Page 29: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/29.jpg)
29 | Copyright © 2017 Criteo
Goal
Show near real-time metrics on trackers format issues Detect mismatches between the trackers and the product feed Provide fine-grained data (max 24 hours) Available for each of our clients (=worldwide)
![Page 30: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/30.jpg)
30 | Copyright © 2017 Criteo
How
Initial trackers architecture
![Page 31: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/31.jpg)
31 | Copyright © 2017 Criteo
How
1. Audit the tracker events2. Send this audit event to Kafka3. Consume it from Druid
![Page 32: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/32.jpg)
32 | Copyright © 2017 Criteo
Why Druid
• Druid is an open-source column-oriented distributed data store
• Advantages:• Fast aggregation queries on huge amount of metrics• Real-time streaming ingestion• Scalable• Highly available
![Page 33: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/33.jpg)
33 | Copyright © 2017 Criteo
1. Audit the tracker events2. Send this audit event to Kafka3. Consume it from Druid4. Query Druid from Integrate
How
![Page 34: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/34.jpg)
34 | Copyright © 2017 Criteo
Result
![Page 35: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/35.jpg)
TagsPart 2:Tag Debug Mode
![Page 36: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/36.jpg)
36 | Copyright © 2017 Criteo
Tag Debug Mode
How do I make sure I send Criteo the right information from my website?
?? Fig 1: Criteo Hotline
![Page 37: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/37.jpg)
37 | Copyright © 2017 Criteo
Tag Debug Mode
How do I make sure I send Criteo the right information from my website?
Fig 2: Happy customer
![Page 38: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/38.jpg)
38 | Copyright © 2017 Criteo
How tags work
https://www.mvmtwatches.com/
![Page 39: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/39.jpg)
39 | Copyright © 2017 Criteo
How tags work
https://www.mvmtwatches.com/
ld.js
![Page 40: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/40.jpg)
40 | Copyright © 2017 Criteo
How tags work
https://www.mvmtwatches.com/
ld.js
GET /event?a=%5B30072%…
![Page 41: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/41.jpg)
41 | Copyright © 2017 Criteo
How tags work
https://www.mvmtwatches.com/
ld.js
GET /event?a=%5B30072%…
200 OK
![Page 42: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/42.jpg)
42 | Copyright © 2017 Criteo
Tag Debug Mode
![Page 43: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/43.jpg)
43 | Copyright © 2017 Criteo
Tag Debug Mode
https://www.mvmtwatches.com/#enable-tag-debug-mode
![Page 44: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/44.jpg)
44 | Copyright © 2017 Criteo
Tag Debug Mode
https://www.mvmtwatches.com/#enable-tag-debug-mode ld.js
if (document.location.hash == debugHash) loadLdDebug();
![Page 45: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/45.jpg)
45 | Copyright © 2017 Criteo
Tag Debug Mode
https://www.mvmtwatches.com/#enable-tag-debug-mode ld.js
ld-debug.js
if (document.location.hash == debugHash) loadLdDebug();
addDebugIframe();
![Page 46: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/46.jpg)
46 | Copyright © 2017 Criteo
Tag Debug Mode
https://www.mvmtwatches.com/#enable-tag-debug-mode ld.js
GET /event?a=%5B30072%…&debugMode=1
ld-debug.js
if (document.location.hash == debugHash) loadLdDebug();
addDebugIframe();
![Page 47: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/47.jpg)
47 | Copyright © 2017 Criteo
Tag Debug Mode
https://www.mvmtwatches.com/#enable-tag-debug-mode ld.js
GET /event?a=%5B30072%…&debugMode=1
200 OKContent-Type: application/javascript
sendDebugInformationToIframe({ audit: {
product: { image: ‘…’ },errors: […]
}});
ld-debug.js
if (document.location.hash == debugHash) loadLdDebug();
addDebugIframe();
![Page 48: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/48.jpg)
48 | Copyright © 2017 Criteo
Tag Debug Mode
Gives you fine-grained insights on the quality of information sent Requires no technical knowlege Mirrors exactly what will be processed down the line
![Page 49: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/49.jpg)
Feed
![Page 50: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/50.jpg)
50 | Copyright © 2017 Criteo
Goal
Provide feedbacks ASAP on a subset of products Provide feedbacks on the whole feed Automatic format detection (Google specs) User can validate the structure of the feed User can review some products As close as possible as the daily feed import
![Page 51: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/51.jpg)
51 | Copyright © 2017 Criteo
Full import
Daily import architecture
![Page 52: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/52.jpg)
52 | Copyright © 2017 Criteo
Full import
Update feed processing Hadoop job to compute errors and attributes statistics
![Page 53: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/53.jpg)
53 | Copyright © 2017 Criteo
Full import
Launch full import from Integrate, retrieve and display statistics
![Page 54: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/54.jpg)
54 | Copyright © 2017 Criteo
Test import
Create a Marathon application that:- Stream incoming feed- Detect format- Reuse part of feed processing
Hadoop job java code- Save import & statistics in DB- Provide API to fetch statistics
![Page 55: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/55.jpg)
55 | Copyright © 2017 Criteo
Result
![Page 56: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/56.jpg)
56 | Copyright © 2017 Criteo
Result
![Page 57: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/57.jpg)
Creatives
![Page 58: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/58.jpg)
58 | Copyright © 2017 Criteo
How banners work at Criteo
• Actual humans pick predefinedlayouts, colors, CTAs
• Then those are combined with productinformation and optimized on-the-fly
Je découvre !
J’achète !× ×
×
=
![Page 59: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/59.jpg)
59 | Copyright © 2017 Criteo
How banners work at Criteo
“Can I have drop shadows on my products?”
“I’m not sure about the pink”
“Could it autoplay loud music?”
As a result, clients worry
“What will my banners look like?”
![Page 60: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/60.jpg)
60 | Copyright © 2017 Criteo
How banners work at Criteo
There is stuff we can’t do, and stuff we don’t necessarily want to do
“What will my banners look like?”
“Can I have drop shadows on my products?”
“I’m not sure about the pink”
“Could it autoplay loud music?”
![Page 61: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/61.jpg)
61 | Copyright © 2017 Criteo
Creatives to the rescue
And it takes back and forth.
Our goal:• Give advertisers a preview of what it’ll look like• Give advertisers customization options• Feedback the performance impact
• 80% of advertisers validate their Creatives in < 2 minutes• 80% of advertisers don’t ask for a change
![Page 62: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/62.jpg)
62 | Copyright © 2017 Criteo
Creatives
Bring on UX, R&D, Product, Sales, Creatives & Technical Support
![Page 63: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/63.jpg)
63 | Copyright © 2017 Criteo
Creatives
Bring on UX, R&D, Product, Sales, Creatives & Technical Support
![Page 64: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/64.jpg)
64 | Copyright © 2017 Criteo
Creatives
1 Education
Preview
Performance
Customization
2
3
4
1
2
3
4
![Page 65: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/65.jpg)
Going further!And mostly faster
![Page 66: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/66.jpg)
66 | Copyright © 2017 Criteo
eCommerce Platforms
Lots of our clients run on ready-to-use platforms that have APIs
As a result, we can completely automate the integration workflow for them!
![Page 67: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/67.jpg)
67 | Copyright © 2017 Criteo
Shopify integration
Only 2 clicks needed!
Reduced integration time from 14 days to 20 minutes
![Page 68: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/68.jpg)
Integration today
![Page 69: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/69.jpg)
69 | Copyright © 2017 Criteo
How customers / technical support / we feel
![Page 70: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/70.jpg)
70 | Copyright © 2017 Criteo
“”
• Only 25% in 2014• 66% complete
Feed in < 1h
• 43 days in 2014
• 2014: 600 integrations/quarter
• 2016: 1800 integrations/quarter
• 50% handled through Integrate
• 95% accept “as-is”• 4% accept with
performance downgrade
• Only 1% ask for modification
Nassim Aissat, Global TS
I’m in love with the Tag Debug Mode
7514d %Median integration time
Tags without help
Integrate achievements
92%Validate Creatives < 2 mn
20mnIntegration w/ Shopify App
![Page 71: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/71.jpg)
Questions?
![Page 72: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/72.jpg)
72 | Copyright © 2017 Criteo
![Page 73: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/73.jpg)
73 | Copyright © 2017 Criteo
What does Black Friday mean at Criteo?
![Page 74: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/74.jpg)
74 | Copyright © 2017 Criteo
Release freeze: trying to guarantee the stability of the platform...
... with nasty side-effects
Getting ready for Black Friday
![Page 75: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/75.jpg)
75 | Copyright © 2017 Criteo
How to know evaluate at a glance the health of the datacenter?
Comes grafana
Monitoring the datacenter
![Page 76: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/76.jpg)
76 | Copyright © 2017 Criteo
With specific filters, deviant machines can be spotted easily
Monitoring the datacenter
![Page 77: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/77.jpg)
77 | Copyright © 2017 Criteo
Drilling down...
Monitoring the datacenter
![Page 78: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/78.jpg)
78 | Copyright © 2017 Criteo
Until finding a likely culprit
Monitoring the datacenter
![Page 79: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/79.jpg)
79 | Copyright © 2017 Criteo
And switching to micro analysis to find the root cause• Process Explorer• Profiling• Windbg• ClrMD
Monitoring the datacenter
![Page 80: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/80.jpg)
80 | Copyright © 2017 Criteo
Load Balancing
HA Proxy
![Page 81: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/81.jpg)
81 | Copyright © 2017 Criteo
Basic of Client Side Load Balancing
![Page 82: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/82.jpg)
82 | Copyright © 2017 Criteo
Basic of Client Side Load Balancing
![Page 83: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/83.jpg)
83 | Copyright © 2017 Criteo
Mixed technical specifications
![Page 84: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/84.jpg)
84 | Copyright © 2017 Criteo
Gen8 Load test
![Page 85: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/85.jpg)
85 | Copyright © 2017 Criteo
• This is a bullet• 2nd level bullet
Gen8 vs Gen9 servers
![Page 86: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/86.jpg)
86 | Copyright © 2017 Criteo
Observable result
2/3
1/3
![Page 87: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/87.jpg)
87 | Copyright © 2017 Criteo
Conclusion
Do not take your software for granted• Internal Infrastructure will change• External workload will change
… be prepared
![Page 88: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/88.jpg)
88 | Copyright © 2017 Criteo
The Analytics Stack at Criteo
Yesterday, Today and Tomorrow with an assist from Bill MurrayJustin Coffey, Team Lead
![Page 89: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/89.jpg)
89 | Copyright © 2017 Criteo
The Ghost of Christmas Present
What do we have now?
![Page 90: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/90.jpg)
90 | Copyright © 2017 Criteo
Criteo: Scale of Data
• 4 Billion ads served each day
• 200+ Billion events logged each day
• 50TBs of data ingested each day
• 10 trillion records processed each day
![Page 91: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/91.jpg)
91 | Copyright © 2017 Criteo
Criteo: Scale of the Analytics Stack
50+ TB ingested / day
2000+ jobs / day
7+PB
UnderManagement
200+ Analysts400+ Engineers
1000+Sales and Ops
![Page 92: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/92.jpg)
92 | Copyright © 2017 Criteo
Criteo: Scaling Analysts
Sep 20
10
Nov 20
10
Jan 2
011
Mar 20
11
May 20
11
Jul 2
011
Sep 20
11
Nov 20
11
Jan 2
012
Mar 20
12
May 20
12
Jul 2
012
Sep 20
12
Nov 20
12
Jan 2
013
Mar 20
13
May 20
13
Jul 2
013
Sep 20
13
Nov 20
13
Jan 2
014
Mar 20
14
May 20
14
Jul 2
014
Sep 20
14
Nov 20
14
Jan 2
015
Mar 20
15
May 20
15
Jul 2
015
Sep 20
15
Nov 20
15
Jan 2
016
Mar 20
160
20
40
60
80
100
120
140
160
180
Analysts Hired since 2010
![Page 93: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/93.jpg)
93 | Copyright © 2017 Criteo
Criteo: Scaling Data
7/13/1
48/3
/14
8/24/1
4
9/14/1
4
10/5/
14
10/26
/14
11/16
/14
12/7/
14
12/28
/14
1/18/1
52/8
/153/1
/15
3/22/1
5
4/12/1
55/3
/15
5/24/1
5
6/14/1
57/5
/15
7/26/1
5
8/16/1
59/6
/15
9/27/1
5
10/18
/15
11/8/
15
11/29
/15
12/20
/15
1/10/1
6
1/31/1
6
2/21/1
6
3/13/1
64/3
/16
4/24/1
6
5/15/1
66/5
/16
6/26/1
6
7/17/1
68/7
/16
8/28/1
6
9/18/1
60
20000000000
40000000000
60000000000
80000000000
100000000000
120000000000
140000000000
Growth of a Single Dataset Since July 2014
![Page 94: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/94.jpg)
94 | Copyright © 2017 Criteo
Criteo: The Analytics Stack Today
Ad-HocAnalysis
Hadoop for primary storage and point of ingestion
Data Transformation on top of Hadoop
Hive (7PB) and Vertica (100+ TB) Data Warehouses
Ad-Hoc SQL on Hive and Vertica, Reporting on Tableau and Vertica
Orchestration via Langoustine
![Page 95: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/95.jpg)
95 | Copyright © 2017 Criteo
Our Stack is Simple
• Few moving parts
• Purposefully built with Shiny Thing blinders on
• It's okay to not have the "latest and greatest" tech
• Good enough is, actually, always good enough
![Page 96: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/96.jpg)
96 | Copyright © 2017 Criteo
On Shiny Things: the universe is vast
so be selective, and master what you select
![Page 97: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/97.jpg)
97 | Copyright © 2017 Criteo
The Ghost of Christmas PastBefore we continue, a quick history lesson of how we got here is in order...
![Page 98: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/98.jpg)
98 | Copyright © 2017 Criteo
Everything starts somewhere
and it's not always pretty.
![Page 99: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/99.jpg)
99 | Copyright © 2017 Criteo
In early 2013, you could use SQL Server…
AdServer_Db
Publisher_DbLogStatus_Db
BlogWidgetStat_Db
BlogWidgetAdStat_dbTraffic_custom_dbExtranet_DbTraffic_custom_db
CATEGORY_DB
Mail_MonitorDB
Inventory_Db
AdServerBo_Db
AdServerStat_Db
DashBoard_DB
Dashboard_Security_DB
WebServerStat_db
ABTesting_DB
AdvertiserFatigueStats_db
ADVERTISING_DB
StatPrediction_DB
CAST_DB
CriteoRefdb
ImportDB
RISK_DBGalacticaStats_DBMaxCpc_DB
UserProfilingDB
WorkflowPersistency_db
CAST_DB_HOURLYStatEngine_Db
Crawler_Db
BICustom_DB
Lookalike_DB
Widget_db
AOC_DB
AOC_DB
Build_Deploy_Fake_db
publisher_stats_db
TestFwk_Db
LogMonitorDb
ADMINLOGS_DB
SqoopExport_db
FraudDetection_db
HPClink_DB
DW_DB
tsuissesbenl_stat_dbHeyokr_Stat_dbkiabiit_stat_dbUltaus_Stat_dbCrutchfieldus_Stat_dbForzierijp_Stat_dbRetailchoiceuk_Stat_dbRyanairhotelses_Stat_dbSpeakyplanetfr_Stat_dbAutowayjp_Stat_dbSicilianobr_Stat_dbJukenhousingjp_Stat_dbCosyforyoufr_Stat_dbTripadvisorru_Stat_dbLinasmatkassese_Stat_dbEllepassionsfr_Stat_dbSkyde_Stat_dbSwimdoctormallkr_Stat_dbSitescoutbr_Stat_dbTravelzoousnewusers_Stat_dbPlatekompanietno_Stat_dbTestaoc110413frcom_Stat_dbMegapoolnl_Stat_dbElektrototaalmarktnl_Stat_dbIntersportuk_Stat_dbUsineadesignfr_Stat_dbLekmerno_Stat_dbVuelingit_Stat_db
Valuedopinions_Stat_dbForzierino_Stat_dbArtisantiuk_Stat_dbIdbusit_Stat_dbCocostorykr_Stat_dbArtnaturejp_Stat_dbByggmaxse_Stat_dbCorporatecriteopmit_Stat_dbAramisauto_Stat_dbMigoaes_Stat_dbDegrotespeelgoedwinkelnl_Stat_dbDiorcouturit_Stat_dbKaufuniquede_Stat_dbCodigallerykr_Stat_dbMandarinaduckfr_Stat_dbComarketingorangenokiafr_Stat_dbSinbiangkr_Stat_dbCheapflightsuk_Stat_dbUndergirlkr_Stat_dbAgradinl_Stat_dbKofferprofide_Stat_dbDomodipl_Stat_dbMandarinaduckat_Stat_dbMobilegermany_Stat_dbChlit_Stat_dbSpreadshirtuk_Stat_dbCasalrunningfr_Stat_dbBloomfm_Stat_db
Hotelsbe_Stat_dbStrumentimusicaliit_Stat_dbBathroomworlduk_Stat_dbVerivoxde_Stat_dbMcmkr_Stat_dbViaggiedreamsit_Stat_dbBrille24de_Stat_dbYjgakuseikaikan_Stat_dbStylepitnl_Stat_dbCvlibraryrecruiter_Stat_dbPreis24de_Stat_dbTigershedsuk_Stat_dbDuvetandpillowuk_Stat_dbNoths_Stat_dbWizwidkr_Stat_dbTicketonlinede_Stat_dbLifestyleeuropeuk_Stat_dbShopeccose_Stat_dbSwanhellenicuk_Stat_dbDeguisementdiscountfr_Stat_dbFreshcottonnl_Stat_dbTikamoonfr_Stat_dbTestfp1_Stat_dbwarehouse_stat_dbHisjeans_Stat_dbMountfieldlawnmowers_Stat_dbSitescoutnl_Stat_dbLancomeus_Stat_db
Brandelijp_Stat_dbMesdessousfr_Stat_dbBeautyplanningjp_Stat_dbLgcobrandingpriceminister_Stat_dbStockngous_Stat_dbKickzde_Stat_dbRockymountaindecorus_Stat_dbCellbesse_Stat_dbYvesrocheres_Stat_dbToshibadirectjp_Stat_dbSeneukr_Stat_dbWaterfeaturesuk_Stat_dbCottagesforyouuk_Stat_dbCamif_Stat_dbLojaskdbr_Stat_dbHipmunkhotels_Stat_dbSorteonline_Stat_dbEdiets_Stat_dbBonsportru_Stat_dbJobjsenjp_Stat_dbRedcoonit_Stat_dbHmuk_Stat_dbSrtestcetelem2_Stat_dbIamprettykr_Stat_dbLebunnybleushopkr_Stat_dbCondenastit_Stat_dbHotusaes_Stat_dbChilitvit_Stat_db
Hellinefr_Stat_dbCobrasonfr_Stat_dbmadeindesign_stat_dbMegagadgetsnl_Stat_dbTodaofertabr_Stat_dbbulbus_Stat_dbCalcioshopit_Stat_dbEdenlyes_Stat_dbRecruiterucajp_Stat_dbEngelhornde_Stat_dbSpreadshirtno_Stat_dbDusparstde_Stat_dbTabletbr_Stat_dbVentesecretfr_Stat_dbVenteunique_Stat_dbDellchde_Stat_dbDressforlessnl_Stat_dbMultipopkr_Stat_dballheartus_Stat_dbTrovitdejobs_Stat_dblesjeudisfr_stat_dbExpediaukcrosssell_Stat_dbFurniturebrituk_Stat_dbYooxbe_Stat_dbSkyscannerno_Stat_dbBluetomatoat_Stat_dbMechakaitaijp_Stat_dbDestinationlightingus_Stat_db
and 10K+ more
![Page 100: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/100.jpg)
100 | Copyright © 2017 Criteo
SQL Server was Production Infrastructure
• Analyst access to data was an afterthought
• Production databases were not designed for analytics
• Reports and queries were tightly coupled to production
• UX was low and Analysts occasionally broke production systems!
![Page 101: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/101.jpg)
101 | Copyright © 2017 Criteo
Hive also made an early appearance…
2013-04-22 11:28:59,942 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:01,010 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:02,071 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:03,134 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:04,876 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:05,112 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:06,047 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec2013-04-22 11:29:06,984 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 365222.27 sec
ZZZZ…
![Page 102: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/102.jpg)
102 | Copyright © 2017 Criteo
But Hive was also an afterthought
• Raw production data batch loaded with no transformations
• Query tools were non-existant
• Queries were slow and only expert analysts could run them
• UX and productivity were extremely low
![Page 103: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/103.jpg)
103 | Copyright © 2017 Criteo
This just wasn't working!we needed a new approach
![Page 104: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/104.jpg)
104 | Copyright © 2017 Criteo
First things firstwe need a database!
![Page 105: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/105.jpg)
105 | Copyright © 2017 Criteo
Requirements for an Analytic Database
• It must be extremely fast
• It must be able to store our most actionable data sets• Dozens (at the time!) of TBs, now hundreds
• It must be queryable with proper SQL
• It must be deployable on hardware we specify
![Page 106: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/106.jpg)
106 | Copyright © 2017 Criteo
Defining a Proof of Concept Evaluation
• Work with Analysts to identify key data sets
• Analyze query patterns
• Define benchmark queries
• Work with vendors to test closed source solutions
• Test OSS in-house
![Page 107: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/107.jpg)
107 | Copyright © 2017 Criteo
The results
• Vertica struck the right balance between cost, performance and deployment options
• PoC evaluation took ~3 months
• Initial deployment took another ~3 months
• Operations ramped up over the following ~6 months
![Page 108: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/108.jpg)
108 | Copyright © 2017 Criteo
Working with Analysts during deployment
• Analysts in the team helped define and document the data model
• They also created training materials
• Training was done in concert with engineers
![Page 109: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/109.jpg)
109 | Copyright © 2017 Criteo
But was it a success?
• Within a year of the rollout we were able to decomission SQL server for analytics
• Today Vertica has over 100 unique ad-hoc users connected each day
• It executes hundreds of thousands of queries each day
• It is the most important piece of analytics infrastructure at Criteo
![Page 110: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/110.jpg)
110 | Copyright © 2017 Criteo
A fresh deployment to mature infrastructure
• Vertica at Criteo has scaled from ~12TB to ~120TB (going PB soon)
• Ad-hoc users have grown from ~40 to ~200
• Reporting users have grown from ~300 to ~1500
• The number of tables has grown from ~50 to >500
![Page 111: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/111.jpg)
111 | Copyright © 2017 Criteo
Wait, 500 tables in 3 years?
That's a lot of data modelling!
![Page 112: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/112.jpg)
112 | Copyright © 2017 Criteo
Analysts contribute to the data model
• Engineers know how the DB works and know how to optimize a data model, but they don't always know what to put in it
• With good tools Analysts contribute to the evolutions of the data model, including schema additions and modifications
• Engineers in the team can help guide them in the finer details
• Rinse and repeat
![Page 113: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/113.jpg)
113 | Copyright © 2017 Criteo
Side bar: We also had dashboards with SSRS
But we were told it was ugly and complicated.
We traded ugly for slow, btw, and it's still complicated
![Page 114: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/114.jpg)
114 | Copyright © 2017 Criteo
From SSRS to Tableau and SQL Server to Vertica
• Actually, "slow" is just our current perception—we had SSRS dashboards with timeouts on the order of hours.
• SSRS served as our de facto ETL between those 10K+ SQL Server DBs
• Those SQL Server DBs were also production databases.
![Page 115: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/115.jpg)
115 | Copyright © 2017 Criteo
So to Summarize the Past
• Analysts had to query across thousands of DBs
• Dashboards were slow and complicated
• Analytics work was strongly coupled to production
life was great back then wasn't it?
![Page 116: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/116.jpg)
116 | Copyright © 2017 Criteo
We're done then?Not quite. Things can go awry!
![Page 117: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/117.jpg)
117 | Copyright © 2017 Criteo
The Ghost of Christmas Future
...here's hoping it's a near future...
![Page 118: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/118.jpg)
118 | Copyright © 2017 Criteo
Criteo is World Wide
We have hundreds of analysts spread across dozens of countries!
![Page 119: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/119.jpg)
119 | Copyright © 2017 Criteo
Criteo has a Rich Product Offering
• Banner Ads, Mobile, In-App, Email, Search
• 10's of Thousands of Advertisers and Publishers
• Some of them very big and very demanding
![Page 120: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/120.jpg)
120 | Copyright © 2017 Criteo
And (reminder!) our Scale Never Seems to Stop Growing
7/13/1
48/3
/14
8/24/1
4
9/14/1
4
10/5/
14
10/26
/14
11/16
/14
12/7/
14
12/28
/14
1/18/1
52/8
/153/1
/15
3/22/1
5
4/12/1
55/3
/15
5/24/1
5
6/14/1
57/5
/15
7/26/1
5
8/16/1
59/6
/15
9/27/1
5
10/18
/15
11/8/
15
11/29
/15
12/20
/15
1/10/1
6
1/31/1
6
2/21/1
6
3/13/1
64/3
/16
4/24/1
6
5/15/1
66/5
/16
6/26/1
6
7/17/1
68/7
/16
8/28/1
6
9/18/1
60
20000000000
40000000000
60000000000
80000000000
100000000000
120000000000
140000000000
Growth of a Single Dataset Since July 2014
![Page 121: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/121.jpg)
121 | Copyright © 2017 Criteo
(reminder #2) Number of analysts hired since 2010
Sep 20
10
Nov 20
10
Jan 2
011
Mar 20
11
May 20
11
Jul 2
011
Sep 20
11
Nov 20
11
Jan 2
012
Mar 20
12
May 20
12
Jul 2
012
Sep 20
12
Nov 20
12
Jan 2
013
Mar 20
13
May 20
13
Jul 2
013
Sep 20
13
Nov 20
13
Jan 2
014
Mar 20
14
May 20
14
Jul 2
014
Sep 20
14
Nov 20
14
Jan 2
015
Mar 20
15
May 20
15
Jul 2
015
Sep 20
15
Nov 20
15
Jan 2
016
Mar 20
160
20
40
60
80
100
120
140
160
180
![Page 122: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/122.jpg)
122 | Copyright © 2017 Criteo
What could go wrong?
![Page 123: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/123.jpg)
123 | Copyright © 2017 Criteo
New Challenges
• With so many hungry analysts to feed and with so much volume and variety of data, Vertica's query planner is working over time
• We need to instrument and monitor more
• We need to level-up analysts' SQL skills
• And yes, finally, we do need some data governance*
*oh how I've resisted this day!
![Page 124: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/124.jpg)
124 | Copyright © 2017 Criteo
2 Analysts and 3 Engineers ain't gonna cut it
• We have scaled up our PM team
• We are moving from a proto-CoE team to an official CoE team
• We are scaling engineering operations
![Page 125: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/125.jpg)
125 | Copyright © 2017 Criteo
What's on the TODO list?
• Documentation, and automating it as much as possible
• Non-invasive, but very intimate query monitoring
• Workload isolation
• Query suggestions and preëmptive query blocking
![Page 126: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/126.jpg)
126 | Copyright © 2017 Criteo
More about query inspection
• No matter how wonderful a database may be its performance comes down to how much IO it has and how much contention there is for it
• The difference between a poorly optimized query and a well optimized one for the IO subsystem can be orders of magnitude
• Better queries means more concurrent, happier users
![Page 127: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/127.jpg)
127 | Copyright © 2017 Criteo
More about query inspection
• Vertica offers lots of ways to find out what is going on behind the scenes, but one of the best ways is to EXPLAIN your users' queries and identify
those who need to be trained!
![Page 128: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/128.jpg)
128 | Copyright © 2017 Criteo
Recalling our Current Challenges
• Tableau Workbooks are Slow
• Vertica is Overloaded
• Reporting Data is Frequently Late
![Page 129: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/129.jpg)
129 | Copyright © 2017 Criteo
Patches and the Arc of History
• Each of our currently challenges can be addressed in the short term
• But we need long term solutions to avoid regressions
![Page 130: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/130.jpg)
130 | Copyright © 2017 Criteo
Tableau Relief Program (TaRP)
Short Term:• Double the cores on production server• Isolate critical workbooks
Medium Term:• Require all production workbooks to go
through gerrit/git review• Score workbook complexity pre-release• Monitor released workbooks for QoS
Not So Long Term:• Work with Product and Central Ops to create
Tableau Center of Excellence and level up BI
![Page 131: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/131.jpg)
131 | Copyright © 2017 Criteo
TaRP: reporting alchemy
Push to production
Productive Analyst
AngrySales Person
No SLAdataset
Productive Analyst
HappySales Person
SLAdataset
Push to review Automated deploy
Knowledgeable Analyst
Compliance checks
passed
Peer-reviewed
![Page 132: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/132.jpg)
132 | Copyright © 2017 Criteo
Why impose a dev cycle on report building?
not to be trite, but, well:
that's good money!
![Page 133: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/133.jpg)
133 | Copyright © 2017 Criteo
More seriously
• Tableau workbooks consume data
• Data comes in all sorts of volumes and velocities (sorry)
• Data query complexity is linked to workbook complexity and features
• If you don't know what you're doing, your workbooks will be:• slow, because of internal workbook complexity• slow, because of complex database queries• not be up to date if it doesn't query the proper data sources
Tableau workbook developers are developers, full stop. Treat them like they are.
![Page 134: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/134.jpg)
134 | Copyright © 2017 Criteo
Consul
Vertica Roadmap
RTIngester
HD
FSIn
gest
er
HLL
JDB
C
VProxy
Adm
in
VIcO
JVMIngester
DataDisco
![Page 135: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/135.jpg)
135 | Copyright © 2017 Criteo
Vertica as a Service
Short Term:• Scale out as fast as reasonable• Split reporting and ad hoc workloads• Better hardware configuration• More monitoring
Not So Long Term:• Better monitoring• Control Input: Trickle and Bulk Loading, Consistently, Durably and Efficiently• Control Output: Query inspection/prioritization, Workload management
![Page 136: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/136.jpg)
136 | Copyright © 2017 Criteo
Fixing Your Latent Data Problem
Short Term:• Migrate critical data workflows to Langoustine• Optimize DAG and long running queries
Medium Term:• Migrate long-tail datasets to Langoustine• Better metrics, capacity planning
Not So Long Term:• Refactor data model to cull useless data sets• Better complexity analysis of workflow modifications pre-release
![Page 137: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/137.jpg)
137 | Copyright © 2017 Criteo
We're going to need better instrumentation
Better Workflow Insights in Langoustine Better Hadoop Job Performance Metrics
![Page 138: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/138.jpg)
138 | Copyright © 2017 Criteo
Let's spend less time making data workflows
Langoustine IDE makes building Hive workflows trivial
![Page 139: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/139.jpg)
139 | Copyright © 2017 Criteo
Langoustine IDE promotes best practices
Workflows are source controlled:
Reviews are built-in:
![Page 140: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/140.jpg)
140 | Copyright © 2017 Criteo
We'll need better dev tools (eg dev-cluster)
build an AWS hadoop cluster:
connect to it via a local docker container:
and load it with data saved in S3:
![Page 141: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/141.jpg)
141 | Copyright © 2017 Criteo
SLAB: SLA Boards That Say A Lot
![Page 142: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/142.jpg)
142 | Copyright © 2017 Criteo
Wait, what about Opera and Vizatra?didn't you guys do a lot of work on that?
![Page 143: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/143.jpg)
143 | Copyright © 2017 Criteo
A Quick Opera Recap
Opera is the internal replacement for CPOP, built in two partsA scalding-langoustine data pipeline: And a vizatra-OLAP web app:
![Page 144: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/144.jpg)
144 | Copyright © 2017 Criteo
We learned a lot from building Opera
• How to use SQL to describe a dashboard
• How to master SQL queries executed from an OLAP app
• How to build big, fast databases
• How to build optimal (or so we think) data processing pipelines
• How to make a decent UI with decent UX
![Page 145: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/145.jpg)
145 | Copyright © 2017 Criteo
Let's focus on the SQL stuff
![Page 146: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/146.jpg)
146 | Copyright © 2017 Criteo
Using SQL for dashboard meta-data
SELECT time_id as hour, country_code as country, network_id as network, SUM(clicks) as clicks, SUM(displays) as displays, SUM(clicks) / SUM(displays) as ctrFROM factsWHERE time_id BETWEEN ?start AND ?endGROUP BY time_id, country_code, network_id
Time dimensions
Dimensions
Metrics
Parameters
![Page 147: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/147.jpg)
147 | Copyright © 2017 Criteo
Using SQL for dashboard meta-data
Time dimension
Dimensions
Metrics
Parameters
![Page 148: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/148.jpg)
148 | Copyright © 2017 Criteo
Big-O(lap)
SELECT time_id as hour, country_code as country, network_id as network, SUM(clicks) as clicks, SUM(displays) as displays, SUM(clicks) / SUM(displays) as ctrFROM factsWHERE time_id BETWEEN ?start AND ?endGROUP BY time_id, country_code, network_id
PROJECTION Revenue by countrySELECTIONLast 7 days in EUR
![Page 149: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/149.jpg)
149 | Copyright © 2017 Criteo
Big-O(lap)
SELECT time_id as hour, country_code as country, network_id as network, SUM(clicks) as clicks, SUM(displays) as displays, SUM(clicks) / SUM(displays) as ctrFROM factsWHERE time_id BETWEEN ?start AND ?endGROUP BY time_id, country_code, network_id
PROJECTION Revenue by countrySELECTIONLast 7 days in EUR
![Page 150: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/150.jpg)
150 | Copyright © 2017 Criteo
Big-O(lap)
SELECT country_code as country, SUM(clicks) as clicks, SUM(displays) as displaysFROM factsWHERE time_id BETWEEN ‘2016-03-01’ AND ‘2016-03-07’GROUP BY country_code
PROJECTION Revenue by countrySELECTIONLast 7 days in EUR
![Page 151: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/151.jpg)
151 | Copyright © 2017 Criteo
Now that we've gotten intimate with SQL...Let's see what else we can build...
![Page 152: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/152.jpg)
152 | Copyright © 2017 Criteo
Vizatra Client: One DB Client to Rule Them All
![Page 153: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/153.jpg)
153 | Copyright © 2017 Criteo
Vizatra Client: One DB Client to Rule Them All
• Parse every query and analyze complexity before executing it
• Enforce best practices (e.g. predicates on partitions)
• Degrade gracefully (e.g. don't submit queries to an overloaded DB)
• Score users and queries, share with other users
• Provide basic visualizations to increase analytic productivity
• Support non-SQL datasources
• And your feature?
![Page 154: Criteo Infrastructure (Platform) Meetup](https://reader035.vdocuments.mx/reader035/viewer/2022062310/58b8a35a1a28abc06d8b58ad/html5/thumbnails/154.jpg)
154 | Copyright © 2017 Criteo
The End.Thanks for listening. If any of this sounds fun, we're hiring!