Download - IronSource Atom - Redshift - Lessons Learned
![Page 1: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/1.jpg)
All content is the property and proprietary interest of CloudZone, The removal of any proprietary notices, including attribution information, is strictly prohibited.
![Page 2: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/2.jpg)
All content is the property and proprietary interest of CloudZone, The removal of any proprietary notices, including attribution information, is strictly prohibited.
Big Data Month 2016 – Up Next…
15.11
22.11
22.11
28.11 30.11
14.11
![Page 3: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/3.jpg)
All content is the property and proprietary interest of CloudZone, The removal of any proprietary notices, including attribution information, is strictly prohibited.
13:00 – 13:20 Intro to Amazon Redshift by IronSource13:20 – 15:00 LAB I – Using Amazon RedShift15:00 – 15:15 Break15:15 – 17:25 LAB II – Table Layout and Schema Design with Amazon Redshift17:25 – 17:30 Your next steps on AWS by CloudZone
Master AWS Redshift - Agenda
![Page 4: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/4.jpg)
Shimon Tolts General Manager, Data Solutions
AtomData Pipeline Processing 200B events
with Node.js And Docker On AWS
![Page 5: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/5.jpg)
About ironSource: Hypergrowth
People Reached Each Month
4200Apps Installed Every Minutewith the ironSource Platform
Registered & Analyzed Data EventsEvery Month
200B
800M
50B
0
100B
150B
200B
Jun 201
5
Jul 201
5
Aug 201
5
Sep 201
5
Oct 201
5
Nov 201
5
Dec 201
5
Jan 201
6
Feb 201
6
Mar 201
6
Apr 201
6
May 201
6
![Page 6: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/6.jpg)
We needed a way to manage this data:
Our Business Challenge
ProcessCollect Store
![Page 7: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/7.jpg)
![Page 8: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/8.jpg)
Collection
● Multi region layer - Latency based
routing
● Low latency from client to Atom servers
● High Availability - AWS regions does
fail!
● Storing raw data + headers upon
receiving
![Page 9: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/9.jpg)
Data Enrichment● Enrich data before storing in your Data
Lake and/or Warehouse○ IP to Country○ Currency conversion ○ Decrypt data○ User Agent parsing - OS, Browser, Device...
● Any custom logic you would like! - fully extendible
![Page 10: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/10.jpg)
Data Targets● Near real-time data insertion - 1
minute!● Stream data to Google Storage and/or
AWS S3● Smart insertion of data into AWS
Redshift○ Set the amount of parallel copys○ Configure priority on tables
● BigQuery - Streaming data using batch files import (saves 20% cost)
![Page 11: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/11.jpg)
![Page 12: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/12.jpg)
Micro-Services Architecture● Everything is a service● Decoupling● Distributed systems
Separate lifecycle● Communication using RESTful /
Queue / Streams
![Page 13: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/13.jpg)
Docker● Linux Container● Save provisioning time● Infrastructure as code● Dev-Test-Production - identical
container● Ship easily
![Page 14: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/14.jpg)
Cloud infrastructure● Pay as you go - (grow)● SaaS services ● Auto-scaling-groups● DynamoDB● RDS *SQL● Redshift data warehouse
![Page 15: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/15.jpg)
Continuous Integration● From commit to production● Jenkins commit hook● Git branching model● AWS dynamic slaves● Unit tests● Docker builds● Updating live environment
![Page 16: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/16.jpg)
Diagram
![Page 18: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/18.jpg)
● Xplenty - hadoop service - ~40min query● One big cluster - 96 xlarge nodes● No WLM configuration● CSV copy● No reserved nodes● different ETL process implemented by every department.
STARTING POINT
![Page 21: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/21.jpg)
● using 8xlnodes if needed● Redshift cluster per department● “hot and cold” clusters - SSD: fast and furios, HDD: slow but cheap● WLM configuration● Reserved Nodes● JSON copy● One pipeline to rule them all - ironBeast - currently supporting over 50B events per month. inserting data to more than 10 Redshift clusters.
SOLUTION:
![Page 23: IronSource Atom - Redshift - Lessons Learned](https://reader036.vdocuments.mx/reader036/viewer/2022062523/586fd91a1a28ab18428b5835/html5/thumbnails/23.jpg)
THINGS WE LEARNED ALONG THE WAY● https://github.com/awslabs/amazon-redshift-utils (AdminViews)
● users permissions does not apply on new tables created in a schema
● Vacuum Vacuum Vacuum
● Avoid parallel inserts (especially in 8xl nodes) - if you copy to multiple tables, it is better to
implement a COPY queue
● STL_LOAD_ERRORS - money on the floor
● Columnar datastore does not mean you can use as much columns as you want - it is better to
split to multiple tables.
● Encode your columns - ‘analyze compression’
● instances that query Redshift should use MTU 1500 - link