![Page 1: CloudCamp Chicago lightning talk "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock](https://reader035.vdocuments.mx/reader035/viewer/2022071820/55b562d0bb61eb4a4c8b47ec/html5/thumbnails/1.jpg)
Redshift: Lessons Learned
Tristan Crockett – Software Engineer, Edgeflip
![Page 2: CloudCamp Chicago lightning talk "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock](https://reader035.vdocuments.mx/reader035/viewer/2022071820/55b562d0bb61eb4a4c8b47ec/html5/thumbnails/2.jpg)
Basics
● Analytical database● PostgreSQL with column storage engine● Automatic Data compression● No traditional indexes; specify a sort key (how
are records in the table sorted?) and distribution key (which node will house a record?)
![Page 3: CloudCamp Chicago lightning talk "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock](https://reader035.vdocuments.mx/reader035/viewer/2022071820/55b562d0bb61eb4a4c8b47ec/html5/thumbnails/3.jpg)
My Work with Redshift
● Data warehouse for Facebook user feeds and related app data
● Inputs– RDS (MySQL)
– DynamoDB
● Stats– ~2TB of compressed data
– Two main tables, ~5bil and ~25bil rows respectively
![Page 4: CloudCamp Chicago lightning talk "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock](https://reader035.vdocuments.mx/reader035/viewer/2022071820/55b562d0bb61eb4a4c8b47ec/html5/thumbnails/4.jpg)
Advantages / Disadvantages
● Fast at copying data in from S3● Fast at computing aggregate/analytical
functions over an entire table● Decent at intra-db operations (create table as
select, insert into select)● Most everything else is slow● Without traditional indexes, table design isn't as
flexible
![Page 5: CloudCamp Chicago lightning talk "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock](https://reader035.vdocuments.mx/reader035/viewer/2022071820/55b562d0bb61eb4a4c8b47ec/html5/thumbnails/5.jpg)
Lessons/Tips
● Optimize load size (1 MB to 1 GB per file)● Compress input● Upsert when needed, and always vacuum● Don't populate tables with 'CREATE TABLE AS'
if you like compression (which you do)● To avoid complicated joins, consider computing
single-table aggregates and join on the results
![Page 6: CloudCamp Chicago lightning talk "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock](https://reader035.vdocuments.mx/reader035/viewer/2022071820/55b562d0bb61eb4a4c8b47ec/html5/thumbnails/6.jpg)
Upsert
![Page 7: CloudCamp Chicago lightning talk "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock](https://reader035.vdocuments.mx/reader035/viewer/2022071820/55b562d0bb61eb4a4c8b47ec/html5/thumbnails/7.jpg)
Keep an Eye on Compression
![Page 8: CloudCamp Chicago lightning talk "Building warehousing systems on Redshift" - Tristan Crockett, Software Engineer at Edgeflip @thcrock](https://reader035.vdocuments.mx/reader035/viewer/2022071820/55b562d0bb61eb4a4c8b47ec/html5/thumbnails/8.jpg)
Single-Table Aggregates