cloudcamp chicago lightning talk "building warehousing systems on redshift" - tristan...

Redshift: Lessons Learned

Tristan Crockett – Software Engineer, Edgeflip

Basics

● Analytical database● PostgreSQL with column storage engine● Automatic Data compression● No traditional indexes; specify a sort key (how

are records in the table sorted?) and distribution key (which node will house a record?)

My Work with Redshift

● Data warehouse for Facebook user feeds and related app data

● Inputs– RDS (MySQL)

– DynamoDB

– Facebook

● Stats– ~2TB of compressed data

– Two main tables, ~5bil and ~25bil rows respectively

Advantages / Disadvantages

● Fast at copying data in from S3● Fast at computing aggregate/analytical

functions over an entire table● Decent at intra-db operations (create table as

select, insert into select)● Most everything else is slow● Without traditional indexes, table design isn't as

flexible

Lessons/Tips

● Optimize load size (1 MB to 1 GB per file)● Compress input● Upsert when needed, and always vacuum● Don't populate tables with 'CREATE TABLE AS'

if you like compression (which you do)● To avoid complicated joins, consider computing

single-table aggregates and join on the results

Upsert

Keep an Eye on Compression

Single-Table Aggregates

Thanks for Listening!

[email protected]

@thcrock

cloudcamp chicago lightning talk "building warehousing systems on redshift" - tristan...

Technology