![Page 1: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/1.jpg)
The Evolution of Netflix’s S3 Data Warehouse
Ryan Blue & Dan WeeksSeptember 2018 - Strata NY
![Page 2: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/2.jpg)
Overview
● Netflix Architecture
● S3 Data Warehouse
● Iceberg Tables
● What’s Next
![Page 3: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/3.jpg)
Netflix Architecture
September 2018 - Strata NY
![Page 4: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/4.jpg)
Cloud native data warehouse
![Page 5: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/5.jpg)
● Separate Compute and Storage
● Isolate Different Workloads
● Single Source of Truth
Architectural Principles
![Page 6: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/6.jpg)
● S3 as storage layer○ Metadata in Hive Metastore
● EC2 as compute layer○ Hadoop + YARN
● Spark, Presto (and a little Hive and Pig)
Tech Stack
![Page 7: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/7.jpg)
S3 Data Warehouse
September 2018 - Strata NY
![Page 8: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/8.jpg)
Hadoop file system compatibility with S3
![Page 9: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/9.jpg)
S3HDFScreate()
open()
listStatus()
delete()
rename()
REST.PUT.OBJECT
REST.GET.OBJECT
REST.GET.BUCKET
REST.DELETE.OBJECT
REST.COPY.OBJECT + REST.DELETE.OBJECT
S3 as a File System
![Page 10: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/10.jpg)
What about performance?
![Page 11: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/11.jpg)
● Performance○ Individual operations take longer○ Some operations do not map cleanly○ Break contracts to optimize
● Commit Path○ Relies on expensive rename○ Creates multiple copies with versioning
Performance & Compatibility
![Page 12: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/12.jpg)
● The “batch” pattern○ Never delete data as part of a job○ Always write data to new paths○ Atomically swap data locations
● S3 Committer○ Use features like multi-part upload○ Allows for “append” support
Optimizing Commits
![Page 13: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/13.jpg)
What about consistency?
![Page 14: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/14.jpg)
● Overlay a consistent view of metadata○ Track file system metadata externally○ Expire old metadata and rely on S3
● Check listings against consistent system○ Fail or delay until view is consistent○ Manually resolve collisions
Consistent Listing (s3mper)
![Page 15: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/15.jpg)
● Maintenance Cost is High○ Custom changes per execution engine○ Never implemented in Presto or Hive○ Behaviors differ slightly by implementation
● Platform issues are surfaced to users○ Append is not atomic○ Automatic overwrite○ Table operations can be inconsistent
Challenges
![Page 16: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/16.jpg)
● File System○ Works around differences in behaviors○ Trades correctness for fewer S3 calls
● s3mper○ Works around S3 prefix-listing inconsistency
● S3 committers and Batch Pattern○ Works around lack of atomic changes to file listings○ Works around lack of cheap rename in S3○ Needed to avoid using S3 file system for silly operations
Common Threads
![Page 17: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/17.jpg)
Maybe the problem is usingS3 as a file system?
![Page 18: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/18.jpg)
Why are we using S3 this way?
![Page 19: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/19.jpg)
Iceberg
September 2018 - Strata NY
![Page 20: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/20.jpg)
● Key idea: organize data in a directory tree
date=20180513/ |- hour=18/ | |- ... |- hour=19/ | |- part-000.parquet
| |- ... | |- part-031.parquet |- hour=20/ | |- ... |- ...
Hive Table Design
![Page 21: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/21.jpg)
● Filter by directories as columnsSELECT ... WHERE date = '20180513' AND hour = 19
date=20180513/ |- hour=18/ | |- ... |- hour=19/ | |- part-000.parquet
| |- ... | |- part-031.parquet |- hour=20/ | |- ... |- ...
Hive Table Design
![Page 22: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/22.jpg)
● Table state is stored in two places○ Partitions in the Hive Metastore○ Files in a FS with no transaction support
● Still requires directory listing to plan jobs○ O(n) listing calls, n = # matching partitions○ Eventual consistency breaks correctness
● Requires elaborate locking for “correctness”○ Nothing respects the locking scheme
Design Problems
![Page 23: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/23.jpg)
● Key idea: track all files in a table over time
○ A snapshot is a complete list of files in a table○ Each write produces and commits a new snapshot
Iceberg’s Design
S1 S2 S3 ...
![Page 24: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/24.jpg)
● Snapshot isolation without locking○ Readers use a current snapshot○ Writers produce new snapshots in isolation, then commit
● Any change to the file list is an atomic operation○ Append data across partitions○ Merge or rewrite files
Snapshot Design Benefits
S1 S2 S3 ...
R W
![Page 25: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/25.jpg)
Design Benefits
● No expensive or eventually-consistent FS operations:○ No directory or prefix listing○ No rename: data files written in place
● Reads and writes are isolated and all changes are atomic
● Faster scan planning, distributed across the cluster○ O(1) manifest reads, not O(n) partition list calls○ Upper and lower bounds used to eliminate files
● Reliable CBO metrics
![Page 26: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/26.jpg)
Iceberg replaces s3mper, batch pattern, and S3 committers
![Page 27: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/27.jpg)
Want more specifics?
Come to the Iceberg talk!
At 5:25 today in 1E09
![Page 28: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/28.jpg)
What’s next?
September 2018 - Strata NY
![Page 29: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/29.jpg)
● New to Hadoop? Big data is great! Just remember . . .○ You need to know the physical layout of tables you read○ Make sure you don’t write too many files – or too few○ Appends are actually overwrites, except in Presto○ Don’t write from Presto (but nothing will stop you)○ You shouldn’t use timestamps or nested types○ You can’t drop columns in CSV tables○ And by CSV, we don’t really mean CSV○ You can’t rename columns in JSON tables○ If you rename columns in Parquet,
either Presto or Spark will work, but not both○ . . .
Today: A narrow paved path
![Page 30: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/30.jpg)
● Hidden partitioning○ Partition filters derived from data filters○ No more accidental full table scans
● Full schema evolution○ Supports add, drop, and rename columns
● Reliable support for types○ date, time, timestamp, and decimal○ struct, list, map, and mixed nesting
While we’re fixing tables . . .
![Page 31: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/31.jpg)
● Queries are not broken by layout changes
● Physical layout can evolve without painful migration○ Mistakes can be fixed○ Prototypes can move to production faster○ Tables can change as volume grows over time
● Data Platform can transparently fix table layout
Table Layout is Hidden
![Page 32: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/32.jpg)
● Any write is atomic – either complete or invisible○ Rewrite files instead of partitions○ Tables never have partially committed data
● Simple, built-in change detection○ Cache and materialized view maintenance○ Incremental processing
● Data Platform can monitor and fix data files○ Compact small files○ Repartition to a new layout
Snapshot-based Tables
![Page 33: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/33.jpg)
● Common implementation for table operations○ Write settings are per table, like row group size○ Read defaults are set in one place, like split combination
● Simple data gathering○ Log scan predicates and projection to Kafka○ Recommend optimizations based on actual use
● Data Platform can automate tuning recommendations○ Test file format tuning settings per table○ Update table to affect all writes
Table Format Library
![Page 34: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/34.jpg)
Questions?
September 2018 - Strata NY
![Page 35: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/35.jpg)
Additional Iceberg Slides
September 2018 - Strata NY
![Page 36: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/36.jpg)
● Historical Atlas data:○ Time-series metrics from Netflix runtime systems○ 1 month: 2.7 million files in 2,688 partitions○ Problem: cannot process more than a few days of data
● Sample query:
select distinct tags['type'] as typefrom iceberg.atlaswhere name = 'metric-name' and date > 20180222 and date <= 20180228order by type;
Case Study: Atlas
![Page 37: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/37.jpg)
● Hive table – with Parquet filters:○ 400k+ splits per day, not combined○ EXPLAIN query: 9.6 min (planning wall time)
● Iceberg table – partition data filtering:○ 15,218 splits, combined○ 13 min (wall time) / 61.5 hr (task time) / 10 sec (planning)
● Iceberg table – partition and min/max filtering:○ 412 splits○ 42 sec (wall time) / 22 min (task time) / 25 sec (planning)
Case Study: Atlas Performance
![Page 38: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/38.jpg)
● Implementation of snapshot-based tracking○ Adds table schema, partition layout, string properties○ Tracks old snapshots for eventual garbage collection
● Table metadata is immutable and always moves forward● The current snapshot (pointer) can be rolled back
Iceberg Metadata
v1.json
S1 S2
v2.json
S1 S2 S3
v3.json
S2 S3
![Page 39: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/39.jpg)
● Snapshots are split across one or more manifest files○ Manifests store partition data for each data file○ Reused to avoid high write volume
Manifest Files
v1.json
S1 S2
v2.json
S1 S2 S3
v3.json
S2 S3
m0.avro m1.avro m2.avro
![Page 40: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/40.jpg)
● Basic data file info:○ File location and format○ Iceberg tracking data
● Values to filter files for a scan:○ Partition data values○ Per-column lower and upper bounds
● Metrics for cost-based optimization:○ File-level: row count, size○ Column-level: value count, null count, size
Manifest File Contents
![Page 41: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/41.jpg)
● To commit, a writer must:○ Note the current metadata version – the base version○ Create new metadata and manifest files○ Atomically swap the base version for the new version
● This atomic swap ensures a linear history
● Atomic swap is implemented by:○ A custom metastore implementation○ Atomic rename for HDFS or local tables
Commits
![Page 42: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/42.jpg)
● Writers optimistically write new versions:○ Assume that no other writer is operating○ On conflict, retry based on the latest metadata
● To support retry, operations are structured as:○ Assumptions about the current table state○ Pending changes to the current table state
● Changes are safe if the assumptions are all true
Commits: Conflict Resolution
![Page 43: Data Warehouse The Evolution of Netflix’s S3 - O'Reilly...Data Warehouse Ryan Blue & Dan Weeks September 2018 - Strata NY Overview Netflix Architecture S3 Data Warehouse Iceberg](https://reader036.vdocuments.mx/reader036/viewer/2022071415/611146abf5c6391fa7109b75/html5/thumbnails/43.jpg)
● Use case: safely merge small files○ Merge input: file1.avro, file2.avro○ Merge output: merge1.parquet
● Rewrite operation:○ Assumption: file1.avro and file2.avro are still present○ Pending changes:
Remove file1.avro and file2.avroAdd merge1.parquet
● Deleting file1.avro or file2.avro will cause a commit failure
Commits: Resolution Example