massively distributed backups at facebook scale - shlomo priymak, facebook - devopsdays tel aviv...
TRANSCRIPT
Massively Distributed Backupat Facebook Scale
Shlomo Priymak ([email protected], @shlomoid)Production Engineering Manager, MySQL Infrastructure
MySQL at Facebook
Sharding
fbid is 64bit integermap(fbid) = shard id
{ "id": "101231234567123", "name": "Shlomo Priymak" }
graph API Examplegraph.facebook.com/me
serverinstance
shard #4
shard #3
shard #2
shard #1
serverinstance
shard #4
shard #3
shard #2
shard #1
serverinstance
shard #4
shard #3
shard #2
shard #1
serverinstance
shard #4
shard #3
shard #2
shard #1
Master
Slaves
Replica Set
Prineville, Oregon
Altoona, Iowa
Forest City, North Carolina
Ashburn, Virginia
Luleå, Sweden
100
1000+
Backup Fundamentals
• `mysqldump` • --single-transaction • Logical Read Ahead
Full Dumps
Logical vs. Physical
Logical Physical
External Tools Yes No
Size Small Large
Single Table Restore Easy Difficult
Debug Corruption Easy Difficult
Compressibility Excellent Meh
Backup / Restore Duration Long Short
Differential Backup
Differential Backup
0
2
4
6
8
% of space taken by differential backups
Day 1 Day 2 Day 3 Day 40
25
50
75
100
Relative backup space usage
Day 1 Day 2 Day 3 Day 4
Full Backup Differential Backup
Differential Backup Generation
INSERT INTO t VALUES (1, ‘San Fransisco’), (2, ‘Santa Clara`), (400, ‘Los Angeles’), [...] );
INSERT INTO t1 VALUES ( );
INSERT INTO t1 VALUES ( );
INSERT INTO t VALUES (1, ‘San Fransisco’), (2, ‘Oakland`), (3, ‘Menlo Park’), [...] );
No Change
Inserted Rows
CREATE TABLE t (id int, city char(50); /* ORDERING KEY : (id) */
Full Backup (old) Full Backup (new)
Deleted Rows
Differential Backup Generation
INSERT INTO t VALUES (1, ‘San Fransisco’), (2, ‘Santa Clara`), (400, ‘Los Angeles’), [...] );
INSERT INTO t1 VALUES ( (2, ‘Santa Clara’), );
INSERT INTO t1 VALUES ( (2, ‘OakLand’), );
INSERT INTO t VALUES (1, ‘San Fransisco’), (2, ‘Oakland`), (3, ‘Menlo Park’), [...] );
Row Updated
Inserted RowsDeleted Rows
Full Backup (old) Full Backup (new)
CREATE TABLE t (id int, city char(50); /* ORDERING KEY : (id) */
Differential Backup Generation
INSERT INTO t VALUES (1, ‘San Fransisco’), (2, ‘Santa Clara`), (400, ‘Los Angeles’), [...] );
INSERT INTO t1 VALUES ( (2, ‘Santa Clara’), );
INSERT INTO t1 VALUES ( (2, ‘OakLand’), (3, ‘Menlo Park’), );
INSERT INTO t VALUES (1, ‘San Fransisco’), (2, ‘Oakland`), (3, ‘Menlo Park’), [...] ); Row Deleted
Inserted RowsDeleted Rows
Full Backup (old) Full Backup (new)
CREATE TABLE t (id int, city char(50); /* ORDERING KEY : (id) */
Differential Backup Generation
INSERT INTO t VALUES (1, ‘San Fransisco’), (2, ‘Santa Clara`), (400, ‘Los Angeles’), [...] );
INSERT INTO t1 VALUES ( (2, ‘Santa Clara’), (400, ‘Los Angeles’), );
INSERT INTO t1 VALUES ( (2, ‘OakLand’), (3, ‘Menlo Park’), );
INSERT INTO t VALUES (1, ‘San Fransisco’), (2, ‘Oakland`), (3, ‘Menlo Park’), [...] ); Row Inserted
Inserted RowsDeleted Rows
Full Backup (old) Full Backup (new)
CREATE TABLE t (id int, city char(50); /* ORDERING KEY : (id) */
Final Output
INSERT INTO t1 VALUES ( (2, ‘Santa Clara’), (400, ‘Los Angeles’), );
INSERT INTO t1 VALUES ( (2, ‘OakLand’), (3, ‘Menlo Park’), );
Inserted RowsDeleted Rows
Restoring Diff Backup
INSERT INTO t1 VALUES ( (2, ‘Santa Clara’), (400, ‘Los Angeles’), );
Inserted Rows
INSERT INTO t1 VALUES ( (2, ‘OakLand’), (3, ‘Menlo Park’), );
Deleted Rows
INSERT INTO t VALUES (1, ‘San Fransisco’), (2, ‘Oakland`), (3, ‘Menlo Park’), [...] );
Full Backup (old)
3-Way Merge
INSERT INTO t VALUES (1, ‘San Fransisco’), (2, ‘Santa Clara`), (400, ‘Los Angeles’), [...] );
Full Backup (new)
• Point in time recovery • Global Transaction IDs
Binary Logs
Continuous Restore
∞
• Everything, Every Day • Streaming Binary Logs • Multiple stages
• HDFS • Offsite
Backup ScheduleWhat, When, Where
Backup ScheduleFull, Diff, Diff, Diff, Full, Diff, Diff, Diff, Full, Diff, Diff, Diff…
Full5
Diff6
Diff7
Diff8
Full9
Diff10
Diff11
Diff12
Full1
Diff2
Diff3
Diff4
Backup Traffic
3.5 Tb/sPeak
~0.5 Tb/sTrough
Differential Backup Stages
1) mysql HDFSFull (new)
2) HDFSFull (new)
Full (old)Diff (new) HDFSDiffer
• Too much HDFS I/O • Too much network I/O • Too long
#fail
Differential Backup Stages
1) mysql HDFSFull (new)
2) HDFSFull (new)
Full (old)Diff (new) HDFSDiffer
Differential Streaming
mysql
HDFS
Full (new)
Full (
old)Diff (new)Differ HDFS
Database Server
Prineville, Oregon
Altoona, Iowa
Forest City, North Carolina
Ashburn, Virginia
Luleå, Sweden
1. Target2. Source
• Equalize HDFS cluster usage • Minimize cross-region traffic • Avoid broken replicas • Consistency • Backup at least once!
System Design Goals
• Define allocation globally • Hash shards into a 1000 buckets • Allocate buckets to clusters,
proportional to size
Distribution Algorithm
serverinstance
shard 14
shard 13
shard 12
shard 11
shard 10
shard 9
shard 8
shard 7
shard 6
shard 5
shard 4
shard 3
shard 2
shard 1
998997 999
4
1000
321
shard 14
shard 13shard 12
shard 11
shard 10
shard 9 shard 8
shard 7 shard 6
shard 5
shard 4
shard 3
shard 2
shard 1
1000 Buckets
HDFS 2HDFS 1 HDFS 3 HDFS 4 HDFS 5
101 400100 401 600 601 850 851 1000
1 PB 3 PB 2 PB 2.5 PB 1.5 PB
1
100 buckets 300 buckets 200 buckets 250 buckets 150 buckets
Total buckets: 1000Total size: 10 PB (Example)
HDFS 2HDFS 1 HDFS 3 HDFS 4 HDFS 5
101 400100 401 600 601 850 851 1000
1 PB 3 PB 2 PB 2.5 PB 1.5 PB
Bucket 20Bucket 200
Bucket 500
Bucket 650
Bucket 900Bucket 30
Bucket 400
1
Bucket 700
Bucket 800
Unified Pool
Some are More Equal than Others?
sorted unsorted
sorted unsorted
Rebalance / Convergence
A
CB
1. C2. A3. B
1. C2. A3. B
1. C2. A3. B
Is Alive?
A
CB
αHDFS
ɣHDFS
Δ(B, α) = 10
Δ(A, α) = 0
Δ(A, ɣ) = 20
Δ(B, ɣ) = 15
Δ(C, ɣ) = 0
HDFS Priority
ɣ 0α 1
rank(mysql, hdfs) ≡ ( Δ(mysql, hdfs), priority(hdfs), )
𝕄 ={A, B, C}// MySQL Servers 𝓗 ={α, ɣ}// HDFS Clusters 𝕄×𝓗 ={(m, h): m∈𝕄 ∧ h∈𝓗} return sort(𝕄×𝓗, key=rank(m, h))
Δ priorityA, α 0 1A, ɣ 20 0B, α 10 1B, ɣ 15 0C, α 20 1C, ɣ 0 0
Δ priorityC, ɣ 0 0A, α 0 1B, α 10 1B, ɣ 15 0A, ɣ 20 0C, α 20 1
Pile it Up
A
CB
War Story: Cluster Turn Up
• New HDFS Cluster • New Datacenter • Slow Ramp-Up
New Region Cluster Turn Up
• Network woes • Pulling full backups to create diffs! • Fix: run full when target HDFS changes
New Region Cluster Turn Up
New Region Cluster Turn Up, cont.
New Region Cluster Turn Up, cont.
Call From the Engine Room
• Emergency meeting • Fix: Turn off a few racks
Divert Power to the Shields!
The Future!
• Record previous value by default • Binary Logs + Binary Logs => Diff • Full + Diff => Full • In theory, run full backup only once!
Row Based Binary Logs
Questions!
Shlomo Priymak ([email protected])@shlomoid