zero-downtime rebalancing and data migration of a mature ......4 2 [email protected] daniella 5 3...
TRANSCRIPT
Zero-Downtime Rebalancing and Data Migration of a
Mature Multi-Shard Platform
F L O R I A N W E I N G A R T E N @fw1729
J U S T I N L I @pushrax
Zero-Downtime Rebalancing and Data Migration of a
Mature Multi-Shard Platform
2
Zero-Downtime Rebalancing and Data Migration of a
Mature Multi-Shard Platform
Zero-Downtime Rebalancing and Data Migration of a
Mature Multi-Shard Platform
Zero-Downtime Rebalancing and Data Migration of a
Mature Multi-Shard Platform
Zero-Downtime Rebalancing and Data Migration of a
Mature Multi-Shard Platform
Zero-Downtime Rebalancing and Data Migration of a
Mature Multi-Shard Platform
I N T R O D U C T I O NS H O P I F Y ’ S M U LT I - S H A R D P L AT F O R M
9
Shopify’s data axioms
• Many shops can share the same database shard, but…
• All of a shop’s data is stored in the same database shard
• Shop datasets are completely independent of each other, i.e., shop A’s data never references shop B’s data
• Every MySQL table has a shop_id column
• Every “unit of work” can only access data of one shop
10
Shard-aware primary key generation
id shop_id product_name
1 1 …
2 2 …
3 2 …
4 1 …
5 3 …
6 1 …
11
Shard-aware primary key generation
id shop_id product_name
1 1 …
2 2 …
3 2 …
4 1 …
5 3 …
6 1 …
id shop_id product_name
12
Shard-aware primary key generation
id shop_id product_name
1 1 …
2 2 …
3 2 …
4 1 …
5 3 …
6 1 …
id shop_id product_name
1 17 …
2 24 …
3 31 …
13
Shard-aware primary key generation
id shop_id product_name
1 1 …
2 2 …
3 2 …
4 1 …
5 3 …
6 1 …
id shop_id product_name
1 17 …
2 24 …
3 31 …
14
Shard-aware primary key generation
id shop_id product_name
2 1 …
4 2 …
6 2 …
8 1 …
10 3 …
… … …
2*j + 0 … …
id shop_id product_name
1 17 …
3 24 …
5 31 …
7 17 …
11 17 …
… … …
2*j + 1 … …
15
Shard-aware primary key generationShard 0 Shard 1 Shard 2 Shard 3
0 - - -
- 1 - -
- - 2 -
- - - 3
4 - - -
- 5 - -
- - 6 -
- - - 7
8 - - -
- 9 - -
- - 10 -
- - - 11
• shard i generates ids n*j + i
• id spaces are disjoint, no collisions
• 100% decentralized, no “id generator” authority required
• auto_increment_increment (n=4)
• auto_increment_offset (i)
16
Shard-aware request routing(simplified)
…
ROUTER
INTERNET
LB LB
WEB WEB WEB WEB WEB WEBWEB
…SHARD 1 SHARD 2 SHARD k
domain shop_id shard_id
foo.myshopify.com 1 1
bar.myshopify.com 2 3
fashionnova.com 3 17
startupsocks.com 4 1
snowdevil.com 5 27
kyliecosmetics.com 6 5
S H A R D I N G I N P R O D U C T I O NT H E P R O B L E M S N O B O D Y TO L D Y O U A B O U T
18
shard0
store.wikimedia.org
snowdevil.com
kyliecosmetics.com
lakersstore.com
shard0
shard1
19
shard0-copy
store.wikimedia.org
snowdevil.com
kyliecosmetics.com
lakersstore.com
shard0
store.wikimedia.org
snowdevil.com
kyliecosmetics.com
lakersstore.comshard0
store.wikimedia.org
snowdevil.com
kyliecosmetics.com
lakersstore.com
20
shard1
store.wikimedia.org
snowdevil.com
kyliecosmetics.com
lakersstore.com
shard0
store.wikimedia.org
snowdevil.com
kyliecosmetics.com
lakersstore.comshard0
store.wikimedia.org
snowdevil.com
kyliecosmetics.com
lakersstore.com
21
shard1
snowdevil.com
kyliecosmetics.com
shard0
store.wikimedia.org
lakersstore.com
shard1
store.wikimedia.org
snowdevil.com
kyliecosmetics.com
lakersstore.com
shard0
store.wikimedia.org
snowdevil.com
kyliecosmetics.com
lakersstore.comshard0
store.wikimedia.org
snowdevil.com
kyliecosmetics.com
lakersstore.com
22
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Tenants by shard
23
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Stored data by shard
24
shard0
id shop_id data
1 42 socks
3 77 laptop
5 42 hat
7 42 shirt
9 77 watch
shard1
id shop_id data
2 58 umbrella
4 58 vase
25
shard0
id shop_id data
1 42 socks
3 77 laptop
5 42 hat
7 42 shirt
9 77 watch
shard1
id shop_id data
2 58 umbrella
4 58 vase
26
shard0 shard1
id shop_id data
1 42 socks
2 58 umbrella
4 58 vase
5 42 hat
7 42 shirt
id shop_id data
1 42 socks
3 77 laptop
5 42 hat
7 42 shirt
9 77 watch
27
shard0 shard1
id shop_id data
1 42 socks
2 58 umbrella
4 58 vase
5 42 hat
7 42 shirt
id shop_id data
3 77 laptop
9 77 watch
28
data integrity > availability > throughput
AV O I D I N G L O S T W R I T E S
P R O B L E M 1 :
30
Avoiding lost writes• Problem: Can’t allow shop to get modified during move
• Ideas:
• Set database to readonly?
• Kill all ongoing work for the shop?
• Mark the shop as locked and don’t allow new work for it to start?
31
• Problem: Can’t allow shop to get modified during move
• Solution: “Readers-writers problem”
Avoiding lost writes
32
Shared/exclusive shop locking• Data structure:
• Any number of processes can acquire a shared lock concurrently, but only if no process is holding an exclusive lock.
• Only one process can acquire an exclusive lock, but only if no process holds a shared lock.
• Idea:
• Regular work that accesses shop data needs to "register" itself by acquiring a shared lock.
• Shop mover acquires exclusive lock before moving the shop.
33
Shared/exclusive shop locking• Idea:
• Regular work that accesses shop data needs to "register" itself by acquiring a shared lock.
• Some advice: • Safety depends on assumption that all work does this correctly • Make it hard for developers to circumvent this • Make this invisible and baked into your framework • Don’t underestimate required refactoring effort in legacy codebase
M I N I M I Z I N G D O W N T I M E
P R O B L E M 2 :
35
‣ copy data in batches
‣ replicate writes from the source
36
shard0
id shop_id data shard0 binlog
37
shard0
id shop_id data
1 42 socks
shard0 binlog
1 NULL 1 42 socks
38
shard0
id shop_id data
1 42 gloves
shard0 binlog
1 NULL 1 42 socks
2 1 42 socks 1 42 gloves
39
shard0
id shop_id data
1 42 gloves
3 77 laptop
5 42 hat
7 42 shirt
9 77 watch
shard0 binlog
6 NULL 9 77 watch
shard1
id shop_id data
2 58 umbrella
4 58 vase
40
shard0
id shop_id data
1 42 gloves
3 77 laptop
5 42 hat
7 42 shirt
9 77 watch
shard0 binlog
6 NULL 9 77 watch
shard1
id shop_id data
2 58 umbrella
4 58 vase
41
shard0
id shop_id data
1 42 gloves
3 77 laptop
5 42 hat
7 42 shirt
9 77 watch
shard1
id shop_id data
1 42 gloves
2 58 umbrella
4 58 vase
shard0 binlog
6 NULL 9 77 watch
42
shard0
id shop_id data
1 42 mitts
3 77 laptop
5 42 hat
7 42 shirt
9 77 watch
shard1
id shop_id data
1 42 gloves
2 58 umbrella
4 58 vase
shard0 binlog
6 NULL 9 77 watch
7 1 42 gloves 1 42 mitts
43
shard0
id shop_id data
1 42 mitts
3 77 laptop
5 42 hat
7 42 shirt
9 77 watch
shard1
id shop_id data
1 42 mitts
2 58 umbrella
4 58 vase
shard0 binlog
6 NULL 9 77 watch
7 1 42 gloves 1 42 mitts
44
shard0
id shop_id data
1 42 mitts
3 77 laptop
5 42 hat
7 42 pants
9 77 watch
shard1
id shop_id data
1 42 mitts
2 58 umbrella
4 58 vase
5 42 hat
7 42 shirt
shard0 binlog
6 NULL 9 77 watch
7 1 42 gloves 1 42 mitts
8 7 42 shirt 7 42 pants
45
shard0
id shop_id data
1 42 mitts
3 77 laptop
5 42 hat
7 42 pants
9 77 watch
shard1
id shop_id data
1 42 mitts
2 58 umbrella
4 58 vase
5 42 hat
7 42 pants
db0 binlog
6 NULL 9 77 watch
7 1 42 gloves 1 42 mitts
8 7 42 shirt 7 42 pants
shop locked, no new writes
0101100001100111000110110101111110111110000001110010101100001010000111100000101010100111110000001001111100011000111101011001010110001111110010110011111010000010010100000111010011110100010111010010010010100010101000110010111100110010010001111010110101111011010001001110011101100101110111110101100101110001100110101110010100100000110000000101101110111110101101110001000110011110011100001101100011110110001001101011101001101101110011000111001001111010110110101101101100111010111100101011100110000010110101001100111010110110010010001011100111011011011001011000100011011101011000010100100000000000011010001001110000010011101000011010000111011011110010111110000111001101100101100010100000011011001110100110110101000100110101101001001100110000110000011100100010110011001000011011110000101101101011000010110000000011000101001001101010010010010100011001010111001111011101101000011000111011101101100010011011011100001101111100000111111111100111110011010100001000000101001011000100111011100000111100000011110000111010011001100011110101011000111011101110010100100011010110110111111000000010110000000100001011110011001111100101001011110101110001101111010100100001011110000010001101001111110011011011010001111111001111100100101100101110000011000111111100000111100100010000110100001001101100110101000001101011101010101110110011000110110000010011111001000100000001101010010101
D ATA I N T E G R I T Y V E R I F I C AT I O N
P R O B L E M 3 :
47
Data integrity verification
id shop_id email name
1 1 [email protected] Simon
2 2 [email protected] Flo
3 1 [email protected] Bob
4 2 [email protected] Daniella
5 3 [email protected] Camilio
6 2 [email protected] Tobi
id shop_id email name
8 5 [email protected] Hormoz
2 2 [email protected] Flo
9 6 [email protected] Footer
4 2 [email protected] Daniella
7 8 [email protected] Dunnolol
6 2 [email protected] Tobi
48
Data integrity verification
id shop_id email name
1 1 [email protected] Simon
2 2 [email protected] Flo
3 1 [email protected] Bob
4 2 [email protected] Daniella
5 3 [email protected] Camilio
6 2 [email protected] Tobi
id shop_id email name
8 5 [email protected] Hormoz
2 2 [email protected] Flo
9 6 [email protected] Footer
4 2 [email protected] Daniella
7 8 [email protected] Dunnolol
6 2 [email protected] Tobi
49
Naive approach
• SELECT all of the shop’s data from the source shard
• SELECT all of the shop’s data from the destination shard
• Compare all rows client-side
• 😴
50
Idea: Fingerprinting verification
• Digest shop’s dataset to a fingerprint
• Such that A = B if (and only if) fingerprint(A) = fingerprint(B)
• Do this computation server-side (in MySQL)
51
Fingerprinting verification
id shop_id email name
1 1 [email protected] Simon
2 2 [email protected] Flo
3 1 [email protected] Bob
4 2 [email protected] Daniella
5 3 [email protected] Camilio
6 2 [email protected] Tobi
52
Fingerprinting verification
id shop_id email name
2 2 [email protected] Flo
4 2 [email protected] Daniella
6 2 [email protected] Tobi
53
Fingerprinting verification
id shop_id email name
2 2 [email protected] Flo
4 2 [email protected] Daniella
6 2 [email protected] Tobi
54
Fingerprinting verification
MD5(id) MD5(shop_id) MD5(email) MD5(name)
2 2 [email protected] Flo
4 2 [email protected] Daniella
6 2 [email protected] Tobi
55
Fingerprinting verification
MD5(id) MD5(shop_id) MD5(email) MD5(name)
c81e7 c81e72 1feb9 300bb
a87ff c81e7 f776f 8f216
616790 c81e7 43bc2 97d15
56
Fingerprinting verification
GROUP_CONCAT MD5(id)
GROUP_CONCAT MD5(shop_id)
GROUP_CONCAT MD5(email)
GROUP_CONCAT MD5(name)
c81e7 a87ff 616790
c81e72 c81e72 c81e72
1feb9 f776f 43bc2
300bb 8f216 97d15
57
Fingerprinting verification
MD5( GROUP_CONCAT
MD5(id))
MD5( GROUP_CONCAT MD5(shop_id))
MD5( GROUP_CONCAT MD5(email))
MD5( GROUP_CONCAT MD5(name))
MD5( c81e7 a87ff 616790
)
MD5( c81e72 c81e72 c81e72
)
MD5( 1feb9 f776f 43bc2 )
MD5( 300bb 8f216 97d15 )
58
Fingerprinting verification
MD5( GROUP_CONCAT
MD5(id))
MD5( GROUP_CONCAT MD5(shop_id))
MD5( GROUP_CONCAT MD5(email))
MD5( GROUP_CONCAT MD5(name))
19d25 3fea8 0a63d 5d38e
59
Fingerprinting verification
CONCAT( MD5(GROUP_CONCAT(MD5(id)),
MD5(GROUP_CONCAT(MD5(shop_id)), MD5(GROUP_CONCAT(MD5(email)), MD5(GROUP_CONCAT(MD5(name))
)
19d25 3fea8 0a63d 5d38e
60
Fingerprinting verification
MD5( CONCAT(
MD5(GROUP_CONCAT(MD5(id)), MD5(GROUP_CONCAT(MD5(shop_id)), MD5(GROUP_CONCAT(MD5(email)), MD5(GROUP_CONCAT(MD5(name))
))
MD5(19d25 3fea8 0a63d 5d38e)
61
Fingerprinting verification
MD5(...) AS fingerprint
5218e
62
Fingerprinting verification
63
Fingerprinting verification
• Heavy lifting happens in the database
• Light on the network
• O(M*N) hashing operations (M columns, N rows)
• 5x speedup vs. naive idea (same database load)
• Iterative re-verification as binlog changes come in
W H AT E L S E ?
65
• Queueing moves
• Orchestrating mover processes
• Interrupting and transferring background jobs
• Replicating other datastores (sessions, etc.)
• Dealing with schema migrations
T L ; D RS U M M A R Y A N D K E Y TA K E A W AY S
67
Sharding functions
68
ID generation
69
Data migrations
70
Shared/exclusive locking for safety
71
Replication logprovides minimal downtime
72
Fingerprinting verification
73
You probably don't need to build this
74
github.com/shopify/ghostferry
Thanks! Questions?
F L O R I A N W E I N G A R T E N @fw1729
Learn more:
Ghostferry: A Data Migration Tool for Incompatible Cloud Platforms Shuhao Wu, Percona Live 2018
Shopify's Move from the Data Centre to the Cloud Scott Francis, SREcon Asia 2018
Scaling Shopify's multi-tenant architecture across multiple data centers Florian Weingarten, O’Reilly Velocity NY 2016
Scripting NGINX for Overload Protection
Justin Li, nginx.conf 2016
J U S T I N L I @pushrax