memcache @ facebook
TRANSCRIPT
-
8/2/2019 Memcache @ Facebook
1/47
memcache@facebook
Marc Kwiatkowski
memcache tech lead
QCon 2010
-
8/2/2019 Memcache @ Facebook
2/47
How big is facebook?
-
8/2/2019 Memcache @ Facebook
3/47
-
8/2/2019 Memcache @ Facebook
4/47
Objects
More than 60 million status updates posted each day 694/s
More than 3 billion photos uploaded to the site each month
23/s More than 5 billion pieces of content (web links, news stories,
blog posts, notes, photo albums, etc.) shared each week
8K/s
Average user has 130 friends on the site
50 Billion friend graph edges
Average user clicks the Like button on 9 pieces of content
each month
-
8/2/2019 Memcache @ Facebook
5/47
- Infrastructure
Thousands of servers in several data centers in two regions Web servers
DB servers
Memcache Servers Other services
-
8/2/2019 Memcache @ Facebook
6/47
The scale of memcache @ facebook
Memcache Ops/s over 400M gets/sec
over 28M sets/sec
over 2T cached items over 200Tbytes
Network IO
peak rx 530Mpkts/s 60GB/s peak tx 500Mpkts/s 120GB/s
-
8/2/2019 Memcache @ Facebook
7/47
A typical memcache servers P.O.V.
Network I/O rx 90Kpkts/s 9.7MB/s
tx 94Kpkts/s 19MB/s
Memcache OPS 80K gets/s
2K sets/s
200M items
-
8/2/2019 Memcache @ Facebook
8/47
Evolution of facebooksarchitecture
-
8/2/2019 Memcache @ Facebook
9/47
-
8/2/2019 Memcache @ Facebook
10/47
Scaling Facebook: Interconnecteddata
Bob
-
8/2/2019 Memcache @ Facebook
11/47
Scaling Facebook: Interconnecteddata
Bob Brian
-
8/2/2019 Memcache @ Facebook
12/47
Scaling Facebook: Interconnecteddata
Bob BrianFelicia
-
8/2/2019 Memcache @ Facebook
13/47
Memcache Rules of the Game
GET object from memcache on miss, query database and SET object to memcache
Update database row and DELETE object in memcache
No derived objects in memcache Every memcache object maps to persisted data in database
-
8/2/2019 Memcache @ Facebook
14/47
Scaling memcache
-
8/2/2019 Memcache @ Facebook
15/47
Phatty Phatty Multiget
QuickTime and aH.264 decompressor
are needed to see this picture.
-
8/2/2019 Memcache @ Facebook
16/47
Phatty Phatty Multiget (notes)
PHP runtime is single threaded and synchronous To get good performance for data-parallel operations like
retrieving info for all friends, its necessary to dispatch
memcache get requests in parallel
Initially we just used polling I/O in PHP.
Later we switched to true asynchronous I/O in a PHP C
extension
In both case the result was reduced latency throughparallelism.
-
8/2/2019 Memcache @ Facebook
17/47
Pools and Threads
PHP Client
-
8/2/2019 Memcache @ Facebook
18/47
PHP Client
sp:12345
sp:12346 sp:12347cs:12345
cs:12346 cs:12347
-
8/2/2019 Memcache @ Facebook
19/47
PHP Client
sp:12345 sp:12346 sp:12347 cs:12345 cs:12346 cs:12347
-
8/2/2019 Memcache @ Facebook
20/47
PHP Client
-
8/2/2019 Memcache @ Facebook
21/47
Pools and Threads (notes)
Privacy objects are small but have poor hit rates User-profiles are large but have good hit rates
We achieve better overall caching by segregating different
classes of objects into different pools of memcache servers
Memcache was originally a classic single-threaded unix
daemon
This meant we needed to run 4 instances with 1/4 the RAM
on each memcache server
4X the number of connections to each both
4X the meta-data overhead
We needed a multi-threaded service
-
8/2/2019 Memcache @ Facebook
22/47
-
8/2/2019 Memcache @ Facebook
23/47
Connections and Congestion (notes)
As we added web-servers the connections to each memcachebox grew.
Each webserver ran 50-100 PHP processes
Each memcache box has 100K+ TCP connections
UDP could reduce the number of connections
As we added users and features, the number of keys per-
multiget increased
Popular people and groups
Platform and FBML
We began to see incast congestion on our ToR switches.
UDP allowed us to do congestion detection and admission-
-
8/2/2019 Memcache @ Facebook
24/47
Serialization and Compression
We noticed our short profiles werent so short 1K PHP serialized object
fb-serialization
based on thrift wire format
3X faster
30% smaller
gzcompress serialized strings
-
8/2/2019 Memcache @ Facebook
25/47
Multiple Datacenters
SF Web
SFMemcache
SCMemcache
SC Web
SC MySQL
Memcache ProxyMemcache Proxy
-
8/2/2019 Memcache @ Facebook
26/47
Multiple Datacenters (notes)
In the early days we had two data-centers
The one we were about to turn off
The one we were about to turn on
Eventually we outgrew a single data-center Still only one master database tier
Rules of the game require that after an update we need to
broadcast deletes to all tiers
The mcproxy era begins
-
8/2/2019 Memcache @ Facebook
27/47
Multiple Regions
SF Web
SFMemcache
SCMemcache
SC Web
SC MySQL
Memcache ProxyMemcache Proxy
MySql replication
East Coast
VA MySQL
VA Web
VAMemcache
Memcache Proxy
West Coast
-
8/2/2019 Memcache @ Facebook
28/47
Multiple Regions (notes)
Latency to east coast and European users was/is terrible.
So we deployed a slave DB tier in Ashburn VA
Slave DB tracks syncs with master via MySQL binlog
This introduces a race condition mcproxy to the rescue again
Add memcache delete pramga to MySQL update and insert
ops
Added thread to slave mysqld to dispatch deletes in east
coast via mcpro
-
8/2/2019 Memcache @ Facebook
29/47
key
Replicated Keys
key key key
PHP Client PHP ClientPHP Client
Memcache MemcacheMemcacheMemcache
-
8/2/2019 Memcache @ Facebook
30/47
key
Replicated Keys
key#0 key#1 key#3
PHP Client PHP ClientPHP Client
MemcacheMemcache Memcache
-
8/2/2019 Memcache @ Facebook
31/47
-
8/2/2019 Memcache @ Facebook
32/47
Memcache Rules of the Game
New Rule
If a key is hot, pick an alias and fetch that for reads
Delete all aliases on updates
-
8/2/2019 Memcache @ Facebook
33/47
Mirrored Pools
General pool with wide fanout
Shard 1 Shard 2
Specialized Replica 2
Shard 1 Shard 2
Shard 1 Shard 2 Shard 3 Shard n
Specialized Replica 1
...
-
8/2/2019 Memcache @ Facebook
34/47
Mirrored Pools (notes)
As our memcache tier grows the ratio of keys/packet
decreases
100 keys/1 server = 1 packet
100 keys/100 server = 100 packets
More network traffic
More memcache server kernel interrupts per request
Confirmed Info - critical account meta-data
Have you confirmed your account?
Are you a minor?
Pulled from large user-profile objects
-
8/2/2019 Memcache @ Facebook
35/47
Hot Misses
[animation]
-
8/2/2019 Memcache @ Facebook
36/47
Hot Misses (notes)
Remember the rules of the game
update and delete
miss, query, and set
When the object is very, very popular, that query rate can kill
a database server
We need flow control!
M h R l f h G
-
8/2/2019 Memcache @ Facebook
37/47
Memcache Rules of the Game
For hot keys, on miss grab a mutex before issuing db query
memcache-add a per-object mutex
key:xxx => key:xxx#mutex
If add succeeds do the query
If add fails (because mutex already exists) back-off and try
again
After set delete mutex
H D l
-
8/2/2019 Memcache @ Facebook
38/47
Hot Deletes
[hot groups graphics]
H t D l t ( t )
-
8/2/2019 Memcache @ Facebook
39/47
Hot Deletes (notes)
Were not out of the woods yet
Cache mutex doesnt work for frequently updated objects
like membership lists and walls for viral groups and
applications.
Each process that acquires a mutex finds that the object has
been deleted again
...and again
...and again
R l f th G C hi I t t
-
8/2/2019 Memcache @ Facebook
40/47
Rules of the Game: Caching Intent
Each memcache server is in the perfect position to detect and
mitigate contention
Record misses
Record deletes
Serve stale data
Serve lease-ids
Dont allow updates without a valid lease id
-
8/2/2019 Memcache @ Facebook
41/47
Next Steps
Sh i g M h T ffi
-
8/2/2019 Memcache @ Facebook
42/47
Shaping Memcache Traffic
mcproxy as router
admission control
tunneling inter-datacenter traffic
Cache Hierarchies
-
8/2/2019 Memcache @ Facebook
43/47
Cache Hierarchies
Warming up Cold Clusters
Proxies for Cacheless Clusters
Big Low Latency Clusters
-
8/2/2019 Memcache @ Facebook
44/47
Big Low Latency Clusters
Bigger Clusters are Better
Low Latency is Better
L2.5
UDP
Proxy Facebook Architecture
Worse IS better
-
8/2/2019 Memcache @ Facebook
45/47
Worse IS better
Richard Gabriels famous essay contrasted
ITS and Unix
LISP and C
MIT and New Jersey
Why Memcache Works
-
8/2/2019 Memcache @ Facebook
46/47
Why Memcache Works
Uniform, low latency with partial results is a better user
experience
memcache provides a few robust primitives
key-to-server mapping
parallel I/O
flow-control
traffic shaping
that allow ad hoc solutions to a wide range of scaling issues
-
8/2/2019 Memcache @ Facebook
47/47
(c) 2010 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0