mixi jp scaling out with open source
TRANSCRIPT
![Page 2: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/2.jpg)
Introduction
•Batara Kesuma•CTO of mixi, Inc.
![Page 3: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/3.jpg)
What is mixi?
•Social networking service• Diary, community, message, review, photo
album, etc.
• Invitation only
•Largest and fastest growing SNS in Japan
![Page 4: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/4.jpg)
![Page 5: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/5.jpg)
Latest information- Friends new diary- Comments history- Communities topics- Friends new reviews- Friends new albums
My latest diaries and reviews
User Testimonials
Friends
Community listing
![Page 6: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/6.jpg)
History of mixi
•Development started in December 2003• Only 1 engineer (me)
• 4 months of coding
•Opened on February 2004
![Page 7: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/7.jpg)
Two months later
•10,000 users•600,000 PV/day
![Page 8: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/8.jpg)
The “Oh crap!” factor
•This model works•But how do we scale out?
![Page 9: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/9.jpg)
The first year
•The online population of mixi grew significantly
•600 users to 210,000 users
![Page 10: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/10.jpg)
The second year
•210,000 users to 2 million users
![Page 11: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/11.jpg)
And now?
![Page 12: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/12.jpg)
More than 3.7 million users15,000 new users/day
Population of Japan is: 127 millionInternet users: 86.7 million
Source CIA Factbook
![Page 13: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/13.jpg)
70% of active users(last login less than 72 hours)
![Page 14: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/14.jpg)
Average user spends 3 hours 20 minutes on mixi
per week
![Page 15: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/15.jpg)
Ranked 35th on Alexa worldwide, and 3rd in
Japan
![Page 16: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/16.jpg)
PV growth in 2 years
Google Japan
mixi
Amazon Japan
![Page 17: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/17.jpg)
Users growth in 2 years
0
875,000
1,750,000
2,625,000
3,500,000
04/03 05/03 06/03
Users
![Page 18: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/18.jpg)
Our technologysolutions
![Page 19: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/19.jpg)
The technology behind
•Linux 2.6•Apache 2.0
•MySQL
•Perl 5.8•memcached
•Squid
![Page 20: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/20.jpg)
mod_proxy
mod_perl
diary cluster message cluster
images
other cluster
HOT OBJECTS
memcached
REQUEST REQUEST
Powered by
![Page 21: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/21.jpg)
MySQL
•More than 100 MySQL servers•Add more than 10 servers/month
•Non-persistent connection
•Mostly InnoDB•Heavily rely on the use of DB partitioning (our own solution)
![Page 22: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/22.jpg)
DB replication
•MySQL server load gets heavy•Add more slaves
mod_perl
DB
RE
QU
ES
T
DB
Replicate
QUERY (WRITE)
QUERY (READ)
![Page 23: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/23.jpg)
DB replication•Classic problem with DB replication
100 reads/s
50 writes/s
MASTER
50 reads/s
50 writes/s
50 reads/s
50 writes/s
SLAVES
100 reads/s
50 writes/s
MASTER
25 reads/s
50 writes/s
25 reads/s
50 writes/s
25 reads/s
50 writes/s
25 reads/s
50 writes/s
SLAVES
![Page 24: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/24.jpg)
Some statistics•Diary related tables
•Read 85%•Write 15%
•Message related tables •Read 75%•Write 25%
![Page 25: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/25.jpg)
DB partitioning
•Replication couldn’t keep up anymore
•Try to split the DB
![Page 26: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/26.jpg)
How to split?
DB
message tables
diary tables
other tables
user A user B user C
Splitting vertically by users or splitting horizontally by table types
![Page 27: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/27.jpg)
Vertical partition
DB
message tables
diary tables
other tables
user A user B user C
DB 1 DB 2
![Page 28: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/28.jpg)
Vertical partition
•Too many tables to deal with at one time
•The transition in splitting gets complex and difficult
![Page 29: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/29.jpg)
Horizontal partition
message tables
OLD DB
other tables
diary tables
Also called level 1 partitioning within mixi
message tables
NEW DB
$dbh = $db->load_dbh(type => “message”);
$dbh = $db->load_dbh();
diary tables
NEW DB
$dbh = $db->load_dbh(type => “diary”);
![Page 30: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/30.jpg)
Partition map for level 1
•Small and static•Just put it in configuration file
•For example:$DB_DIARY = ‘DBI:mysql:host=db1;database=diary’;$DB_MESSAGE = ‘DBI:mysql:host=db2;database=message’;...
![Page 31: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/31.jpg)
Easy transition
OLD DB NEW DB
mod_perlW
RITE
READ
WRITE
1 Writes to both DBs
SELECTINSERT IGNORE
2 Copies in background
READ
3Shifts reads
![Page 32: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/32.jpg)
Problems with level 1
•Cannot use JOIN anymore• Use FEDERATED TABLE from MySQL 5
• Or do SELECT twice which is faster than using FEDERATED TABLEs
• If table is small, just duplicate it
![Page 33: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/33.jpg)
Next step
•When the new DB gets overloaded
•We split the DB, yet again•Get ready for level 2
![Page 34: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/34.jpg)
message tables
user id
Partitioning key
•user id, message id•Choose wisely!
user A user B
message id
message tablesor
![Page 35: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/35.jpg)
Level 2 partition
message tables
LEVEL 1 DB
user A user B user C user D
message tables
NODE 1NEW DB message tables
NODE 2
![Page 36: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/36.jpg)
Partition map for level 2
•Big and dynamic•Cannot put it all in configuration file
![Page 37: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/37.jpg)
Partition map for level 2
•Manager based• Use another DB to do the partition
mapping
•Algorithm based• Partition map is counted inside
application
• node_id = member_id % TOTAL_NODE
![Page 38: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/38.jpg)
Manager based
message tables
NODE 1
message tables
NODE 2
message tables
NODE 3
MANAGER DB
mod_perl
user_id=14
1 Asks for node_id
node_id=22 Returns node_id
3 Connects to node
![Page 39: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/39.jpg)
Algorithm based
message tables
NODE 1
message tables
NODE 2
message tables
NODE 3
mod_perl
node_id=(user_id%3)+1node_id=3
1 Computes node_id
number of nodes = 3
2 Connects to node
![Page 40: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/40.jpg)
Manager based
•Pros:• Easy to manage
• Add a new node, move data between nodes
•Cons:• This process increases by 1 query for
partition map
• It needs to send a request to the manager
![Page 41: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/41.jpg)
Algorithm based
•Cons:• Difficult to manage
• Adding new nodes is tricky
•Pros:• Application servers can compute node id
by themselves
• Bypass the connection to the manager
![Page 42: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/42.jpg)
Adding nodes is tricky
mod_perl
NODE 1
NODE 2+
NODE 3
NODE 4
READWRITE
2 Writes to both DBsif node_id is different
old_node_id=(member_id%2)+1
WRITE
number of nodes = 2
new_node_id=(member_id%4)+1
number of nodes = 41 Adds a new application logic C
OP
Y
CO
PY
3 Copies in background
READ4 Shifts reads
![Page 43: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/43.jpg)
Problems with level 2
NODE 1member tables
NODE 2member tables
NODE 3member tables
• Too many connections to different DBs
• Fortunately, on mixi, the majority are small data sets
• Cache them all by using distributed memory caching
• We rarely hit the DB
NODE 1community tables
NODE 2community tables
• Average page load time is about 0.02 sec*
* depending on data sets average load time may vary
![Page 44: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/44.jpg)
Caching
•memcached• Also used in LiveJournal, Slashdot, etc
•Install server on mod_perl machine
•39 machines x 2 GB memory
![Page 45: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/45.jpg)
Summary of DB partitioning
•Level 1 partition (split by table types)
•Level 2 partition (split by partitioning key)•Manager based•Algorithm based
![Page 46: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/46.jpg)
LEVEL 1message tables
1 Split by table types
Summary of DB partitioninguser A user B user C
message tables
OLD DB
other tables
diary tables
message tables
LEVEL 2
2
message tables
LEVEL 2
Split by partitioning key
![Page 47: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/47.jpg)
Image Servers
![Page 48: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/48.jpg)
Statistics
•Total size is more than 8 TB of storage
•Growth rate is about 23 GB / day•We use MySQL to store metadata only
![Page 49: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/49.jpg)
Two types of images
•Frequently accessed images• Number of image files is relatively small
(about a few million files)
• For example, user profile photos, community logos
•Rarely accessed images• About hundred millions of image files
• Diary photos, album photos, etc
![Page 50: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/50.jpg)
Frequently accessed images
•Few hundred GBs of files•Distribute via the use of FTP and Squid
•Third party Content Delivery Network
![Page 51: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/51.jpg)
Frequently accessed images
mod_perl Storage
Squid CDNsto1.mixi.jp sto2.mixi.jp
UPLOAD
1 Uploads to storage
2 Pull images from storage
![Page 52: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/52.jpg)
Rarely accessed images
•Few TBs of files•Newer files get accessed more often
•Cache hit ratio is very bad
•Distribute directly from storage
![Page 53: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/53.jpg)
Uploading rarely accessed images
mod_perl
MANAGERDB
Storagesto1.mixi.jp
Storagesto2.mixi.jp
Storagesto3.mixi.jp
Storagesto4.mixi.jp
abc.gif
1 Assigns a id for an image file
area_id=1,2
2 Arranges a pair of area_id
UPLOAD
UPLOAD
3 Uploads image to storage
![Page 54: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/54.jpg)
Viewing rarely accessed images
Storagesto1.mixi.jp
Storagesto2.mixi.jp
Storagesto3.mixi.jp
Storagesto4.mixi.jp
User
mod_perl
MANAGERDB
Asks for view_diary.pl
1
2 Detects abc.gif in view_diary.pl
abc.gifAsks for area_id 3
area_id =1
4 Returns area_id
Creates image URL
5
Returns view_diary.pland URL for abc.gif
6
Asks for abc.gif7
Returns abc.gif8
![Page 55: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/55.jpg)
To do
•Try MySQL Cluster•Try to implement better algorithm• Consistent hashing?
• Linear hashing?
•Level 3 partitioning?• Split again by timestamp?
![Page 56: mixi jp scaling out with open source](https://reader034.vdocuments.mx/reader034/viewer/2022050807/54649886b4af9f493f8b4ad7/html5/thumbnails/56.jpg)
Questions?