your backend architecture is what matters slideshare

31
Your backend architecture is what matters Scaling your application Colin Charles, [email protected] @bytebot / http://bytebot.net/blog/ KL Facebook Developer Garage, February 26 2011

Upload: colin-charles

Post on 06-May-2015

3.603 views

Category:

Documents


0 download

DESCRIPTION

Overview of understanding your stack at the KL Facebook Developer Garage

TRANSCRIPT

Page 1: Your backend architecture is what matters slideshare

Your backend architecture is what matters

Scaling your applicationColin Charles, [email protected]

@bytebot / http://bytebot.net/blog/ KL Facebook Developer Garage, February 26 2011

Page 2: Your backend architecture is what matters slideshare
Page 3: Your backend architecture is what matters slideshare

What are you building?

• a bunch of static fbml pages + fql+ database triggers+views?

• the next cool game zynga wants to acquire?

• needs to be database driven with proper architecture planning

Page 4: Your backend architecture is what matters slideshare

http://xkcd.com/327/

Page 5: Your backend architecture is what matters slideshare

Don’t prematurely optimise

Just remember the 7P’s:Prior & Proper Planning Prevents Piss Poor Performance

Page 6: Your backend architecture is what matters slideshare

Reference Architectures

• Is there one for the Web world?

• Your choices are to:

• scale up

• scale out

• Which do you pick?

Page 7: Your backend architecture is what matters slideshare

Scaling out

• Buying (renting) commodity hardware

• Using the cloud to expand

• Or using the cloud totally: e.g. http://heroku.com/facebook

Page 8: Your backend architecture is what matters slideshare

OS• If you didn’t go opensource, you’re silly

• Tuning Linux/BSD is mandatory

• filesystem: xfs, ext3(4)

• swap is the devil

• different schedulers work better for different tasks (web, database, etc.)

• NFS? You’d better tune that! (stopgap, scaling is hard)

Page 9: Your backend architecture is what matters slideshare

Web server

• Apache, lighthttpd, nginx

• They all require configuration (httpd.conf)

• Simple things like maximum connections, worker MPM, usually go unconfigured

Page 10: Your backend architecture is what matters slideshare

Language

• “PHP doesn’t scale.” - Cal Henderson, when he was at Flickr.com

• Languages are not meant to scale for you

• Use bytecode caches (PHP, Python, etc.)

• Compile away -- HipHop

• Library, driver support; developer communities

Page 11: Your backend architecture is what matters slideshare

Databases

• are slow, period.

• partition data into shards

• tune that database

Page 12: Your backend architecture is what matters slideshare

And that was your basic LAMP stack

Page 13: Your backend architecture is what matters slideshare

How do you scale easily?

• Use caches

• Disk-based caching (cache_lite via php-pear). RAM disks on SSDs... fast!

• In-memory caching (APC, memcached)

• Cloud-based caching (S3, MogileFS)

Page 14: Your backend architecture is what matters slideshare

memcached

• Easy to setup and use

• Very fast over the network

• Scales, has failover, widely supported

• Centralised and shared across the site

Page 15: Your backend architecture is what matters slideshare

S3

• Databases are good for storing relational data, but suck for blob storage

• S3 is a file & data store, running over HTTP

• In theory, infinitely scalable

• Centralised & shared across site

• Costs money, no Malaysian POP

• See OpenStack’s Swift Object Store

Page 16: Your backend architecture is what matters slideshare

CDN

• Outsource it

• Costs a lot of money

• Aflexi is a Malaysian company making a pretty darn good CDN (resold via Exabytes?)

• Out of your control but will help you scale, scale, scale

Page 17: Your backend architecture is what matters slideshare

Back to the database...

• Sharding

• not all data lives in one place

• hash records to partitions

• partition alphabetically? put n-users/shard? organise by postal code?

• horizontal vs vertical partitioning

Page 18: Your backend architecture is what matters slideshare

Horizontal vs Vertical Partitioning

192.168.0.1User

id int(10)username char(15)password char(15)

email char(50)

192.168.0.2User

id int(10)username char(15)password char(15)

email char(50)

192.168.0.3User

id int(10)username char(15)password char(15)

email char(50)

192.168.0.1User

id int(10)username char(15)password char(15)

email char(50)

192.168.0.2

UserInfologin datetime

md5 varchar(32)guid varchar(32)

Better if INSERTheavy and there’s

less frequentlychanged data

Page 19: Your backend architecture is what matters slideshare

MySQL has engines

• InnoDB (XtraDB) for transactional use

• MyISAM for “data warehousing” use

• Maria in time

Page 20: Your backend architecture is what matters slideshare

MySQL has replication!

• Simple, easy to implement (async)

• Row based replication is better than statement based replication

• You do not need mysql cluster (ndb)

• Look at Tungsten Replicator, Galera, etc. for other topologies (e.g. many masters)

Page 21: Your backend architecture is what matters slideshare

Use INDEXes

• Covering index: all fields in SELECT for specific table are contained in index

• EXPLAIN will say “Using index”

Page 22: Your backend architecture is what matters slideshare

Monitor everything!

• Benchmarking allows tracking performance over time

• Nagios

• MySQL (MariaDB/Percona Server)

• slow query log, extended stats in slow query log, use EXPLAIN, microsecond process list, userstats v2, SHOW PROCESSLIST, etc.

Page 23: Your backend architecture is what matters slideshare

Fulltext Search

• Don’t use the database!

• Sphinx

• SphinxSE for MariaDB

• Lucene

Page 24: Your backend architecture is what matters slideshare

Don’t

• SELECT * FROM room WHERE room_date BETWEEN ‘2011-02-25’ AND ‘2011-02-27’

• not have an INDEX on field being operated on by range operator => full table scan

• not allocate a primary key

• over-normalise (3NF is fine)

Page 25: Your backend architecture is what matters slideshare

Keeping state

• Session data in DB

• PHP has files, doesn’t scale. DB+Memcached goes far

• Replicate/Partition/Cache state

• Cookies can be validated by checksums and timestamps (encryption consumes CPU)

Page 26: Your backend architecture is what matters slideshare

General advice

• Your DB servers are not your web servers and they’re not your load balancers

• Write non-locking code

• Don’t block loading unnecessarily

• Cache partially (esp. w/dynamic pages)

• Use UTC for time (replication across geographies?)

• Keep everything in version control

• Migrations are never recommended unless you’ve exceeded capabilities of current solutions. Beware v2 disasters.

Page 27: Your backend architecture is what matters slideshare

NoSQL

• MongoDB

• Redis

• hBase/Hadoop

• CouchDB

• And the 45 other solutions out there...

"I don't foresee StumbleUpon ever giving up on all of its MySQL instances. RDBMSs are just too useful. The

plan, though, is to shrink what MySQL does over time, let MySQL do what its good at and have HBase take over where MySQL is running up against limits handling ever-growing write rates, table sizes, etc." -

Michael Stack, hbase project chair, StumbleUpon DBAhttp://www.theregister.co.uk/2011/01/19/

hbase_on_the_rise/

Page 28: Your backend architecture is what matters slideshare

A lot of web scale tech comes from...

• Brad Fitzpatrick

• LiveJournal infrastructure

• memcached (distributed caching, hits less DB), MogileFS (distributed file system), Perlbal (reverse proxy load balancer), Gearman (remotely run code, load balanced, in parallel)

• next: camlistore (http://camlistore.org/)

Page 29: Your backend architecture is what matters slideshare

“Without money the site can't function. Okay, let me tell you the difference between Facebook and everyone else, we don't crash EVER! If those servers are down for even a day, our entire reputation is irreversibly destroyed! Users are fickle, Friendster has proved that. Even a few people leaving would reverberate through the entire userbase. The users are interconnected, that is the whole point. College kids are online because their friends are online, and if one domino goes, the other dominos go, don't you get that?” -- Mark Zuckerberg (okay, not really, Jesse Eisenberg, in The Social Network)

Page 30: Your backend architecture is what matters slideshare

Resources

• High Performance Web Sites (Steve Sounders)

• High Performance MySQL (Jeremy Zawodny, Baron Schwartz, Peter Zaitsev, et al)

• Study HyperDB (Powers wordpress.com)

• http://kb.askmonty.org/