engineering scalable social games dr. robert zubek
Post on 11-Jan-2016
221 Views
Preview:
TRANSCRIPT
Engineering ScalableSocial Games
Dr. Robert Zubek
Top social game developer
Growth Example
Roller Coaster Kingdom
1M DAUs over a weekend
source: developeranalytics.com
Growth Example
FishVille
6M DAUs in its first week
source: developeranalytics.com
Growth Example
FarmVille
25M DAUs over five months
source: developeranalytics.com
Talk Overview
Part I. Game Architectures
Part II. Scaling Solutions
Introduce game developers to best
practices for large-scale web development
• server approaches• client approaches
Part I. Game Architectures
Three Major Types
Web server stack Web + MMO stack
HTML FlashFlash
• Game logic in MMO server (eg. Java)
• Web stack for everything else
Server Side
• Usually based on a LAMP stack
• Game logic in PHP
• HTTP communication
Mixed stackWeb stack
Example 1
FishVille
Web Servers + CDN
Cache and Queue Servers
DB Servers
Example 2
YoVille
Web Servers + CDN
MMO Servers
Cache and Queue Servers
DB Servers
• HTTP scales very well• Stateless request/response• Easy to load-balance across servers• Easy to add more servers
Some limitations• Server-initiated actions• Storing state between requests
Why web stack?
caches
example
Web stack and state
Web #1
DB Servers
Web #2
Web #3Client
Web #4
xxx
MMO servers
• Minimally “MMO”:• Persistent socket connection per client• Live game support: chat, live events• Supports data push• Keeps game state in memory
• Very different from web!
Why MMO servers?
• Harder to scale out!• But on the plus side:
1. Supports live communication and events
2. Game logic can run on the server, independent of client’s actions
3. Game state can be entirely in RAM
Example Web + MMO stack
Web Servers
MMO Servers
Client
caches
DB Servers
• The game is “just” a web page
• Minimal sys reqs
• Limited graphics and animation
Client Side
• High production quality games
• Game logic on client
• Can keep open socket
HTML + AJAXFlash
Social Network Integration
• “Easy” because not related to scaling
• Calling the host network to get a list of friends, post something, etc.
• Networks provide REST APIs, and sometimes client libraries
Architectures
Data
Server
Client
database, cache, etc.
Web server stack
HTML Flash
Web + MMO
Flash
Part I. Game Architectures
Part II. Scaling Solutions
Introduce game developers to best
practices for large-scale web development
not blowing upas you grow
ScalingScale up vs. scale out
I need more boxes
I need a bigger box
ScalingScale up vs. scale out
Scale out has been a clear win• At some point you can’t
get a box big enough, quickly enough
• Much easier to add more boxes
• But: requires architectural support from the app!
Example:
Scaling up vs. out
• 500,000 daily players in first week
• Started with just one DB, which quickly became a bottleneck
• Short term: scaled up, optimized DB accesses, fixed slow queries, caching, etc.
• Long term: must scale out with the size of the player base!
ScalingWe’ll look at four elements:
Databases
Game servers (web + MMO)
Caching
Capacity planning
II A. Databases
App Servers
App Servers
Client
caches
queues
DBDB
DBs are the first to fall over
App
W R
App
W R
M S S
App
M S S
App
M
App
M
M
App
MM0
App
M1
How to partition data?
• Two most common ways:• Vertical – by table
• Easy but doesn’t scale with DAUs• Horizontal – by row
• Harder to do, but gives best results!• Different rows live on different DBs• Need a good mapping from row # to DB
id Data
100 foo
101 bar
102 baz
103 xyzzy
Row striping
M0 M1• Row-to-shard mapping:• Primary key modulo # of DBs• Like a “logical RAID 0”• More clever schemes exist, of course
S0 S1M2 M3
x x
Scaling out your shards
M0
ServersServersServers
M1
Sharding in a nutshell
• It’s just data striping across databases (“logical RAID 0”)• There’s no automatic replication, etc.• No magic about how it works• That’s also what makes it robust,
easy to set up and maintain!
Example:
Sharding
• Biggest challenge: designing how to partition huge existing dataset
• Partitioned both horizontally and vertically
• Lots of joins had to be broken
• Data patterns needed redesigned • With sharding, you need to know the shard
ID for each query!
• Data replication had difficulty catching up with high-volume usage
Sharding surprises
• Can’t do joins across shards
• Instead, do multiple selects, or denormalize your data
Be careful with joins
Sharding surprises
• CPU-expensive; easier to pay this cost in the application layer (just be careful)
• The more you keep in RAM, the less you’ll need these
Skip transactions and foreign key constraints
II B. Caching
• DBs can be too slow• Spend memory to buy speed!
caches
queues
caches
queues
App Servers
App Servers
ClientDB
Caching
• Two main uses:• Speed up DB access to commonly
used data• Shared game state storage for stateless
web game servers
• Memcached: network-attached RAM cache
Memcache
• Stores simple key-value pairs• eg. “uid_123” => {name:”Robert”, …}
• What to put there?• Structured game data• Mutexes for actions across servers
• Caveat• It’s an LRU cache, not a DB!
No persistence!
Memcache scaling out
DBs MCs App
Shard it just like the DB!
II C. Game Servers
Web Servers
MMO Servers
Client
caches
queues
DBWeb
Servers
MMO Servers
LB
LB
DNSLB
Scaling web servers
LB
Scaling content delivery
CDN
game dataonly
mediafiles
mediafiles
Web Servers
Client
Scaling MMO servers
• Scaling out is easiest when servers don’t need to talk to each other• Databases: sharding• Memcache: sharding• Web: load-balanced farms• MMO: ???
• Our MMO server approach: shard just like other servers
Scaling MMO servers
• Keep all servers separate from each other• Minimize inter-server communication• All participants in a live event should be on
the same server• Scaling out means no direct sharing
• Sharing through third parties is okay
Problem #1:
Load balancing • What if a server gets overloaded?
• Can’t move a live player to a different server
• Poor man’s load-balancing:• Players don’t pick servers; the LB does• If a server gets hot, remove it from the
LB pool• If enough servers get hot, add more server
instances and send new connections there
Problem #2:
Deployment
Downtime = lost revenues• You can measure cost of
downtime in $/minute!
Problem #2:
Deployment
Deployment comparison:• Web: just copy over PHP files• Socket servers – much harder
• How to deploy with zero downtime?
• Ideally: set up shadow new server and slowly transition players over
II D. Capacity planning
Capacity
We believein scaling out
• Demand could change rapidly• How to provision enough servers?
Different logistics thanscaling up
Capacity:
colo vs. cloud
• Low or zero fixed cost
• Higher variable cost
• Little or no lead time on additional capacity
• Less control over ops
• Limited choice of VMs
• Virtualized storage
• Can’t scale up! Only out.
CloudColo
• Higher fixed cost
• Lower variable cost
• Significant lead time on additional capacity
• More control over ops
• Any chips you want
• Any disks you want
• Scale up or out
Operations
• Scaling out: a legion of servers
• You’ll need tools• App health – custom dashboard• Server monitoring – Munin• Automatic alerts – Nagios
Ops: Dashboard
Ops: Munin
Server stats from the entire farm
Ops: Nagios
• SMS alerts for your server stats• Connections drop to zero• Or CPU load exceeds 4• Or test account can’t connect, etc.
• Put alerts on business stats, too!• DAUs dropping below daily avg, etc.
Server failures
• Common in the cloud• Dropping off the net• Sometimes even shutting down
• Be defensive• Ops should be prepared• Devs should program defensively
The EndMany thanks to those who taught me a lot:
Virtual Worlds: Scott Dale, Justin WaldronCoaster: Biren Gandhi, Nathan Brown, Tatung Mei
http://robert.zubek.net/gdc2010
top related