building a world in the clouds: mmo architecture on aws (mbl304) | aws re:invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Building a World in the Clouds MMO Architecture on AWS

Jeffrey Berube Director of Technical Operations

Red 5 Studios, Inc.

November 15, 2013

• What is Firefall?

• Why build in the cloud?

• Infrastructure Goals

• Evolution of the Platform

• Tools for Success

Overview

What is Firefall?

• Free-to-play cooperative open world shooter

• “Shardless” world

• Instance-based maps • Both persistent and transient map types

• Possible for many instances of a map to exist at the same time

Why build in the cloud?

• Players are unpredictable

• Developers are unpredictable

• Cyclical player behavior opens up opportunities

for significant cost savings


• Players are unpredictable – Forecasts can be (and usually are) wrong

• Too little hardware, Too many players

– Bad for everyone

• Too much hardware, Too few players

– Good for players (sort of) but bad for the business

– What if they don’t stick around?


• Cyclical player behavior opens up opportunities for significant cost savings



• Developers are unpredictable – Active development has risks

• Performance can change drastically

• New services can “appear” the day of the patch

– MMOs are ALWAYS being actively developed!

• (If you want to be successful…)

• Cyclical player behavior opens up opportunities for significant cost savings

Player Graph

36 Hours

Player Graph with Server Overlay

Player Graph with Efficient Server Overlay

35%–37.5% Savings

Infrastructure Goals

Infrastructure Goals

Deployment and Recovery

How do we make site

management better?

• Expansion

• Scalability

• Disaster Recovery

• Self-Healing

Platform

How can the platform make

the player experience better?

• Downtime

• Player Mobility

Infrastructure Goals:

Deployment and Recovery

Deployment and Recovery Goals

• Quick regional expansion

• On-demand scalability

• Disaster recovery with minimal downtime

• Self-healing


• Quick regional expansion – Traditionally, expansion is a multi-month process

• Contracts, Purchase and Shipping, Installation, etc.

– Today, adding a region is about a week long task

• Additional improvements are in the works



• Self-healing





• Self-healing



• On-demand scalability – Automated scale up and down without* limits

• Instance sizes desired may not always be available, however


• Self-healing





• Self-healing




• Disaster recovery with minimal downtime – Traditionally, DR sites are expensive and are not always

properly maintained

– Our goal is to automate disaster recovery safely

• We do a lot manually at present

• Self-healing





• Self-healing

Infrastructure Goals:

Platform

Platform Goals

• Zero downtime game updates

• Players can play globally without restrictions

Platform Goals

• Zero downtime game updates – Blue-Green deployment

– Doesn’t preclude scheduled maintenance

• Some things are more safely done offline


Platform Goals



Platform Goals


• Players can play globally without restrictions – Characters won’t be held hostage

• Player data is available everywhere they want to be

• We don’t charge a player so that they can play with their friends

– Prefer closest healthy region, however

Evolution of the Platform

The Beginning March 2011

The Beginning March 2011

AWS Services in Use

Elastic Compute Cloud (EC2)

US-West-1 Availability Zone: b

INET

AWS

CORP

Outside-Game

Alpha October 2011

Alpha October 2011

AWS Services in Use


Simple Storage Service (S3)

Elastic Load Balancing (ELB)

Simple Queue Service (SQS)

US-West-1 Availability Zone: b

INET

AWS

CORP

Insi

de-C

ore

Inside-Game

Outside-Game

Inside-LB

Inside-DB

Inside-AppIc

Chef

Log

MCP MD MM

HP HPHP

I C U

I C U

Ar Ad

Mx HP

HP

HP

Out

side

-LB

AWS ELB

Ad

PvP

Operator

Closed Beta April 2012

Closed Beta April 2012

AWS Services in Use





Relational Database Service (RDS)

ElastiCache

US-West-1

Availability Zone: b

INET

AWS

CORP

Insi

de-C

ore

Insi

de-G

ame

Outside-Game

Insi

de-L

B

Inside-AppGr

Ic

Chef

Log

MCP

MD

MM

HP

HP

HP

I C U

Ar Ad S

OW PvP

Availability Zone: c

AWS RDS

HP

HP

HP

Out

side

-LB

Insi

de-G

ame

Outside-Game

Insi

de-L

B

Inside-App

MM

HP

HP

HP

I C U

Ar Ad S

OW PvPHP

HP

HP

Out

side

-LB

AWS ELBOperator

Gamescom August 2012

Gamescom August 2012

AWS Services in Use





Relational Database Service (RDS)

ElastiCache

CloudFront

Virtual Private Cloud (VPC) US-West-2


EU-West-1

US-East-1

INET

AWS

CORP

HQ

Inside-Core

Inside-Game

Outside-Game

Outside-LB

Inside-DB

Inside-App

Inside-Core

Inside-AppTasks

GrIc

Task Task Task

Chef LogGrIc

Chef Log

AWS ELB

MCP MD MM

HP HP HP

I C U A P W

I C U Ar L In

Ad Co S A P W

OW NPE PvE PvP

VPC

AWS ELB

Operator

US-West-2

Open Beta July 2013

Open Beta July 2013

AWS Services in Use





CloudFront

Virtual Private Cloud (VPC)

Elastic MapReduce (EMR)

US-West-2


Ops

EU-West-1

US-East-1

INET

AWS

CORP

HQ

Inside-Core

Inside-Game

Outside-Game

Outside-LBInside-LB

Inside-DB

Inside-App

Inside-Core

Inside-AppTasks

Inside-Search

ES

ES

ES

ES

ES

ES

GrIc

Task Task Task

Chef LogGrIc

Chef Log

AWS ELB

MCP MD MM HP HP HP

HP HP HP

I C U A P W

I C U Ar L In

Ad Co S A P W

OW NPE PvE PvP

VPC

Operator

VPC

US-East-1

Today November 2013

Today November 2013

AWS Services in Use





CloudFront

Virtual Private Cloud (VPC)

Elastic MapReduce (EMR)

US-West-2


Ops

EU-West-1

US-East-1

AP-NorthEast-1

SA-East-1

INET

AWS

CORP

HQ

Inside-Core

Inside-Game

Outside-Game

Outside-LBInside-LB

Inside-DB

Inside-App

Inside-Core

Inside-AppTasks

Inside-Search

ES

ES

ES

ES

ES

ES

GrIc

Task Task Task

Chef LogGrIc

Chef Log

AWS ELB

MCP MD MM HP

HP

C A P

I C U Ar L In

Ad Co S A P W

OW NPE PvE PvP

VPC

Operator

VPC

US-East-1

Tools for Success

Third-party Tools

• Opscode Chef

• collectd

• Icinga (Nagios)

• Graphite

• Graylog2

• HAProxy

• Keepalived

• elasticsearch

• Others (Bluepill, Thin, memcached, RabbitMQ, etc.)

Third-party Services

• Dyn – Global Load Balancing

– DNS Failover

• PagerDuty – On-call scheduling

– Phone calls!

• Pingdom – Transaction monitoring

• Duo Security

Internal Tools

• Architect

• Cartographer

• Dashboards (Everywhere)

Internal Tools

• Architect – Everything heartbeats

– Service-specific data aggregation

• Cartographer


Internal Tools

• Architect

• Cartographer


Internal Tools

• Architect

• Cartographer – Builds new game server stacks

– Replaces failed game server components

– Scales up (or down) the servers within a pool depending on

player demand


Internal Tools

• Architect

• Cartographer


Internal Tools

• Architect

• Cartographer

• Dashboards (Everywhere) – Multiple data sources

• Graphite

• Production databases

• Business Intelligence databases

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

MBL304

building a world in the clouds: mmo architecture on aws (mbl304) | aws re:invent 2013

Technology