building a world in the clouds: mmo architecture on aws (mbl304) | aws re:invent 2013

Post on 22-Nov-2014

1.509 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Can you really build the infrastructure required to bring a massively multiplayer online game (MMO) to life in the cloud? This session discusses the evolution of Red 5 Studios' FireFall—a free-to-play MMO. FireFall runs entirely on the AWS platform and allows players from around the world to play together in the cloud. The session covers some of the design decisions made over the last two years—the things that worked well and not so well. The session also presents some of the solutions Red 5 implemented to ease the transition from dedicated data center hardware to virtual servers in AWS.

TRANSCRIPT

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Building a World in the Clouds MMO Architecture on AWS

Jeffrey Berube Director of Technical Operations

Red 5 Studios, Inc.

November 15, 2013

• What is Firefall?

• Why build in the cloud?

• Infrastructure Goals

• Evolution of the Platform

• Tools for Success

Overview

What is Firefall?

• Free-to-play cooperative open world shooter

• “Shardless” world

• Instance-based maps • Both persistent and transient map types

• Possible for many instances of a map to exist at the same time

• Free-to-play cooperative open world shooter

• “Shardless” world

• Instance-based maps • Both persistent and transient map types

• Possible for many instances of a map to exist at the same time

• Free-to-play cooperative open world shooter

• “Shardless” world

• Instance-based maps • Both persistent and transient map types

• Possible for many instances of a map to exist at the same time

Why?

Why build in the cloud?

• Players are unpredictable

• Developers are unpredictable

• Cyclical player behavior opens up opportunities

for significant cost savings

Why build in the cloud?

• Players are unpredictable – Forecasts can be (and usually are) wrong

• Too little hardware, Too many players

– Bad for everyone

• Too much hardware, Too few players

– Good for players (sort of) but bad for the business

– What if they don’t stick around?

• Developers are unpredictable

• Cyclical player behavior opens up opportunities for significant cost savings

Why build in the cloud?

• Players are unpredictable

• Developers are unpredictable

• Cyclical player behavior opens up opportunities

for significant cost savings

Why build in the cloud?

• Players are unpredictable

• Developers are unpredictable – Active development has risks

• Performance can change drastically

• New services can “appear” the day of the patch

– MMOs are ALWAYS being actively developed!

• (If you want to be successful…)

• Cyclical player behavior opens up opportunities for significant cost savings

Why build in the cloud?

• Players are unpredictable

• Developers are unpredictable

• Cyclical player behavior opens up opportunities

for significant cost savings

Player Graph

36 Hours

Player Graph with Server Overlay

Player Graph with Efficient Server Overlay

35%–37.5% Savings

Infrastructure Goals

Infrastructure Goals

Deployment and Recovery

How do we make site

management better?

• Expansion

• Scalability

• Disaster Recovery

• Self-Healing

Platform

How can the platform make

the player experience better?

• Downtime

• Player Mobility

Infrastructure Goals:

Deployment and Recovery

Deployment and Recovery Goals

• Quick regional expansion

• On-demand scalability

• Disaster recovery with minimal downtime

• Self-healing

Deployment and Recovery Goals

• Quick regional expansion – Traditionally, expansion is a multi-month process

• Contracts, Purchase and Shipping, Installation, etc.

– Today, adding a region is about a week long task

• Additional improvements are in the works

• On-demand scalability

• Disaster recovery with minimal downtime

• Self-healing

Deployment and Recovery Goals

• Quick regional expansion

• On-demand scalability

• Disaster recovery with minimal downtime

• Self-healing

Deployment and Recovery Goals

• Quick regional expansion

• On-demand scalability – Automated scale up and down without* limits

• Instance sizes desired may not always be available, however

• Disaster recovery with minimal downtime

• Self-healing

Deployment and Recovery Goals

• Quick regional expansion

• On-demand scalability

• Disaster recovery with minimal downtime

• Self-healing

Deployment and Recovery Goals

• Quick regional expansion

• On-demand scalability

• Disaster recovery with minimal downtime – Traditionally, DR sites are expensive and are not always

properly maintained

– Our goal is to automate disaster recovery safely

• We do a lot manually at present

• Self-healing

Deployment and Recovery Goals

• Quick regional expansion

• On-demand scalability

• Disaster recovery with minimal downtime

• Self-healing

Infrastructure Goals:

Platform

Platform Goals

• Zero downtime game updates

• Players can play globally without restrictions

Platform Goals

• Zero downtime game updates – Blue-Green deployment

– Doesn’t preclude scheduled maintenance

• Some things are more safely done offline

• Players can play globally without restrictions

Platform Goals

• Zero downtime game updates

• Players can play globally without restrictions

Platform Goals

• Zero downtime game updates

• Players can play globally without restrictions – Characters won’t be held hostage

• Player data is available everywhere they want to be

• We don’t charge a player so that they can play with their friends

– Prefer closest healthy region, however

Evolution of the Platform

The Beginning March 2011

The Beginning March 2011

AWS Services in Use

Elastic Compute Cloud (EC2)

US-West-1 Availability Zone: b

INET

AWS

CORP

Outside-Game

Alpha October 2011

Alpha October 2011

AWS Services in Use

Elastic Compute Cloud (EC2)

Simple Storage Service (S3)

Elastic Load Balancing (ELB)

Simple Queue Service (SQS)

US-West-1 Availability Zone: b

INET

AWS

CORP

Insi

de-C

ore

Inside-Game

Outside-Game

Inside-LB

Inside-DB

Inside-AppIc

Chef

Log

MCP MD MM

HP HPHP

I C U

I C U

Ar Ad

Mx HP

HP

HP

Out

side

-LB

AWS ELB

Ad

PvP

Operator

Closed Beta April 2012

Closed Beta April 2012

AWS Services in Use

Elastic Compute Cloud (EC2)

Simple Storage Service (S3)

Elastic Load Balancing (ELB)

Simple Queue Service (SQS)

Relational Database Service (RDS)

ElastiCache

US-West-1

Availability Zone: b

INET

AWS

CORP

Insi

de-C

ore

Insi

de-G

ame

Outside-Game

Insi

de-L

B

Inside-AppGr

Ic

Chef

Log

MCP

MD

MM

HP

HP

HP

I C U

Ar Ad S

OW PvP

Availability Zone: c

AWS RDS

HP

HP

HP

Out

side

-LB

Insi

de-G

ame

Outside-Game

Insi

de-L

B

Inside-App

MM

HP

HP

HP

I C U

Ar Ad S

OW PvPHP

HP

HP

Out

side

-LB

AWS ELBOperator

Gamescom August 2012

Gamescom August 2012

AWS Services in Use

Elastic Compute Cloud (EC2)

Simple Storage Service (S3)

Elastic Load Balancing (ELB)

Simple Queue Service (SQS)

Relational Database Service (RDS)

ElastiCache

CloudFront

Virtual Private Cloud (VPC) US-West-2

Availability Zone: b

EU-West-1

US-East-1

INET

AWS

CORP

HQ

Inside-Core

Inside-Game

Outside-Game

Outside-LB

Inside-DB

Inside-App

Inside-Core

Inside-AppTasks

GrIc

Task Task Task

Chef LogGrIc

Chef Log

AWS ELB

MCP MD MM

HP HP HP

I C U A P W

I C U Ar L In

Ad Co S A P W

OW NPE PvE PvP

VPC

AWS ELB

Operator

US-West-2

Open Beta July 2013

Open Beta July 2013

AWS Services in Use

Elastic Compute Cloud (EC2)

Simple Storage Service (S3)

Elastic Load Balancing (ELB)

Simple Queue Service (SQS)

CloudFront

Virtual Private Cloud (VPC)

Elastic MapReduce (EMR)

US-West-2

Availability Zone: b

Ops

EU-West-1

US-East-1

INET

AWS

CORP

HQ

Inside-Core

Inside-Game

Outside-Game

Outside-LBInside-LB

Inside-DB

Inside-App

Inside-Core

Inside-AppTasks

Inside-Search

ES

ES

ES

ES

ES

ES

GrIc

Task Task Task

Chef LogGrIc

Chef Log

AWS ELB

MCP MD MM HP HP HP

HP HP HP

I C U A P W

I C U Ar L In

Ad Co S A P W

OW NPE PvE PvP

VPC

Operator

VPC

US-East-1

Today November 2013

Today November 2013

AWS Services in Use

Elastic Compute Cloud (EC2)

Simple Storage Service (S3)

Elastic Load Balancing (ELB)

Simple Queue Service (SQS)

CloudFront

Virtual Private Cloud (VPC)

Elastic MapReduce (EMR)

US-West-2

Availability Zone: b

Ops

EU-West-1

US-East-1

AP-NorthEast-1

SA-East-1

INET

AWS

CORP

HQ

Inside-Core

Inside-Game

Outside-Game

Outside-LBInside-LB

Inside-DB

Inside-App

Inside-Core

Inside-AppTasks

Inside-Search

ES

ES

ES

ES

ES

ES

GrIc

Task Task Task

Chef LogGrIc

Chef Log

AWS ELB

MCP MD MM HP

HP

C A P

I C U Ar L In

Ad Co S A P W

OW NPE PvE PvP

VPC

Operator

VPC

US-East-1

Tools for Success

Third-party Tools

• Opscode Chef

• collectd

• Icinga (Nagios)

• Graphite

• Graylog2

• HAProxy

• Keepalived

• elasticsearch

• Others (Bluepill, Thin, memcached, RabbitMQ, etc.)

Third-party Services

• Dyn – Global Load Balancing

– DNS Failover

• PagerDuty – On-call scheduling

– Phone calls!

• Pingdom – Transaction monitoring

• Duo Security

Internal Tools

• Architect

• Cartographer

• Dashboards (Everywhere)

Internal Tools

• Architect – Everything heartbeats

– Service-specific data aggregation

• Cartographer

• Dashboards (Everywhere)

Internal Tools

• Architect

• Cartographer

• Dashboards (Everywhere)

Internal Tools

• Architect

• Cartographer – Builds new game server stacks

– Replaces failed game server components

– Scales up (or down) the servers within a pool depending on

player demand

• Dashboards (Everywhere)

Internal Tools

• Architect

• Cartographer

• Dashboards (Everywhere)

Internal Tools

• Architect

• Cartographer

• Dashboards (Everywhere) – Multiple data sources

• Graphite

• Production databases

• Business Intelligence databases

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

MBL304

top related