accelerating application performance with amazon elasticache (dat207) | aws re:invent 2013

50
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. DAT207 - Accelerating Application Performance with Amazon ElastiCache Omer Zaki (AWS) / Nick Dor (GREE) / James Kenigsberg (2U) November 14, 2013

Upload: amazon-web-services

Post on 26-Jan-2015

120 views

Category:

Technology


7 download

DESCRIPTION

Learn how you can use Amazon ElastiCache to easily deploy a Memcached or Redis compatible, in-memory caching system to speed up your application performance. We show you how to use Amazon ElastiCache to improve your application latency and reduce the load on your database servers. We'll also show you how to build a caching layer that is easy to manage and scale as your application grows. During this session, we go over various scenarios and use cases that can benefit by enabling caching, and discuss the features provided by Amazon ElastiCache.

TRANSCRIPT

Page 1: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

DAT207 - Accelerating Application

Performance with Amazon ElastiCache

Omer Zaki (AWS) / Nick Dor (GREE) / James Kenigsberg (2U)

November 14, 2013

Page 2: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Speakers

• Omer Zaki – Senior Product Manager, AWS

[email protected]

• Nick Dor – Senior Director of Engineering, GREE International, Inc.

• James Kenigsberg – Chief Technology Officer, 2U, Inc.

Page 3: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

What is a Cache?

• Specialized data store that keeps frequently

accessed data in memory

• Memory is order of magnitudes faster than disk

Page 4: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Why Use a Cache?

• “Latency is the mother of interactivity”*

• Handle hot data, handle spikes

• Reduce load on backend

• For a majority of web applications, workloads are read heavy – Often as high as 80-90% reads vs. writes

* http://highscalability.com/blog/2009/7/25/latency-is-everywhere-and-it-costs-you-sales-how-to-crush-it.html

Page 5: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Caches, caches, caches

• Types – browser cache, proxy cache, server cache, database cache, file system cache

• Characteristics – persistence, scalability, data model, warming

• Architecture – side cache, read through, write back

• Options – Memcached, Redis, etc.

Page 6: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Memcached

• Free, open-source, high-performance, in-memory

key-value store

• Developed for LiveJournal in 2003

• Used by many of the worlds top websites – YouTube, Facebook, Twitter, Pinterest, Tumblr, …

Page 7: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Memcached: Architecture

APP A

PI

Clie

nt Lib

no communication

between servers persistent TCP

session

can handle

large number

of TCP

sessions

which memcached server?

server = server_list [key mod n]

Source: http://architects.dzone.com/news/notes-memcached

value = get(key)

set(key,value,expiry)

add(key,value,expiry)

replace(key,value,expiry)

app reads /

cache updates

database

reads / writes

Page 8: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Redis

• High speed, in-memory, key-value data store

• Data structure support – strings, lists, sets, sorted sets

• Asynchronous replication

• Optional durability (persistence via snapshot or append-only file)

• Pub/sub functionality

Page 9: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

App

MySQL DB

Instance

Redis Master App

Reads

Clients Cache

Updates

Redis: Architecture

Redis Read Replica

Page 10: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Amazon ElastiCache

• Web service that lets you easily create and use cache clusters in the cloud

• Memcached, Redis compatible

• Managed, scalable, secure

• Pay-as-you-go and flexible, so you can add capacity when you need it

Page 11: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Amazon ElastiCache Architecture

Page 12: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Where is Amazon ElastiCache used?

• Gaming

• Social

• Media & Entertainment

• Mobile

• E-Commerce

• Ad Tech

• Many more…

Page 13: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

• Auto Scaling front end

• Amazon ElastiCache

• Amazon RDS

• Amazon S3

• Amazon CloudFront

Sample Deployment: Gaming

ZADD leaderboard 556 “Andy” ZADD leaderboard 819 “Barry” ZADD leaderboard 105 “Carl” ZADD leaderboard 1312 “Derek”

ZREVRANGE leaderboard 0 -1 1) “Derek” 2) “Barry” 3) “Andy” 4) “Carl”

Page 14: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Design Patterns

• Low latency / high

throughput store

• Database offloading

• Session management

• In-memory storage for

difficult or time-

consuming tasks

• Leaderboards

• High-speed sorting

• Atomic counters

• Queuing systems

• Activity streams

Page 15: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon ElastiCache at GREE

Nick Dor – Sr. Director, Engineering

GREE International, Inc.

November 14, 2013

Page 16: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

GREE International • 2004 – GREE is founded in Japan

• 2011 – establishes office in US – Hosting games in traditional datacenters

– 2 weeks to procure and provision new servers + 1 week to setup application

– ITIL practices (Dev / Ops separation)

• 2012 – acquires Funzio – AWS hosted

– Quick provisioning of servers (minutes) / but still manual setup (days)

– Hybrid hosting environment

• 2013 – consolidates in AWS – Migrated games from traditional datacenter to AWS

– Automated application setup

– DevOps practices

(c) GREE

Page 17: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

GREE Games

• All Mobile, all Free-to-Play – iOS & Android smart phones

– Big focus on tablets

• Role Playing Games (RPG+) – Multi-million dollar franchise, top-grossing titles

– Some of the oldest games on the App Store

• Hardcore – Deeper more intense gameplay mechanics

• Real-Time Strategy (RTS) – Fast action, small unit management

• Casino & Casual Games – Familiar games, wider audience, casual play

(c) GREE

Page 18: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Some Scale

• Over 60 ELB endpoints hosted in AWS – Games, shared services, analytics infrastructure

• 1200 Amazon EC2 instances

• 400 Amazon ElastiCache nodes

• 260 Amazon RDS database servers

• 1TB daily logs from app servers

• Millions of monthly active users

(c) GREE

Page 19: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Example Game Architecture – RPG+

• Application Servers – PHP

– Game events Analytics

• Cache Layer – Memcached ElastiCache

• Batch Processing Servers – Node.js (moving to GO)

– Batches database writes

• Database – MySQL RDS

RDS RDS RDS Failover

DB

Elastic Load Balancing

App App App App

Cache Cache Cache Cache

Batch Batch

(c) GREE

Page 20: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Caching Strategy

• Game architecture predates stable NoSQL – We wanted similar performance at scale

– Keep combined average internal response times below 500ms

• Memcache Authoritative – Still use an RDBMS; potential data loss is limited

• Allows for cheaper/simpler DB layer – Always do full row replacements (ie: no current_row_value +1)

(c) GREE

Page 21: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Data Flow

• Reads – ELB App Cache

• Writes (Synchronous) – ELB App Cache DB

– ELB App Cache Batch DB

– Standard write-through

– No blind writes; always fetch current ver.

• Writes (Asynchronous) – Batch DB

– Batch writes to DB every 30 seconds RDS RDS RDS

App App App App

Cache Cache Cache Cache

Batch Batch

(c) GREE

Elastic Load Balancing

Page 22: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Batch Processor

• 80% of game write traffic is asynchronous

• Ex: Player items (loot) after multiple quests – 10 items in 30 sec; app server sends 10 writes downstream

– Batch processor sends last record with final item count to DB

• Greatly reduced writes on DB – Shard at table and DB server level for larger games

(c) GREE

Page 23: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Memcache Writes - Key Facts

• App handles memcache key hashing & sharding – DB rows are usually just a key, version, timestamp & JSON blob

– Look familiar?

• NEVER do blind writes – Always fetch current value in MC, perform operation, then write

• If version collision, then simply fail – Extremely rare; application will retry for some calls

(c) GREE

Page 24: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Memcache Writes – High Concurrency

• Player vs. Player Events (World Domination) – These have much higher concurrency

– Match-making, battles/results, leaderboards

• Here we do relative updates at MC layer – Yes, we contradict ourselves here a little

• If we get a version collision/failure – App server reloads MC value and tries again, up to 5 times

– Usually on 2nd or 3rd try we succeed

– This happens VERY fast in the code

(c) GREE

Page 25: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Failure Scenarios

• Memcache node fails – Go straight to the database; versioning is key here

• Hashing compartmentalizes impact – During failure, only players assigned to that node are affected

– Usually only a small performance drop

• Node comes back online… – Cache is refilled organically

– DB load for that subset of operations decreases over time

(c) GREE

Page 26: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Why Amazon ElastiCache?

• Fairly stable – Fails less regularly than Amazon EC2

• Automatic node replacement – Same node name/DNS

• Good performance – Highest performance with larger instances (network layer)

• Configuration endpoint – Application can dynamically add/remove nodes

– Automatically rebalance hashes to accommodate new nodes

– No more manual memcache migrations – YAY!

(c) GREE

Page 27: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Newer Games - Architecture

• MUCH more modern in terms of arch/tech

• Shift towards real-time games – Longer play sessions; higher player engagement

– Will impact our caching model – less pools, but larger

• Streaming, queuing – GO, nsqd

• Moving (finally) to memcached – Had used old memcache libraries for long time

(c) GREE

Page 28: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Future Trends in Caching at GREE

• Check and Set tokens (CAS) – A sort of internal versioning in memcached

– Ensures data is latest before updating

– Atomic transactions

• Investigate real NoSQL implementation

• Redis - Promising – Need to see how I/O performance goes when hitting disk

(c) GREE

Page 29: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

How 2U is Building the World’s Premier

Online Learning Programs

James Kenigsberg, CTO, 2U, Inc.

November 14, 2014

Page 30: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

2U partners with top universities to

deliver the world’s best online programs

- Real degrees

- Real live classes

- Real faculty

- Real outcomes

Page 31: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

3

1

Graduate Undergraduate

Page 32: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013
Page 33: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Our best-in-class, proprietary

technology platform can be

integrated across numerous

university clients, program verticals,

and individual classes

University Partner

Prospect Mgmt

App Process

Online Campus

Content Mgmt

CRM

Security

Learning and

Management Stack

Page 34: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

2U Online Campus

Far more than what you think of as a

Learning Management System...

...the 2U Online Campus represents the

single hub for students’ asynchronous

study, live class sessions, and dynamic

social tools to create a rich, online

student community.

Page 35: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

“No man ever steps in the

same river twice, for it’s not

the same river and he’s not

the same man.”

- Heraclitus

Page 36: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

2008 Servers Engineers Developer

Page 37: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

2009 Servers Engineers

Page 38: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

2009 • Surly French Canadians

• Configuring our own load

balancers

• No MySQL clustering

• Save us! SOS…

Page 39: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Set Amazon’s

servers on fire,

not ours

Page 40: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

2010 • Amazon to the rescue!

• Release of Amazon RDS for

databases

• Release of Elastic Load

Balancing for load balancing

• Caching helps students

communicate!

• Memcache

• No file redundancy

Page 41: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

2011 Instances Engineers

100 2

Page 42: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

2011 • Redundancy!

• GlusterFS

• More Availability Zones

• Using new AWS services as fast as

they release them

• Amazon S3 – Backups

• Amazon SES – Outbound email

• Amazon Route 53 – A lifesaver!

(Zerigo outage)

Page 43: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

2012 Instances Engineers

200 3

Page 44: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

2012 • Stack growth

• API layer

• Amazon ElastiCache

• DevOps!

• Puppet

• Jenkins

• AWS CloudFormation

Page 45: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

2013 Instances Engineers

400 4

Page 46: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

2013 • Amazon S3 to the rescue

• More AWS!

• Amazon Redshift data

warehouse

Page 47: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Amazon is committed to

customers

Page 48: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

We are committed to changing your life

Page 49: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Impact: Education

1,704 Graduates

3,287,000 K–12 Students

1,097 Current Students

2,116,000 K–12 Students

Through 2019

12,496,000 K–12 Students

Page 50: Accelerating Application Performance with Amazon ElastiCache (DAT207) | AWS re:Invent 2013

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

DAT207

Want more caching: Attend Amazon ElastiCache Architecture and Design Patterns

Friday @ 11:30am – 12:30pm

Lido 3006