smugmug: from mysql to amazon dynamodb (dat204) | aws re:invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

DAT204 - SmugMug: From MySQL to

Amazon DynamoDB (and some of the tools we used to get there)

Brad Clawsie, SmugMug.com

November 14, 2013

Welcome!

• I'm Brad Clawsie, an engineer at SmugMug.com

• SmugMug.com is a platform for hosting, sharing,

and selling photos

• Our target market is professional and enthusiast

photographers

This Talk…

• Isn't an exhaustive Amazon DynamoDB tutorial

• Is about some observations made while

migrating some features from MySQL to

Amazon DynamoDB

• Is an introduction to some of the tools that have

helped us migrate to Amazon DynamoDB

Background

• SmugMug.com started in 2003

• LAMP code base

• A few machines/colocation

→ a lot of machines/colocations

→ Amazon Web Services

• Hundreds of thousands of paying customers

• Millions of viewers, billions of photos, petabytes

of storage

Amazon DynamoDB in a Nutshell

• Tables → [Keys → Items]

• Items → [AttributeName → Attribute]

• Attribute → {Type:Value}

• Provisioned throughput

• NoSQL-database-as-a-service

• Create, Get, Put, Update, Delete, Query, Scan

MySQL at SmugMug

MySQL on Our Terms...

• “SQL”, but not relational • We avoid joins, foreign keys, complex queries, views,

etc.

• Simplified our model so that caching was easier

to implement

• Used like a key (id) → values (row) system

MySQL on Our Terms...

• Aggressive denormalization is common in many

online applications

• Upside – easy to migrate some of these tables

and supporting code to “NoSQL” style database

• Downside(?) – database does less, code does

more

So Why Change?

We're hitting roadblocks that can't be addressed

by:

• More/better hardware

• More ops staff

• Best practices

Notable Issue #1: “OFFLINE OPS”

like ALTER TABLE

• We used to have a fair number of read-only/site-

maintenance downtime to ALTER tables

• As number of users grows, this always

inconveniences someone

• Introduces risk into the code

• Other RDBMs are better about this

Temporary Relief...

• Introduced the concept of treating a column as a

JSON-like BLOB type for embedding

varying/new data

• Bought us some time and flexibility, and reduced

the need for ALTER TABLE-related downtime

• But MySQL wasn't intended to be an ID →

BLOB system, and other issues remained

Notable Issue #2: Concurrency

• MySQL can manifest some non-graceful

degradation under heavy write load

• We're already isolating non-essential tables to

their own databases and denormalizing where

we can...the problem persists

Notable Issue #3: Replication

• A necessary headache, but in fairness MySQL is

pretty good at it

• Performance issues (single threaded etc.)

• Makes it harder to reason about consistency in

code

• Big ops headache

Notable Issue #4: Ops

Keeping all of this going requires an ops team...

• People

• Colocation

• “Space” concerns – storage, network

capacity, and all the hardware to meet

anticipated capacity needs

Intangibles

• We have the resources to try out some new things

• We were already AWS fan boys

• Big users of Amazon S3

• Recently moved out of colocations and into Amazon

EC2

• Our ops staff has become AWS experts

• So we would give an AWS database consideration

Immediate Observations

• Limited key structure

• Limited data types

• ACID-like on Amazon DynamoDB's terms

• Query/Scan operations not that interesting

• But, freedom from most space constraints

• Leaving the developer with primarily time

constraints

First Steps

• Start with a solved problem – stats/analytics

• SmugMug's stats stack is a relatively simple

data model:

{“u”:”1”,”i”:”123”,”a”:”321”...}

• We measure hits on the frontend and create

lines of JSON with user, image, album, time, etc.

First Steps

• Analytics needs reliable throughput – new data is

always being generated

• Space concerns (hardware, storage, replication)

It was obvious that Amazon DynamoDB would free

us from some space constraints. However, we

were naive about Amazon DynamoDB's special

time constraints.

Very Simple Tables

• A site key (user, image, album id) as HashKey

• A date as RangeKey

• The rest of the data

• Just a few tables • We'll have to manage removing data from them over

time

• Obvious: fewer tables → lower bill

Need for Tools

• Even with our simple initial test bed, we saw the

need for more tooling

• We are huge users of memcache multi*

functions

• So we wanted to be able to have arbitrary-sized

“batch” input requests

• PHP doesn't do concurrency

So...a Proxy

• A long-running proxy to take requests and

manage concurrency for PHP

• A proxy to allow us to cheat a little with sizing

our requests*

• Needed a tool that was geared toward building

tools like proxies

• Go fit the bill

A Little Risk

• Writing tools for a young database in a young

programming language

• Resulted in two codebases we will share:

• GoDynamo: equivalent of the AWS SDKs

• BBPD: an HTTP proxy based on GoDynamo

Observation #1:

On Amazon DynamoDB's Terms

• Sensible key ↔ application mapping

• Denormalization

• No reliance on complex queries or other

relational features

• Many at-scale MySQL users are already using it

in this way anyway

Observation #1:

On Amazon DynamoDB's Terms

• Avoid esoteric features

• Don't force it • Amazon DynamoDB is not the only AWS database

• Nice to have a “control” to use as a yardstick of

success

Observation #2:

Respect Throttling

• Coming from MySQL, graceful degradation is an

expected artifact of system analysis

• But Amazon DynamoDB is a shared service

using a simple WAN protocol

• You either get a 2xx (success), 4xx, or 5xx

(some kind of failure) • A binary distinction

Observation #2:

Respect Throttling

• Throttling is the failure state of a properly-

formatted request

• Throttling happens when the rate of growth of

requests changes quickly (my observation)

• Correlate your throttling to your provision

Observation #2:

Respect Throttling

• Typically, throttling happens well below the

provisioning level

• Don't reflexively increase your provisioning

• Amazon DynamoDB behaves best when you

optimize requests for space and time

Space Optimizations

• Compress data (reduce requests)

• Cache data (read locally when possible)

• Avoid clustering requests to tables/keys

• Use key/table structures if possible (often the

application dictates this)

Time Optimizations

• Reduce herding/spikes if possible

• Queue requests to be processed as a controlled

rate of flow elsewhere

• Experiment with concurrency to achieve

optimum reqs/sec

Don't Obsess Over Throttling

• Some throttling is unavoidable

• “Hot keys” are unavoidable

• The service will get better about adapting to

typical use

• Experiment: flow, distribution, mix of requests,

types of requests, etc.

• Throttling is a strong warning

Observation #3:

Develop with Real(ish) Data

• “Test” data and “test” volume will fail you when

you launch

• Again, no graceful degradation

• Your real data has its own flow and distribution • You must optimize for that

• Once again, set up a control to validate

observations

Observation #4:

Live with the Limits

• Don't try to recreate relational features in

Amazon DynamoDB

• Query/Scan are limited, be realistic

• You can't really see behind the curtain

• Feedback from the console is limited

• Expect to iterate

Success?

Recall our original MySQL gripes:

(1) ALTER TABLE: kind of solved

Amazon DynamoDB doesn't have full table

schemas so to speak, so while we are able to

add Attributes to an Item at will, we can only

change a table's provisioning once created.

Success?

(2) Replication: solved

But opaque to using Amazon DynamoDB.

(3) Concurrency: kind of solved

Throttling introduces a new kind of

concurrency issue, but at least it is limited to a

single table.

Success?

(4) Ops: mostly solved

Ops doesn't have to babysit servers anymore,

but they need to learn the peculiarities of

Amazon DynamoDB and accept the limited

value of the console and available body of

knowledge.

Recap: What We Wrote

• GoDynamo: like the AWS SDK, but in Go

• BBPD: a proxy written on GoDynamo

• See github.com/smugmug

Recap: Why a Proxy?

• Allows us to integrate Amazon DynamoDB with

PHP so concurrency can be put to use

• Moves operations to an efficient runtime

• Provides for simple debugging via curl and can

check for well-formedness of requests locally

• Hides details like renewing IAM credentials

Trivial Examples

# Convenience endpoints available directly:

$ curl -X POST -d '{"TableName":"user","Key":{"UserID":{"N":"1"}, \

"Date":{"N":"20131017"}}}' http://localhost:12333/GetItem

# Or specify the endpoint in a header:

$ curl -H 'X-Amz-Target: DynamoDB_20120810.GetItem' \

-X POST -d '{"TableName":"user","Key":{"UserID":{"N":"1"}, \

"Date":{"N":"20131017"}}}' http://localhost:12333/

http://localhost:12333/GetItem

BBPD is Just a Layer

• GoDynamo is where the heavy lifting is done

• Libraries for all endpoints • AWS Signature Version 4 support

• IAM support (transparent and thread-safe)*

• Other nonstandard goodies

• Pro-concurrency, high performance

• Enables some cool hacks

GoDynamo: Why Go?

• Strong types, concurrency, Unicode, separate

compilation, fast startup, low(ish) memory use,

static binary as output of compiler (deploy →

scp my_program)

• Types ↔ JSON is easy, flexible, and idiomatic

• Easy to learn and sell to your boss

Trivial Example // control our concurrent access to IAM creds in the background

iam_ready_chan := make(chan bool)

go conf_iam.GoIAM(iam_ready_chan)

// try to get an Item from a table

var get1 get_item.Request

get1.TableName = “my-table”

get1.Key = make(endpoint.Item)

get1.Key[“myhashkey”] = endpoint.AttributeValue{s:”thishashkey”}

get1.Key[“myrangekey”] = endpoint.AttributeValue{n:”1”}

body,code,err := get1.EndpointReq()

if err != nil || code != http.StatusOK {

panic(“uh oh”)

} else {

fmt.Printf(“%v\n”,body)

}

AWS Identity and Access

Management (IAM)

• Included as a dependency is another package

worth mentioning: goawsroles

• An interface that describes how to handle IAM

credentials

• An implementation for text files

• Suspends threads as credentials are being

updated

Just the Beginning

• Available at github.com/smugmug

• Standard disclaimer – works for us, but YMMV!

• Would love for you to use it and help create a

community of contributors

Thanks! :)

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

DAT204

smugmug: from mysql to amazon dynamodb (dat204) | aws re:invent 2013

Technology

amazon ec2

concurrency mysql

simple tables

mysql wasnt

fairness mysql

scale mysql users

space constraints

notable issue