dat302 under the covers of amazon dynamodb - aws re: invent 2012

Under the Covers of Amazon DynamoDB

Matt Wood, Chief Data Scientist

Amazon DynamoDB

Two decisions + three clicks

= ready for use


= ready for use

Primary keys

Level of throughput

Amazon DynamoDB

Provisioned

throughput

Amazon DynamoDB

Provisioned

throughput

Data patterns

DynamoDB is a managed NoSQL

database service

Store and retrieve any amount of data

Serve any level of request traffic

Without the operational burden

Consistent, predictable performance

Single digit millisecond latency

Backed on solid-state drives

Flexible data model

Key/attribute pairs. No schema required.

Easy to create. Easy to adjust.

Seamless scalability

No table size limits. Unlimited storage.

No downtime.

Durable

Consistent, disk only writes

Replication across data centers

and availability zones


Focus on your app


= ready for use

Amazon DynamoDB

Provisioned

throughput

Provisioned throughput

Reserve IOPS for reads and writes

Scale up for down at any time

Pay per capacity unit

Priced per hour of provisioned throughput

Write throughput

Size of item x writes per second

$0.01 for 10 write units

Consistent writes

Atomic increment and decrement

Optimistic concurrency control: conditional writes

Transactions

Item level transactions only

Puts, updates and deletes are ACID

Read throughput

Strong or eventual consitency

Read throughput


Provisioned units = size of item x reads per second

$0.01 per hour for 50 units

Read throughput

Strong or eventual consistency

Provisioned units = size of item x reads per second

$0.01 per hour for 100 units

2

Read throughput


Same latency expectations

Mix and match at ‘read time’

Provisioned throughput is

managed by DynamoDB

Data is partitioned and managed

by DynamoDB

Achieving full provisioned throughput

requires a uniform workload

The DynamoDB Uniform Workload

DynamoDB divides table data in to multiple partitions

Data is distributed primarily by primary key

Provisioned throughput is divided evenly across partitions

The DynamoDB Uniform Workload

To achieve and maintain full provisioned throughput

for a table, spread the workload evenly

across primary keys

Non-uniform workloads

Some requests might be throttled,

even at high levels of provisioned throughput

Model data for a uniform workload

Amazon DynamoDB

Provisioned

throughput

Data patterns


= ready for use

Primary keys

Level of throughput

DynamoDB semantics

Tables, items and attributes

id = 100 date = 2012-05-16-09-00-

10 total = 25.00

id = 101 date = 2012-05-15-15-00-

11 total = 35.00

id = 101 date = 2012-05-16-12-00-

10 total = 100.00

id = 102 date = 2012-03-20-18-23-

10 total = 20.00

id = 102 date = 2012-03-20-18-23-

10 total = 120.00

id = 100 date = 2012-05-16-09-00-

10 total = 25.00

id = 101 date = 2012-05-15-15-00-

11 total = 35.00

id = 101 date = 2012-05-16-12-00-

10 total = 100.00

id = 102 date = 2012-03-20-18-23-

10 total = 20.00

id = 102 date = 2012-03-20-18-23-

10 total = 120.00

Table

id = 100 date = 2012-05-16-09-00-

10 total = 25.00

id = 101 date = 2012-05-15-15-00-

11 total = 35.00

id = 101 date = 2012-05-16-12-00-

10 total = 100.00

id = 102 date = 2012-03-20-18-23-

10 total = 20.00

id = 102 date = 2012-03-20-18-23-

10 total = 120.00

Item

id = 100 date = 2012-05-16-09-00-

10 total = 25.00

id = 101 date = 2012-05-15-15-00-

11 total = 35.00

id = 101 date = 2012-05-16-12-00-

10 total = 100.00

id = 102 date = 2012-03-20-18-23-

10 total = 20.00

id = 102 date = 2012-03-20-18-23-

10 total = 120.00

Attribute

Items are indexed by primary key

Single hash keys and composite range keys

id = 100 date = 2012-05-16-09-00-

10 total = 25.00

id = 101 date = 2012-05-15-15-00-

11 total = 35.00

id = 101 date = 2012-05-16-12-00-

10 total = 100.00

id = 102 date = 2012-03-20-18-23-

10 total = 20.00

id = 102 date = 2012-03-20-18-23-

10 total = 120.00

Hash key

id = 100 date = 2012-05-16-09-00-

10 total = 25.00

id = 101 date = 2012-05-15-15-00-

11 total = 35.00

id = 101 date = 2012-05-16-12-00-

10 total = 100.00

id = 102 date = 2012-03-20-18-23-

10 total = 20.00

id = 102 date = 2012-03-20-18-23-

10 total = 120.00

Range key

Items are retrieved by primary key

Range keys for queries

For example: all items for November

Relationships are not hard coded,

but can be modeled

Players

user_id =

mza

location =

Cambridge

joined =

2011-07-04

user_id =

jeffbarr

location =

Seattle

joined =

2012-01-20

user_id =

werner

location =

Worldwide

joined =

2011-05-15

user_id =

mza

location =

Cambridge

joined =

2011-07-04

user_id =

jeffbarr

location =

Seattle

joined =

2012-01-20

user_id =

werner

location =

Worldwide

joined =

2011-05-15

user_id =

mza

game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

werner

location =

bejewelled

score =

55,000

Players

Scores

user_id =

mza

game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

werner

location =

bejewelled

score =

55,000

Players

Scores

user_id =

mza

location =

Cambridge

joined =

2011-07-04

user_id =

jeffbarr

location =

Seattle

joined =

2012-01-20

user_id =

werner

location =

Worldwide

joined =

2011-05-15

game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

mza

game =

tetris

score =

9,000,000

user_id =

jeffbarr

Leader boards

user_id =

mza

game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

werner

location =

bejewelled

score =

55,000

Players

Scores

user_id =

mza

location =

Cambridge

joined =

2011-07-04

user_id =

jeffbarr

location =

Seattle

joined =

2012-01-20

user_id =

werner

location =

Worldwide

joined =

2011-05-15

game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

mza

game =

tetris

score =

9,000,000

user_id =

jeffbarr

Leader boards

Scores by user

user_id =

mza

game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

werner

location =

bejewelled

score =

55,000

Players

Scores

user_id =

mza

location =

Cambridge

joined =

2011-07-04

user_id =

jeffbarr

location =

Seattle

joined =

2012-01-20

user_id =

werner

location =

Worldwide

joined =

2011-05-15

game =

angry-birds

score =

11,000

user_id =

mza

game =

tetris

score =

1,223,000

user_id =

mza

game =

tetris

score =

9,000,000

user_id =

jeffbarr

Leader boards

High scores by game

NoSQL data modeling for maximal

provisioned throughput

Distinct values for hash keys

Hash key elements should have a high

number of distinct values

user_id =

mza

first_name =

Matt

last_name =

Wood

user_id =

jeffbarr

first_name =

Jeff

last_name =

Barr

user_id =

werner

first_name =

Werner

last_name =

Vogels

user_id =

mattfox

first_name =

Matt

last_name =

Fox

... ... ...

Lots of unique user IDs: workload well distributed

Limited response codes: workload poorly distributed

status =

200

date =

2012-04-01-00-00-01

status =

404

date =

2012-04-01-00-00-01

status

404

date =

2012-04-01-00-00-01

status =

404

date =

2012-04-01-00-00-01

Amazon DynamoDB

Provisioned

throughput

Data patterns

NYT faбrik

AWS re:Invent – November 2012

Andrew Canaday & Michael Laing

New York Times Digital

What we’ll cover

faбrik overview

Getting more out of DynamoDB with python/boto

– More throughput / provisioned capacity

– Across more endpoints / table

– More reliably and controllably

60

61

Frank McCloud: “He wants more, don't you, Rocco?”

Johnny Rocco: “Yeah. That's it. More. That's right! I want more!”

James Temple: “Will you ever get enough?”

Frank McCloud: “Will you, Rocco?”

Johnny Rocco: “Well, I never have. No, I guess I won't.”

Takeaways

Messaging infrastructure is cool (again)

Old dogs have tricks you can apply

– The Internet is your friend

– BUT: much good computer science was done prior

– HENCE: not so readily findable

Boto is great – clone and contribute!

62

NYT Mission

63

Enhance society by creating, collecting and distributing high quality

news, information and entertainment

- Distributing: publish / subscribe

- Collecting: gather / analyze

- High Quality: fast, reliable, accurate

faбrik

Asynchronous Messaging Framework

For client devices as well as our apps

Enabled by:

– Websockets

– Robust message handling software

– Amazon Web Services

Focusing on simple, common services

64

Typical Web Architecture

Clients interact with front-end via load balancers

Front-end makes requests to back-end on behalf of client

Bottlenecks abound

Information transfer is initiated by client

65

Typical Request Flow

…

Client

Load Balancer

Front End

… Data

… API

Typical Response Flow

…

Client

Load Balancer

Front End

… Data

… API

faбrik Web Architecture

Clients interact with the nearest “App Buddy” front-end

The “App Buddy” is connected to the “Bad Rabbit” backbone

The “Bad Rabbit” backbone is clustered regionally and federated globally

NYT content producers connect directly to the backbone

Information flow is bidirectional and event-driven

68

faбrik Information Flow

69

NYT

Internal

Client

Client Client

Globally distributed

“faбrik ” layer

faбrik – basic

70

App

App

App

Message Broker

faбrik – basic

71

App

App

App

Message Broker

Amazon Web Services

• EC2

• S3

• Identity & Access

Mgt

• DynamoDB

• Route 53

…

faбrik – basic++

72

OtherAp

p

App

Buddy Service

Buddy

Message Broker

Message Broker

Service

Buddy

“Retail”

“Wholesale”

faбrik: Current Implementation

73

Open source:

– Erlang/OTP 14B04

– RabbitMQ 2.8.7/3.pre

– Nodejs 8.xx

– Sockjs (websockets +)

– Python 2.6/2.7

– ZeroMQ

Automated deployment using CloudFormation

DynamoDB & S3 for persistence

faбrik – active/active cluster

74

Service

Buddy

‘a’

Service

Buddy

‘b’

Zone ‘a’ Zone ‘b’

Region Wherever

faбrik – active/active cluster

75

Service

Buddy

‘a’

Service

Buddy

‘b’

Zone ‘a’ Zone ‘b’

Region Wherever

faбrik

77

So why DynamoDB?

faбrik services are reliable but stateless (mostly)

A happy faбrik has short queues (measurable by the way)

So persist everything as rapidly as possible (enter DynamoDB)

Plus we want to gather & analyze

– Pulse: Map / Reduce, rapid cycle

– Longitudinal analysis

– Complex Event Processing in parallel (maybe)

Note: the faбrik is asynchronous and facilitates parallelization

80

DynamoDB requirements

Store all messages crossing each ‘virtual host’

Note: think of a ‘virtual host’ as a horizontal band of related, reliable

services/endpoints across zones/instances in a region

Store log messages for all application and system instances

Facilitate ‘burst’ loads as well as steady state

Support gather / analyze for all of the above

Generational storage: DDB to S3 to Glacier (with some weeding)

Fairly allocate resources among many competing endpoints

81

82

Conventional wisdom…

Uh oh – we have an unpredictable mix of all these…

More conventional wisdom…

“In addition to simple retries, we recommend using an exponential backoff algorithm

for better flow control. The concept behind exponential backoff is to use progressively

longer waits between retries for consecutive error responses. For example, up to 50

milliseconds before the first retry, up to 100 milliseconds before the second, up to 2400

milliseconds before third, and so on. However, after a minute, if the request has not

succeeded, the problem might be the request size exceeding your provisioned

throughput, and not the request rate. Set the maximum number of retries to stop around

one minute. If the request is not successful, investigate your provisioned throughput

options.” [i.e. increase provisioned throughput – hmmmm…]

83

So…

We would have to provision for peaks

Exponential backoff would give us about a 1 minute buffer

But! The faбrik does buffering and we can monitor queue lengths

Plus we have asynchronous event scheduling/handling facilities built in…

84

First strategy

With node.js, asynchronously blast all requests at dynamo, reschedule exponentially

based on backpressure

This worked pretty well!

– * Dynamo would deliver about 3 times stated capacity in bursts

– Nothing got lost

– Converged reasonably onto table capacity

But…

– Problems exerting backpressure on the faбrik from node.js… hence requests could get scheduled

WAY into el futuro… and WAY out of order

– Competition among endpoints was ‘unfair’ and fostered convergence problems

85

Current strategy

Be smarter, look for similar patterns and tested solutions, plus select tools that give

the right level of control

Old dog:

– “I remember when TCP was new and throughput was not very high…”

(time passes)

– “The ‘ThroughputExceeded’ backpressure from DynamoDB is sort of like TCP backpressure…”

(more time passes)

– “Perhaps we could leverage that thought by applying the research and practices that have

improved TCP etc. to our use of DynamoDB.”

(time for a nap)

86

Current strategy

Be smarter, look for similar patterns and tested solutions, plus select tools that give

the right level of control

Token Bucket (circa 1986) for traffic shaping

“…an algorithm used in packet switched computer networks and telecommunications networks to

check that data transmissions conform to defined limits on bandwidth and burstiness.” – Wikipedia

Additive Increase/Multiplicative Decrease (AIMD)

“…a feedback control algorithm best known for its use in TCP congestion control…combines linear

growth of the congestion window with an exponential reduction when congestion takes place…flows

will eventually converge to use equal amounts of a contended link.” – Wikipedia

Explicit Congestion Notification (ECN) etc. etc.

87

http://en.wikipedia.org/wiki/Burst_transmission

Current strategy

Be smarter, look for similar patterns and tested solutions, plus select reliable tools

that give the right level of control

Tools:

– Use python to get a more mature and lower level event-driven interface (pika) to RabbitMQ –

easier to exert backpressure on the message source

– Use boto to get a mature interface to DynamoDB that can be easily ‘tweaked’ to give better

information about backpressure from DynamoDB (ThroughputExceeded exception)

– Use python’s concurrent futures to easily add asynchronous capability to boto, making use of

boto’s connection pooling

88

Managed Access to DynamoDB

89

Placeholder

Here we show some code, describe the testing methodology briefly, and show

generated results.

90

Amazon DynamoDB

Provisioned

throughput

Data patterns

Thank you!

[email protected]

@mza

We are sincerely eager to

hear your FEEDBACK on this

presentation and on re:Invent.

Please fill out an evaluation

form when you have a

chance.

dat302 under the covers of amazon dynamodb - aws re: invent 2012

Documents

mza user

table date

item date

attribute date

players user

scores user

range key date

scoresleader boards