dat302 under the covers of amazon dynamodb - aws re: invent 2012
DESCRIPTION
Learn about the thought and decisions that went into designing and building DynamoDB. We'll talk about its roots and how we can deliver the performance and throughput you enjoy today. We’ll also show you how to model data, maintain maximum throughput, and drive analytics against the data with DynamoDB. Finally, you'll hear from some of our customers on how they've built large-scale applications on DynamoDB and about the lessons they've learned along the way.TRANSCRIPT
Under the Covers of Amazon DynamoDB
Matt Wood, Chief Data Scientist
Hello
Amazon DynamoDB
Two decisions + three clicks
= ready for use
Two decisions + three clicks
= ready for use
Primary keys
Level of throughput
Amazon DynamoDB
Provisioned
throughput
Amazon DynamoDB
Provisioned
throughput
Data patterns
Amazon DynamoDB
Provisioned
throughput
Data patterns
DynamoDB is a managed NoSQL
database service
Store and retrieve any amount of data
Serve any level of request traffic
Without the operational burden
Consistent, predictable performance
Single digit millisecond latency
Backed on solid-state drives
Flexible data model
Key/attribute pairs. No schema required.
Easy to create. Easy to adjust.
Seamless scalability
No table size limits. Unlimited storage.
No downtime.
Durable
Consistent, disk only writes
Replication across data centers
and availability zones
Without the operational burden
Without the operational burden
Focus on your app
Two decisions + three clicks
= ready for use
Amazon DynamoDB
Provisioned
throughput
Provisioned throughput
Reserve IOPS for reads and writes
Scale up for down at any time
Pay per capacity unit
Priced per hour of provisioned throughput
Write throughput
Size of item x writes per second
$0.01 for 10 write units
Consistent writes
Atomic increment and decrement
Optimistic concurrency control: conditional writes
Transactions
Item level transactions only
Puts, updates and deletes are ACID
Read throughput
Strong or eventual consitency
Read throughput
Strong or eventual consitency
Provisioned units = size of item x reads per second
$0.01 per hour for 50 units
Read throughput
Strong or eventual consistency
Provisioned units = size of item x reads per second
$0.01 per hour for 100 units
2
Read throughput
Strong or eventual consitency
Same latency expectations
Mix and match at ‘read time’
Provisioned throughput is
managed by DynamoDB
Data is partitioned and managed
by DynamoDB
Achieving full provisioned throughput
requires a uniform workload
The DynamoDB Uniform Workload
DynamoDB divides table data in to multiple partitions
Data is distributed primarily by primary key
Provisioned throughput is divided evenly across partitions
The DynamoDB Uniform Workload
To achieve and maintain full provisioned throughput
for a table, spread the workload evenly
across primary keys
Non-uniform workloads
Some requests might be throttled,
even at high levels of provisioned throughput
Model data for a uniform workload
Amazon DynamoDB
Provisioned
throughput
Data patterns
Two decisions + three clicks
= ready for use
Primary keys
Level of throughput
DynamoDB semantics
Tables, items and attributes
id = 100 date = 2012-05-16-09-00-
10 total = 25.00
id = 101 date = 2012-05-15-15-00-
11 total = 35.00
id = 101 date = 2012-05-16-12-00-
10 total = 100.00
id = 102 date = 2012-03-20-18-23-
10 total = 20.00
id = 102 date = 2012-03-20-18-23-
10 total = 120.00
id = 100 date = 2012-05-16-09-00-
10 total = 25.00
id = 101 date = 2012-05-15-15-00-
11 total = 35.00
id = 101 date = 2012-05-16-12-00-
10 total = 100.00
id = 102 date = 2012-03-20-18-23-
10 total = 20.00
id = 102 date = 2012-03-20-18-23-
10 total = 120.00
Table
id = 100 date = 2012-05-16-09-00-
10 total = 25.00
id = 101 date = 2012-05-15-15-00-
11 total = 35.00
id = 101 date = 2012-05-16-12-00-
10 total = 100.00
id = 102 date = 2012-03-20-18-23-
10 total = 20.00
id = 102 date = 2012-03-20-18-23-
10 total = 120.00
Item
id = 100 date = 2012-05-16-09-00-
10 total = 25.00
id = 101 date = 2012-05-15-15-00-
11 total = 35.00
id = 101 date = 2012-05-16-12-00-
10 total = 100.00
id = 102 date = 2012-03-20-18-23-
10 total = 20.00
id = 102 date = 2012-03-20-18-23-
10 total = 120.00
Attribute
Items are indexed by primary key
Single hash keys and composite range keys
id = 100 date = 2012-05-16-09-00-
10 total = 25.00
id = 101 date = 2012-05-15-15-00-
11 total = 35.00
id = 101 date = 2012-05-16-12-00-
10 total = 100.00
id = 102 date = 2012-03-20-18-23-
10 total = 20.00
id = 102 date = 2012-03-20-18-23-
10 total = 120.00
Hash key
id = 100 date = 2012-05-16-09-00-
10 total = 25.00
id = 101 date = 2012-05-15-15-00-
11 total = 35.00
id = 101 date = 2012-05-16-12-00-
10 total = 100.00
id = 102 date = 2012-03-20-18-23-
10 total = 20.00
id = 102 date = 2012-03-20-18-23-
10 total = 120.00
Range key
Items are retrieved by primary key
Range keys for queries
For example: all items for November
Relationships are not hard coded,
but can be modeled
Players
user_id =
mza
location =
Cambridge
joined =
2011-07-04
user_id =
jeffbarr
location =
Seattle
joined =
2012-01-20
user_id =
werner
location =
Worldwide
joined =
2011-05-15
user_id =
mza
location =
Cambridge
joined =
2011-07-04
user_id =
jeffbarr
location =
Seattle
joined =
2012-01-20
user_id =
werner
location =
Worldwide
joined =
2011-05-15
user_id =
mza
game =
angry-birds
score =
11,000
user_id =
mza
game =
tetris
score =
1,223,000
user_id =
werner
location =
bejewelled
score =
55,000
Players
Scores
user_id =
mza
game =
angry-birds
score =
11,000
user_id =
mza
game =
tetris
score =
1,223,000
user_id =
werner
location =
bejewelled
score =
55,000
Players
Scores
user_id =
mza
location =
Cambridge
joined =
2011-07-04
user_id =
jeffbarr
location =
Seattle
joined =
2012-01-20
user_id =
werner
location =
Worldwide
joined =
2011-05-15
game =
angry-birds
score =
11,000
user_id =
mza
game =
tetris
score =
1,223,000
user_id =
mza
game =
tetris
score =
9,000,000
user_id =
jeffbarr
Leader boards
user_id =
mza
game =
angry-birds
score =
11,000
user_id =
mza
game =
tetris
score =
1,223,000
user_id =
werner
location =
bejewelled
score =
55,000
Players
Scores
user_id =
mza
location =
Cambridge
joined =
2011-07-04
user_id =
jeffbarr
location =
Seattle
joined =
2012-01-20
user_id =
werner
location =
Worldwide
joined =
2011-05-15
game =
angry-birds
score =
11,000
user_id =
mza
game =
tetris
score =
1,223,000
user_id =
mza
game =
tetris
score =
9,000,000
user_id =
jeffbarr
Leader boards
Scores by user
user_id =
mza
game =
angry-birds
score =
11,000
user_id =
mza
game =
tetris
score =
1,223,000
user_id =
werner
location =
bejewelled
score =
55,000
Players
Scores
user_id =
mza
location =
Cambridge
joined =
2011-07-04
user_id =
jeffbarr
location =
Seattle
joined =
2012-01-20
user_id =
werner
location =
Worldwide
joined =
2011-05-15
game =
angry-birds
score =
11,000
user_id =
mza
game =
tetris
score =
1,223,000
user_id =
mza
game =
tetris
score =
9,000,000
user_id =
jeffbarr
Leader boards
High scores by game
NoSQL data modeling for maximal
provisioned throughput
Distinct values for hash keys
Hash key elements should have a high
number of distinct values
user_id =
mza
first_name =
Matt
last_name =
Wood
user_id =
jeffbarr
first_name =
Jeff
last_name =
Barr
user_id =
werner
first_name =
Werner
last_name =
Vogels
user_id =
mattfox
first_name =
Matt
last_name =
Fox
... ... ...
Lots of unique user IDs: workload well distributed
Limited response codes: workload poorly distributed
status =
200
date =
2012-04-01-00-00-01
status =
404
date =
2012-04-01-00-00-01
status
404
date =
2012-04-01-00-00-01
status =
404
date =
2012-04-01-00-00-01
Amazon DynamoDB
Provisioned
throughput
Data patterns
Amazon DynamoDB
Provisioned
throughput
Data patterns
NYT faбrik
AWS re:Invent – November 2012
Andrew Canaday & Michael Laing
New York Times Digital
What we’ll cover
faбrik overview
Getting more out of DynamoDB with python/boto
– More throughput / provisioned capacity
– Across more endpoints / table
– More reliably and controllably
60
61
Frank McCloud: “He wants more, don't you, Rocco?”
Johnny Rocco: “Yeah. That's it. More. That's right! I want more!”
James Temple: “Will you ever get enough?”
Frank McCloud: “Will you, Rocco?”
Johnny Rocco: “Well, I never have. No, I guess I won't.”
Takeaways
Messaging infrastructure is cool (again)
Old dogs have tricks you can apply
– The Internet is your friend
– BUT: much good computer science was done prior
– HENCE: not so readily findable
Boto is great – clone and contribute!
62
NYT Mission
63
Enhance society by creating, collecting and distributing high quality
news, information and entertainment
- Distributing: publish / subscribe
- Collecting: gather / analyze
- High Quality: fast, reliable, accurate
faбrik
Asynchronous Messaging Framework
For client devices as well as our apps
Enabled by:
– Websockets
– Robust message handling software
– Amazon Web Services
Focusing on simple, common services
64
Typical Web Architecture
Clients interact with front-end via load balancers
Front-end makes requests to back-end on behalf of client
Bottlenecks abound
Information transfer is initiated by client
65
Typical Request Flow
…
Client
Load Balancer
Front End
… Data
… API
Typical Response Flow
…
Client
Load Balancer
Front End
… Data
… API
faбrik Web Architecture
Clients interact with the nearest “App Buddy” front-end
The “App Buddy” is connected to the “Bad Rabbit” backbone
The “Bad Rabbit” backbone is clustered regionally and federated globally
NYT content producers connect directly to the backbone
Information flow is bidirectional and event-driven
68
faбrik Information Flow
69
NYT
Internal
Client
Client Client
Globally distributed
“faбrik ” layer
faбrik – basic
70
App
App
App
Message Broker
faбrik – basic
71
App
App
App
Message Broker
Amazon Web Services
• EC2
• S3
• Identity & Access
Mgt
• DynamoDB
• Route 53
…
faбrik – basic++
72
OtherAp
p
App
Buddy Service
Buddy
Message Broker
Message Broker
Service
Buddy
“Retail”
“Wholesale”
faбrik: Current Implementation
73
Open source:
– Erlang/OTP 14B04
– RabbitMQ 2.8.7/3.pre
– Nodejs 8.xx
– Sockjs (websockets +)
– Python 2.6/2.7
– ZeroMQ
Automated deployment using CloudFormation
DynamoDB & S3 for persistence
faбrik – active/active cluster
74
Service
Buddy
‘a’
Service
Buddy
‘b’
Zone ‘a’ Zone ‘b’
Region Wherever
faбrik – active/active cluster
75
Service
Buddy
‘a’
Service
Buddy
‘b’
Zone ‘a’ Zone ‘b’
Region Wherever
76
faбrik
77
78
79
So why DynamoDB?
faбrik services are reliable but stateless (mostly)
A happy faбrik has short queues (measurable by the way)
So persist everything as rapidly as possible (enter DynamoDB)
Plus we want to gather & analyze
– Pulse: Map / Reduce, rapid cycle
– Longitudinal analysis
– Complex Event Processing in parallel (maybe)
Note: the faбrik is asynchronous and facilitates parallelization
80
DynamoDB requirements
Store all messages crossing each ‘virtual host’
Note: think of a ‘virtual host’ as a horizontal band of related, reliable
services/endpoints across zones/instances in a region
Store log messages for all application and system instances
Facilitate ‘burst’ loads as well as steady state
Support gather / analyze for all of the above
Generational storage: DDB to S3 to Glacier (with some weeding)
Fairly allocate resources among many competing endpoints
81
82
Conventional wisdom…
Uh oh – we have an unpredictable mix of all these…
More conventional wisdom…
“In addition to simple retries, we recommend using an exponential backoff algorithm
for better flow control. The concept behind exponential backoff is to use progressively
longer waits between retries for consecutive error responses. For example, up to 50
milliseconds before the first retry, up to 100 milliseconds before the second, up to 2400
milliseconds before third, and so on. However, after a minute, if the request has not
succeeded, the problem might be the request size exceeding your provisioned
throughput, and not the request rate. Set the maximum number of retries to stop around
one minute. If the request is not successful, investigate your provisioned throughput
options.” [i.e. increase provisioned throughput – hmmmm…]
83
So…
We would have to provision for peaks
Exponential backoff would give us about a 1 minute buffer
But! The faбrik does buffering and we can monitor queue lengths
Plus we have asynchronous event scheduling/handling facilities built in…
84
First strategy
With node.js, asynchronously blast all requests at dynamo, reschedule exponentially
based on backpressure
This worked pretty well!
– * Dynamo would deliver about 3 times stated capacity in bursts
– Nothing got lost
– Converged reasonably onto table capacity
But…
– Problems exerting backpressure on the faбrik from node.js… hence requests could get scheduled
WAY into el futuro… and WAY out of order
– Competition among endpoints was ‘unfair’ and fostered convergence problems
85
Current strategy
Be smarter, look for similar patterns and tested solutions, plus select tools that give
the right level of control
Old dog:
– “I remember when TCP was new and throughput was not very high…”
(time passes)
– “The ‘ThroughputExceeded’ backpressure from DynamoDB is sort of like TCP backpressure…”
(more time passes)
– “Perhaps we could leverage that thought by applying the research and practices that have
improved TCP etc. to our use of DynamoDB.”
(time for a nap)
86
Current strategy
Be smarter, look for similar patterns and tested solutions, plus select tools that give
the right level of control
Token Bucket (circa 1986) for traffic shaping
“…an algorithm used in packet switched computer networks and telecommunications networks to
check that data transmissions conform to defined limits on bandwidth and burstiness.” – Wikipedia
Additive Increase/Multiplicative Decrease (AIMD)
“…a feedback control algorithm best known for its use in TCP congestion control…combines linear
growth of the congestion window with an exponential reduction when congestion takes place…flows
will eventually converge to use equal amounts of a contended link.” – Wikipedia
Explicit Congestion Notification (ECN) etc. etc.
87
Current strategy
Be smarter, look for similar patterns and tested solutions, plus select reliable tools
that give the right level of control
Tools:
– Use python to get a more mature and lower level event-driven interface (pika) to RabbitMQ –
easier to exert backpressure on the message source
– Use boto to get a mature interface to DynamoDB that can be easily ‘tweaked’ to give better
information about backpressure from DynamoDB (ThroughputExceeded exception)
– Use python’s concurrent futures to easily add asynchronous capability to boto, making use of
boto’s connection pooling
88
Managed Access to DynamoDB
89
Placeholder
Here we show some code, describe the testing methodology briefly, and show
generated results.
90
Amazon DynamoDB
Provisioned
throughput
Data patterns
We are sincerely eager to
hear your FEEDBACK on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.