aws webcast - data modeling for low cost and high performance with dynamodb
DESCRIPTION
Efficient schema design reduces cost and eliminates barriers to scalability. This requires a different approach to data modeling, with a focus on optimizing for usage patterns rather than merely describing objects. In this session, you will learn best practice techniques for minimizing payload size and modeling one-to-many relationships in DynamoDB, leveraging the versatility of hash+range primary keys. These methods have been used extensively by customers with substantial workloads on DynamoDB, enabling them to grow their applications quickly and cost-effectively.TRANSCRIPT
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Data Modeling for
Low Cost and High Performance
with Amazon DynamoDB
David Pearson
Siva Raghupathy
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
What is AWS?
Compute Storage
AWS Global Infrastructure
Database
Application Services
Deployment & Administration
Networking
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon DynamoDB Fast, Predictable, Highly-Scalable NoSQL Data Store
Amazon RDS Managed Relational Database Service for
MySQL, Oracle and SQL Server
Amazon ElastiCache In-Memory Caching Service
Amazon Redshift Fast, Powerful, Fully Managed, Petabyte-Scale
Data Warehouse Service
Compute Storage
AWS Global Infrastructure
Database
Application Services
Deployment & Administration
Networking
AWS Database
Services
Scalable High Performance
Application Storage in the Cloud
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon DynamoDB Fast, Predictable, Highly-Scalable NoSQL Data Store
Amazon RDS Managed Relational Database Service for
MySQL, Oracle and SQL Server
Amazon ElastiCache In-Memory Caching Service
Amazon Redshift Fast, Powerful, Fully Managed, Petabyte-Scale
Data Warehouse Service
Compute Storage
AWS Global Infrastructure
Database
Application Services
Deployment & Administration
Networking
AWS Database
Services
Scalable High Performance
Application Storage in the Cloud
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Fully Managed
Non-Relational
Predictable Performance
Massively Scalable =
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
= FAST Develop in days Scale in minutes
Low Latency single-digit msec with on-disk durability, spanning multiple AZ’s
Admin-Free Scalability requested-capacity provisioning of read and write throughput
Rapid Deployment simple APIs and no effort needed to configure and maintain a cluster
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Data Lifecycle Integration with Redshift
Direct integration with COPY command
High velocity data ages into Redshift
Low cost, high scale option for new apps
DynamoDB Redshift
OLTP Web Apps
Reporting and BI
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
New! Local Secondary Indexes
support for new access patterns
query flexibility
application complexity
consistent latency
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Access Pattern Modeling
Method
1. Describe the overall use case – maintain context
2. Identify the individual access patterns of the use case
3. Model each access pattern to its own discrete data set
4. Consolidate data sets into tables and indexes
Benefits
• Data is stored in the format it is accessed
• Payloads are minimal for each access
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Agenda
DynamoDB Recap
Modeling Primitives
Modeling Examples
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Tables, Items, Attributes
An AWS account owns a collection of Tables
A Table is a collection of Items
An Item is a arbitrary collection of Attributes (Name-
Value pairs)
11
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Primary Key
A table mush have a Primary Key • Each item must have a unique primary key
Types of Primary keys: • Hash
• One attribute is chosen as the Hash key
• Unordered hash index
• Hash and Range • Two attributes constitute a composite key
– First is Hash
» Unordered hash index
– Second is Range
» Sorted range index
• Sorted collection within a hash bucket
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Online Gaming Example
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB Namespace AWS
Account1
Region1
Table1
Item1
Attribute1 (Hash key)
Attribute2 (Range key)
Attribute3
Item2
Attribute1
Attribute2
...
…
Table2
Item1
Attribute1 (Hash key)
Attribute2
…
Item2
Attribute1
Attribute2
…
…
…
Region2 …
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Indexing
Data is indexed by the primary key
Local Secondary Indexes provide an “alternate range
key” for your table for efficient queries
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Partitioning
DynamoDB automatically partitions data by the hash key • Hash key spreads data & workload across partitions
Auto-partitioning driven by: • Data set size
• Provisioned Throughput
Tip: Large number of unique hash keys and uniform distribution of workload across hash keys lends well to massive scale!
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Data types
Scalar data types • String (S)
• Unicode with UTF8 binary encoding
• Number (N) • Up to 38 digits precision and can be between 10-128 to 10+126
• Variable width encoding can occupy up to 21 bytes
• Binary (B) • Base64 encoded binary data
Multi-valued types • String Set (SS)
• Number Set (NS)
• Binary Set (BS)
17
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB API CreateTable
UpdateTable
DeleteTable
DescribeTable
ListTables
PutItem
GetItem
UpdateItem
DeleteItem
BatchGetItem
BatchWriteItem
Query
Scan
manage tables
query specific items OR scan the full table
read and write items
bulk get or update
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Read Patterns
GetItem • Returns a set of attributes for an item that matches the primary key
Query • Only works on a table with a composite hash, range key
• Hash key = ‘xxxxx’ and no range key condition
• Hash key = ‘xxxxx’ and a range key condition (EQ, GT, LT, GE, LE, BEGINS_WITH, BETWEEN)
• Count of items (that match a hash key value or hash key + range condition)
• Top N / Bottom N items ( via ScanIndexForward = T/F & Limit N)
• Paging via Limit N & ExclusiveStartKey = LastEvaluatedKey
BatchGetItem • Returns the attributes for multiple items from multiple tables using their primary keys
Scan • Scans a table from beginning to end and applies filters
19
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Write Patterns PutItem
• Add a new item, replace an item with a new item
• Conditional: Insert a new item only if the PK does not exist
• Can returns ALL_OLD
UpdateItem • Add, update or delete an attribute (other than the PK)
• Increment an attribute (X = X + 10) • Atomic increment and get
• Insert a new item and attributes
• Conditional: Insert a new attribute if it does not exist
• Can return ALL_OLD, UPDATED_OLD, ALL_NEW or UPDATED_NEW (optional)
DeleteItem • Delete an item
• Conditional: Delete an item if it exists or if it has an expected attribute value
• Can return ALL_OLD (optional)
BatchWriteItem • Up to 25 put or delete operations (or 1 MB payload) in a single API call
• Not atomic across multiple items or tables (but individual updates are atomic)
• No conditional updates or return values
20
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Agenda
DynamoDB Recap
Modeling Primitives
Modeling Examples
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Modeling 1:1 relationships
Use a table with a hash key
Examples:
• Users
• Hash key = UserID
• Games
• Hash key = GameId
Users Table Hash key Attributes UserId = bob Email = [email protected], JoinDate = 2011-11-15 UserId = fred Email = [email protected], JoinDate = 2011-12-
01, Sex = M
Games Table Hash key Attributes GameId = Game1 LaunchData = 2011-10-15, Version = 2, GameId = Game2 LaunchDate = 2010-05-12, Version = 3, GameId = Game3 LaunchDate = 2012-01-20, version = 1
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Modeling 1:N relationships
Use a table with hash and range key
Example:
• One (1) User can play many (N) Games
• User_Games table
– Hash key = UserId
– Range key = GameId
User Games table Hash Key Range key Attributes UserId = bob GameId =
Game1 HighScore = 10500, ScoreDate = 2011-10-20
UserId = fred
GameId = Game2
HIghScore = 12000, ScoreDate = 2012-01-10
UserId = bob GameId = Game3
HighScore = 20000, ScoreDate = 2012-02-12
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Modeling N:M relationships
Use two hash and range tables
Example:
• One User can play many Games
• Hash key = UserId
• Range key = GameId
• One Game can have many Users
• Hash key = GameId
• Range key = UserId
User_Games Hash Key Range key UserId = bob GameId = Game1
UserId = fred GameId = Game2
UserId = bob GameId = Game3
Game_Users Hash Key Range key GameId = Game1 UserId = bob GameId = Game2 UserId = fred GameId = Game3 UserId = bob
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Modeling Multi-tenancy
Use tenant id as the
hash key
Example:
ForumName is
the tenent id in the
Thread table
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Agenda
DynamoDB Recap
Modeling Primitives
Modeling Examples
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Example1: Multi-tenant application for file
storing and sharing
Access Patterns
1. Users should be able to query all the files they own
2. Search by File Name
3. Search by File Type
4. Search by Date Range
5. Keep track of Shared Files
6. Search by descending order or File Size
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Entities and Relationships
Entities:
• Users
• Files
Relationship
• One User has many Files (1:N)
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
DynamoDB Data Model
Users
• Hash key = UserId (S)
• Attributes = User Name (S), Email (S), Address (SS), etc.
User_Files
• Hash key = UserId (S) – This is also the tenant id
• Range key = FileId (S)
• Attributes = Name (S), Type (S), Size (N), Date (S), SharedFlag
(S), S3key (S)
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Access Pattern 1
Find all files owned by a user
• Query (TableName = User_Files, UserId = 2)
UserId
(Hash)
FileId
(Range)
Name Date Type SharedFlag Size S3key
1 1 File1 2013-04-23 JPG 1000 bucket\1
1 2 File2 2013-03-10 PDF Y 100 bucket\2
2 1 File3 2013-03-10 PNG Y 2000 bucket\3
2 2 File4 2013-03-10 DOC 3000 bucket\4
3 1 File5 2013-04-10 TXT 400 bucket\5
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Access Pattern 2
Search by file name
• Query (TableName =
User_Files, IndexName =
NameIndex, UserId = 1,
Name = File1)
UserId Name FileId
1 File1 1
1 File2 2
2 File3 1
2 File4 2
3 File5 1
NameIndex
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Access Pattern 3
Search for file name by
file Type
• Query (TableName =
User_Files, IndexName =
TypeIndex, UserId = 2,
Type = DOC)
UserId Type FileId Name
1 JPG 1 File1
1 PDF 2 File2
2 DOC 2 File4
2 PNG 1 File3
3 TXT 1 File5
Projection
TypeIndex
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Access Pattern 4
Search for file name by
date range
• Query (TableName =
User_Files, IndexName =
DateIndex, UserId = 1,
Date between 2013-03-
01 and 2013-03-29)
UserId Date FileId Name
1 2013-03-10 2 File2
1 2013-04-23 1 File1
2 2013-03-10 1 File3
2 2013-03-10 2 File4
3 2013-04-10 1 File5
Projection
DateIndex
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Access Pattern 5
Search for names of
Shared files
• Query (TableName =
User_Files, IndexName =
SharedFlagIndex, UserId
= 1, SharedFlag = Y)
UserId SharedFlag FileId Name
1 Y 2 File2
2 Y 1 File3
Projection
SharedFlagIndex
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Access Pattern 6
Query for file names by
descending order of size
• Query (TableName =
User_Files, IndexName =
SizeIndex, UserId = 1,
ScanIndexForward = false)
UserId Size FileId Name
1 100 1 File1
3 400 1 File2
1 1000 2 File3
2 2000 1 File4
2 3000 2 File5
Projection
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Local Secondary Indexes
Table Name Index Name Attribute to
Index
Projected Attribute
User_Files NameIndex Name KEYS
User_Files TypeIndex Type KEYS + Name
User_Files DateIndex Date KEYS + Name
User_Files SharedFlagIndex SharedFlag KEYS + Name
User_Files SizeIndex Size KEYS + Name
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Example 2: Modeling large items
Break large attributes
across multiple
DynamoDB items
Store Large attributes
in Amazon S3
MESSAGE-
ID
(hash key)
1 FROM = ‘user1’
TO = ‘user2’
DATE = ‘12/12/2011’
SUBJECT = ‘DynamoDB Best practices’
BODY= ‘The first few Kbytes…..’
BODY_OVERFLOW = ‘S3bucket+key’
MESSAGE-ID
(hash key) PART
(range key)
1 0 FROM = ‘user1’
TO = ‘user2’
DATE = ‘12/12/2011’
SUBJECT = ‘DynamoDB Best practices’
BODY = ‘The first few Kbytes…..’ 1 1 BODY = ‘ the next 64k’ 1 2 BODY = ‘ the next 64k’ 1 3 EOM
37
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Example 2: Modeling large items
Use a overflow table for large attributes
Retrieve items via Batch Get
Mail Box Table
ID (hash key)
Timestamp (range key)
Attribute1
Attribute2
Attribute3
….
AttributeN
LargeAttribute
MailBox Table
ID (hash key)
Timestamp (range key)
Attribute1
Attribute2
Attribute3
….
AttributeN
LargeAttributeUUID
Overflow Table
LargeAttributeUUID LargeAttribute
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Example 3: Modeling Time Series Data
You application wants to
keep one year historic data
You can pre-create one
table per week (or per day
or per month) and insert
records into the appropriate
table based on timestamp
39
Events_table_2012
Event_id
(Hash key)
Timestamp
(range key)
Attribute1 …. Attribute N
Events_table_2012_05_week1
Event_id (Hash key)
Timestamp (range key)
Attribute1 …. Attribute N Events_table_2012_05_week2
Event_id (Hash key)
Timestamp (range key)
Attribute1 …. Attribute N Events_table_2012_05_week3
Event_id (Hash key)
Timestamp (range key)
Attribute1 …. Attribute N
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Example 4: Modeling Global Secondary Indexes
Create global secondary indexes
• Example: First_name_index & Last_name_index
Query: Get me all the Users data for First_name = ‘Tim’
• Query First_name_index for hash key = ‘Tim’
• This will return User_id = (101, 201)
• BatchGet (Users, [101, 201])
40
User_Id
(hash
key)
First_name Last_name …
101 Tim White 201 Tim Black 301 Ted White 401 Keith Brown 501 Keith White 601 Keith Black
First_name
(hash key) User_id
(range key)
Tim 101 Tim 201 Ted 301 Keith 401 Keith 501 Keith 601
Last_name
(hash key) User_id
(range key)
White 101 Black 201 White 301 Brown 401 White 501 Black 601
Users
First_name_index Last_name_index
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Example 5: Model for Avoiding Hot Keys
Use multiple keys (aliases) instead of a
single hot key
Generate aliases by prefixing or suffixing
a known range (N)
Use BatchGetItem API to retrieve ticket
counts for all the aliases (1_Avatar,
2_Avatar, 3_Avatar,…, N_Avatar) and
sum them in your client application
41
MOVIES
MNAME (hash key)
1_Avatar TicketCount = 4,000,000
2_Avatar TicketCount = 2,000,000
3_Avatar TicketCount = 4,000,000
…. N_Avatar TicketCount =
4,000,000
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Tips for Minimizing Storage & Throughput Costs
Keep item size as small as possible
• Consider compressing attribute values and storing them as binary
• Keep attribute names succinct
Use the right storage service
• Example: keep Blobs in S3 and metadata in DynamoDB
Use overflow table for large items and do batch gets
Use table for time period for time series data
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Summary
Access Pattern Modeling enables applications to
scale with minimal overhead and cost
Use Case Access Patterns Data Design
DynamoDB enables 1:M relationships within tables
via support for hash+range primary keys
Local Secondary Indexes provide complex
query support without performance degradation
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.
Questions? http://aws.amazon.com/resources/databaseservices/webinars