deep dive on amazon dynamodb - aws online tech talks

61
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sean Shriver, AWS DynamoDB Solutions Architect November 2017 Deep Dive: Amazon DynamoDB

Upload: amazon-web-services

Post on 21-Jan-2018

408 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Sean Shriver, AWS DynamoDB Solutions Architect

November 2017

Deep Dive: Amazon DynamoDB

Page 2: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Dating Website Serverless IoT

o DAX

o GSIs

o TTL

o Streams

o DAX

Getting Started

o Developer Resources

Amazon DynamoDB

o Foundations

o Tables

o Indexes

o Partitioning

New Features

o TTL

o Tagging

o VPC Endpoints

o Auto Scaling

o DAX

Plan

Page 3: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Dynamo whitepaper

Page 4: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

NoSQL foundations

0000 {“Texas”}

0001 {“Illinois”}

0002 {“Oregon”}

T

XW

A

I

L

Key

Column

0000-0000-0000-0001

Game Heroes

Version 3.4

CRC ADE4

Key Value Graph Document Column-family

Dynamo:Amazon’s Highly Available

Key-value

Store

January 2012Fall 2007 June 2009

Meetup235 2nd St

San Francisco

Page 5: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Scaling relational vs. non-relational databases

Traditional SQL NoSQL

DB

DB

Scale up

DB

Host

1

DB

Host

n

DB

Host

2

DB

Host

3

Scale out to many shards

(DynamoDB: partitions)

Page 6: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Scaling NoSQL

- Good sharding (partitioning) scheme affords even

distribution of both data and workload, as they grow

- Key concept: partition key

- Ideal scaling conditions:

1. Partition key is from a high cardinality set (that grows)

2. Requests are evenly spread over the key space

3. Requests are evenly spread over time

Page 7: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

What (some) customers store in NoSQL DBs

Market Orders Tokenization

(PHI, Credit Cards)

Chat MessagesUser Profiles

(Mobile)

IoT Sensor Data

(& device status!)

File MetadataSocial Media Feeds

Page 8: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Use case: DataXu’s attribution engine

Meta

Amazon

EMR

JobAmazon

Cloud

Watch

DynamoDBAWS Data

Pipeline

3rd

Party

S3

Buckets

1st

Party

AWS Direct

Connect

Amazon

VPC

Amazon

EC2

Amazon

RDS

Amazon SNS

AWS IAM

“Attribution" is the marketing term for the allocation of credit to individual

advertisements that eventually lead to a desired outcome (e.g., purchase).

Page 9: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Use case: DataXu’s attribution engine

Meta

Amazon

EMR

JobAmazon

Cloud

Watch

DynamoDBAWS Data

Pipeline

3rd

Party

S3

Buckets

1st

Party

AWS Direct

Connect

Amazon

VPC

Amazon

EC2

Amazon

RDS

Amazon SNS

AWS IAM

“Attribution" is the marketing term for the allocation of credit to individual

advertisements that eventually lead to a desired outcame (e.g. purchase).

Page 10: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Technical challenges

Amazon EC2

InstancesEBS Volumes

CloudWatch

MetricsNotifications

Scaling new AZs,

new Regions

Page 11: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Highly available Consistent, single digit

millisecond latency

at any scale

Fully managed

Secure

Integrates with AWS Lambda,

Amazon Redshift, and more.

Amazon DynamoDB

Page 12: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Elastic is the new normal

Write Capacity Units

Read Capacity Units

Consum

ed C

apacity U

nits

>200% increase from baseline>300% increase from baseline

Time

Page 13: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Scaling high-velocity use cases with DynamoDB

Ad Tech Gaming MobileIoT Web

Page 14: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Partition Key

Mandatory

Key-value access pattern

Determines data distribution

Optional

Model 1:N relationships

Enables rich query capabilities

DynamoDB Table

A1

(partition key)

A2

(sort key)

A3 A4 A7

A1

(partition key)

A2

(sort key)

A6 A4 A5

A1

(partition key)

A2

(sort key)

A1

(partition key)

A2

(sort key)

A3 A4 A5

SortKey

Table

Items

Page 15: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

10 GB max per

partition key, i.e.

LSIs limit the # of

sort keys!

A1

(partition key)

A3

(sort key)

A2 A4 A5

A1

(partition key)

A4

(sort key)

A2 A3 A5

A1

(partition key)

A5

(sort key)

A2 A3 A4

• Alternate sort key

attribute

• Index is local to a

partition key

Local Secondary Indexes

Page 16: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

RCUs/WCUs

provisioned

separately for GSIs

INCLUDE A2

A

L

L

KEYS_ONLY

A3

(partition key)

A1

(table key)

A2 A4 A7

A3

(partition key)

A1

(table key)

A3

(partition key)

A1

(table key)

A2

• Alternate partition

(+sort) key

• Index is across all

table partition keys

• Can be added or

removed anytime

A3

(partition key)

A1

(table key)

A2 A4 A7

A3

(partition key)

A1

(table key)

A2

A3

(partition key)

A1

(table key)

Global Secondary Indexes

Page 17: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Data types

Type DynamoDB Type

String String

Integer, Float Number

Timestamp Number or String

Blob Binary

Boolean Bool

Null Null

List List

SetSet of String, Number,

or Binary

Map Map

Page 18: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Table creation options

PartitionKey, Type:

SortKey, Type:

Provisioned Reads:

Provisioned Writes:

LSI Schema GSI Schema

AttributeName [S,N,B]

AttributeName [S,N,B]

1+

1+

Provisioned Reads: 1+

Provisioned Writes: 1+

TableName

Op

tion

al

Re

qu

ired

CreateTable

String,

Number,

Binary ONLY

Per Second

Unique to

Account and

Region

Optionally:

TTL and Auto Scaling

Page 19: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Provisioned capacity

Provisioned capacity

Read Capacity Unit (RCU)1 RCU returns 4KB of data for strongly

consistent reads, or double the data

at the same cost for eventually

consistent reads

Capacity is per second, rounded up to the next

whole number

Write Capacity Unit (WCU)1 WCU writes 1KB of data, and each

item consumes 1 WCU minimum

Page 20: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Horizontal Sharding

Host 1 Host 99 Host n

~Each new host brings compute, storage and network bandwidth~

CustomerOrdersTable

Page 21: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Cu

sto

merO

rders

Tab

le

OrderId: 1

CustomerId: 1

ASIN: [B00X4WHP5E]

Partitioning

00

55

AA

FF

Hash(1) = 7B

CustomerOrdersTable

OrderId: 2

CustomerId: 4

ASIN: [B00OQVZDJM]

OrderId: 3

CustomerId: 3

ASIN: [B00U3FPN4U]

Partition A

33.33 % Keyspace

33.33 % Provisioned Capacity

Partition B

33.33 % Keyspace

33.33 % Provisioned Capacity

Partition C

33.33 % Keyspace

33.33 % Provisioned Capacity

Hash.MIN = 0

Hash.MAX = FF

Ke

ysp

ace

Hash(2) = 48

Hash(3) = CD

Page 22: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Cu

sto

merO

rders

Tab

le

00

55

AA

FF

Partition A

33.33 % Keyspace

33.33 % Provisioned Capacity

Partition B

33.33 % Keyspace

33.33 % Provisioned Capacity

Partition C

33.33 % Keyspace

33.33 % Provisioned Capacity

Hash.MIN = 0

Hash.MAX = FF

Ke

ysp

ace

Time

Partition A

33.33 % Keyspace

33.33 % Provisioned Capacity

Partition B

33.33 % Keyspace

33.33 % Provisioned Capacity

Partition D

Partition E

16.66 %

16.66 %

16.66 %

16.66 %

Partition split due to partition size00

55

AA

FF

Partition A

33.33 % Keyspace

33.33 % Provisioned Capacity

Partition B

33.33 % Keyspace

33.33 % Provisioned Capacity

Partition C

33.33 % Keyspace

33.33 % Provisioned Capacity

Time

Partition A

Partition C

16.66 %

16.66 %

16.66 %

16.66 %

Partition splits due to capacity increase

16.66 %

16.66 %

16.66 %

16.66 %

16.66 %

16.66 %

16.66 %

16.66 %

Partition B

Partition D

Partition E

Partition F

The desired size of a partition

is 10GB* and when a

partition surpasses this it can

split*=subject to change

Split for partition size

The desired capacity of a

partition is expressed as:

3w + 1r < 3000 *

Where w = WCU & r = RCU*=subject to change

Split for provisioned capacity

Partitioning

Page 23: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Partition A

1000 RCUs

100 WCUs

Partition C

1000 RCUs

100 WCUs

Host A Host C

Availability Zone A

Partition A

1000 RCUs

100 WCUs

Partition C

1000 RCUs

100 WCUs

Host E Host G

Availability Zone B

Partition A

1000 RCUs

100 WCUs

Partition C

1000 RCUs

100 WCUs

Host H Host J

Availability Zone C

CustomerOrdersTable

54:∞00:0 54:∞00:0 54:∞00:0FF:∞AA:0 FF:∞AA:0 FF:∞AA:0

Data is replicated to

three Availability Zones

by design

3-way replication

OrderId: 1

CustomerId: 1

ASIN: [B00X4WHP5E]

Hash(1) = 7B

Partition B

1000 RCUs

100 WCUs

Host B Host F Host I

Partition B

1000 RCUs

100 WCUs

Partition B

1000 RCUs

100 WCUs

A9:∞55:0 A9:∞55:0 A9:∞55:0

Partitioning

Page 24: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

DynamoDB Streams

Partition A

Partition B

Partition C

Ordered stream of item

changes

Exactly once, strictly

ordered by key

Highly durable, scalable

24 hour retention

Sub-second latency

Compatible with Kinesis

Client Library

DynamoDB Streams

1

Shards have a lineage and

automatically close after time or

when the associated DynamoDB

partition splits

2

3Updates

KCL

Worker

Amazon

Kinesis Client

Library

Application

KCL

Worker

KCL

Worker

GetRecords

Amazon DynamoDB

TableDynamoDB Streams

Stream

Shards

Page 25: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

TTL

job

Time-To-Live (TTL)

Amazon DynamoDB

Table

CustomerActiveOrder

OrderId: 1

CustomerId: 1

MyTTL:

1492641900

DynamoDB Streams

Stream

Amazon Kinesis

Amazon Redshift

An epoch timestamp marking when an

item can be deleted by a background

process, without consuming any

provisioned capacity

Time-To-Live

Removes data that is no longer relevant

Page 26: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Time-To-Live (TTL)

TTL items

identifiable in

DynamoDB Streams

Configuration protected by AWS

Identity and Access Management

(IAM), auditable with AWS

CloudTrail

Eventual deletion,

free to use

Page 27: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Cost allocation tagging

• Track costs: AWS bills broken down by tags

in detailed monthly bills and Cost Explorer

• Flexible: Add customizable tags to tables,

indexes and DAX clusters

Features

Key Benefits

• Transparency: know exactly how much

your DynamoDB resources cost

• Consistent: report of spend across AWS

services

Page 28: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

DynamoDB Auto Scaling

Specify: 1) Target capacity in percent 2) Upper and lower bound

Page 29: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

DynamoDB Auto Scaling

Page 30: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Amazon

DynamoDB

msμs

DAXYour App

in VPC

Amazon DynamoDB Accelerator (DAX)

Page 31: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

DynamoDB in the VPC

Availability Zone #1 Availability Zone #2

Private Subnet Private Subnet

VPC endpoint

web

app

server

security

groupsecurity

group

oMicroseconds latency in-memory cache

oMillions of requests per second

oFully managed, highly available

oRole based access control

oNo IGW or VPC endpoint required

DAX

oDynamoDB-in-the-VPC

o IAM resource policy

restricted

VPC Endpoints

AWS Lambda

security

groupsecurity

group

DAX

web

app

server

DAX

Page 32: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

DynamoDB Accelerator (DAX)

Private IP, Client-side

Discovery

Supports AWS Java and NodeJS SDK,

with more AWS SDKs to come

Cluster based, Multi-AZ Separate Query and

Item cache

Page 33: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Elements of even access in NoSQL

1) Time

2) SPACE

Page 34: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

DynamoDB key choice

To get the most out of DynamoDB throughput, create tables where the

partition key has a large number of distinct values, and values are requested

fairly uniformly, as randomly as possible.

Amazon DynamoDB Developer Guide

Page 35: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Burst capacity is built-in

0

400

800

1200

1600

Ca

pa

cit

y U

nit

s

Time

Provisioned Consumed

“Save up” unused capacity

Consume saved up capacity

Burst: 300 seconds

(1200 × 300 = 360k CU)

DynamoDB “saves” 300

seconds of unused capacity

per partition

Page 36: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Burst capacity may not be sufficient

0

400

800

1200

1600

Cap

ac

ity U

nit

s

Time

Provisioned Consumed Attempted

Throttled requests

Don’t completely depend on burst capacity… provision sufficient throughput

Burst: 300 seconds

(1200 × 300 = 360k CU)

Page 37: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Hot shards

Host 1 Host 2 Host 3

Hot key!

AZ A AZ B AZ C

Page 38: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Throttling

Occurs if sustained throughput goes beyond provisioned throughput per partition

• Possible causes

• Non-uniform workloads

• Hot keys/hot partitions

• Very large items

• Mixing hot data with cold data

• Remedy: Use TTL or a table per time period

- Disable retries, write your own retry code, and log all throttled or returned

keys

Page 39: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Design Patterns

Page 40: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Online dating website running on AWS

Users have people they like, and conversely

people who like them

Hourly batch job matches users

Data stored in Likes and Matches tables

Dating Website

DESIGN PATTERNS:

DynamoDB Accelerator and GSIs

Page 41: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Schema Design Part 1

GSI_Otheruser_id_other

(Partition key)

user_id_self

(sort key)

Requirements:

1. Get all people I like

2. Get all people that like me

3. Expire likes after 90 days

LIKES|

Likesuser_id_self

(Partition key)

user_id_other

(sort key)

MyTTL

(TTL attribute)

… Attribute N

Page 42: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Schema Design Part 2

Matchesevent_id

(Partition key)

timestamp

(sort key)

UserIdLeft

(GSI left)

UserIdRight

(GSI right)

Attribute N

GSI LeftUserIdLeft

(Partition key)

event_id

(Table key)

timestamp

(Table Key)

UserIdRight

GSI RightUserIdRight

(Partition key)

event_id

(Table key)

timestamp

(Table Key)

UserIdLeft

Requirements:

1.Get my matchesMATCHES|

Table Keys

Page 43: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Matchmaking

LIKES

Requirements:

1.Get all new likes every hour

2.For each like, get the other user’s likes

3.Store matches in matches table

Partition 1

Partition …

Partition NAvailability Zone

Public Subnet

match

making

server

security group

Auto Scaling group

Page 44: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Matchmaking

LIKES

Requirements:

1.Get all new likes every hour

2.For each like, get the other user’s likes

3.Store matches in matches table

Partition 1

Partition …

Partition NAvailability Zone

Public Subnet

match

making

server

security group

Auto Scaling group

THROTTLE!

Page 45: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Matchmaking Requirements:

1.Get all new likes every hour

2.For each like, get the other user’s likes

3.Store matches in matches table

1.Key choice: High key cardinality

2.Uniform access: access is evenly spread over the key-space

3.Time: requests arrive evenly spaced in time

Even Access:

Page 46: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Matchmaking

LIKES

Requirements:

1.Get all new likes every hour

2.For each like, get the other user’s likes

3.Store matches in matches table

Partition 1

Partition …

Partition NAvailability Zone

Public Subnet

match

making

server

security group

Auto Scaling group

0. Write like to like table, then query by user id to warm

cache, then queue for batch processing

security group

DAX

Page 47: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Takeaways:

Keep DAX warm by querying after writing

Use GSIs for many to many relationships

Dating Website

DESIGN PATTERNS:

DynamoDB Accelerator and GSIs

Page 48: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Amazon DynamoDB

DESIGN PATTERNS:

TTL, DynamoDB Streams, and DAX

Single DynamoDB table for storing sensor data

Tiered storage to remove archive old events to S3

Data stored in data table

Serverless IoT

Page 49: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Schema Design

DataDeviceId

(Partition key)

EventEpoch

(sort key)

MyTTL

(TTL attribute)

… Attribute N

Requirements:

1.Get all events for a device

2.Archive old events after 90 daysDATA|

UserDevicesUserId

(Partition key)

DeviceId

(sort key)

Attribute 1 … Attribute N

Requirements:

1.Get all devices for a userUSERDEVICES|

References

Page 50: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

DATA

DeviceId: 1

EventEpoch: 1492641900

MyTTL: 1492736400 Expiry

AWS Lambda

Amazon S3

Bucket

Amazon DynamoDB Amazon DynamoDB Streams

Single DynamoDB table for storing sensor data

Tiered storage to remove archive old events to S3

Data stored in data table

USERDEVICES

Serverless

IoT

Page 51: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Serverless IoT

DATA

Partition A Partition B Partition DPartition C

Throttling

Noisy sensor produces data at

a rate several times greater

than others

Page 52: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Data

00

3F

BF

FF

Partition A25.0 % Keyspace

25.0 % Provisioned Capacity

Partition B25.0 % Keyspace25.0 % Provisioned Capacity

Partition D

25.0 % Keyspace25.0 % Provisioned Capacity

Hash.MIN = 0

Hash.MAX = FF

Ke

ysp

ace

Partition C25.0 % Keyspace25.0 % Provisioned Capacity

7F

Page 53: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Data

00

3F

BF

FF

Partition A25.0 % Keyspace

25.0 % Provisioned Capacity

Partition B25.0 % Keyspace25.0 % Provisioned Capacity

Partition D

25.0 % Keyspace25.0 % Provisioned Capacity

Hash.MIN = 0

Hash.MAX = FF

Ke

ysp

ace

Partition C25.0 % Keyspace25.0 % Provisioned Capacity

7F1.Key choice: High key cardinality2.Uniform access: access is evenly spread

over the key-space3.Time: requests arrive evenly spaced in

time

Even Access:

Page 54: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Serverless IoT

Requirements:

1. Single DynamoDB table for storing sensor data

2. Tiered storage to remove archive old events to

S3

3. Data stored in data table

0. Capable of dynamically sharding to overcome

throttling

Page 55: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Schema Design

ShardDeviceId

(Partition key)

ShardCount

Requirements:

1. Get shard count for given device

2. Always grow the count of shardsSHARD|

Requirements:

1. Get all events for a device

2. Archive old events after 90 daysData |

DataDeviceId

(Partition key)

EventEpoch

(sort key)

MyTTL

(TTL attribute)

… Attribute N

A sharding scheme where the number of

shards is not predefined, and will grow over

time but never contract. Contrast with a fixed

shard count

Naïve Sharding

Range: 0..1,000

Page 56: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

DATA

DeviceId_ShardId: 1_3

EventEpoch: 1492641900

MyTTL: 1492736400

SHAR

DDeviceId: 1

ShardCount: 10

1.

2.

Serverless IoT: Naïve Sharding

Request path:

1.Read ShardCount from Shard table

2.Write to a random shard

3.If throttled, review shard count

Expiry

Page 57: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Serverless IoT

DATA

Partition A Partition B Partition DPartition C

Pick a random shard to write data to

DeviceId_ShardId: 1_Rand(0,10)

EventEpoch: 1492641900

MyTTL: 1492736400

2.

?

SHAR

DDeviceId: 1

ShardCount: 10

1.

Page 58: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

DATA

DeviceId: 1

EventEpoch: 1492641900

MyTTL: 1492736400 Expiry

AWS Lambda

Amazon S3

Bucket

Amazon DynamoDB

Streams

Single DynamoDB table for storing sensor data

Tiered storage to remove archive old events to S3

Data stored in data table

Capable of dynamically sharding to overcome throttling

USERDEVICES

Serverless

IoT

SHAR

DDeviceId: 1

ShardCount: 10

DAX

+

Amazon Kinesis

Firehose

Page 59: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

DESIGN PATTERNS:

TTL, DynamoDB Streams, and DAX

Takeaways:

Use naïve write sharding to dynamically expand shards

Use DAX for hot reads, especially from Lambda

Use TTL to create tiered storage

Serverless IoT

Page 60: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Getting started?

DynamoDB Local Document SDKsDynamoDB

Developer Resources

https://aws.amazon.com/dynamodb/developer-resources/

Page 61: Deep Dive on Amazon DynamoDB - AWS Online Tech Talks

Thank you!