aws summit berlin 2013 - understanding database options on aws

Jan Borch - AWS Solutions Architect

Understanding Database Options on AWS

Jan Borch

#awssummit

Berlin

Berlin

We want to make it easy for you to start

1. Zero to Application in ____ Minutes

2. Zero to Millions of users in ____ Days

3. Zero to “Profits!” ASAP

AWS can help

Totally up to you!

Berlin

Spot the critical component!

Berlin

https://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systemshttp://nosql-database.org/

Berlin

Spectrum of options on AWS

SQL NoSQL

Low Cost High Cost

Do-it-yourself Fully Managed

Not available on AWS

Berlin


SQL NoSQL


Berlin

RDS- MySQL- Oracle- SQL Server

MySQLOracleSQL ServerPostgreSQLYour favorite RDBMS


SQL NoSQL


Berlin


SQL NoSQL

Do-it-yourself Fully ManagedMongoDBCassandraRedisMemcached…

Amazon DynamoDBAmazon ElastiCache

Berlin

Thinking about the questions

Should I use SQL or NoSQL?

Should I use MySQL on EC2 or RDS?

Should I use MongoDB,

Cassandra, or DynamoDB?

Should I use Redis, Memcached, or

ElastiCache?

?

Berlin

Actually, thinking about the right questions

What are my scale and latency needs?

What are my transactional and

consistency needs?

What are my read/write, storage

and IOPS needs?

What are my time to market and server

control needs?

?

Berlin

Focus on your application

Berlin

“I need root access to the instance to do some

custom configuration”

“My object persistence

framework does not support

Amazon DynamoDB”

“My team has a strongPostgreSQL expertise”

Option 1:Run your databases on EC2

Amazon Elastic Compute CloudAmazon EC2

Virtual core: 1Memory: 1.7 GiBI/O performance: Moderate

m1.small cc2.8xlarge

Virtual core: 32 - 2 x Intel XeonMemory: 60,5 GiBI/O performance: 10 Gbit

cr1.8xlarge

Virtual core: 32 - 2 x Intel XeonMemory: 240 GiBI/O performance: 10 GbitSSD Instance store: 240 GB

cr1.8xlarge

Virtual core: 16Memory: 60.5 GiBI/O performance: 10 GbitSSD Instance store: 2 x 1TB

cr1.8xlarge

Virtual core: 16Memory: 117 GiBI/O performance: 10 GbitInstance store: 24 x 2TB

Berlin

Berlin

Choose an Amazon Machine Image

Berlin

Leverage AWS services

EBS storage Volumes with EBS Snapshots

S3 for backups (for example Oracle RMAN)

Automation with AWS API or CloudFormation

Berlin

Option 2:Let AWS manage my databases

“I want to reduce the time

developers spend on database

administration tasks”“I need a database that is

simple to deploy and easy to scale”

Berlin

backup & recovery,data load & unload

performance tuning5%

25%

20%

40%

5% 5%

scripting & coding

securityplanning

install, upgrade, patch and migrate

documentation, licensing &

training

differentiated effort increases the

uniqueness of an application

Why Managed Databases?

Berlin

We believe in choiceOne size does not fit all

Traditional Apps

Relational DB Needs

High Performance, High Scale Data

Warehouses

New Web Apps

Massive Scalability

Amazon RDS

Amazon ElasticCache

Amazon DynamoeDB

Amazon Redshift

Berlin

Option 2.1:Managed SQL database

“I have a complex data model

a need integrity constraints”“My business apps only understands SQL”“I need complex transactions, joins, updates?”

Amazon Relational Database ServicesAmazonRDS

RDS is a fully managed relational database service that is simple to deploy, easy to scale, reliable and

cost-effective

Berlin

Choice of database options

Berlin

Rapid deployment via Web Console

Berlin

Backups and Recovery

Berlin

Push Button Scaling

Scale …• vertically up or down• Storage vertically

Berlin

Pricereduction

High Availability: Multi-AZ Deployments

Multi AZ price reductions ranging from 15% to 32%

Berlin

A few clicks or one API call

Berlin

Horizontal Scaling with Read Replicas

New Features

• Endpoint Renaming• ReadReplica

to master promotion

Berlin

A few clicks or one API call

Berlin

3TB

30 00

0 IOPS

High Performance RDS

Berlin

Security

Oracle Native Network Encryption and Transparent Data Encryption on Oracle EE

SSL support for SQL Server and mysql

Berlin

Amazon RDSConfiguration

ImproveAvailability

IncreaseThroughput

ReduceLatency

Push-Button Scaling

Multi-AZ

Read Replicas

Provisioned IOPS

Read ReplicasPush-Button ScalingProvisioned IOPS

Region

Multi-AZ

Availability Zone

Availability Zone

Availability and performance options

Berlin

Use case

Berlin

Who is succeeding with RDS?Thousands of developers use RDS every single day

Gaming Web Apps Mobile/Social Media

Amazon Elastic Cache

Amazon ElastiCache is a fully managed Memcached-compatible caching service

Berlin

Option 2.2:Managed noSQL database

“I have very low latency

requirements ”

“I do not require complex queries or transactions”“I need to scale (now, or in future)”

“I want to eliminate administrative costs”

Amazon DynamoDB

Amazon DynamoDB is a fully managed NoSQL database service

Berlin

Single digit millisecond latency.

Backed on solid-state drives.

Consistent, predictable performance

Berlin

No table size limits. Unlimited storage

No downtime.

Seamless scalability

Berlin

Consistent, disk only writes.

Replication across data centers and availability zones.

Durable

Berlin

Without the operational burden.

managed by DynamoDB

Berlin

Three click or on API call

Table name + Primary Key Level of throughput

Optional: Secondary local indexes

Berlin

Reserve IOPS for reads and writes.

Scale up for down at any time.

Provisioned throughput.

Berlin

Pay per capacity unit

READ

Capacity Units = Size of item (KB) x read per second

Consistent read:

$0.0065 for 50 read units

Eventually consistent reads:

$0.0065 for 100 read units

WRITE

Capacity Units = Size of item (KB) x write per

second

$0.0065 for 10 write units

Berlin

Reserved capacity

Up to 53% for 1 year reservation

Up to 76% for 3 year reservation

Berlin

Transactions

Item level transactions only

Puts, updates and deletes are ACID

Atomic increment and decrement

Conditional writes

Optimistic concurrency control

Berlin

Read Consistency

Strong or eventually consistent reads

Same latency expectations for strong

Mix and match at ‘read time’

Berlin

Data Modeling

Tables do not require a formal schema

Items are an arbitrarily sized hash.

Berlin

id = 100 date = 2012-05-16-09-00-10 total = 25.00

id = 101 date = 2012-05-15-15-00-11 total = 35.00

id = 101 date = 2012-05-16-12-00-10

total = 100.00

id = 102 date = 2012-03-20-18-23-10 total = 20.00

id = 102 date = 2012-03-20-18-23-10

total = 120.00

Data modelingTable

Berlin

id = 100 date = 2012-05-16-09-00-10 total = 25.00

id = 101 date = 2012-05-15-15-00-11 total = 35.00

id = 101 date = 2012-05-16-12-00-10

total = 100.00

id = 102 date = 2012-03-20-18-23-10 total = 20.00

id = 102 date = 2012-03-20-18-23-10

total = 120.00

Data modeling

Item

Berlin

id = 100 date = 2012-05-16-09-00-10 total = 25.00

id = 101 date = 2012-05-15-15-00-11 total = 35.00

id = 101 date = 2012-05-16-12-00-10

total = 100.00

id = 102 date = 2012-03-20-18-23-10 total = 20.00

id = 102 date = 2012-03-20-18-23-10

total = 120.00

Data modeling

Attributes

Berlin

Items are indexed by primary and secondary keys

Primary keys can be composite

Secondary keys are local to the table

Indexing

Berlin

ID Date Total

Indexing

Berlin

ID Date Total

Hash key

Indexing

Berlin

ID Date Total

Hash key Range key

Composite primary key

Indexing

Berlin

ID Date Total

Hash key Range key Secondary range key

Indexing

Berlin

Programming DynamoDB.Small but perfectly formed API.

CreateTable

UpdateTable

DeleteTable

DescribeTable

ListTables

PutItem

GetItem

UpdateItem

DeleteItem

BatchGetItem

BatchWriteItem

Query

Scan

Manage tables

Query specific items OR scan the full table

“Select”, “insert”, “update” items

Bulk select or update (max 1MB)

Berlin

Query patterns

Retrieve all items by hash key.

Range key conditions:

==, <, >, >=, <=, begins with, between.

Counts. Top and bottom n values.

Paged responses.

500,000 WRITES PER SECONDDURING SUPER BOWL

Berlin

Amazon DynamoDB: who is succeeding with it?

Berlin

Option 2.3:Managed datawarehouse database

“I need to query high volume of

data”

“I do primarily SQL analytic queries”

“I need high performance for my reporting queries”

Berlin

OLTP <-> OLAP

SELECT ProductID, Name

FROM Products

Where ProductID = 1234;

SELECT ProductID, count(*) FROM Page_Hits

WHERE hour in (12,13)

GROUP BY ProductID

Berlin

Transactional Processing

• Global context– Daily revenue report

• Throughput• Full table scans• Sequential IO• Disk Transfer rates

Analytical Processing

• Transactional context– Get order total

• Latency• Indexed access• Random IO• Disk Seek times

OLTP <-> OLAP

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service

Amazon Redshift

Berlin

Fast and powerful

Parallelize and Distribute Everything

Dramatically Reduce I/ODirect-attached storageLarge data block sizesColumn data storeData compressionZone maps

MPPLoadQueryResizeBackupRestore

Berlin

Fully Managed

Protect Operations

Simplify ProvisioningRedshift data is always encryptedContinuously backed up to S3Automatic node recoveryTransparent disk failure

Create a cluster in minutesAutomatic OS and software patching

Scale up to 1.6PB with a few clicks and no downtime

Berlin

Amazon Redshift architecture

10 GigE(HPC)

IngestionBackupRestore

SQL Clients/BI Tools

128GB RAM

16TB disk

16 cores

Amazon S3

JDBC/ODBC

128GB RAM

16TB disk

16 coresCompute Node

128GB RAM

16TB disk


128GB RAM

16TB disk


LeaderNode

Berlin

Focus on your application

Berlin

Best of both worlds: Use both SQL and NoSQL models in one app

Berlin

More on Amazon Redshift?

03:15pm to 03:45pm

Introducing the Amazon Redshift data warehouse

Room Zero

Speaker: Steffen Krause, Amazon

Thanks!

aws summit berlin 2013 - understanding database options on aws

Technology

gibio performance

managed sql database

xlargevirtual core

choice of database options

high performance rds

awssql nosqldo

aws api

servercontrol needs