aws webcast - understanding database options
DESCRIPTION
Power your apps with a secure, scalable and durable back end on Amazon Web Service. Whether you are looking to minimize your operational overhead or to maintain tight control, AWS has a spectrum of database options for you to choose the right architecture for your needs. Learn about your options and how to choose the right architecture for your apps.TRANSCRIPT
Database Options on AWS
Miles Ward, Solution Architect
Database Services on AWS
Relational databases on EC2
Amazon RDS Service
NoSQL databases on EC2
Amazon Managed NoSQL Services
In-Memory Database
Data Warehouses on EC2
Amazon Managed Data Warehouses
Oracle
MySQL Mongo Amazon SimpleDB
Amazon ElastiCache
Vertica Amazon RedShift
SQL Server SQL Server Riak Amazon DynamoDB
SAP Hana TeraData Hive on EMR
MySQL
Oracle Couchbase MemcacheD on EC2
Hive on EC2 Hbase on EMR
IBM DB2, Informix
Cassandra Hbase on EC2
……. ……… ………
Managed
Relational databases on EC2
Amazon RDS Service
NoSQL databases on EC2
Amazon Managed NoSQL Services
In-Memory Database
Data Warehouses on EC2
Amazon Managed Data Warehouses
Oracle
MySQL Mongo Amazon SimpleDB
Amazon ElastiCache
Vertica Amazon RedShift
SQL Server SQL Server Riak Amazon DynamoDB
SAP Hana TeraData Hive on EMR
MySQL
Oracle Couchbase MemcacheD on EC2
Hive on EC2 Hbase on EMR
IBM DB2, Informix
Cassandra Hbase on EC2
……. ……… ………
Database Services on AWS
SQL Deployment Options on EC2
Dev, Test and Production Environments Deploy Oracle software in minutes using AWS instances types
Use Oracle Database 11g and Oracle Enterprise Linux to build enterprise-grade solutions in the cloud
Free up CapEx budget No need to pre-allocate hardware budgets. Pay as you go. Amazon and Oracle provide businesses with a scalable, reliable, and cost-effective business application platform.
Better end-user experience Use Amazon Machine Images (AMIs) with pre-configured Oracle solutions
Full Oracle license portability Customers can use their existing licenses or buy new licenses
Certification AWS is the first supported cloud platform. Oracle certified and fully supports oracle products running on AWS
Complete Oracle Stack OVM, Oracle Linux, EMGC, Oracle Database, FMW, Enterprise Applications
Running Oracle on AWS
http://aws.amazon.com/oracle
Deploying Oracle on Amazon EC2
Step 1: Create an account at aws.amazon.com
Step 2: Login to the AWS Web Console
Step 3: Right-click on an Oracle AMI and click “Launch Instance”
Step 4: Right click on your EC2 instance to SSH into your server
Oracle has delivered a set of Amazon Machine Images (AMIs), to make it easy for customers to get started deploying Oracle solutions on Amazon EC2Have Oracle database up and running in few minutes…
Running Microsoft SQL Server on AWS
http://aws.amazon.com/windows/
Dev, Test and Production Environments Deploy SQL Server software in minutes using Amazon EC2 running Windows Server with SQL Server
Free up CapEx budget No need to pre-allocate hardware budgets. Pay as you go. Amazon and Microsoft provide businesses with a scalable, reliable, and cost-effective business application platform.
Better end-user experience Use Amazon Machine Images (AMIs) with pre-configured SQL Server solution and only pay for what you use
Pay by Hour Customers can launch SQL Server standard editions and pay by hour
Full License portability Customers can use their existing licenses or buy new licenses
Deploying SQL Server on Amazon EC2
Step 1: Create an account at aws.amazon.com
Step 2: Login to the AWS Web Console. Search for SQL Server 2008 AMI
Step 3: Right-click on any SQL Server AMI and click “Launch Instance”
Step 4: Right click on your EC2 instance to RDP into your server
Amazon has delivered a set of Amazon Machine Images (AMIs), to make it easy for customers to get started deploying SQL Server solutions on Amazon EC2. Have SQL Server database up and running in five minutes…
SQL Deployment Options on EC2 Managed
Amazon RDS
RDS is a fully managed Relational database service that is simple to deploy, easy to scale, reliable and cost-effective
Ease of Deployment
Choice of Database Engines and App. compatibility
Automated Backups and Disaster Recovery
Amazon Relational Database Service (RDS)
Microsoft SQL Server…
Monitoring and Auto Host Replacement
Pre-configured Parameters
Monitoring and Metrics
Automatic Software Patching
Replication – Multi AZ, Read Replicas
Isolation and Security
Pay by hour
Rapid deployment via Web Console
Operational DBA tasks & Amazon RDS
Install, upgrade, and migrations
Troubleshooting and corrective actions
Space and account management
Database monitoring and reporting
Performance and tuning
Backup and recovery
Capacity planning
Data load/unload and synchronization Source: http://www.forrester.com/Events/Content/0,5180,-1110,00.ppt
Focus on Applications – Convert Ops DBA resources to more productive Apps DBA resources
Distribution of time
Performance/trou
bleshoot
Security planning
Backup rec
load/unload 5%
25%
5%
5%
20%
40%
License/doc training
Scripting coding
Install, upgrade, patch,
migration
Data Durability – Backups and Recovery
• DB Snapshots User-driven snapshots of
database
Kept until explicitly deleted
• Automated Backups Nightly system snapshots +
transaction backup
Enables point-in-time restore to any point in retention period, up to the last 5 minutes
High Availability – Multi-AZ Deployments Enterprise-grade fault tolerance solution for production databases.
Through few clicks, Amazon RDS creates and synchronously maintains a standby in a different Availability Zone
Automatic failover in case of: Loss of availability in primary AZ
Loss of connectivity to primary
Host or storage failure on primary
Vertical Scaling
Software patching
Scalability – Read Replicas
A Read Replica is a copy of a specified DB Instance that can serve read traffic
Intended Use Cases
Read scaling, business reporting
Not intended as a fault tolerance substitute for Multi-AZ
Unlike Multi-AZ, uses native, asynchronous MySQL replication and replica can lag source
Read Replica can use Multi-AZ deployment as source
High Performance Relational Databases
Amazon RDS Configuration
Improve Availability
Increase Throughput
Reduce Latency
Push-Button Scaling
Multi AZ
Read Replicas
ElastiCache
Availability
Zone Region
Availability
Zone
Multi-AZ
ElastiCache
Read Replicas Push-Button Scaling
Oracle on RDS Licensing
• Oracle 11gR2 Database: EE, SE, SE1 editions
• Runs on Oracle VM with hard partitioning
• Several licensing options available – Use existing licenses from Oracle
– Purchase new Oracle Database licenses directly from Oracle or an Oracle partner
– License Included from Amazon
• Two pricing models for Amazon RDS – On-demand, hourly pricing
– Amazon RDS Reserved Instances
17
NoSQL Deployment Options on EC2
You can run NoSQL data store in the cloud on Amazon EC2 and Amazon EBS.
Free up CapEx budget Running your own NoSQL databases on Amazon EC2 and Amazon EBS gives you full control over your database without the burden of provisioning and installing hardware.
We recommend running non-relational databases on Amazon EC2 for customers who:
Want to exert complete administrative control over their NoSql server
Have in-house expertise in managing and scaling their own distributed database clusters
Implementation Best Practices MongoDB … http://d36cz9buwru1tt.cloudfront.net/AWS_NoSQL_MongoDB.pdf
Riak… http://media.amazonwebservices.com/AWS_NoSQL_Riak.pdf
CouchBase… http://media.amazonwebservices.com/AWS_NoSQL_Couchbase.pdf
More coming soon!
Running NoSQL Databases on AWS
One Volume: ~200 MongoOPS with some variability, <1mb/s Loaded instance: ~ 1000 MongoOPS with some variability <10mb/s
NoSQL Storage Options –
Testing random 4K reads
PIOPS
+
EBS
One Volume: 4000 MongoOPS with <1% variability, 6mb/s Loaded Instance: 40000 MongoOPS with <1% variability, 120mb/s
SSD Hi1.4xlarge ephemeral: ~64,000 MongoOPS with low variability, ~245mb/s
Stab
le
NoSQL Storage Options –
Testing random 4K reads
EBS
PIOPS
+ SSD
Amazon NoSQL Services Managed
Amazon SimpleDB
Amazon SimpleDB is a managed NoSQL database
service designed for smaller datasets
Zero Administration
Manage & Query Structured Data
10GB Storage per domain
AWS Identity Integration
Conditional Puts
Consistent or Eventually consistent read requests
Amazon SimpleDB Use Cases
Need Consistent Reads.
Scales automatically up or down in response to demand.
Require the highest availability and can’t tolerate downtime for data backup or software maintenance.
No need for complex transactions or joins.
Index store for data stored in S3.
Use for state management when using Spot instances along with EMR processes.
Can’t afford significant administrative burden managing their structured data
DynamoDB is a fully managed NoSQL database service that provides extremely fast and predictable performance with seamless scalability
Amazon DynamoDB
Zero Administration
Low Latency SSD’s
Reserved Capacity
Unlimited Potential Storage and Throughput
Reducing Risks
• Consistency – DynamoDB writes are always consistent
– Reads are consistent, or eventually consistent (default)
• Durability – All writes occur to disk, not memory
– A write is only acknowledged (committed) once it exists in at least two physical data centers
• Availability – Regional service
– spans multiple availability zones
– All data is continuously replicated to multiple AZ’s
Amazon DynamoDB Use Cases
No Administrative burden managing their structured data
Need Consistent Writes & Reads. Setup scaling needs based on application.
Require the highest availability and can’t tolerate downtime for data backup or software maintenance.
No need for complex transactions or joins.
Item size is 64K or less. Larger Items can be placed in Amazon S3 by storing just the pointer in the Amazon DynamoDB
Amazon SimpleDB and Amazon DynamoDB Amazon SimpleDB Amazon DynamoDB
Scalability
Highly Scalable Options available
Seamless scalability and fast, predictable
performance.
Size
10GB data limit in table/domains
There is no limit on the amount of data.
Capacity
Great fit for lower-scale
workloads that require query
flexibility.
Efficient throughput model to meet the
capacity levels that you need for your apps.
Amazon ElastiCache gives you access to the capabilities of a familiar Memcached compatible caching environment
Pre-configured Parameters
Automatic failure detection and recovery
Detailed Monitoring and Metrics
Automatic Software Patching
Push-Button Scaling
If your application already relies on Memcached, you can easily port it to take advantage of Amazon ElastiCache
Control access to your Cache Clusters through Cache Security Groups
Free up CapEx budget No need to pre-allocate hardware budgets. Pay as you go..
Usage & Pricing On Demand Instances
Reserved Cache Nodes – Light , Medium and Heavy Utilization instances
Running Amazon ElastiCache
Amazon ElastiCache Use Cases
Low administration. Simplifies and offloads the management, monitoring, and operation of in-memory cache environments
Significantly improve latency and throughput for many read-heavy application
Increase performance of I/O intensive applications
Easily Port applications that rely on Memcached
Data Warehouse Deployment Options on EC2
Data Warehouse Deployment Options on EC2
Amazon RedShift: Cloud Data Warehouse
A fast and powerful, petabyte-scale data warehouse that is:
A Lot Faster
A Lot Cheaper
A Lot Simpler
Amazon Redshift
Managed
Common Customer Use Cases
• Reduce costs by extending DW rather than
adding HW
• Migrate completely from existing DW systems
• Respond faster to business; provision in minutes
• Improve performance by an order of magnitude
• Make more data available for analysis
• Access business data via standard reporting tools
• Add analytic functionality to applications
• Scale DW capacity as demand grows
• Reduce HW & SW costs by an order of magnitude
Traditional Enterprise DW Companies with Big Data SaaS Companies
Amazon Redshift Customers
• 5x – 20x reduction in query times; 4x cost reduction over
HIVE
• 20x – 40x reduction in query times
• Nokia: 50% reduction in costs, 2x improvement in query
times
Amazon Redshift dramatically reduces I/O
• Column storage
• Data compression
• Zone maps
• Direct-attached storage
• Large data block sizes
• Use direct-attached storage to
maximize throughput
• Hardware optimized for high
performance data processing
• Large block sizes to make the
most of each read
• Amazon Redshift manages
durability for you
Amazon Redshift architecture
• Leader Node
– SQL endpoint
– Stores metadata
– Coordinates query execution
• Compute Nodes
– Local, columnar storage
– Execute queries in parallel
– Load, backup, restore via Amazon S3
– Parallel load from Amazon DynamoDB
• Single node version available
10 GigE (HPC)
Ingestion Backup Restore
JDBC/ODBC
Amazon Redshift lets you start small and grow big
Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores
Single Node (2 TB)
Cluster 2-32 Nodes (4 TB – 64 TB)
Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE
Cluster 2-100 Nodes (32 TB – 1.6 PB)
Note: Nodes not to scale
Monitor query performance
Point and click resize
Amazon Redshift provides multiple data loading options
• Work with a partner
• Upload to Amazon S3
• AWS Direct Connect
• AWS Import/Export
More coming soon…
Data Integration
Systems Integrators
Amazon Redshift works with your existing analysis tools
JDBC/ODBC
Amazon Redshift
More coming soon…
AWS Direct
Connect Amazon Virtual
Private Cloud (VPC)
Import / Export
Service
Dedicated connection between your datacenter and
AWS
Private VPN connection to your AWS resources
Move data into AWS using portable storage devices
Secure Options for Moving Data to and from the AWS Cloud
More Information
Amazon EC2
https://aws.amazon.com/ec2/
Amazon RDS http://aws.amazon.com/RDS
Amazon ElastiCache
http://aws.amazon.com/elasticache/
Amazon DynamoDB
https://aws.amazon.com/dynamodb/
Amazon SimpleDB
https://aws.amazon.com/simpledb/
Amazon RedShift
http://aws.amazon.com/redshift