amazon aurora - cloud object storagetrack/amazon+aurora.pdf · mysql read scaling • replicas must...
TRANSCRIPT
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
Amazon Aurora Relational databases reimagined.
Ronan Guilfoyle, Solutions Architect, AWS
Brian Scanlan, Engineer, Intercom
Current DB Architectures are Monolithic
Multiple layers of functionality all on a single box
SQL
Transactions
Caching
Logging
Current DB Architectures are Monolithic
Even when you scale it out, you’re still replicating the same stack
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
Application
Current DB Architectures are Monolithic
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
Application Even when you scale it out, you’re still replicating the same stack
Current DB Architectures are Monolithic
SQL
Transactions
Caching
Logging
SQL
Transactions
Caching
Logging
Storage
Application Even when you scale it out, you’re still replicating the same stack
This is a problem. For cost. For flexibility. And for availability.
Re-imagining the Relational Database
What if we were inventing the database today?
You wouldn’t design it the way we did in 1970. At least not entirely You’d build something scale-out, self-healing, that leverage existing AWS services
Relational databases reimagined for the cloud.
speed and availability of high-end commercial databases
simplicity and cost-effectiveness of open source databases
drop-in compatibility with MySQL
simple pay as you go pricing
Delivered as a managed service.
Amazon Aurora applying a Service-oriented architecture to the database • Moved the logging and storage layer
into a multi-tenant, scale-out database-optimized storage service
• Integrated with other AWS Services like EC2, VPC, DynamoDB, SWF, Route 53 for control plane operations
• Integrated with S3 for continuous backup and 99.999999999% durability
Logging + Storage
SQL
Transactions
Caching
Control Plane Data Plane
Amazon S3
DynamoDB
Amazon SWF
Amazon Route 53
Aurora Works with Your Existing Apps
An Established Ecosystem
Business Intelligence Data Integration Query & Monitoring SI & Consulting
“It is great to see Amazon Aurora remains MySQL compatible; we have found our connectors work with Aurora seamlessly. Today, customers can take our drivers and connect to Aurora, MariaDB or MySQL without worrying about compatibility. We look forward to working with the Aurora team in the future to further accelerate innovation within the MySQL ecosystem.” – Rasmus Johansson, VP Engineering
Amazon Aurora is Easy to Use
Aurora Makes it Easy to Run Your Databases
• Create a database in minutes
• Automatic patching
• Push-button scaling
• Failure detection and failover.
• Read Replica’s are available as failover targets, with no data loss
Amazon RDS
Aurora simplifies storage management
• Instant creation of user-snapshots • Continuous backups to S3 • Automatic storage scaling up to 64 TB -
no performance or availability impact • Automatic restriping, mirror repair, hot
spot management, encryption
Amazon RDS
Aurora simplifies Data Security
• Encryption to secure data at rest – AES-256; hardware accelerated – All blocks on disk and in Amazon S3 encrypted – Key management via AWS KMS
• SSL to secure data in transit
• Network isolation via Amazon VPC by default
• No direct access to nodes
• Supports industry standard security and data protection certifications
AZ 1 AZ 3
Primary Instance
Amazon S3
Replica Instance
Customer VPC
Internal VPC
MySQL App
AZ 2
Amazon Aurora is Highly Available
Aurora is Highly Available
• Highly available by default – 6-way replication across 3 AZs
– 4 of 6 write quorum • Automatic fallback to
3 of 4 if an AZ is unavailable
– 3 of 6 read quorum • SSD, scale-out, multi-tenant storage
– Seamless storage scalability
– Up to 64TB database size – Only pay for what you use
• Log-structured storage – Many small segments, each with
their own redo logs
– Log pages used to generate data pages – Eliminates chatter between database and storage
SQL Transaction
s
AZ 1 AZ 2 AZ 3
Caching
Amazon S3
Aurora Performs Consistent, Low Latency Writes
Improvements • Consistency - tolerance to outliers
• Latency - 2 phase commit vs. asynchronous replication
• Significantly more efficient use of network IO
AZ 1 AZ 2
Primary Instance
Standby Instance
EBS
Amazon S3
EBS mirror
EBS
EBS mirror
AZ 1 AZ 3
Primary Instance
Amazon S3
AZ 2
Replica Instance
Log records
Binlog
Data
Doublewrite buffer
FRM files, metadata
Type of writes
MySQL Multi-AZ with Standby Amazon Aurora
async 4/6 quorum
2 phase commit
PiTR
Sequential write
Sequential write Distributed
writes
Aurora Performs Consistent, Low Latency Writes
Improvements • Consistency - tolerance to outliers
• Latency - 2 phase commit vs. asynchronous replication
• Significantly more efficient use of network IO
AZ 1 AZ 2
Primary Instance
Standby Instance
EBS
Amazon S3
EBS mirror
EBS
EBS mirror
AZ 1 AZ 3
Primary Instance
Amazon S3
AZ 2
Replica Instance
Log records
Binlog
Data
Doublewrite buffer
FRM files, metadata
Type of writes
MySQL with Standby Amazon Aurora
async 4/6 quorum
2 phase commit
PiTR
Sequential write
Sequential write Distributed
writes
Self-healing and fault-tolerant
• Lose 2 copies or an AZ failure without read or write availability impact
• Lose 3 copies without read availability impact
• Automatic detection, replication and repair
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
SQL
Transaction
AZ 1 AZ 2 AZ 3
Caching
Read & Write Availability Read Availability
Traditional Databases • Have to replay logs since the
last checkpoint
• Single threaded in MySQL; requires a large number of disk accesses
Amazon Aurora • Underlying storage replays
redo records on demand as part of a disk read
• Parallel, distributed, asynchronous
Checkpointed Data Redo Log
Crash at T0 requires a re-application of the SQL in the redo log since last checkpoint
T0 T0
Crash at T0 will result in redo logs being applied to each segment on demand, in parallel, asynchronously
Aurora Has Instant Crash Recovery
Aurora’s Cache Survives a DB Restart
• We moved the cache out of the database process
• Cache remains warm in the event of a database restart
• Lets you resume fully loaded operations much faster
• Instant crash recovery + survivable cache = quick and easy recovery from DB failures
SQL Transactions
Caching
SQL
Transactions
Caching
SQL Transactions
Caching
Caching Process is Outside the DB process and remains warm across a database restart
Multiple failover targets, without data loss.
MySQL Read Scaling • Replicas must replay logs • Replicas place additional load on master • Replica lag can grow indefinitely • Failover results in data loss
Page cache invalidation
Aurora Master
30% Read
70% Write
Aurora Replica
100% New Reads
Shared Multi-AZ Storage
MySQL Master
30% Read
70% Write
MySQL Replica
30% New Reads
70% Write
Single threaded
binlog apply
Data Volume Data Volume
You Can Simulate Failures Using SQL
• To cause the failure of a component at the database node: ALTER SYSTEM CRASH [{INSTANCE | DISPATCHER | NODE}]
• To simulate the failure of disks: ALTER SYSTEM SIMULATE percent_failure DISK failure_type IN [DISK index | NODE index] FOR INTERVAL interval
• To simulate the failure of networking: ALTER SYSTEM SIMULATE percent_failure NETWORK failure_type [TO {ALL | read_replica | availability_zone}] FOR INTERVAL interval
Intermission…
Intercom
Intercom’s technology
- Largely monolithic Ruby on Rails application. Strong culture around Continuous Deployment and general DevOps best practices. Some adoption of SOA/Microservices.
- Built using MySQL and Ruby on Rails’ ActiveRecord ORM for rapid development. Unstructured customer data stored in MongoDB, however all messages in MySQL.
- Heavy use of AWS services and other SaaS services (New Relic, Code Climate, CodeShip, LogEntries).
- Custom infrastructure/code orchestration & deployment system.
Recent MySQL woes- Highly sensitive to MySQL performance and started experiencing regular inexplicable
performance degradation. Could not vertically scale our way out of the problem. Engaged RDS Support & MySQL consultants.
- Adjusted parameters e.g. lock wait timeouts, transaction read isolation levels and txn_flush_at_commit, etc. Instrumented and reduced number of long-running transactions in an attempt to reduce lock contention.
- Greatly increased the number of MySQL metrics being collected, built application level fingerprinting and automated data collection during outages. Got our hands dirty with MySQL’s performance schema. Reducing read throughput stabilised.
Why we’re interested in Aurora- While we bought ourselves time with improved caching, Aurora
gives us more options and a lot more vertical scaling opportunities. - Operational experiences of using read-replicas with RDS/MySQL
means we don’t trust them for customer facing queries. The eventual consistency guarantees of Aurora look good enough for our application to use for practically all read queries.
Enterprise grade features and performance at open source prices
Aurora Pricing
Simple pricing • No licenses • No lock-in • Pay only for what you use
Discounts • 44% with a 1 year RI • 63% with a 3 year RI
vCPU Mem Hourly Price
db.r3.large 2 15.25 $0.29
db.r3.xlarge 4 30.5 $0.58
db.r3.2xlarge 8 61 $1.16
db.r3.4xlarge 16 122 $2.32
db.r3.8xlarge 32 244 $4.64
• Storage consumed, up to 64TB, is $0.10/GB/month • IOs consumed are billed at $0.20 per million IO • Prices are for Virginia
Aurora – Enterprise Grade. Open Source Prices
• Expanding to unlimited preview
• Adding preview support for US West (Oregon) and EU (Ireland)
• Signup for preview access at: https://aws.amazon.com/rds/aurora/preview
• Full service launch in the coming months
LONDON