disaster recovery with the aws cloud
DESCRIPTION
TRANSCRIPT
Disaster Recovery with the AWS Cloud
Paul Duffy
Some Context
What you told us…“DR ‘top of the list’ for budgeted cloud initiatives in 2012”
Source: AWS Customer Survey, 2011`
Why Disaster Recovery on AWS..?
Low cost
Instant Elasticity
Open & Flexible
Secure
AWS encourages some new HA/DR thinking…
Amazon S3Bucket
AZ-1
Region
Elastic LoadBalancer
Amazon SNSNotifications
Auto Scaling Group
Web Server
AZ-2
Amazon EC2Instances
WebServer
DR on AWS does not mean “rearchitect”
Existing, on-premises Infrastructure
This talk is about using AWS for DR for your datacenters
Spot the difference!
Production:RoutersFirewallsIP NetworkApplication LicensesOperating SystemsHypervisorServersStorage NetworkPrimary StorageBackup SWBackup TapesTape SilosArchive SWArchive Storage
DR Site (Datacenter):Routers
Firewalls
IP Network
Application Licenses
Operating Systems
Hypervisor
Servers
Storage Network
Snapshot Storage
Backup SW
Backup Tapes
Tape Silos
Archive SW
Archive Storage
DR Site (AWS):Routers
Firewalls
IP Network
Application Licenses
Operating Systems
Hypervisor
Servers
Storage Network
Snapshot Storage
Backup SW
Backup Tapes
Tape Silos
Archive SW
Archive Storage
A different cost model
2nd Site Cost
AWS Cost
Demand
Cost savings w/ AWS
Ability to scale – no arbitrary time limit to
failback
Time
Infr
astr
uct
ure
Co
st
Test Test Failover Failback
How to do DR with AWS..?
AWS - Flexible, Global Infrastructure
AWS Regions
AWS Edge Locations
AWS Security Standards
Certifications
SOC 1 Type 2 (formerly SAS-70)
ISO 27001
PCI DSS for EC2, S3, EBS, VPC, RDS, ELB, IAM
FISMA Moderate Compliant Controls
HIPAA & ITAR Compliant Architecture
Physical Security
Datacenters in nondescript facilities
Physical access strictly controlled
Must pass two-factor authentication at least twice for floor access
Physical access logged and audited
HW, SW, Network
Systematic change management
Phased updates deployment
Safe storage decommission
Automated monitoring and self-audit
Advanced network protection
Which AWS Services for DR..?
Amazon Simple Storage Service (Amazon S3)
Amazon Import/Export
Amazon Elastic Compute Cloud (Amazon EC2)
AWS Storage Gateway
Amazon Route 53
Scenarios
Disaster Recovery Terms
RTO: Recovery Time Objective• Acceptable time period within which normal operation (or
degraded operation) needs to be restored after event
RPO: Recovery Point Objective• Acceptable data loss measured in time
Backup and Restore
On-premises Infrastructure
Traditional server
Amazon Route 53
AWS Import/Export
S3 Bucket with Objects
Data copied to S3
Backup and Restore – Storage Gateway
Backup and Restore
Availability ZoneAWS Region
Data Volume
Amazon EC2Instance
AMI
Amazon S3 Bucket
Data copied from objects in S3
Instance Quickly provisioned from
AMI
Pre-bundled with OS and
applications
Backup and Restore
Advantages• Simple to get started• Extremely cost effective (mostly backup storage)
Preparation Phase• Take backups of current systems• Store backups in S3• Describe procedure to restore from backup on AWS
• Know which AMI to use, build your own as needed• Know how to restore system from backups• Know how to switch to new system• Know how to configure the deployment
Backup and Restore – Storage Gateway
Advantages• Simple to get started• Extremely cost effective (mostly backup storage)
Preparation Phase• Download AWS Storage Gateway software appliance• Install and configure Storage Gateway• Use Storage Gateway• Describe procedure to restore from backup on AWS
• Know which AMI to use, build your own as needed• Know how to switch to new system• Know how to configure the deployment
Backup and Restore
In Case of Disaster• Retrieve backups from S3• Bring up required infrastructure
• EC2 instances with prepared AMIs, Load Balancing, etc.
• Restore system from backup• Switch over to the new system
• Adjust DNS records to point to AWS
Objectives• RTO: as long as it takes to bring up infrastructure and restore system from
backups• RPO: time since last backup
Pilot Light
User or system
WebServer
ApplicationServer
DatabaseServer
Data Volume
Web Server
ApplicationServer
DatabaseServer
Data Volume
Data Mirroring/ Replication
Not Running
Smaller Instance
Amazon Route 53
Pilot Light
User or system
WebServer
DatabaseServer
Data Volume
Web Server
ApplicationServer
DatabaseServer
Data Volume
Not Running
Smaller Instance
Amazon Route 53
WebServer
ApplicationServer
DatabaseServer Data Mirroring/
Replication
ApplicationServer
Web Server
Pilot Light
User or system
WebServer
DatabaseServer
Data Volume
DatabaseServer
Data Volume
Start in minutes
Resize as desired
Amazon Route 53
WebServer
ApplicationServer
DatabaseServer Data Mirroring/
Replication
Pilot Light
Advantages• Very cost effective (fewer 24/7 resources)
Preparation Phase• Enable replication of all critical data to AWS• Prepare all required resources for automatic start
• AMIs, Network Settings, Load Balancing, etc.
• Reserved Instances
Pilot Light
In Case of Disaster• Automatically bring up resources around the replicated core data
set• Scale the system as needed to handle current production traffic• Switch over to the new system
• Adjust DNS records to point to AWS
Objectives• RTO: as long as it takes to detect need for DR and automatically
scale up replacement system• RPO: depends on replication type
WebServer
Fully-Working Low Capacity Standby
User or system
Data Volume
Data Volume
Data Mirroring/ Replication
Low CapacityAmazon Route 53
WebServer
AppServer
DBServer
DatabaseServer
ApplicationServer
Fully-Working Low Capacity Standby
User or system
Data Volume
Data Volume
Low CapacityAmazon Route 53
WebServer
AppServer
DBServerData Mirroring/
Replication
WebServer
DatabaseServer
ApplicationServer
Fully-Working Low Capacity Standby
User or system
Data Volume
AppServer
DBServer
Data Volume
Grow CapacityAmazon Route 53
WebServer
Web Server
ApplicationServer
DatabaseServer
WebServer
DatabaseServer
ApplicationServer
Data Mirroring/ Replication
Fully-Working Low Capacity Standby
User or system
Data Volume
AppServer
DBServer
Data Volume
Grow CapacityAmazon Route 53
WebServer
Web Server
ApplicationServer
DatabaseServer
WebServer
DatabaseServer
ApplicationServer
Data Mirroring/ Replication
Fully-Working Low-Capacity Standby
Advantages• Can take some production traffic at any time• Cost savings (IT footprint smaller than full DR)
Preparation• Similar to Pilot Light• All necessary components running 24/7, but not scaled for
production traffic• Best practice – continuous testing
• “Trickle” a statistical subset of production traffic to DR site
Fully-Working Low-Capacity Standby
In Case of Disaster• Immediately fail over most critical production load
• Adjust DNS records to point to AWS
• (Auto) Scale the system further to handle all production load
Objectives• RTO: for critical load: as long as it takes to fail over; for all other
load, as long as it takes to scale further• RPO: depends on replication type
Multi-Site Hot Standby
User or system
Data Volume
AppServer
DBServer
Data Volume
Data Mirroring/ Replication
Full CapacityAmazon Route 53
WebServer
ApplicationServer
DatabaseServer
Web Server
ApplicationServer
DatabaseServer
Web Server
ApplicationServer
DatabaseServer
Multi-Site Hot Standby
Advantages• At any moment can take all production load
Preparation• Similar to Low-Capacity Standby• Fully scaling in/out with production load
In Case of Disaster• Immediately fail over all production load
• Adjust DNS records to point to AWS
Objectives• RTO: as long as it takes fail over• RPO: depends on replication type
Please test..!
Keith MeadeSenior Account Director, Partner
What’s Up Interactive@keithmeade
About What’s Up Interactive
True Digital Agency – Marketing, Creative, Technology
Atlanta, GA
25 employees
Clients: FOX, AT&T, DS Waters, Georgia Lottery, Georgia Aquarium
Growth Businesses (Building a Web Presence)
Our Experience
April 2011 – Implemented new backup and offsite storage solution utilizing S3
August 2011 – Backup of select sites & systems in EC2
Rolling Out this Year
Pilot light and Low Capacity solutions for all our clients
Savings of $2,500/month via Pilot Light
What We Learned
Try It• Get a good understanding of capabilities and potential• Get a handle on costs
Test It, and Test Again• Test systems and data flow to ensure replication is working
correctly
Realize Its Potential• Quickly scalable to add resources to production
Next steps…
Resources
http://aws.amazon.com/disaster-recovery/• Whitepapers• Customer Videos• Case Studies
AWS Storage Solution Provider Partners
Call to action
Learn more about our DR resources
Evaluate using AWS for a DR project
Start testing – first steps are simple
Give us feedback
Thank you!
Come and talk to us at the booth!