Download - Automate Your Backups at Scale
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Automate Your Backups at Scale
Jeff GentileThe Common Application :: DevOps Engineer
Aaron ArmstrongThe Common Application :: Director of Engineering
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
3.6 millionApplication Forms
Counselor & Teacher
14 million
Recommendations
800,000Students
• Founded in 1975• Not-for-profit• 500+ colleges / universities in US & abroad
About the Organization
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
AWS at the Common App
• Run all prod and dev assets on AWS (nothing on-premises)• Use most “core” services: Amazon EC2, Amazon S3, Amazon
DynamoDB, Amazon RDS, Amazon VPC• Some app-specific services: Amazon SWF, Amazon SQS• Completing 2nd year of production operation in AWS• Continually improving efficiencies, controls, and costs
Implement Learn Refine
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Agenda
• The problem we faced• DevOps Ecosystem• Where we wanted to go as an organization / group• What options were available• The solution we implemented• Where we are now• How to leverage and other benefits• Questions
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
The Problem We Faced…
• Long-lived (static) vs. scale-out servers• Restorability
– Quick return to operation– Retrieval of lost data
• Use case example, FTP– 100s x GB data– Everyday use– Hundreds of users– Critical system
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
DevOps EcosystemEC2 Instance
Rundeck Job
Python Snapshot Script
Amazon EBS SnapshotsRundeck Job
Python Restore Script
Restored EC2 Instance
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Where We Wanted to Go…
Automated schedule Rapid restore to normal operation Manual and scripted restore options Add / remove servers over time Monitoring and oversight of backup process
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
What Options Were Available?
• Third-Party Software (e.g., Skeddly)– Quick implementation
• Homegrown Implementation– Growing expertise in Python and Boto– Didn’t want another system to manage / learn– Tailored for specific needs– No net new costs
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
The Solution We Implemented…
• Snapshots using the Python SDK, Boto• Instance and snapshot tagging• Snapshot removal based on tag• Push button restore process
– More soon…
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Solution continued… (Backup)
• Instance Tags• Snapshot Tags
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Solution continued… (Restore)
• Rundeck UI
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Solution continued… (Restore)
• How long will it take to restore?15:00?
10:00?
5:00?
1:35
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Inspect What We Expect…
• Weekly reporting via Rundeck showing snapshot status
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Where We Are…
Automated schedule Rapid restore to normal operation Manual and scripted restore options Add / remove servers over time Monitoring and oversight of backup process
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
How to Leverage…
• Figure out what your backup / restore strategy should be– Live snapshots– Stop instance, then snapshot– No snapshot, native application backups
• Determine which instances need regular backups– Create backup schedules for those instances
• Document the process!– Have a fresh set of eyes follow the steps for verification
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Other Benefits
• Copy snapshots / AMIs across regions / accounts– Load Testing– Disaster Recovery
• Incremental backup– First snapshot takes the longest– Subsequent backups have lower storage overhead (i.e., cost efficiency)
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Questions
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015
Thank You.This presentation will be loaded to SlideShare the week following the Symposium.
http://www.slideshare.net/AmazonWebServices
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015