automate your backups at scale

18
AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015 ©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Automate Your Backups at Scale Jeff Gentile The Common Application :: DevOps Engineer Aaron Armstrong The Common Application :: Director of Engineering

Upload: amazon-web-services

Post on 14-Aug-2015

184 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Automate Your Backups at Scale

Jeff GentileThe Common Application :: DevOps Engineer

Aaron ArmstrongThe Common Application :: Director of Engineering

Page 2: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

3.6 millionApplication Forms

Counselor & Teacher

14 million

Recommendations

800,000Students

• Founded in 1975• Not-for-profit• 500+ colleges / universities in US & abroad

About the Organization

Page 3: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS at the Common App

• Run all prod and dev assets on AWS (nothing on-premises)• Use most “core” services: Amazon EC2, Amazon S3, Amazon

DynamoDB, Amazon RDS, Amazon VPC• Some app-specific services: Amazon SWF, Amazon SQS• Completing 2nd year of production operation in AWS• Continually improving efficiencies, controls, and costs

Implement Learn Refine

Page 4: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Agenda

• The problem we faced• DevOps Ecosystem• Where we wanted to go as an organization / group• What options were available• The solution we implemented• Where we are now• How to leverage and other benefits• Questions

Page 5: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Problem We Faced…

• Long-lived (static) vs. scale-out servers• Restorability

– Quick return to operation– Retrieval of lost data

• Use case example, FTP– 100s x GB data– Everyday use– Hundreds of users– Critical system

Page 6: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

DevOps EcosystemEC2 Instance

Rundeck Job

Python Snapshot Script

Amazon EBS SnapshotsRundeck Job

Python Restore Script

Restored EC2 Instance

Page 7: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Where We Wanted to Go…

Automated schedule Rapid restore to normal operation Manual and scripted restore options Add / remove servers over time Monitoring and oversight of backup process

Page 8: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

What Options Were Available?

• Third-Party Software (e.g., Skeddly)– Quick implementation

• Homegrown Implementation– Growing expertise in Python and Boto– Didn’t want another system to manage / learn– Tailored for specific needs– No net new costs

Page 9: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Solution We Implemented…

• Snapshots using the Python SDK, Boto• Instance and snapshot tagging• Snapshot removal based on tag• Push button restore process

– More soon…

Page 10: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Solution continued… (Backup)

• Instance Tags• Snapshot Tags

Page 11: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Solution continued… (Restore)

• Rundeck UI

Page 12: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Solution continued… (Restore)

• How long will it take to restore?15:00?

10:00?

5:00?

1:35

Page 13: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Inspect What We Expect…

• Weekly reporting via Rundeck showing snapshot status

Page 14: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Where We Are…

Automated schedule Rapid restore to normal operation Manual and scripted restore options Add / remove servers over time Monitoring and oversight of backup process

Page 15: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

How to Leverage…

• Figure out what your backup / restore strategy should be– Live snapshots– Stop instance, then snapshot– No snapshot, native application backups

• Determine which instances need regular backups– Create backup schedules for those instances

• Document the process!– Have a fresh set of eyes follow the steps for verification

Page 16: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Other Benefits

• Copy snapshots / AMIs across regions / accounts– Load Testing– Disaster Recovery

• Incremental backup– First snapshot takes the longest– Subsequent backups have lower storage overhead (i.e., cost efficiency)

Page 17: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Questions

Page 18: Automate Your Backups at Scale

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Thank You.This presentation will be loaded to SlideShare the week following the Symposium.

http://www.slideshare.net/AmazonWebServices

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015