automate your backups at scale

Post on 14-Aug-2015

184 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Automate Your Backups at Scale

Jeff GentileThe Common Application :: DevOps Engineer

Aaron ArmstrongThe Common Application :: Director of Engineering

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

3.6 millionApplication Forms

Counselor & Teacher

14 million

Recommendations

800,000Students

• Founded in 1975• Not-for-profit• 500+ colleges / universities in US & abroad

About the Organization

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

AWS at the Common App

• Run all prod and dev assets on AWS (nothing on-premises)• Use most “core” services: Amazon EC2, Amazon S3, Amazon

DynamoDB, Amazon RDS, Amazon VPC• Some app-specific services: Amazon SWF, Amazon SQS• Completing 2nd year of production operation in AWS• Continually improving efficiencies, controls, and costs

Implement Learn Refine

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Agenda

• The problem we faced• DevOps Ecosystem• Where we wanted to go as an organization / group• What options were available• The solution we implemented• Where we are now• How to leverage and other benefits• Questions

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Problem We Faced…

• Long-lived (static) vs. scale-out servers• Restorability

– Quick return to operation– Retrieval of lost data

• Use case example, FTP– 100s x GB data– Everyday use– Hundreds of users– Critical system

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

DevOps EcosystemEC2 Instance

Rundeck Job

Python Snapshot Script

Amazon EBS SnapshotsRundeck Job

Python Restore Script

Restored EC2 Instance

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Where We Wanted to Go…

Automated schedule Rapid restore to normal operation Manual and scripted restore options Add / remove servers over time Monitoring and oversight of backup process

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

What Options Were Available?

• Third-Party Software (e.g., Skeddly)– Quick implementation

• Homegrown Implementation– Growing expertise in Python and Boto– Didn’t want another system to manage / learn– Tailored for specific needs– No net new costs

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

The Solution We Implemented…

• Snapshots using the Python SDK, Boto• Instance and snapshot tagging• Snapshot removal based on tag• Push button restore process

– More soon…

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Solution continued… (Backup)

• Instance Tags• Snapshot Tags

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Solution continued… (Restore)

• Rundeck UI

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Solution continued… (Restore)

• How long will it take to restore?15:00?

10:00?

5:00?

1:35

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Inspect What We Expect…

• Weekly reporting via Rundeck showing snapshot status

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Where We Are…

Automated schedule Rapid restore to normal operation Manual and scripted restore options Add / remove servers over time Monitoring and oversight of backup process

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

How to Leverage…

• Figure out what your backup / restore strategy should be– Live snapshots– Stop instance, then snapshot– No snapshot, native application backups

• Determine which instances need regular backups– Create backup schedules for those instances

• Document the process!– Have a fresh set of eyes follow the steps for verification

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Other Benefits

• Copy snapshots / AMIs across regions / accounts– Load Testing– Disaster Recovery

• Incremental backup– First snapshot takes the longest– Subsequent backups have lower storage overhead (i.e., cost efficiency)

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Questions

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

Thank You.This presentation will be loaded to SlideShare the week following the Symposium.

http://www.slideshare.net/AmazonWebServices

AWS Government, Education, and Nonprofit Symposium Washington, DC I June 25-26, 2015

top related