building a paas with docker and aws

27
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Empire Building a PaaS with Docker and AWS

Upload: amazon-web-services

Post on 11-Aug-2015

888 views

Category:

Technology


2 download

TRANSCRIPT

©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

EmpireBuilding a PaaS with Docker and AWS

Agenda

• A little background about why we decided to build an internal PaaS.

• Introduction to Empire.• How we’re leveraging Amazon EC2 Container

Service (ECS) as the backend.• Demo• Q&A

Who am I

• Eric Holmes• Infrastructure Engineer at Remind• I like building things for other developers• Work mostly with Go and Ruby• You can find my open source stuff at

https://github.com/ejholmes

What’s Remind?

• Remind is a messaging platform for teachers, students and parents.

• Chat/Announcements/Files• ~25 million users. ~350,000 new users per day

during BTS• ~5 million messages per day.• ~50 employees. ~30 engineers.

Architecture

Started as a monorail

We started growing...

Broke apart the monolith

• Sidekiq queues were IO bound and constantly backed up during BTS

• Message delivery workers were tightly coupled to the rest of the application. Difficult to scale out horizontally

• Database would need to be sharded• Started breaking the monolith apart into loosely

coupled services.• Now have ~50 production services

Heroku

• Entirely hosted on Heroku• Heroku has been awesome; never needed an

ops team.• Allowed us to focus on building product.

But we ran into issues...

• “Internal” micro-services need to be exposed publicly.

• Databases need to be opened up to all traffic.• Little visibility into performance of hosts.• No control over the routing layer.

What do we want?

• Want to use AWS services.• Want to maintain operational simplicity.• Support 12 factor apps. http://12factor.net/• Maintain shared patterns for deployment. Faster iteration and build +

release cycles• No ops.• Decrease our surface area and only expose a single app publicly.• Robust and resilient to failure. Self-healing.• If we can, continue to use containers as a unit of deployment.

Why containers?

• Fast to build*• Let us isolate dependencies as a portable, easy-

to-distribute package.• Allow us to create better development

environments with more dev/prod parity.• Limit the number of moving parts when we

deploy.• Better resource utilization and cost management

We’re not the first company to want a PaaS

• Netflix - Asgard• SoundCloud - Bazooka• Every other company in our investor’s portfolio...

Something we can re-use?

• Flynn–Alpha–Undergoing many architectural changes–Custom load balancer

• Deis–More than it needed to be–Nobody using it successfully in production (that we knew of)

Empire was born

• Initially started as a management layer on top of CoreOS + fleet.

• Load balancing via nginx configured through confd + etcd.

• Unit of deployment was Docker containers• Implemented a subset of the Heroku API

Therein lies the rub...

• Fleet initially worked well, until we started testing failure modes.

• Fleet had a lot of bugs• etcd was fragile• We needed resilience and stability• We didn’t want to run and operate our own

clustering.

Amazon EC2 Container Service (ECS) becomes GA

• Amazon ECS became GA while we were looking for an alternative scheduler.

• Looked promising to serve as the scheduling backend.

What is Amazon ECS?

• Pools hosts together as a single compute resource.

• Provides a set of APIs for placing tasks on machines

• Scheduler supports “services” for scaling tasks horizontally and maintaining desired state.

• Services integrate with ELB for connection draining, zero downtime, and healthchecks.

Amazon ECS Components

• Container Instance• Amazon ECS Agent• Amazon ECS Scheduler

Amazon ECS Resources

• Task Definition• Service• Task• Cluster

Amazon ECS for Empire

• Solid set of primitives to serve as the scheduling backend

• Managed service• Failure modes behaved as we expected them to• ELB integration allowed us to remove custom

routing layer• Service discovery via DNS

What is Empire?

• Open source internal PaaS for micro-services• A layer of usability on top of Amazon ECS for 12

factor apps• Single binary. Minimal deps. Easy to run.• Provides an API and CLI to create apps, deploy

docker images, update configuration, run one off tasks etc.

• Allows you to use Procfiles to build multiple Amazon ECS services

Is it ready for production?

• Running ~15 production services within Amazon ECS managed via Empire for a little over a month

• Empire is hands off after you’ve deployed. AWS services take over

• Moving directly onto EC2 showed huge performance improvements for services

Demo

What does Empire not do?

• Bring your own logging and metrics (soon?)• It doesn’t handle building your Docker images• Doesn’t handle the creation of attached

resources like Databases

Things to keep an eye on

• http://www.convox.com/

Thank you

• GH: @ejholmes• Twitter: @vesirin• https://github.com/remind101/empire• https://github.com/ejholmes/empire-demo• http://12factor.net/