pie on aws

Pie AWS

About me

• Kuan-Yen Heng (Chris) • Software Engineer at Pie (pie.co) • Primarily backend and DevOps • [email protected] • https://github.com/gigablah • @gigablah

http://pie.co

mailto:[email protected]

https://github.com/gigablah

https://twitter.com/gigablah

About Pie

• Chat for work • Multi device sync

(web, iOS, Android) • Rich media integration • We build Pie using Pie!

Requirements

• Realtime websocket messaging • Horizontal scalability with autoscaling • Load balancing across availability zones • Job queue / background worker system • Rapid develop-build-test-deploy cycle • Zero downtime deployment with rollback

Technologies used

TERRAFORM

Architecture

AB

C

apiwor

ker

Infrastructure as code

• We use Terraform to define and manage our staging and production clusters

• AWS resources (VPC, Security Groups, Launch Configurations, Autoscaling Groups, Instances, Load Balancers) configured in HCL

• Version control for your infrastructure • Separate planning and execution phases

Dependency graph

Cons

• Terraform is not yet mature • Not all AWS resources and parameters

supported • Currently not possible to port in existing

infrastructure • Considering AWS CloudFormation

App/service distribution

Docker workflow

docker push

docker pull

registry

FROM debian:wheezy MAINTAINER blah <[email protected]>

RUN apt-get install rabbitmq-server

EXPOSE 5672 15672

ENTRYPOINT ["/bin/bash", "-c"] CMD ["/usr/sbin/rabbitmq-server"]

Dockerfile

docker build docker tag

image

docker run

container

docker commit

mailto:[email protected]

Why containers?

• Lightweight, fast startup compared to VMs • Repeatable, consistent builds • Dependency isolation • Pristine host OS; only Docker installed • Homogenous hosts, easier management • “Servers as cattle”

Container orchestration

CoreOS: fleetd + etcd

fleetd

etcd

fleetd

etcd

fleetctl

fleetd

etcd

systemd systemd

docker docker

Scheduling units

• Basically writing systemd units • Fleet specific metadata [X-Fleet] • Schedule global units, specify constraints and

dependencies, restart policies • Deploy units based on machine fleet

metadata, e.g. role=api and role=worker

MachineMetadata=role=api Global=true

Global=true

single instance

> fleetctl list-units

UNIT MACHINE ACTIVE SUB api-discovery@master_123.service 75e1c8bd.../10.0.10.xxx active running api-discovery@master_123.service f54a4d78.../10.0.11.xxx active running api-discovery@master_123.service 320af1d0.../10.0.12.xxx active running api-proxy.service 75e1c8bd.../10.0.10.xxx active running api-proxy.service f54a4d78.../10.0.11.xxx active running api-proxy.service 320af1d0.../10.0.12.xxx active running api@master_123.service 75e1c8bd.../10.0.10.xxx active running api@master_123.service f54a4d78.../10.0.11.xxx active running api@master_123.service 320af1d0.../10.0.12.xxx active running logspout.service 75e1c8bd.../10.0.10.xxx active running logspout.service 17291bf6.../10.0.11.xxx active running logspout.service 320af1d0.../10.0.12.xxx active running logspout.service e1c8ca4c.../10.0.10.xxx active running logspout.service f54a4d78.../10.0.11.xxx active running logspout.service d28b5a20.../10.0.12.xxx active running logspout.service db206400.../10.0.10.xxx active running rabbitmq.service e1c8ca4c.../10.0.10.xxx active running rabbitmq.service 17291bf6.../10.0.11.xxx active running rabbitmq.service d28b5a20.../10.0.12.xxx active running [email protected] e1c8ca4c.../10.0.10.xxx active running [email protected] 17291bf6.../10.0.11.xxx active running [email protected] d28b5a20.../10.0.12.xxx active running

CI/CD pipeline

bastion

registry

build

api cluster

apiapiapi

worker cluster

workerworkerworker

Test in Docker container

Push to registry

bastion

registry

cluster

hubotdeploy

“hubot deploy api:master”

pull deploy container

fleetetcd docker

dockerfleetetcd

pull api container

Logging and monitoring

collector container

gliderlabs/logspout

Amazon CloudWatch

monitoring / metrics container

etsy/statsddatadog/docker-dd-agentscoutapp/docker-scoutlogentries/docker-logentries

Amazon CloudWatch

What’s next?

Amazon ECS

• CoreOS: too many moving parts? • fleet and etcd still evolving • Problems with btrfs • Studying a move to Amazon Linux with ECS

ECS parallels

• The ECS agent container takes the place of fleetd

• Cluster and task management through the AWS CLI

• Task definitions in JSON • ECS handles container lifecycle; in fleet unit

files you still have to manage your containers

Quick demo

Thank you

[email protected] https://github.com/gigablah

@gigablah