aws re:invent 2016: development workflow with docker and amazon ecs (con302)

Post on 16-Apr-2017

666 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Development Workflows

with Docker and Amazon ECS Jon Todd, Chief Architect, Okta

Tim Secor, Manager of Developer Productivity, Okta

Danielle Greshock, Manager, Solutions Architecture, AWS

CON302

December 1, 2016

What to Expect from the Session

• Review the CI/CD Pipeline

• How would you use containers with CI/CD?

• Okta Engineering: How they work and ship code

• CI with Docker and ECS

The Continuous Everything… Nirvana

Goal Design Develop Deploy TestRun and

monitor

Continuous integration

Continuous delivery

Continuous deployment

Continuous feedback

Virtual machine Container

Why Use Containers for Continuous Delivery?

• Roll out features as quickly as possible

• Predictable and reproducible environment

• They are immutable! They will run the same in every

environment

• Fast feedback

The Lifecycle:

Stage 1 – Source

Docker and Docker Toolbox

• Docker (Linux > 3.10)

• Docker Toolbox or Docker Beta (OS X, Windows)

• Define app environment with Dockerfile

Dockerfile

FROM ruby:2.2.2

RUN apt-get update -qq && apt-get install -y build-

essential libpq-dev

RUN mkdir -p /opt/web

WORKDIR /tmp

ADD Gemfile /tmp/

ADD Gemfile.lock /tmp/

RUN bundle install

ADD . /opt/web

WORKDIR /opt/web

Docker Compose

Define and run multi-container applications:

1. Define app environment with Dockerfile

2. Define services that make up your app in docker-

compose.yml

3. Run docker-compose up to start and run entire app

The Lifecycle:

Stage 2 – Build

Containers as Build Execution Environment

Containers as Build Artifacts

Amazon EC2 Container Registry

• Security

• IAM resource-based policies

• CloudTrail audit logs

• Images encrypted at transit and at rest

• Easily manage & deploy images

• Tight integration with ECS

• Integration with Docker toolset

• AWS Management Console & AWS CLI

• Reliability & performance

• S3-backed

The Lifecycle:

Stage 3 – Test

Running Tests Inside a Container

Usual Docker commands available within your test

environment

Run the container with the commands necessary to

execute your tests, e.g.:

docker run web bundle exec rake test

Running Tests Against a Container

Start a container running in detached mode with an

exposed port serving your app

Run browser tests or other black box tests against the

container, e.g., headless browser tests

The Lifecycle:

Stage 4 – Deploy

Amazon EC2 Container Service

• Highly scalable container management service

• Easily manage clusters for any scale

• Flexible container placement

• Integrated with other AWS services

• Extensible

• ECS concepts

• Cluster and container instances

• Task definition and task

AWS Elastic Beanstalk

• Deploy and manage applications without worrying about

the infrastructure

• Elastic Beanstalk manages your database, Elastic Load

Balancing, ECS cluster, monitoring, and logging

• Docker support

• Single container (on EC2)

• Multi container (on ECS)

Amazon ECS CLI

• Easily create ECS clusters & supporting resources

such as EC2 instances

• Run Docker Compose configuration files on ECS

• Available today – http://amzn.to/1jBf45a

Continuous Delivery

Workflows

Continuous Delivery To ECS with Jenkins

4. Push image to

Docker registry

2. Build image from

sources 3. Run test on image

1. Code push

triggers build

5. Update service

6. Pull image

Continuous Delivery To ECS with Jenkins

Easy deployment

Developers – Merge into master, done!

Jenkins build steps

Trigger via webhooks, monitoring, Lambda

Build Docker image via Build and Publish plugin

Push Docker image into registry

Register updated job with ECS API

Continuous Delivery To ECS with CodePipeline

1. Code push

triggers pipeline

2. Lambda function

creates EC2 instance

3. Image is built and

pushed to ECR

4. Lambda function

terminates EC2 instance

5. Lambda function

deploy new task

revision to ECS

Continuous Delivery To ECS with CodePipeline

• Lambda custom actions

• Create and terminate EC2 instance

• Update ECS service

• EC2 instance uses user data to build an image and push

it to ECR

Continuous Delivery To ECS with Shippable

About Okta

Millions of People Use Okta Every DayMillions of People Use Okta Every Day

An identity platform for developers

1. Connect to any data source

© Okta and/or its affiliates. All rights reserved.

An identity platform for developers

2. Customizable login w/ MFA

© Okta and/or its affiliates. All rights reserved.

An identity platform for developers

3. Support all application types w/

modern identity standards

© Okta and/or its affiliates. All rights reserved.

An identity platform for developers

Learn more at: developer.okta.com

The case for ECS & Docker

The problem

Inspired by: http://dev2ops.org/2010/02/what-is-devops/

Dev OpsWall of turmoil

Dev Ops

I want stabilityI want change

Domain boundary

Container frameworks

Cluster schedulerDev Ops

Continuous integration

© Okta and/or its affiliates. All rights reserved. Okta Confidential

Options

Container frameworks Cluster schedulers

Amazon ECSLXC

Okta’s CI with ECS

Okta Engineering

Okta Engineering—How Do We Work, How Do

We Ship Our Code?

• 200 engineers, split into teams with embedded

specialists

• 1 week sprints, and deploy to production weekly

• Capability to do more than one hotfix per day at

customers’ request or for bugs found in CI or pre-prod

• Every merge to master is a potential release candidate

Okta Engineering—How Do We Test Our

Code?

• Every topic branch goes through the same amount of

vigor in testing as release candidates.

• Passing automated tests is enforced at commit time.

• Largest repo: 33K tests, takes 60 minutes (22 parallel

runs)

• Smallest repo: 100 tests, 5 minutes

• The Developer Productivity team is responsible for

supporting engineering.

Challenge of Developer Productivity Team

• Developer experience

• Quality

• Cost

• Cloud first

Challenge of Developer Productivity Team

• Developer experience

• Quality

• Cost

• Cloud first

Developers expect fast turn-

around time and reliable results

Challenge of Developer Productivity Team

• Developer experience

• Quality

• Cost

• Cloud first

We need to run all the tests

required to guarantee quality

Challenge of Developer Productivity Team

• Developer experience

• Quality

• Cost

• Cloud first

We need to run an

infrastructure which is as cost-

effective as possible

Challenge of Developer Productivity Team

• Developer experience

• Quality

• Cost

• Cloud first

We aim to use cloud services

first, wherever possible

Problems

CI Using Open Source, Monolithic Applications

Vision

Vision

• Clean testing environments

• Dynamic worker scaling

• Spot Instances for cost

• Versioned testing

• Improved queuing system

• Less infrastructure flakiness

• The correct privileges, to

maintain security

Vision

• Clean testing

environment

• Dynamic worker scaling

• Spot Instances for cost

• Versioned testing

• Improved queuing system

• Less infrastructure flakiness

• The correct privileges, to

maintain security

Isolate test environments from

others, parallel and serial runs

Vision

• Clean testing environments

• Dynamic worker scaling

• Spot Instances for cost

• Versioned testing

• Improved queuing system

• Less infrastructure flakiness

• The correct privileges, to

maintain security

Workers should survive the

loss of their build server

Worker pool should scale

quickly

Number of workers should not

affect memory footprint of build

server

Vision

• Clean testing environment

• Dynamic worker scaling

• Spot Instances for cost

• Versioned testing

• Improved queuing system

• Less infrastructure flakiness

• The correct privileges, to

maintain security

Run our services for cheaper

rates, as we have many short

lived tasks, and could certainly

handle a few failures

Vision

• Clean testing environment

• Dynamic worker scaling

• Spot Instances for cost

• Versioned Testing

• Improved queuing system

• Less infrastructure flakiness

• The correct privileges, to

maintain security

Enable testing of infrastructure

changes in topic branches

Vision

• Clean testing environment

• Dynamic worker scaling

• Spot Instances for cost

• Versioned testing

• Improved queuing system

• Less infrastructure flakiness

• The correct privileges, to

maintain security

Should survive build server

reboots

Shouldn’t be tied to specific

workers or build servers

Centralized

Should have good visibility

Re-queuing of lost tasks

Vision

• Clean testing environment

• Dynamic worker scaling

• Spot Instances for cost

• Versioned testing

• Improved queuing system

• Less infrastructure

flakiness

• The correct privileges, to

maintain security

Push testing and creation of

test machines to developers

Vision

• Clean testing environment

• Dynamic worker scaling

• Spot Instances for cost

• Versioned testing

• Improved queuing system

• Less infrastructure flakiness

• The correct privileges, to

maintain security

Launch tasks in secure

environments

Solutions

Custom Reporting

ECS and Docker

• AWS + Java app tailored to Okta process

• Immutable and disposable build workers—created for

one-time use, destroyed when job is done

• Near ZERO cost on weekends, scales with load

• ECS allows us to maximize usage of EC2 instances

• Same containers for multiple types and numbers of

builds

• Same AMI can run multiple Docker images

Amazon ECS

IAM separation per service

• Either service per cluster or use new IAM for ECS functionality

Sharing the docker daemon to allow running Docker within

Docker

Pre-fetching large data blobs and making them available

on the hosts is an option

Multiple containers: mysql, redis, kinesilite

Docker Update

• Update Dockerfile and our CI system builds the new image,

uploading it to our repository

• Update task definition for cluster updates

Docker Conventions

• Dockerfiles live with project code, versioned together

• docker-compose used for development, so a clone plus

build will have a full service running locally

• Single repo for library and third-party service definitions

• Secrets or any form of config NEVER baked in

containers

• Start from minimal, audited base OS

• Strict rules around “FROM” clause

• Build owns creating immutable version and publishing

Docker Build Process

Task Definitions

{

"taskDefinitionArn": "arn:aws:ecs:us-east-1:262205085595:task-definition/base-container-box-task:1",

"containerDefinitions": [

{

"memory": 15000,

"essential": true,

"mountPoints": [

{

"containerPath": "/usr/bin/docker",

"sourceVolume": "docker_daemon",

"readOnly": null

},

{

"containerPath": "/var/run/docker.sock",

"sourceVolume": "docker_socket",

"readOnly": null

}

Task Definitions

],

}

],

"volumes": [

{

"host": {

"sourcePath": "/var/run/docker.sock"

},

"name": "docker_socket"

},

{

"host": {

"sourcePath": "/usr/bin/docker"

},

"name": "docker_daemon"

}

],

"family": "base-container-box-task”

Clean Testing Environments

• Docker images

• Nearly instant machine refresh

• Easy for users to create and upload images that have

been tested to work locally

• Efficient machine use

• ECS with ECR and private repository back end

Dynamic Worker Scaling

SQS LambdaSNS

Lambda

Scaling

Bin packing

ECS

Dynamic Worker Scaling

Lambda allocates jobs using bin packing

This is one of the changes we had to make in order to use

ECS for long running tasks, rather than services spread

across many stateless instances

Disconnects unneeded nodes from cluster, allowing

themselves to self-terminate when they are idle

VS

Dynamic Worker Scaling

Lambda allocates jobs using bin packing

This is one of the changes we had to make in order to use

ECS for long running tasks, rather than services spread

across many stateless instances

Disconnects unneeded nodes from cluster, allowing

themselves to self-terminate when they are idle

VS

Dynamic Worker Scaling

Lambda allocates jobs using bin packing

This is one of the changes we had to make in order to use

ECS for long running tasks, rather than services spread

across many stateless instances

Disconnects unneeded nodes from cluster, allowing

themselves to self-terminate when they are idle

VS

Dynamic Worker Scaling

Lambda allocates jobs using bin packing

This is one of the changes we had to make in order to use

ECS for long running tasks, rather than services spread

across many stateless instances

Disconnects unneeded nodes from cluster, allowing

themselves to self-terminate when they are idle

VS

Dynamic Worker Scaling`

Lambda allocates jobs using bin packing

This is one of the changes we had to make in order to use

ECS for long running tasks, rather than services spread

across many stateless instances

Disconnects unneeded nodes from cluster, allowing

themselves to self-terminate when they are idle

VS

Dynamic Worker Scaling

Spot Instances

• We use Spot Instances across all Availability Zones

• Manually switch between On-Demand and Spot

Instances 3 times per week during Spot price spikes

• We are planning on moving to Spot Fleet soon

• Set pricing to On-Demand prices, we lose build slaves

whenever pricing goes above On-Demand prices

• 4000-6000 instance hours per day, about 1500 Spot

losses per week

Spot Instances

Spot Instances

Spot Instances

Versioned Jobs

Scripts checked into repositories Makes a transition to Docker jobs

easy

Versioned Jobs with ECS

• Versioned build and test scripts can now be run in

versioned Docker containers, using versioned task

definitions

• Creates extreme flexibility

• CloudFormation allows us to stand up whole new

clusters with all different versions in a matter of minutes

for long term testing

ECS + Docker Problems

• Docker containers not launching

• ECS agent failing

• Docker containers stopping

• Incompatibility with certain services

• Docker OS availability

• Cleanup - AWS has made this configurable

• Image size

Amazon Web Services

EC2

SQS

LambdaECS S3

RDS

Amazon

KinesisSpot Instances

ECR

CloudFormation

SNS

CloudWatch

CloudTrail

Building CI with Amazon Web Services

Future

Expand Use

• Use ECS for more services

• Allow developers to control their test suites and Docker

images more directly

• Developer environments

• Use Docker for local long running services

• Use a VM running the same version OS

• Remote updates to keep it in line with CD system

• Aim to enable running CD containers right out of the box

ECS Services In Production

© Okta and/or its affiliates. All rights reserved.

Requirements

• Support for our multi-AZ & multi-region architecture

• Compliance – SOC2 type 2, HIPAA, ISO 27001, FedRAMP

• Least-privilege principle - independent IAM roles per service

• Host to host encryption

• Deployment support for:

• Rollback

• Canary

• Blue-green

• 0-downtime deployments

0-Downtime Testing

https://github.com/jontodd/aries

© Okta and/or its affiliates. All rights reserved. Okta Confidential

Test Assumptions

• ECS config• Agent version 1.11.0

• Docker version 1.11.2

• Cluster config• 8 instances backed by ASG

• ASG config• 8 instances across 3 AZs• Default termination policy

• 5 min health check grace period

• ELB• Timeout 4s• Interval 5s

• Unhealthy threshold 2• Healthy threshold 10

• Enable connection draining 300s timeout

• Load generation

• 16 threads

• Throughput

• Interactive ➔ 490 r/s

• 10s long poll ➔ 1.5 r/s

© Okta and/or its affiliates. All rights reserved. Okta Confidential 89

Operation Interactive Errors

(~70ms latency, 490rps)

Long Poll Errors

(~10s latency, 1.5rps)

Upsize ECS service 4 → 8 0 0

Downsize ECS service 8 → 4 0 0

Deploy ECS service – 50% min healthy 0 0

Stop task* 0 0

Downsize Auto Scaling group 0 0

Terminate EC2 instance 0 0

Stop Docker daemon (service docker stop)* 0 0

Stop EC2 instance** 0 0

Kill Docker container (docker kill <containerId>)* 2 2

Fail health check 450 5

* No intention of running operation in practice ** Caused inconsistent state

Workflow

Auto Scaling group

Launch config

EC2

ECS cluster

ECS

serviceECS canary

serviceApplication YAML

Docker Registry

(Artifactory)

ELB

Images pulled

when tasks start

Conductor

(Bastion ECS controller)

CI Pipeline

Git repo

Promoted artifactsDockerfile

docker_compose.yml

Test / Preview / ProductionDev

Deploy new version

© Okta and/or its affiliates. All rights reserved. Okta Confidential

Application definition

• Developers define YAML for

their application

• Deploy time configuration is

supplied to the ECS task

definition

• Secrets are pulled by the

application at startup

Demo

© Okta and/or its affiliates. All rights reserved.

Feature requests

• Dynamic port mapping (Application load balancing)

• Service autoscaling

• Per container IAM roles

• Per-container security groups

• Bin-packing scheduler

© Okta and/or its affiliates. All rights reserved.

Lessons learned

• /etc/ecs/ecs.config• ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION for forensics (default 1hr)

• ECS_LOGLEVEL=debug

• Tune ELB health check

• Docker 1.10 for security enhancements

• Canary & blue/green separate service attached to same ELB

• ECS is incredibly easy to get up and running

• The ecosystem is changing quickly

Thank you!

Jon Todd – @JonToddDotCom

Tim Secor - @TimSecor

Danielle Greshock – greshock@amazon.com

Remember to complete

your evaluations!

top related