amazon ec2 container service deep dive

86
1 Amazon EC2 Container Service Deep dive Ryosuke Iwanaga Solutions Architect, Amazon Web Services Japan

Upload: amazon-web-services-japan

Post on 16-Apr-2017

4.813 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Amazon EC2 Container Service Deep dive

1

Amazon EC2 Container Service

Deep dive

Ryosuke Iwanaga

Solutions Architect, Amazon Web Services Japan

Page 2: Amazon EC2 Container Service Deep dive

2

Agenda

• ContainerとDevOps

• Case studies– Sony: Batch jobs

• Summary

Page 3: Amazon EC2 Container Service Deep dive

3

ContainerとDevOps

Page 4: Amazon EC2 Container Service Deep dive

4

DevOps lifecycle

Build Test ProductionDevelopment

<>

<>

Application

CodeArtifact

Page 5: Amazon EC2 Container Service Deep dive

5

DevOps lifecycle

Build Test ProductionDevelopment

<>

<>

+AMI Provisioning

Code

Application

CodeArtifact

Provisioning

Code

{}Config

{}Config

{}Config

Page 6: Amazon EC2 Container Service Deep dive

6

After Docker…

+ <>+

Build Test ProductionDevelopment

<>

<>

+{} {} {}

Page 7: Amazon EC2 Container Service Deep dive

7

After Docker…

+Provisioning

Code

<>

Application

Code

Docker

Image

+

DockerfileDocker

Image

Build Test ProductionDevelopment

Page 8: Amazon EC2 Container Service Deep dive

8

Dev and Ops

• Dev– インフラに関するコードを書かずに、アプリをCI/CDする

– 価値あるコードを生み出し続ける

• Ops– どんなアプリが動くかに依存しないインフラ管理

– スケーラビリティ、コスト最適化

Page 9: Amazon EC2 Container Service Deep dive

9

Infrastructure for Docker containersBuild Test ProductionDevelopment Registry

Service A

Service B

Service C

Page 10: Amazon EC2 Container Service Deep dive

10

Resource Utility of Heterogeneous Containers

35%

85%

~80%

Amazon ECS

Page 11: Amazon EC2 Container Service Deep dive

11

Resource Utility of Heterogeneous Instances

Amazon ECSAmazon EC2

Spot Fleet+

c3.xlarge

c3.xlarge

c3.xlarge

r3.8xlarge

r3.8xlarge

r3.8xlarge

c3.8xlarge

c3.8xlarge

c3.8xlarge

c3.4xlarge

c3.4xlarge

c3.4xlarge

r3.2xlarge

r3.2xlarge

r3.2xlarge

Page 12: Amazon EC2 Container Service Deep dive

12

Cluster管理とScheduler

• Cluster管理

– 計算機群のリソース、状態を常に管理

• Scheduler

– Cluster全体を見て適切にContainerを配置

CPU: 500

Mem: 300

CPU: 10

Mem: 30 CPU: 2000

Mem: 1000

CPU: 10

Mem: 30CPU: 10

Mem: 30

Scheduler

Cluster Manager

Page 13: Amazon EC2 Container Service Deep dive

13 Source: eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf

Page 14: Amazon EC2 Container Service Deep dive

14

Docker

Task

Container Instance

Amazon

ECS

Container

ECS Agent

ELB

Internet

ELB

User / Scheduler

API

Cluster Management Engine

Task

Container

Docker

Task

Container Instance

Container

ECS Agent

Task

Container

Docker

Task

Container Instance

Container

ECS Agent

Task

Container

AZ 1 AZ 2

Key/Value Store

Agent Communication Service

http://aws.typepad.com/sajp/2015/07/under-the-hood-of-the-amazon-ec2-container-service.html

Page 15: Amazon EC2 Container Service Deep dive

15

Amazon ECS: Task Definition

Page 16: Amazon EC2 Container Service Deep dive

16

Amazon ECS: Task Definition

• Containerの集合を定義– 必ず同じInstanceで稼働– 要求するリソースを指定

• CPU, memory, (Port)

• ボリュームも定義可能– Instanceのファイルシス

テムを利用できる

• バージョニングが可能

Page 17: Amazon EC2 Container Service Deep dive

17

Task Definition: Container Definition{"name": "simple-demo","image": "foo/my-demo","cpu": 10,"memory": 500,"portMappings": [{"containerPort": 80,"hostPort": 80

}],"mountPoints": [{"sourceVolume": "my-vol","containerPath": "/var/www/my-vol"

}],"entryPoint": ["/usr/sbin/apache2","-D","FOREGROUND"

],"essential": true

},

{

"name": "busybox",

"image": "busybox",

"cpu": 10,

"memory": 500,

"volumesFrom": [

{

"sourceContainer": "simple-demo"

}

],

"entryPoint": [

"sh",

"-c"

],

"command": [

"while true; do /bin/date > /var/www/my-vol/date; sleep 1; done"

],

"essential": false

}

Page 18: Amazon EC2 Container Service Deep dive

18

Task Definition: Overview

Shared data volume

PHP AppTime of day

App

Task Definition

Page 19: Amazon EC2 Container Service Deep dive

19

Task Definition: Overview

Container

Instance

Schedule

Shared data volume

PHP AppTime of day

App

Task Definition Task

Page 20: Amazon EC2 Container Service Deep dive

20

Amazon ECS: Service

Page 21: Amazon EC2 Container Service Deep dive

21

Amazon ECS: Service

• Web/APIの様に長期稼働するワークロードに最適

• Taskを必要数保ってくれるスケジューラ– 自動復旧にも対応

• 新しいTask Definitionをデプロイしつつ切替

• Elastic Load Balancingとの連携も可能

Page 22: Amazon EC2 Container Service Deep dive

22

Amazon ECS: Serviceの例

Page 23: Amazon EC2 Container Service Deep dive

23

Amazon ECS: ServiceのUpdate

• Serviceが使うTask DefinitionをUpdateすると、新しいTaskをデプロイできる

• 空いているリソースで新しいTaskを起動しながら、徐々に古いTaskを止めていく– Blue-Green deployment

– 中間では、新旧のTaskが混在する

Page 24: Amazon EC2 Container Service Deep dive

24

Cluster

Amazon ECS: ServiceのUpdate

Service

Task Definition:1

Task:1Task:1 Task:1

Page 25: Amazon EC2 Container Service Deep dive

25

Cluster

Amazon ECS: ServiceのUpdate

Service

Task Definition:1 Task Definition:2

Task:1Task:1 Task:1

Page 26: Amazon EC2 Container Service Deep dive

26

Cluster

Amazon ECS: ServiceのUpdate

Service

Task Definition:1 Task Definition:2

Task:1Task:1 Task:1Task:2 Task:2

Page 27: Amazon EC2 Container Service Deep dive

27

Cluster

Amazon ECS: ServiceのUpdate

Service

Task Definition:1 Task Definition:2

Task:1Task:2 Task:2

Page 28: Amazon EC2 Container Service Deep dive

28

Cluster

Amazon ECS: ServiceのUpdate

Service

Task Definition:1 Task Definition:2

Task:1Task:2 Task:2Task:2

Page 29: Amazon EC2 Container Service Deep dive

29

Cluster

Amazon ECS: ServiceのUpdate

Service

Task Definition:1 Task Definition:2

Task:2 Task:2Task:2

Page 30: Amazon EC2 Container Service Deep dive

30

Case: Sony (Batch jobs)

Page 31: Amazon EC2 Container Service Deep dive

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Konstantin Wilms, Enterprise Solutions Architect, AWS

Ben Masek, CTO, Solutions Engineering Group, Sony PSA

October 2015

CMP405

Containerizing VideoCreating the Next Generation Video Transcoding Pipeline

Page 32: Amazon EC2 Container Service Deep dive

1秒1秒が大事

なぜ?

スピードがコストに跳ね返る

Page 33: Amazon EC2 Container Service Deep dive

変化

なぜ?

動画エンコードのライフサイクルは速い

Page 34: Amazon EC2 Container Service Deep dive

ソフトウェアは捨てられる

なぜ?

しばしば高いエンジニアリングコストでv1.0 v2.0 v3.0

Page 35: Amazon EC2 Container Service Deep dive

AWSがどう役立ったか?

どのようにして?

エンコードの課題を解決するために

Page 36: Amazon EC2 Container Service Deep dive

Amazon ECS

スケジュールAmazon EC2

処理

中心となるサービス群

保存AWS Lambda

繋ぎAmazon EFS &

Amazon S3

100ms単位課金インフラ管理不要

カスタムのバイナリ

中心となるストレージS3の代わり

コンテナとリンク

仕事の単位複数のクラスタ

エンコードに特化

スケジューラの代わり一般的なコンテナCPUを使い切る

Page 37: Amazon EC2 Container Service Deep dive

アーキテクチャ全体

Source

Storage

Ingest

Container

Storage

EC2

Bootstrap

Transcode

ECS

EC2

Manager

Target

Stitch

Manager

入力ファイルがやってくると、イベント駆動なワークフローの全体が開始される

エンコードのためのAmazon ECSの起動用shellスクリプト

Amazon S3は長期保管用、Amazon EFSは一時的/コンテナ用ストレージ

中央集権のマネージャー

Amazon ECSのスケジューラを利用

マネージャーは分割し統合する

補助的な処理に、Lambda

他のワークフローとも、イベント駆動で連携

Page 38: Amazon EC2 Container Service Deep dive

Auto Scaling

単純な複数AZを使ったスケール

ワークロードの単位で決定論的なスケジューリングのためにMin=Max=Desired

いつでも簡単に捨てることができる

インスタンスのタイプとトータルメモリが重要(~1.5MB エンコーダ)

Page 39: Amazon EC2 Container Service Deep dive

Amazon EC2 インスタンスの初期化

起動時にECSクラスタに参加させる

ECSクラスタの各インスタンスでAmazon EFSをマウントする

どのAZかを知るためにメタデータにクエリする

これ以外の魔法は、全てDockerとLambdaで起こる

Page 40: Amazon EC2 Container Service Deep dive

Amazon EFSでメディア処理

メディアのチャンク用の一時的なストレージ

メディアのソースとチャンクを移動する必要がない

Scales linearly 線形にスケールする 巨大なファイルを並列

で読み書きするのに理想的

ECSクラスタにアタッチするためのエンドポイントが各AZにある

Page 41: Amazon EC2 Container Service Deep dive

Amazon ECSクラスタとスケジューラの設定

キャパシティの単位

共有ステータス、楽観的並行制御

単純なスケジューラの設計 – list*, describe*, run*, start*

10 CPU単位で我々はキャパシティ管理

‘swimlane’ でエンコードのアーキテクチャをモデル化

Page 42: Amazon EC2 Container Service Deep dive

コンテナの構造

FROM ubuntu:14.04

ENV FLASK_JOB https://s3/xxx/jobs/h264.sh

ENV FLASK_FRAG https://s3/xxx/media/frag-000.mp4

ENV FLASK_THREADS 4

RUN wget http://xxx.com/ffmpeg/builds/ffmpeg-static.tar.xz \

&& tar xvfJ ffmpeg-static.tar.xz -C /usr/local/bin

RUN apt-get -y install libav-tools

RUN wget http://xxx.com/qpsnr.deb && dpkg -i qpsnr.deb

起動時の処理とスレッド数を環境変数で設定

固定した環境変数により簡単に開発とテストができる (デフォルトの呼び出しはデバッグ)

メモリの利用率を分析して、ホストのリソースを消費し尽くさないように

単一のAmazon ECS Task Definitionに紐付ける

420MBのイメージ、CLI無しなら350MB

CMD: s3 cp s3://…/bootstrap.sh . && chmod && bootstrap.sh

Page 43: Amazon EC2 Container Service Deep dive

コンテナのデプロイパイプライン

docker save 2408e7a48bac \

| sudo docker-squash -t \

transcode-toolkit:1.01s \

-verbose | docker load

ロジックを起動処理で開始するので、非常に単純

Kitematicが役立つ 容量削減と高速なデプロイのために、

静的リンクしたバイナリを使う 自動化も可能だし、手動でもいい Webhooks & ビルド自動化 Unionファイルシステムのレイヤが増

えるので、コンテナをSquashする publicかprivateのDocker registryを

使う

Page 44: Amazon EC2 Container Service Deep dive

44

まとめ

Page 45: Amazon EC2 Container Service Deep dive

45

Batch Processing Open-source Paas Real-time Image Transformation

Solr Search Cluster

PaaSGaming Engine

Web Application Platform

Microservices Backend

Mapping Solution

Amazon ECS: Some Examples…

Page 46: Amazon EC2 Container Service Deep dive

46 https://aws.amazon.com/docker/

AWS Container Partners

Page 47: Amazon EC2 Container Service Deep dive

47

Amazon ECSまとめ

• 多数の本番稼働実績、多くのパートナー– 2,000 containersが動いているお客様、エンタープライズ・大規模企業

• とてもシンプルなのに強力– Master的なサーバ、Configuration KVS等の管理不要

– Service schedulerが、Blue-Greenデプロイや自動復旧してくれる

• 次世代のサービスインフラ– Just like “Amazon EC2” in 2006!

Page 48: Amazon EC2 Container Service Deep dive

48

Appendix

Page 49: Amazon EC2 Container Service Deep dive

49

Case: Remind

Page 50: Amazon EC2 Container Service Deep dive

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Eric Holmes & Michael Barrett, Remind

October 2015

DVO308

Docker & ECS in ProductionHow We Migrated Our Infrastructure from Heroku to AWS

Page 51: Amazon EC2 Container Service Deep dive

Remind

• A messaging platform for teachers.

• Chat/announcements/files

• Over 30 million users

• Used actively in ~50% of U.S. public schools

• Over 2 billion messages delivered

• ~50 employees. ~30 engineers.

Page 52: Amazon EC2 Container Service Deep dive

Heroku was great, but…

• Every app on Heroku is publicly accessible

• Databases need to be exposed to Internet traffic

• Limited visibility and control

Page 53: Amazon EC2 Container Service Deep dive

What we want from a PaaS

• AWS

• Flexibility

• Shared patterns for deployment

• Easy service operation

• Containers/Docker

Page 54: Amazon EC2 Container Service Deep dive

Building an Empire

Page 55: Amazon EC2 Container Service Deep dive

Design Goals

• Easy to operate

• Open source

• Support 12-factor stateless apps (12factor.net)

• Swappable scheduling back-ends

• Stability!

• Docker images as a unit of deployment

Page 56: Amazon EC2 Container Service Deep dive

Empire :: V2

Scheduler Router Control Plane

ECS ELB

Heroku Platform API

Spec + emp CLI

Page 57: Amazon EC2 Container Service Deep dive

Empire :: V2

An open-source, self-hosted PaaS for running

twelve-factor Docker apps backed by AWS

services

Page 58: Amazon EC2 Container Service Deep dive

Twelve-Factor Tenants

I. Codebase

II. Dependencies

III. Config

IV. Backing Services

V. Build, release, run

VI. Processes

VII. Port binding

VIII.Concurrency

IX. Disposability

X. Dev/prod parity

XI. Logs

XII. Admin processes

Page 59: Amazon EC2 Container Service Deep dive

12factor :: Dependencies

“Explicitly declare and isolate dependencies”

FROM rubyRUN apt-get install imagemagickRUN bundle install

Page 60: Amazon EC2 Container Service Deep dive

12factor :: Build, release, run

“Strictly separate build and run stages”

Empire

Page 61: Amazon EC2 Container Service Deep dive

12factor :: Build

$ git push

Page 62: Amazon EC2 Container Service Deep dive

12factor :: Release, Run

Config{}

Release

Amazon ECS

Page 63: Amazon EC2 Container Service Deep dive

12factor :: Release, Run

$ cat Procfile

web: ./bin/web

worker: ./bin/worker

$ aws ecs list-services

arn:aws:ecs:us-east-1:***:service/api--web

arn:aws:ecs:us-east-1:***:service/api--worker

$ emp deploy org/api:latest

Status: Created v1 release.

Page 64: Amazon EC2 Container Service Deep dive

Service Discovery

$ aws ecs describe-services --service api--web

"loadBalancers": [{

"containerName": "web”,

"containerPort": 9001,

"loadBalancerName”: "2888...a31d4c”

}]

$ curl http://api

Ok

Page 65: Amazon EC2 Container Service Deep dive

12factor :: Concurrency

“Scale out via the process model”

$ emp scale web=10

$ aws ecs describe-service --service api--web

“desired-count”: 10

Page 66: Amazon EC2 Container Service Deep dive

12factor :: Dev/prod parity

“Keep development, staging, and production as similar as

possible”

$ docker run --env-file <(emp env -a api) org/api

Page 67: Amazon EC2 Container Service Deep dive

12factor :: Logs

“Treat logs as event streams”

$ emp log

“GET / HTTP/1.1” 200

STDOUT Amazon Kinesis

Page 68: Amazon EC2 Container Service Deep dive

12factor :: Admin processes

“Run admin/management tasks as one-off processes”

$ emp run rake db:migrate

Migrated

Page 69: Amazon EC2 Container Service Deep dive

69

Case: Coursera

Page 70: Amazon EC2 Container Service Deep dive

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Frank Chen, Coursera

Brennan Saeta, Coursera

October 2015

CMP406

Amazon ECS at CourseraPowering a general-purpose near-line execution

microservice, while defending against untrusted code

Page 71: Amazon EC2 Container Service Deep dive
Page 72: Amazon EC2 Container Service Deep dive
Page 73: Amazon EC2 Container Service Deep dive

Bad Old Days of Batch Processing @ Coursera

Cascade

• PHP-based job runner

• Originally ran in screen sessions

• Polled APIs for new jobs

• Forced restarts on regular basis

due to unidentified memory leaks

• Fragile and unreliable

The early

days…

Page 74: Amazon EC2 Container Service Deep dive

Bad Old Days of Batch Processing @ Coursera

Saturn

• Scala scheduled batch job runner• Powered by Quartz Scheduler library

• Better than Cascade, but…

• All jobs ran on same JVM, causing

interference

The not-

so early

days?

Page 75: Amazon EC2 Container Service Deep dive

What Else Did We Look At?

Home-grown Tech

• Tried, but proved

to be unreliable

• Difficult to

handle

coordination and

synchronization

• Powerful, but

hard to

productionize

• Needs

developers with

experience

• Designed for

GCE first

• Not a managed

service, higher

Ops load

Page 76: Amazon EC2 Container Service Deep dive

Amazon ECS to the Rescue

Little

maintenance

Integrated with

rest of AWSEasy to

develop for

Page 77: Amazon EC2 Container Service Deep dive

However…

Amazon ECS is a great building block,

but we still need to build tools around it

for our purposes.

Page 78: Amazon EC2 Container Service Deep dive

What We Built: Iguazú

Marissa Strniste (https://www.flickr.com/photos/mstrniste/5999464924) CC-BY-2.0

• Batch Job Scheduler for Amazon ECS

• Immediately

• Deferred (run once at X time)

• Scheduled recurring (cron-like)

• Programmatically accessible internally via

our standard APIs and clients

• Named for Iguazú falls

• World’s largest waterfall by volume

• We hope Iguazú handles a similar volume of jobs

Page 79: Amazon EC2 Container Service Deep dive

Iguazú

Frontend

Iguazú

SchedulerIguazú

Backend

Iguazú: Architecture

CassandraServices Services

Iguazú

Admin

ECS

Workers

SQS

ECS API

Devs

Users

Page 80: Amazon EC2 Container Service Deep dive

Iguazú: Developer / Ops User Interface

Page 81: Amazon EC2 Container Service Deep dive

Deploying Jobs

Easy Deployment

1. Developers Merge into master. Done!

Jenkins Build Steps:

1. Builds zip package from master

2. Prepares Docker image with zip file

3. Pushes image into Docker registry

4. Registers updated jobs with

Amazon ECS API

Page 82: Amazon EC2 Container Service Deep dive

Since April 2015…

65 jobs in

production

>1000 runs

per day

44 different

scheduled jobs

Page 83: Amazon EC2 Container Service Deep dive

Programming Assignments at Coursera

Page 84: Amazon EC2 Container Service Deep dive

The Security Challenge

Compiling and running untrusted, arbitrary code in

Amazon EC2

Would you like to compile and run C code from random

people on the Internet on your servers?

Page 85: Amazon EC2 Container Service Deep dive

What We Built: GrID

Patrick Hoesly (https://www.flickr.com/photos/zooboing/5665221326/) CC-BY-2.0

• Service + architecture for grading

programming assignments

• Builds on Amazon ECS and Iguazú

• Named for Tron’s “digital frontier”

• Backronym: Grading Inside Docker

Page 86: Amazon EC2 Container Service Deep dive

High-level GrID Architecture

Learners

GrID

Iguazú

S3 Bucket

ECS API

Grading

Machines

VPC

Firewalls

Production Acct GrID Grading Account