scheduling containers on amazon ecs
TRANSCRIPT
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Dan Gerdesmeier, Sr. Software Development Engineer
June 21st, 2016
Scheduling Containers on
Amazon ECS
What is Amazon ECS?
Amazon EC2 Container Service (ECS) is a highly scalable,
high performance container management service. You
can use Amazon ECS to schedule the placement of
containers across your cluster. You can also integrate your
own scheduler or third-party scheduler to meet business
or application specific requirements.
Agenda
Introduction to Scheduling
Resource Management in ECS
Scheduling in ECS
Demo
Introduction to Scheduling
Why is Scheduling Important?
Utilization and Cost
Application Performance
Placement and Launch Time
Application Availability
Types of Schedulers
Monolithic Schedulers
Two-Level Schedulers
Shared-State Schedulers
Monolithic Scheduler
c3.xlarge
c3.xlarge
c3.xlarge
r3.8xlarge
r3.8xlarge
r3.8xlarge
c3.8xlarge
c3.8xlarge
c3.8xlarge
c3.4xlarge
c3.4xlarge
c3.4xlarge
r3.2xlarge
r3.2xlarge
r3.2xlarge
Scheduler
Monolithic Scheduler
Pros
• Entire State
• Consistent View
• Easy to Understand
Cons
• Head-of-Line blocking
• Difficult to maintain
• Single Point of Failure
Two-Level Scheduling
c3.xlarge
c3.xlarge
c3.xlarge
r3.8xlarge
r3.8xlarge
r3.8xlarge
c3.8xlarge
c3.8xlarge
c3.8xlarge
c3.4xlarge
c3.4xlarge
c3.4xlarge
r3.2xlarge
r3.2xlarge
r3.2xlarge
Resource Manager
Scheduler 1 Scheduler 2 Scheduler 3
Two-Level Scheduling
Pros
• Avoids Contention
• Enables Multiple
Schedulers
Cons
• Partial State
• Difficult to Repartition
• Difficult to Achieve High
Utilization
Shared State
c3.xlarge
c3.xlarge
c3.xlarge
r3.8xlarge
r3.8xlarge
r3.8xlarge
c3.8xlarge
c3.8xlarge
c3.8xlarge
c3.4xlarge
c3.4xlarge
c3.4xlarge
r3.2xlarge
r3.2xlarge
r3.2xlarge
Resource Manager
Scheduler 1 Scheduler 2 Scheduler 3
c3.xlarge
c3.xlarge
c3.xlarge
r3.8xlarge
r3.8xlarge
r3.8xlarge
c3.8xlarge
c3.8xlarge
c3.8xlarge
c3.4xlarge
c3.4xlarge
c3.4xlarge
r3.2xlarge
r3.2xlarge
r3.2xlarge
Resource Manager
Scheduler 1 Scheduler 2 Scheduler 3
Shared State Scheduling
Pros
• Full State
• Enables Multiple
Schedulers
• No Head-Of-Line
Blocking
Cons
• Contention can Slow
Progress
• Multiple Copies of State
• Stale Information
Tiered Scheduling
c3.xlarge
c3.xlarge
c3.xlarge
r3.8xlarge
r3.8xlarge
r3.8xlarge
c3.8xlarge
c3.8xlarge
c3.8xlarge
c3.4xlarge
c3.4xlarge
c3.4xlarge
r3.2xlarge
r3.2xlarge
r3.2xlarge
Resource Manager
Scheduler 1
Resource Management
Maintains Available Resources
Tracks Resource Changes
Accepts Resource Requests
Guarantees Accuracy and Consistency
What is a Resource Manager?
CPU
Memory
Ports
Disk Space
Disk IOPS
Network Bandwidth
Resources
ECS
Agent
Docker
Task
Container Instance
Container
ECS Agent
Task
Container
Available Resources
register-container-instance --total-resources
[
{
“name” : “cpu”,
“type” : “integerValue”,
“integerValue” : 2048
},
…
]
Modifying Exposed Resources
Requesting Resources
register-task-definition -–family demo –-cli-input-json
{
"name": "simple-demo",
"image": "my-demo",
"cpu": 10,
"memory": 500,
"portMappings": [
{
"containerPort": 80,
"hostPort": 80
}
...
}
Accepting Resource Requests
Tasks
Shared Data Volume
Containers
launchContainer
Instance
Volume Definitions
Container Definitions
Starting a Task
API
User /
Scheduler
StartTask
Starting a Task
API
User /
Scheduler
StartTask
Cluster Management Engine
Starting a Task
API
User /
Scheduler
StartTask
Cluster Management Engine
Agent Communication
Starting a Task
API
User /
Scheduler
StartTask
Cluster Management Engine
Agent Communication
Docker
Container Instance
ECS Agent
Task
Container
WebSocket
Starting a Task
API
User /
Scheduler
StartTask
Cluster Management Engine
Agent Communication
Docker
Task
Container Instance
Container
ECS Agent
Task
Container
SubmitStateChange
Tracking Resource Changes
Terminated Task
API
User /
Scheduler
StartTask
Cluster Management Engine
Agent Communication
Docker
Task
Container Instance
Container
ECS Agent
SubmitStateChange
Missing Container Instance
API
User /
Scheduler
StartTask
Cluster Management Engine
Docker
Task
Container Instance
Container
ECS Agent
?
Agent Communication
Terminated Container Instance
API
User /
Scheduler
StartTask
Cluster Management Engine
Agent Communication
Termination
Notifier
Docker
Task
Container Instance
Container
ECS Agent
Guaranteeing Accuracy and
Consistency
Amazon ECS under the Hood
IDN-1 IDN IDN+1 IDN+2 IDN+3 IDN+4 IDN+5
IDN+6
IDN+5
WRITE
READ
Amazon ECS under the Hood
IDN-1 IDN IDN+1 IDN+2 IDN+3 IDN+4 IDN+5
IDN+6IDN+3
IDN+5IDN+2
WRITE WRITE
READREAD
Scheduling
in ECS
c3.xlarge
c3.xlarge
c3.xlarge
r3.8xlarge
r3.8xlarge
r3.8xlarge
c3.8xlarge
c3.8xlarge
c3.8xlarge
c3.4xlarge
c3.4xlarge
c3.4xlarge
r3.2xlarge
r3.2xlarge
r3.2xlarge
ECS Schedulers
Batch Jobs
ECS Task scheduler
Run tasks once
Batch jobs
RunTask (random)
StartTask (placed)
Long-Running Apps
ECS Service scheduler
Health management
Scale-up and scale-down
AZ aware
Grouped Containers
Custom Schedulers
1. Calls the ECS List* and Describe* API operations to
determine the current state of the cluster.
2. Selects one (or more) container instances according to
the logic implemented.
3. Calls StartTask API to start a task on the selected
container instance.
Scheduling API Examples
start-task --cluster default--task-definition demo:1--container-instance be21c208-1554-4e38-b9e2-fe236bf0a555
run-task --task-definition demo:1
create-service--task-definition demo:1--service-name demo-service--desired-count 1--deployment-configuration
maximumPercent=200, minimumHealthyPercent=100
Integration with third party schedulers
ECS allows you to use third party schedulers, e.g.
Marathon and Chronos
Integration via ECS API
For Mesos schedulers, the ECSSchedulerDriver interprets
the command given when scheduling jobs with Mesos and
starts a task with TaskDefinition family:revision
https://github.com/awslabs/ecs-mesos-scheduler-driver
Multiple Schedulers on the Same Cluster
Amazon ECS Service Scheduler
Service Scheduling Responsibilities
Determine Desired State
Check Against Current State
Perform Action
Discovering Differences
Deployment Status Desired Pending Running
ecs-svc/1 PRIMARY 5 0 0
Minimum Healthy Maximum Healthy
50% 200%
Steady State
Determine
Placement Options
Deploy Task
Service State Machine
RUNNING == DESIRED
RUNNING != DESIRED &&
STATUS == PRIMARY
ALL_RUNNING < MAX_HEALTHY
Discovering Differences
Deployment Status Desired Pending Running
ecs-svc/2 PRIMARY 10 0 0
ecs-svc/1 ACTIVE 5 0 5
Minimum Healthy Maximum Healthy
50% 200%
Steady State
Determine
Placement Options
Deploy Task
RUNNING == DESIRED
RUNNING != DESIRED &&
STATUS == PRIMARY
ALL_RUNNING < MAX_HEALTHY
Service State Machine
Clear Deployment Kill Task
ALL_RUNNING >
MAX_HEALTHY
RUNNING != DESIRED &&
STATUS == ACTIVE
Mark Inactive
ALL_RUNNING == 0
Steady State
Determine
Placement Options
Deploy Task
RUNNING == DESIRED
RUNNING != DESIRED &&
STATUS == PRIMARY
ALL_RUNNING <
MAX_HEALTHY
Service State Machine
Clear Deployment Wait for Drain
ALL_RUNNING >
MAX_HEALTHY
RUNNING != DESIRED &&
STATUS == ACTIVE
Mark Inactive
ALL_RUNNING == 0
Deregister & Kill
Task
Register ELB
Other Considerations
Task AutoScaling
Availability Zone Balancing
Permissions and Errors
Task Health
Demo
Thank You!