intro to cluster scheduler for linux containers

Introduction to cluster schedulers(for Linux Containers)

by Kumar GauravPresented in Kubernetes meetup, Bangalore,

25th July 2015

http://www.meetup.com/Bangalore-Kubernetes-Meetup/events/223697109/

http://www.meetup.com/Bangalore-Kubernetes-Meetup/events/223697109/

2

Requirements (of a cluster scheduler)

• Goals:• High resource utilization• User supplied placement constraints• Rapid decision making• “fairness” and business priority• Robust and always available

• Types of jobs to cater to:• Service: long running, so spend time on scheduling it with perfect fit• Batch: perform a computation and then finish, so point in time demand

Can you spot the differences in scheduling containers vs VMs?

3

Types of cluster schedulers• Monolithic: uses single, centralized scheduling for all jobs• Two level: single active resource manager that offers compute resources to

multiple parallel, independent “scheduler frameworks”• Shared state: using lock-free optimistic concurrency control

General philosophy: run mix of workload on same machine for efficiency

4

Taxonomy• Design issues to be addressed

• Partitioning the scheduling work: Load balancing, specialized schedulers• Choice of resources: subset or all cluster resources• Interference: internal competition (only in shared-state scheduler)• Allocation granularity: if job contains many tasks, choice of all-or-none• Cluster-wise behavior: fairness, common priority definition

• Monolithic schedulers• Single instance of scheduling code, applies same algo to all incoming jobs

• To support different scheduling policies, can provide multiple code paths

• 2 level schedulers• Static partition: may lead to fragmentation, and sub-optimal utilization• In Mesos, a centralized resource allocator dynamically partitions a cluster

• Shared-state schedulers• Grant each scheduler full access to entire cluster, compete in free-for-all• Once a scheduler makes placement decision, it updates shared copy of cell state in atomic

commit• Schedulers can have different policies. Fairness is not guaranteed

5

Cluster scheduler for Containers• Mesos, fleet, yarn, kubernetes, swarm, etc

• Makes a bunch of nodes appear as one big computer and deploy anything to your own private cloud; just like Docker, but across any number of nodes

• 2 level scheduling: resource and job

• Common traits• leaders, for coordination and scheduling• some service discovery component• some underlying cluster management tool (like zookeeper)• followers, for processing

7

1. Apache Mesos, Marathon for Docker

• Mesos• is a general purpose cluster management solution; distributed systems kernel• originally from Twitter, was initially targeted for Hadoop

• master• a node which does cluster management and resource scheduling• generally has mesos, marathon, chronos deployed. How to

• marathon• a framework built upon Apache Mesos• Web service for Container scheduling, FT, service discovery, elastic scaling, linking

and life cycle management• http://<master>:8080/apps for handling requests

• mesos-slave• a process which should run on all nodes• should have Docker engine running

https://mesosphere.com/docs/tutorials/launch-docker-container-on-mesosphere/

https://mesosphere.com/2014/07/17/mesosphere-package-repositories/

8

1. Marathon demo

9

2. Docker Swarm• Docker Swarm is native clustering for Docker, released late 2014• Create and access to a pool of Docker hosts using the full suite of Docker

tools• Advantage: any tool that already communicates with a Docker daemon

can use Swarm to transparently scale to multiple hosts• Scheduling control is decided by

• Filter (constraint | affinity | port | dependency | health)• strategy (spread | binpack | random)

• (node) Discovery is done by• swarm join: running a swarm agent on each host• Static file describing the cluster: a line per node• Using a registry: etcd/consul/zookeeper• Using a network subnet

• Primitive networking and storage support

11

3. Kubernetes

• Service for Container Cluster Management• Open Sourced by Google.

First announce in Google I/O, June 2014• Supports GCE, CoreOS, Azure,

Rackspace, vSphere• Used to manage Docker containers

as a default implementation• Written in go-lang, like Docker• Basic installation of 2 nodes:

• Master• Node (formerly, minion)

• Uses etcd for persistent store & service registry

https://github.com/GoogleCloudPlatform/kubernetes

12

3. Kubernetes

• Kubernetes is:• lean: lightweight, simple, accessible• portable: public, private, hybrid, multi cloud• extensible: modular, pluggable, hookable, composable• self-healing: auto-placement, auto-restart, auto-replication

• Kubernetes has inherited shared-state scheduler: multiple master nodes

• All of the software on the master is stateless, and uses etcd as its backing store. etcd can be configured in a multi-master set up

“HA set up for the master would be to have three different master nodes, with etcd set up multi-master between them, and a load balancer set up in front of the API server to balance traffic between clients and the masters.”

13

Comparison• Compare (a) Swarm, (b) Mesos, (c) Kubernetes

• Swarm: scheduler busyness & wait time increases linearly with jobs. Good for very small clustersEase of use

• Marathon: achieves fairness by alternatively offering all available cluster resources to different schedulers. Assumption: all resources are available frequently and scheduler decisions are quick

• Kubernetes: Avg job wait times are comparable to Mesos in normal scaleGCE (google container engine) is hosted KubernetesExpected to perform better on huge scale

intro to cluster scheduler for linux containers

Software

docker mesos

resource scheduling

level scheduling

cluster scheduler goals

entire cluster

container scheduling

centralized scheduling

docker swarm docker