intro to cluster scheduler for linux containers
TRANSCRIPT
Introduction to cluster schedulers(for Linux Containers)
by Kumar GauravPresented in Kubernetes meetup, Bangalore,
25th July 2015
2
Requirements (of a cluster scheduler)
• Goals:• High resource utilization• User supplied placement constraints• Rapid decision making• “fairness” and business priority• Robust and always available
• Types of jobs to cater to:• Service: long running, so spend time on scheduling it with perfect fit• Batch: perform a computation and then finish, so point in time demand
Can you spot the differences in scheduling containers vs VMs?
3
Types of cluster schedulers• Monolithic: uses single, centralized scheduling for all jobs• Two level: single active resource manager that offers compute resources to
multiple parallel, independent “scheduler frameworks”• Shared state: using lock-free optimistic concurrency control
General philosophy: run mix of workload on same machine for efficiency
4
Taxonomy• Design issues to be addressed
• Partitioning the scheduling work: Load balancing, specialized schedulers• Choice of resources: subset or all cluster resources• Interference: internal competition (only in shared-state scheduler)• Allocation granularity: if job contains many tasks, choice of all-or-none• Cluster-wise behavior: fairness, common priority definition
• Monolithic schedulers• Single instance of scheduling code, applies same algo to all incoming jobs
• To support different scheduling policies, can provide multiple code paths
• 2 level schedulers• Static partition: may lead to fragmentation, and sub-optimal utilization• In Mesos, a centralized resource allocator dynamically partitions a cluster
• Shared-state schedulers• Grant each scheduler full access to entire cluster, compete in free-for-all• Once a scheduler makes placement decision, it updates shared copy of cell state in atomic
commit• Schedulers can have different policies. Fairness is not guaranteed
5
Cluster scheduler for Containers• Mesos, fleet, yarn, kubernetes, swarm, etc
• Makes a bunch of nodes appear as one big computer and deploy anything to your own private cloud; just like Docker, but across any number of nodes
• 2 level scheduling: resource and job
• Common traits• leaders, for coordination and scheduling• some service discovery component• some underlying cluster management tool (like zookeeper)• followers, for processing
6 6
7
1. Apache Mesos, Marathon for Docker
• Mesos• is a general purpose cluster management solution; distributed systems kernel• originally from Twitter, was initially targeted for Hadoop
• master• a node which does cluster management and resource scheduling• generally has mesos, marathon, chronos deployed. How to
• marathon• a framework built upon Apache Mesos• Web service for Container scheduling, FT, service discovery, elastic scaling, linking
and life cycle management• http://<master>:8080/apps for handling requests
• mesos-slave• a process which should run on all nodes• should have Docker engine running
https://mesosphere.com/docs/tutorials/launch-docker-container-on-mesosphere/
8
1. Marathon demo
9
2. Docker Swarm• Docker Swarm is native clustering for Docker, released late 2014• Create and access to a pool of Docker hosts using the full suite of Docker
tools• Advantage: any tool that already communicates with a Docker daemon
can use Swarm to transparently scale to multiple hosts• Scheduling control is decided by
• Filter (constraint | affinity | port | dependency | health)• strategy (spread | binpack | random)
• (node) Discovery is done by• swarm join: running a swarm agent on each host• Static file describing the cluster: a line per node• Using a registry: etcd/consul/zookeeper• Using a network subnet
• Primitive networking and storage support
10
11
3. Kubernetes
• Service for Container Cluster Management• Open Sourced by Google.
First announce in Google I/O, June 2014• Supports GCE, CoreOS, Azure,
Rackspace, vSphere• Used to manage Docker containers
as a default implementation• Written in go-lang, like Docker• Basic installation of 2 nodes:
• Master• Node (formerly, minion)
• Uses etcd for persistent store & service registry
https://github.com/GoogleCloudPlatform/kubernetes
12
3. Kubernetes
• Kubernetes is:• lean: lightweight, simple, accessible• portable: public, private, hybrid, multi cloud• extensible: modular, pluggable, hookable, composable• self-healing: auto-placement, auto-restart, auto-replication
• Kubernetes has inherited shared-state scheduler: multiple master nodes
• All of the software on the master is stateless, and uses etcd as its backing store. etcd can be configured in a multi-master set up
“HA set up for the master would be to have three different master nodes, with etcd set up multi-master between them, and a load balancer set up in front of the API server to balance traffic between clients and the masters.”
13
Comparison• Compare (a) Swarm, (b) Mesos, (c) Kubernetes
• Swarm: scheduler busyness & wait time increases linearly with jobs. Good for very small clustersEase of use
• Marathon: achieves fairness by alternatively offering all available cluster resources to different schedulers. Assumption: all resources are available frequently and scheduler decisions are quick
• Kubernetes: Avg job wait times are comparable to Mesos in normal scaleGCE (google container engine) is hosted KubernetesExpected to perform better on huge scale