containers @ google

17
Google Confidential and Proprietary Victor Marmol ([email protected] ) Rohit Jnagal ([email protected] ) Let Me Contain That For You Containers @ Google SF Bay Area Large-Scale Production Engineering: Lightweight Containers Meetup February 20, 2014

Upload: vmarmol

Post on 02-Jul-2015

1.105 views

Category:

Technology


0 download

DESCRIPTION

Slides from our presentation at the SF Bay Area Large Scale Production Engineering meetup on Lightweight Containers.

TRANSCRIPT

Page 1: Containers @ Google

Google Confidential and Proprietary

Victor Marmol ([email protected])Rohit Jnagal ([email protected])

Let Me Contain That For You

Containers @ Google

SF Bay Area Large-Scale Production Engineering: Lightweight Containers MeetupFebruary 20, 2014

Page 2: Containers @ Google

Google Confidential and Proprietary

● Used to provide VM-like instances● High density (lower costs) and high performance● Fast to start● Migration is hard, but possible

Containers in the Wild

Linux Kernel

User 1 User 2 User 4User 3

Page 3: Containers @ Google

Google Confidential and Proprietary

I/O:CPU:MemSensitive Task Front End Task Back End Task

Alloc

BACKGROUND TASKS

System Daemons Batch workload Soaker workload

The Need for Isolation: A Shared Google Machine

Page 4: Containers @ Google

Google Confidential and Proprietary

● Container-aware tasks use asymmetric subcontainers● Provide different guarantees of quality of service● Overcommit resources to achieve high utilization● Early users, few namespaces, and near-zero overhead

Containers @ Google

Linux Kernel

Alloc 1 Task 2Task 1

Sub 1 Sub 4

Sub 2

Sub 1 Sub 3

Sub 2Sub 3

SS1

SS2

SS3Task 1 Task 2

SS4

Page 5: Containers @ Google

Google Confidential and Proprietary

Asymmetric Isolation

Isolating only certain resources (e.g., CPU but not memory).

CPU Memory Net

Container 1

Container 2

Container 3

Page 6: Containers @ Google

Google Confidential and Proprietary

Containers @ Google Today

● Historically○ 2004: No isolation○ 2006: Cgroups○ Now: Namespaces

● Primarily Linux cgroups + user-space policies and monitoring● We skipped VMs due to high overhead● Used everywhere: SaaS, PaaS, IaaS; Android, Chrome OS● Heterogeneous workloads: Latency, bandwidth, and priority● High task churn

Page 7: Containers @ Google

Google Confidential and Proprietary

Goals

● Isolation○ Tasks do not impact each other○ The behavior of a Task is the same regardless of what else is

on the machine

● Predictability○ Tasks behave the same each time they run○ Unless they are specifically configured to use "slack"

● Quality of Service○ Different tasks get different quality of resources

● Overcommitment○ Oversell machine resources within QoS guarantees

Page 8: Containers @ Google

Google Confidential and Proprietary

Open source containers stack based on Google’s.

github.com/google/lmctfy/

Provides the Container abstraction to higher levels by abstracting away the kernel interfaces.

Motivation● Existing code, systems, and design around containers● Problems with LXC

○ No abstraction (direct knob exposure)○ No easy way to access programmatically

lmctfy: Let Me Contain That For You

Page 9: Containers @ Google

Google Confidential and Proprietary

Objectives● Abstract away enforcement: separate policy from enforcement● Scalability and parallel access● Intent-based container specifications● Asymmetric isolation● Subcontainer support● Provides tiers of quality of service

System Layers● CL1

○ Container abstraction and enforcement○ Thin and light layer○ Current lmctfy

● CL2○ Sets policy (QoS, overcommitment)○ Higher level logic, monitoring, and control loops○ Stateful entity

lmctfy: Let Me Contain That For You

Page 10: Containers @ Google

Google Confidential and Proprietary

Current cgroup API is complicated with lots of knobs (each a cgroup file):

Common: 5+ filescgroup.clone_children cgroup.event_control cgroup.procs notify_on_release release_agent

CPU: 8+ filescpuacct.stat cpuacct.usage cpuacct.usage_percpu cpu.cfs_period_us cpu.cfs_quota_us cpu.rt_period_us cpu.rt_runtime_us cpu.shares cpu.stat

Memory: 12+ filesmemory.failcnt memory.force_empty memory.limit_in_bytes memory.max_usage_in_bytes memory.move_charge_at_immigrate memory.numa_stat memory.oom_control memory.pressure_level memory.soft_limit_in_bytes memory.stat memory.swappiness memory.usage_in_bytes memory.use_hierarchy

Cpuset: 12+ filescpuset.cpu_exclusive cpuset.cpus cpuset.mem_exclusive cpuset.mem_hardwall cpuset.memory_migrate cpuset.memory_pressure cpuset.memory_pressure_enabled cpuset.memory_spread_page cpuset.memory_spread_slab cpuset.mems cpuset.sched_load_balance cpuset.sched_relax_domain_level

+DiskIO+Net+...

lmctfy: Fine-tuned resource isolation

Page 11: Containers @ Google

Google Confidential and Proprietary

Initial version of lowest layer● Written entirely in C++● Delivered as a CLI and a C++ library (C and Go bindings soon)● Isolation for CPU, memory, and perf event● Full support for subcontainers● “Stateless” and lightweight● Initial support for namespaces, more to come in the next week.

Can be augmented with custom kernel patches● CPU latency and accounting● OOM priority

Supported configurations● Target configuration is well supported● Designed to be flexible, but we test on a limited set of them● More target configurations being added● Contributions to add more are welcome

Released 0.4.0 (This Week!)

Page 12: Containers @ Google

Google Confidential and Proprietary

message ContainerSpec { optional int64 owner = 1;

optional CpuSpec cpu = 2; optional MemorySpec memory = 3; optional DiskIoSpec diskio = 4; optional NetworkSpec network = 5; optional VirtualHost virtualhost = 6; ...}

message CpuSpec { optional ShedulingLatency scheduling_latency = 1; optional uint64 limit = 2; optional uint64 max_limit = 3; ...}

Create: “cpu:<limit:1000 max_limit:2000> memory:<limit:4096000 reservation:1024000>”

Container Specifications

Page 13: Containers @ Google

Google Confidential and Proprietary

Create: “cpu:<limit:1000 max_limit:2000 scheduling_latency:PRIORITY> memory:<limit:4096000 reservation:1024000>”

equivalent lxc cgroup config:lxc.cgroup.cpu.shares = 2048lxc.cgroup.cpu.cfs_period_us = 50000lxc.cgroup.cpu.cfs_quota_us = 10000lxc.cgroup.cpu.lat = 25.. cpu performance knobs ..lxc.cgroup.memory.limit_in_bytes = 4096000lxc.cgroup.memory.soft_limit_in_bytes = 1024000.. memory performance knobs ..

Cgroup Specifications

Page 14: Containers @ Google

Google Confidential and Proprietary

::containers::lmctfy::ContainerApi● Create● Get● Destroy● Detect● InitMachine

::containers::lmctfy::Container● Update● Run● Notifications● List (threads, PIDs, and subcontainers)● Stats● Pause/Resume● KillAll

CLI is a thin wrapper around the C++ API

C++ API

Page 15: Containers @ Google

Google Confidential and Proprietary

Path-like hierarchy of container names:Absolute: /parent/selfRelative: self when in /parent

Container Names

Container Name Refers To

/ The root top-level container

/sys The sys top-level container

/sys/sub The sub subcontainer of the sys top-level container

. or ./ The current container (current relative to the calling process)

.. The parent container (parent relative to the calling process)

./foo_container or foo_container

The foo_container subcontainer of the current container

/foo_container The foo_container top-level container

Page 16: Containers @ Google

Google Confidential and Proprietary

Towards Version 1.0● Improve VirtualHost support● Root file systems● Checkpoint restore● Support and target most major distros● Fully compatible with Docker’s use of containers

Higher Layer● Admission control and feasibility checks● Monitoring, notifications, and statistics● Tiers of quality of service guarantees

Contributions Welcome!

Roadmap

Page 17: Containers @ Google

Google Confidential and Proprietary

Repository: https://github.com/google/lmctfy/Mailing list: [email protected]

Victor Marmol: [email protected] Jnagal: [email protected]

Questions?