container & kubernetes
TRANSCRIPT
Container & Kubernetes
Written by Ted Jung ([email protected])(Cloud Native Engineer)
I. Base Techs(container)FSCGroupsNamespacesCOW
II. Kubernetes (service networking)
What is Container?Lightweight VM. But, It’s not quite like a VM
1 Uses the host kernel2 Does not need to boot a different OS3 Does not have its own modules4 Does not need init as PID 1
It’s just normal processes on a host machine
What is Container?Containers wrap a pieces of software in a complete filesystem that contains everything it needs to run:• Code,• Runtime,• System tools• System librariesAnything you can install on a server
This guarantees that it will always run the same regardless of the environment where it is running on.
VM vs. Container
Infrastructure
Operating system
Hypervisor
Guest OS
Guest OS
Guest OS
Bins/Libs
App1
Bins/Libs
App2
Bins/Libs
App3
Infrastructure
Operating system
Docker Engine
Bins/Libs
App1
Bins/Libs
App2
Bins/Libs
App3
Share the kernel with other containersRunning as isolated processes in user spaceDocker containers are not tied to any specific infrastructure
What is Docker?
lmctfyopenvzzonelibcontainerlxcrkt
Why Docker?
• Easy to use : Simple and accessible tooling
• High degree of reuse and extensibility
: stackable file system
Before go ahead further..
FSCgroupsNamespaces
Base tech of container(AUFS)
Group of branches by order- a branch (=a single directory)- is stored in a directory in the hostat least,- a single branch for Read-only many Read-Write branches Read-only
Read-write
Read-writeRead-write
Base tech of container(AUFS)
Mount pointAUFS, mount-point of a container is:/var/lib/docker/aufs/mnt/$CONTAINER_ID/
It is only mounted when the container is running
AUFS branches(read-only & read-write) are in:/var/lib/docker/aufs/diff/$CONTAINER_OR_IMAGE_ID
Base tech of container(AUFS)
e.g. Create Container
/proc/mount/sys/fs/aufs/si_XXXX/br*
/var/lib/docker/aufs/diff/XXXContainer = a group of branches
host container
Base tech of container(AUFS)A file (container / host)
Delete container
container
Host
Base tech of container(AUFS)
Docker V1.10
: Content addressable storage model
Ubuntu: 15.04 Image
C84bfc126a2 188MB
D14bfc54ea1 194.5KB
c80179960767 1.895KB
6d45a3841788 0 B
Thin R/W layer Container layer
Image layer (R/O)
- Docker storage driver is:enabling and managing both image layer & container layer.stacking layers , providing a single unified view
- Location: /var/lib/docker/.
Ubuntu: 15.04 Image
C84bfc126a2 188MB
D14bfc54ea1 194.5KB
c80179960767 1.895KB
6d45a3841788 0 B
Thin R/W layer
• Security• Avoid ID Collisions• Guarantees data integrity
Random UUID
CryptographicContent hashes
Storage DriverAUFS BtrfsDevice mapperOverlayFSZFS
1. Search through the image layers top-down approach
2. Perform “copy-up” operation copies the file thin writable layer
3. Modify the copy of the file
File modification(create, delete, update) steps..
Ubuntu: 15.04 Image
C84bfc126a2 188MB
D14bfc54ea1 194.5KB
c80179960767 1.895KB
6d45a3841788 0 B
Thin R/W layer
Ubuntu: 15.04 Image
C84bfc126a2 188MB
D14bfc54ea1 194.5KB
c80179960767 1.895KB
6d45a3841788 0 B
Thin R/W layer
6d45a3841788 2B
Modification2B on 6d~
copy-up
modification
Developed by Rohit Seth in 2006 under the name “Process Containers”Kernel capability to limit, account(metering) and isolate resourcesCPU, Memory, Disk I/O, Network
Base tech of container(CGroups)
Cgroup controllers Memory controller CPUset controller CPUaccounting controller CPUscheduler controller Devices controller I/O controller for block devices Freezer Network Class Controller
reducing resource contention and increasing predictability in performance
Controller Description
memoryAllows for setting limits of RAM and resource usage and querying cumulative usage of all processes in the group
cpuset Binding of processes within a group to a set of CPUs and controlling migration between CPUs
cpuacct Information about CPU usage for a group of processes
cpu Controlling the prioritization of processes in the group
devices Access control lists on character and block devices
Base tech of container(CGroups)
Base tech of container(CGroups)
Cgroups(control groups)A ‘cgroups’ associate a set of tasks with a set of parameters for one or more subsystemsA ‘subsystem’ is a module that makes use of the task grouping facilities provided by cgroups to treat groups of tasks in particular waysA ‘subsystem’ is typically a “resource controller” that schedules a resource and applies per-cgroup limitsA ‘hierarchy’ is a set of cgroups arranged in a tree, such that every task in the system is in exactly one of the cgroups in the hierarchy and a set of subsystems; each subsystem has system-specific state attached to each cgroups in the hierarchy. Each hierarchy has an instance of the cgroups virtual filesystem associated with it.
Cgroup subsystem-Isolation and special controls: cpuset, namespace, freezer, device, checkpoint/restart-Resource control: cpu(scheduler), memory, disk io, network
Base tech of container(Namespace)
handle six items in table belowController Description
PID Processes (Process ID)NET Network Interface/ Iptables/ Routing Tables/ SocketsMNT Root File SystemUTS HostnameIPC Inter Process Communication
USER UID/GID, security improvement
Base tech of container(Namespace)
Namespaces are created with system call “clone()”Namespaces are materialized by pseudo-files in /proc/<pid>/ns
Base tech of container(Summarize)
Why do we need CGroups?SLA Management: reduce resource contention and increase predictability in performanceLarge Virtual Consolidation: prevent single or group of virtual machines monopolizing resources or impacting other env
Cgroups-Limit use of resources
Namespace-Limits what resources can be seenNamespace provide processes with their own view of system Docker
Linux Kernel
namespaces cgroups
libcontainer
Base tech of container(COW)Everyone has a single shared copy of the same data until it’s over written, and then a copy is made.
Docker uses COW, which essentially means that every instance of your docker image uses the same files until one of them needs to change a file.
K8S terms
ReplicationControllers
Dynamically manage(create, kill, etc) the lifecycle of pods(Scaling up/down, rolling updates)
Clusters
Services• abstraction• a REST object• a logical set of
pods & a policy
Servicespod pod pod
pod pod pod
Pods• a collocated
group of Docker containers with shared volumes
• each of pods are born and die
container container
server server server
Deployable unit• Created• Scheduled• Managed
Pool ofKubernetesresources
IPtables Rule
containercontainer
endpoints
K8S terms{ “kind”: ”Service”, “apiVersion”:”v1”, “metadata”:{ “name”: ”my-service” }, “spec”:{ “selector”: { “app”: ”MyApp” }, “ports”:[{ “protocol”: ”TCP”, “port”:”80”, “targetPort”:9376” }] } }
service
pod pod
endpoint
Selector = “app: MyApp”
Cluster IP my-service
targetPort:9376
Serviceproxy
K8S terms (routing mode of service traffic)
Iptables rule
service
endpoint
endpoint
endpoint
Kube-proxy
Master
mode: userspace
pod
redirect
Iptables rule
service
endpoint
endpoint
endpoint
Kube-proxy
Master
mode: iptables
pod
redirect
• Fast• ReliableBut,• No retry
How K8S worksKubernetes Master
Worker Node
API server
ETCD
Scheduler
Kubernetes controller manager server
kublet Kube-proxyMaster’s status is stored
Validates and configuresPodServiceReplication controller
REST operations
Container manifest: YAML
(description of pod)Services
pod pod pod
8080
4001
8080
8080
Schedule pods to worker nodesSynchronize pod status
K8S Service Traffic Flows
rc:3 rc:1 rc:2
Service 2
(…)
Service 3
(back-end)
kube-proxy kube-proxy
Service 1
(front-end)
kube-proxy
request
Cluster-domain : 10.100.0.10 (Service_Cluster_IP_Range, virtual IP)Cluster-pool: 192.168.0.0/16
ClusterDomain
ClusterPool
skyd
ns
skyd
ns
podcontain
er
pod podcontain
ercontain
er
pod pod podcontain
ercontain
ercontain
er
K8S Service Traffic Flows (e.g.)
Then, what is Kube-proxy?
Node #2Node #1
Kube-proxy
podcontainer
podcontainer
Iptables rule
Watches kubernetes masterto add and remove the objects- Service- Endpoints
Can do simple TCP,UDP stream forwardingRound Robin TCP, UDP forwardingVIP is managed by kube-proxyWatch all servicesUpdates iptables after backend changingTranslate ServiceIP to Pod IP
Master ETCD Cluster
API Server ETCDCluster statusCurrent configuration
SkyDNSSkyDNS in Kubernetes?Kubernetes offers a DNS cluster addon, which most of the supported environments enabled by default.SkyDNS is a DNS service, with some custom logic to slave it to the Kubernetes API Server
Create Service DNS name is mapped to the service
Virtual IP address is assigned to a service
Kubelet –v=5 –address=0.0.0.0 –port=10250 –hostname_override=105.144.47.24 –api_servers=105.*.*.23:8080 –healthz_bind_address=0.0.0.0 –healthz_port=10248 –network_plugin=calico –cluster-domain=cluster.local –cluster-dns=10.100.0.10 –logtostderr=true
SkyDNS(cont..)
ETCD in pod(DNS record)
SkyDNS in pod(DNS server)
Kube2SKY in pod
(bridging between Kubernetes and
ETCD)
Kubernetes(kubelet)
Pods in running
Kubernetes(Master)
Service info is published/written into etcdThen,SkyDNS be able to retrieve the name of service
Kublet pretends itself to a DNS server
Info of Service is pulledfrom master into SkyDNSe.g. what services has changed?
RetrieveSearch
QueryUpdate
Thank You