of running kubernetes for publication deep dive: the value · 2018-09-05 · deep dive: the value...
TRANSCRIPT
#vmworld
Deep Dive: The Valueof Running Kubernetes
on vSphereFrank Denneman, VMware, Inc.
@FrankDennemanMichael Gasch, VMware Global Inc.
@embano1
CNA1553BU
#CNA1553BU
VMworld 2018 Content: Not for publication or distribution
Disclaimer
2©2018 VMware, Inc.
This presentation may contain product features orfunctionality that are currently under development.
This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined.
VMworld 2018 Content: Not for publication or distribution
Agenda
3©2018 VMware, Inc.
Kubernetes Primer
Customer Scenario – Making the Case for Bare Metal
Experience Report – Kubernetes on Bare Metal vs. vSphere
QnA
VMworld 2018 Content: Not for publication or distribution
4©2018 VMware, Inc.
Kubernetes Primer
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 5©2018 VMware, Inc.
Google Search(late 1990s)
The Origin of Kubernetes
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 6©2018 VMware, Inc.
Revolutionizing the Way we build Distributed(cloud-native) Applications today.
Google Search Pillars:• Commodity• Fault-Tolerant Software• Fraction of the Cost from High-End Servers
The Origin of KubernetesGoogle Search
Source: https://ai.google/research/pubs/pub49VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 7©2018 VMware, Inc.
Platform Engineering Responsibilities
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 8©2018 VMware, Inc. CONFIDENTIAL
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 9©2018 VMware, Inc.
“We must treat the Datacenter itself as one massive Warehouse-scale Computer.”
The Origin of KubernetesThe Datacenter as a Computer
SSH
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 10©2018 VMware, Inc.
Google Search(late 1990s)
Borg(~2003)
The Origin of Kubernetes
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 11©2018 VMware, Inc.
Google Search(late 1990s)
Borg(~2003)
Cgroups(2007)
The Origin of Kubernetes
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 12©2018 VMware, Inc.
Google Search(late 1990s)
Borg(~2003)
Cgroups(2007)
Omega(~2012)
The Origin of Kubernetes
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 13©2018 VMware, Inc.
The Origin of KubernetesContainers become Mainstream
In Search for a Common Language
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 14©2018 VMware, Inc.
The Origin of KubernetesSo what is a Container, really?
Kernel Mode
Cgroups
Namespaces
Security Capabilities
Scheduler
Syscall
task_struct
…
Scheduling Entity (se)
“running”
syscall.Exec(ENTRYPOINT/CMD)*
A Structure in Kernel Memory. The Kernel has no Notion of a “Container”. It’s yet another Executable.
User Mode
Docker Engine
ContainerCreate()
* After Container Sandbox Initialization(nsenter.go/nsexec.c)
sched_classfair.c (CFS)
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 15©2018 VMware, Inc.
Google Search(late 1990s)
Borg(~2003)
Cgroups(2007)
Omega(~2012)
Docker(2013)
The Origin of Kubernetes
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 16©2018 VMware, Inc.
Google Search(late 1990s)
Borg(~2003)
Cgroups(2007)
Omega(~2012)
Docker(2013)
The Origin of Kubernetes
Kubernetes(2014)
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 17©2018 VMware, Inc.
„Kubernetes is an open-source System for automating Deployment, Scaling, and
Management of containerized Applications.”
The Origin of KubernetesContainer Orchestration
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 18©2018 VMware, Inc.
Kubernetes Cluster
KubernetesHigh-Level Architecture
Infrastructure(Compute, Storage, Networking)
Control Plane Worker
Pod Pod Pod Pod PodAPI
Kubernetes Cloud Provider
VMworld 2018 Content: Not for publication or distribution
19©2018 VMware, Inc.
Customer ScenarioMaking the Case for Bare Metal
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 20©2018 VMware, Inc.
ABC Inc. is the Leader in Manufacturing Wayback Machines
Enterprise IT Organization with separate Infrastructure, Linux/Middleware and Development Teams (Silos)
>90% standardized and virtualized on VMware vSphere
Going through Digital Transformation to become more Customer and Feedback driven• Need to develop (iterate) faster with an agile Approach
Technical Vehicle: Containers and “cloud-native” Application Architectures• Kubernetes as the Framework to build and run these new Applications• Embrace and contribute to Open Source Software
Meet ABC Inc.
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 21©2018 VMware, Inc.
Linux Team at ABC Inc. decided to deploy Kubernetes on Bare Metal
Justification:• New (cloud-native) Applications don’t need vSphere Features like HA and vMotion• Containers are more lightweight, replacing VM’s and the Hypervisor• Kubernetes provides Hypervisor Functionality, e.g. Resource Management and HA• Virtualization reduces Performance of containerized Applications• Reduce Complexity and Costs by eliminating the Hypervisor from the Stack• IT Infrastructure not agile enough (no Self-Service)
ABC Inc.’s vSphere Team reached out to VMware for Help
ABC Inc.’s Decision to go Bare Metal
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 22©2018 VMware, Inc.
Back to 2005
merchoid.com
VMworld 2018 Content: Not for publication or distribution
23©2018 VMware, Inc.
Experience ReportKubernetes on Bare Metal vs. vSphere
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 24©2018 VMware, Inc.
Day 0 Planning and “Green Lights”
Day 1 Experiences with first Deployments
Day 2 Container and Cluster Sprawl
Day 3 Maintenance & Availability
Terminology
VMworld 2018 Content: Not for publication or distribution
25©2018 VMware, Inc.
Day 0Planning and “Green Lights”
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 26©2018 VMware, Inc.
Day 0Planning and “Green Lights”
Kubernetes Cluster
Infrastructure(Compute, Storage, Networking)
Cloud Provider(Custom)
External Dependencies
DNS DBs IPAM
Images CA Auth
Secrets Monitoring Logging
CustomIntegrations
Label: AZ=AZ-1VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 27©2018 VMware, Inc.
Day 0Realization: Managing Bare Metal Systems is hard
VMworld 2018 Content: Not for publication or distribution
28©2018 VMware, Inc.
How vSphere Can Help
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 29©2018 VMware, Inc.
Average Time to get HW in DC – Unpredictable Process
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 30©2018 VMware, Inc.
Average Time to get Hardware in Data Center
86 Days
#CNA1553BU
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 31©2018 VMware, Inc.
Hardware CompatibilityDecoupling the OS from the Hardware reduces operational Overhead
a simple NIC revision change can directly impact the Kubernetes host
Virtualized hardware decouples the OS from the underlying hardware.Hardware abstraction reduces operational overhead for supported firmware versions of components.
Configuration management done at the physical layer (firmware, drivers, etc). (Drift?)(Supported?)
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 32©2018 VMware, Inc.
Non Disruptive PatchingvMotion Workload away for Hardware, Firmware or Driver Update
a simple NIC revision change can directly impact the Kubernetes host
Need to Patch, vMotion workloadNo disruption
Need to patch – Kill workload
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 33©2018 VMware, Inc.
Kill Doesn’t matter for Stateless WorkloadsETCD isn’t stateless, and what about top 10 Workloads in Containers today?
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 34©2018 VMware, Inc.
Strong Security IsolationStrong Isolation between Workloads with efficient Resource Usage
a simple NIC revision change can directly impact the Kubernetes host
VMs provides strong isolation between guest, allowing multi tenancy. Efficient use of resources
Containers are processes in Linux Kernel, security concerns can lead to reduced resource utilization
Tenant A Tenant B
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 35©2018 VMware, Inc.
Modern DCs operate various workloads in different packaging formats
vSphere provides unified platform for these workloads
Use your current tool and skillset to manage this workloads
Focus on creating value
Functional Use of HardwareGeneral vs Dedicated
general purpose allows mixed workloads and superior resource utilization
dedicated hardware to a particular function hinders resource optimization
#CNA1553BU
VMworld 2018 Content: Not for publication or distribution
36©2018 VMware, Inc.
Day 1Experiences with first Deployments
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 37©2018 VMware, Inc.
Physical Host
Day 1In the old Bare Metal Days (Pre-Virtualization Era)
Kernel
App
M
Hardware
16 Cores 128GB RAID NIC-Teaming
G G W W W
NUMA sysctl nice IRQ-Balance
Bins/Libs
Almost exclusive Access to Host Resources for this App (1:1)
Best Practices for this Deployment Type were developed
App uses Host Information for Runtime Tuning
Downside: Utilization & Agility
M
G
W
OS Thread: main()
OS Thread: GC
OS Thread: Worker/PoolVMworld 2018 Content: Not for publication or distribution
#CNA1553BU 38©2018 VMware, Inc.
Physical Host
Day 1Containers and Kubernetes on Bare Metal to the Rescue?
Kernel
Hardware
64 Cores(HT)
384GB RAID NIC-Teaming
NUMA sysctl Cgroups IRQ-Balance
Container Runtime
Kubelet
Not all Runtimes are Cgroup-aware!
How to tune per Workload?
Resource Contention and Workload Interference!
Isolation (Security)?
Utilization & Agility kube-scheduler
Node: BM001Capacity:
cpus: 64memory: 384GB
Allocatable:cpus: 60memory: 360GB
How much to reserve vs. Waste?
VMworld 2018 Content: Not for publication or distribution
41©2018 VMware, Inc.
How vSphere Can Help
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 42©2018 VMware, Inc.
Hyperthreading
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 43©2018 VMware, Inc.
Hyperthreading in vSphere
VMworld 2018 Content: Not for publication or distribution
‹#› 44©2018 VMware, Inc.
Consistent performance is obtained by avoiding NUMA boundaries
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 45©2018 VMware, Inc.
NUMA Architecture
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 46©2018 VMware, Inc.
NUMA Architecture
VMworld 2018 Content: Not for publication or distribution
47©2018 VMware, Inc.
Day 2Container and Cluster Sprawl
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 48©2018 VMware, Inc.
Day 2Container and Cluster Sprawl
Org
aniz
atio
n
EnvironmentVMworld 2018 Content: Not for publication or distribution
#CNA1553BU 49©2018 VMware, Inc.
Day 2Container and Cluster Sprawl
Org
aniz
atio
n
EnvironmentVMworld 2018 Content: Not for publication or distribution
#CNA1553BU 50©2018 VMware, Inc.
Day 2Container and Cluster Sprawl
Imb
alan
ce
CostVMworld 2018 Content: Not for publication or distribution
51©2018 VMware, Inc.
How vSphere Can Help
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 52©2018 VMware, Inc.
Multi-Tenancy
general purpose allows mixed workloads and superior resource utilization
VM
vSphere Cluster
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 53©2018 VMware, Inc.
Multi-TenancyIncrease in Utilization: Scale out & redistribute
general purpose allows mixed workloads and superior resource utilization
VM
vSphere Cluster
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 54©2018 VMware, Inc.
Resource DistributionUse Per-VM Reservations or Resource Pool Structures
VMworld 2018 Content: Not for publication or distribution
55©2018 VMware, Inc.
Day 3Maintenance & Availability
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 56©2018 VMware, Inc.
Day 3Maintenance & Availability (Control Plane)
Admission Control and Failover Capacity (MTTR)?Proactive HA?Impact of Host Maintenance/
Failure on Control Plane?
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 57©2018 VMware, Inc.
Day 3Maintenance & Availability (Workloads)
Kubernetes Control Plane
Controller Manager Scheduler
“QA”“Dev” “Prod” “QA”
4 CPUs 4 CPUs 4 CPUs
“Prod” “QA”“Dev”
2 CPUs 1 CPU 1 CPU 2 CPUs 1 CPU1 CPU 1 CPU
Only considering beta/stable and in-tree Kubernetes FeaturesDisruptive Pod Priority & Preemption, incl. Priority Queue, beta in v1.11
Example assumes shared File System for Persistent Volumes
Queue: “QA”“Dev”“Prod”pick
* Default, configurable** Fixed
pod-eviction-timeout
5min*ReconcilerMaxWaitForUnmountDuration
6min**
“Dev”
1 CPU
“QA”
1 CPU
VMworld 2018 Content: Not for publication or distribution
58©2018 VMware, Inc.
How vSphere Can Help
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 59©2018 VMware, Inc.
Multi Cluster ConfigurationPriority
VM
vSphere Cluster
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
Important Important Important Important
More Important More Important More Important More Important
Meh Meh Meh
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 60©2018 VMware, Inc.
HA Restart PriorityEnsure “Prod” Systems get restarted first
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 61©2018 VMware, Inc.
Restart Dependency
Works based on VM to VM rules
Only 1 level, so for A-B-C create two rules
• VM Group B depends on A• VM Group C depends on B• ETCD-Masters-Workers
Specify when to start next batch
• Resources allocated• Powered On• Guest Heartbeat• App Heartbeat
Or “HA Orchestrated Restart” as it is also called
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 62©2018 VMware, Inc.
Kubernetes Cluster Node RolesControl Plane (Masters) and Workers
VM
vSphere Cluster
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
(Master) (Master) (Master) (Workers) (Worker)
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 63©2018 VMware, Inc.
Multi-TenancyDRS Affinity Rules
VM
vSphere Cluster
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
(Master) (Master) (Master) (Workers) (Worker)
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 64©2018 VMware, Inc.
Multi-TenancyDRS Affinity Rules
VM
vSphere Cluster
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
(Master) (Master) (Master) (Workers) (Worker)
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 65©2018 VMware, Inc.
Multiple Fault DomainsQuorum dictates Design
VM
Fault Domain A
VM VM VM VM VM VM VMVM
K8S Prod
VM
K8S Prod
V
K8S Prod
VM
K8S Prod
VM
K8S Test
VM
K8S Test
VM
K8S Test
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
VM
K8S Prod
(Master) (Master)
Fault Domain B
(Worker)
VM
K8S Prod(Master) (Worker)
VM
K8S Prod
(Worker) (Worker)
(VM Anti-Affinity)
Host-VM Rules
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 66©2018 VMware, Inc.
DRS proactively avoid needing the use of HA• Integrates with server vendor’s monitoring software• Health states are passed to DRS• DRS reacts based on health state of hardware
None of DRS affinity/anti-affinity rules are violated
Quarantine Mode accepts workloads if performance degradation is imminent
Proactive HAMoving Workloads away at first Signs of Trouble
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 67©2018 VMware, Inc.
Admission Control ensures satisfying VM-level reservations
Percentage based works on the reservation per VM• Set aside a percentage for fail-over capacity• And the reservations (and overhead) will be removed
from the total percentage of available resources
Slots are based on the highest reservation + overhead
• If you have a 32GB mem reservation, your slot size is 32GB for memory + memory overhead
HA Admission ControlUse Per-VM Reservations to ensure Resource Availability during HA Events
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 68©2018 VMware, Inc.
VM Latency SensitivityCPU Core Isolation
VMworld 2018 Content: Not for publication or distribution
THANK YOU!
#vmworld #CNA1553BU
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 70©2018 VMware, Inc.
Global Services: Reimagining Support Giving you a more proactive, personalized, effortless experience
Read All About ItSupport Insider Blog
https://blogs.vmware.com/kb/
Meet the Team Connect at the
VMVillage’s Listening Post and the Global
Services Meeting Center
Download VMware Skyline™
Visit the VMware Skyline station in the Solutions
Exchange VMworld 2018 Content: Not for publication or distribution
More Sessions on Kubernetes
Try HOL
VMware Cloud-Native Apps
Follow Us
https://blogs.vmware.com/cloudnativehttps://www.youtube.com/c/VMwareCloudNativeApps
@cloudnativeapps
Tuesday, August 28CNA2084BU Intro to VMware Kubernetes Engine—Managed K8s Service on Public CloudCNA1656BU Put a Lid on It: Securing Containers and Kubernetes on vSphereCNA3146BU Technical Deep Dive: Kubernetes Networking and Security with NSX-T on PKSCNA2755BU Architecting PKS for Production Lessons Learned from PKS Deployments
Wednesday, August 29CNA1075BU Operating and Managing Kubernetes on Day 2 with PKSCNA2009BU Run Stateful Apps on Kubernetes with PKS: Highlight WebLogic ServerCNA3124BU Deep Dive: VMware Kubernetes Engine – Kubernetes as a Service on Public Cloud
193201CNA_U VMware Kubernetes Engine – Getting Started
193101NET_U VMware Pivotal Container Service and Kubernetes – Getting Started193002CNA_U VMware vSphere Integrated Containers – Getting Started
VMworld 2018 Content: Not for publication or distribution
PLEASE FILL OUTYOUR SURVEY.Take a survey and enter a drawingfor a VMware company store gift card.
#vmworld #CNA1553BU
VMworld 2018 Content: Not for publication or distribution
73©2018 VMware, Inc.
Top 5 Best Practices for Kubernetes on vSphere
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 74©2018 VMware, Inc.
Kubernetes is Linux• Learn Components/Implementation• Don’t treat it as “Black Box” VMs which someone else will hopefully manage
vSphere Best Practices for HW and OS Configuration/Patching/Security still apply
Use a tested and certified (supported) Kubernetes Distribution • Optimal Integration• E2E Support Experience• Reduce Overhead on Customer Side (create Business Value vs. undifferentiated Plumbing)
#1The Basics (“80/20” Rule)
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 75©2018 VMware, Inc.
NUMA Rules and Sizing Best Practices still apply• Defer NUMA Handling from Guest OS to the Hypervisor
CPU/MEM Reservations for critical Components• Control Plane and PROD-Workers
DRS for intelligent Placement and X-K8s Cluster Balancing
Leverage vSphere Overcommit Capabilities, but wisely• VM “Flat Rate”, i.e. slice & dice the Host into VMs as needed• vSphere Host and Cluster Resource Controls allow Overcommit Flexibility for different
Workloads and Classes (Dev/Test vs. Prod)
#2Embrace vSphere Resource Management
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 76©2018 VMware, Inc.
Use DRS Host Groups and Affinity/Anti-Affinity Rules for increased Availability• Use “should” instead of “must” Rules for K8s whenever possible• Respected by HA
Set higher HA Restart Priorities for K8s Control Plane and Dependencies
vMotion to avoid planned Downtime, e.g. Host Maintenance
Coming soon: Automatic Kubernetes vSphere Topology & Availability Zone Awareness• https://github.com/kubernetes/kubernetes/issues/64021
#3Embrace vSphere High Availability Features
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 77©2018 VMware, Inc.
#4Embrace the vSphere Ecosystem
VMworld 2018 Content: Not for publication or distribution
#CNA1553BU 78©2018 VMware, Inc.
Read and understand Kubernetes Vendor Documentation to provide Guidance
Example:
“Lastly, set all of the VMs created to High VM Latency to ensure some additional tuningrecommended by VMware for latency sensitive workloads [...]” (Source: Red Hat OpenShift v3.9 Reference Architecture for vSphere)
#5Provide Guidance
VMworld 2018 Content: Not for publication or distribution