xvi05 open source virtualization with kvm

57
© 2010 IBM Corporation Session xVI05 Open Source Virtualization with KVM for IBM System x [email protected] - Linux Architect IBM Systems Technical University - Budapest, 3-6 May 2010

Upload: darkcompanion

Post on 29-Nov-2014

1.360 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

Session xVI05Open Source Virtualization with KVM for IBM System x [email protected] - Linux Architect

IBM Systems Technical University - Budapest, 3-6 May 2010

Page 2: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Tom Schwaller ([email protected])Linux IT Architect, IBM Germany

Tom Schwaller studied Mathematics and Theoretical Physics at the Swiss Federal Institute of Technology in Zurich and worked afterwards for several high performance computing research projects. From 1996-2001 he was editor-in-chief of he German Linux Magazine and also cofounded the Linux New Media AG.

Since June 2001 he works as Linux IT Architect at IBM Gemany and helped as member of the Linux Impact Team (until end of 2005) many IBM customers with their Linux migration. As IBM‘s Linux Evangelist he also supported hundreds of Linux customer briefings, gave dozends of radio, TV and press interviews, represented IBM as keynote speaker on all major German Linux events (LinuxTag, LinuxWorldExpo, LinuxPark, etc.) and was advisory board member / chairman of several conferences (SambaXP, iX Eclipse Conference, First International Virtualization Conference, etc.). As EMEA Linux Dektop Technical Leader he also coauthored the very successful IBM Linux Client Migration Cookbook.

Since 2006 his main focus is on high performance and cloud computing, virtualization, high end x86 systems, iDataPlex & BladeCenter and high speed Infiniband/10GbE networking/storage (incl. GPFS). From 2007-2008 he was Deep Computing Lead Architect in CEEMEA. At the moment he works as Lead Architect for a major Linux Desktop Cloud project in Germany.

Page 3: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Agenda x86-Virtualization KVM (Kernel-based Virtual Maschine) KVM Performance Tuning

– KSM (Kernel SamePage Merging)– VirtFS

KVM I/O-Architecture (Evolution)– Virtio– I/O Acceleration with vhost-net & SRIOV

KVM Security– SELinux & sVirt

QEMU– Creating Disk Images & manual Installation

Thanks to Anthony Liguori, Hollis Blanchard, Ram Pai

Page 4: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

x86-Virtualization

Page 5: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Some Virtualization Use Cases Datacenter Consolidation

– Primary driver: increase utilization (primarily CPU, memory)– Special case: security isolation (e.g. hosting providers)

Hardware Abstraction– Driver: new hardware with older guest OS– Support via virtual device drivers, e.g. ATA disk

Leverage New Technologies (FCoE, 10 GbE) Cloud Computing

– Resources on demand, pay per use Development and Testing

– Driver: multiple development and testing environments, isolation from main workspace in host. Fault injection.

Virtual Desktop– Multiple OS– Legacy applications (DOS, Windows)

Thin Clients– Provision, manage, high availability– Accessible from everywhere

Page 6: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Evolution of x86 Hypervisors

x86

Hypervisor

Dom

ain

0

VM VM

x86

Hypervisor

Binary Translation

VM VM VM

VT x86

Linux

VT

VM VM VM

KVM

First Generation

Software based

Second Generation

Para-virtualization

Third Generation

Hardware-based /OS virtualization

Virtualization logic PV kernel PV drivers

Page 7: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM (Kernel-based Virtual Maschine)

Page 8: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

An Alternative to Xen In 2006-2007, kernel developers started talking

about an alternative to Xen that was more closely aligned with Linux– A few issues stood out:

• NUMA Support• Control Tool Stack

A startup, Qumranet, wanted to build a VDI solution using Open Virtualization

Qumranet never intended on being a hypervisor vendor

Page 9: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Leveraging Hardware Design virtualization support around hardware virtualization

– Hardware virtualization support is pervasive– Modern Intel VT-x/AMD-V outperforms paravirtualization (PV)

Tremendous simplification comes from not supporting older hardware– Intrusive Linux patching is unnecessary

Leveraging Linux Xen is virtualization added to an exokernel Linux is a proof-by-example that monolithic kernels

are more scalable/secure/fast/stable than microkernels/exokernels– Linux dominates the top 100– Linux has a large share in the embedded space– Rising desktop/server market shares

If Linux can be used in Naval destroyers for systems control, why can't it be used to run a couple dozen Windows XP instances running Office?

Page 10: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM Design Philosophy

Guest is a userspace process– Not just from a management perspective

From a performance perspective, switching to and from the guest is roughly equivalent to switching between kernel and userspace

The problems that need to be solved to do something in virtualization are roughly the same as to do them in userspace– PCI passthrough == userspace PCI drivers– Unsurprisingly, interrupt sharing is the major issue for both

If we solve the problems for Linux userspace in general, we solve the problems for virtualization

Page 11: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Kernel-based Virtual Maschine (KVM) - Overview Converts Linux into a Type-1 Hypervisor Incorporated into the Linux kernel in 2006 Runs Windows, Linux and other guests KVM architecture leverages the power of Linux

– Built on trusted, stable enterprise grade platform– Scheduler, memory management, hardware support etc. – Ease of management - same Linux paradigm

Advanced features – Inherit scalability, NUMA support, power management, hot-plug from Linux

• Red Hat Hypervisor (KVM) expected to support >96 cores / 1 TB RAM on host and 16 vCPU / 64 GB RAM on guest!

– SELinux security, Real-Time scheduler, RAS support, OpenGL for guests– Live Migration of virtual machines– VM Storage access (iSCSI, AoE, FCoE, GNBD, cLVM,..) from Linux

Hybrid-mode operation– Run regular Linux applications side-by-side with Virtual Machines on

the same server - much higher degree of hardware efficiency

http://www.linux-kvm.orghttp://www.linux-kvm.com

Page 12: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Linux KVM - Architecture

Type 1 Hypervisor– Not “bare metal” in a classical sense, but hypervisor is kernel-integrated

Introduces new instruction execution mode – Guest Mode– Executes VMs closer to Kernel avoiding User Mode context switching like

traditional non-kernel integrated Type 2 Hypervisor Slightly modified QEMU is used for HVM construct and I/O

– virtio utilizes user mode virtio drivers inherent in Kernel/QEMU for performance

Page 13: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM Execution Model

Native GuestExecution

KernelExit Handler

UserspaceExit Handler

Switch toGuest Mode

ioctl()

Userspace Kernel Guest

Lightweight ExitHeavyweight Exit

Page 14: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM is a Virtualization Driver

KVM is a small kernel driver that adds virtualization support on multiple architectures– AMD, Intel (included in 2.6.20)

• KVM-lite: PV Linux guest on non-VTx / non-SVM host

– IA64 (included in 2.6.26)– S390 (included in 2.6.26)– Embedded PowerPC (power.org, included in 2.6.26)

About 30k LOCS Compared to ~250k LOCS for Xen Uses QEMU in userspace as a device model Safe to use by unprivileged userspace processes Can leverage almost all Linux features

Page 15: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM Features Power Management

– C and P state support– Advanced governers– Suspend/resume

Memory Management– NUMA support

• Policy control• Memory migration

– Swapping– Overcommit– Compression (KSM)

Resource Control– cgroups– CFS tunables

Anything that Linux supports All Hardware that Linux supports is supported in KVM

– Compare this to ESX!

Page 16: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM Upstream Maintainerships

ReleasedHypervisor

Qemu DeviceModel

120 kloc

IBMAnthony Liguori

LibvirtCIM

Providers25 kloc

IBMKaitlin Rupert

KVM Modules15 kloc

Red HatAvi Kivity

Libvirt Toolchain100 kloc

Red HatDaniel Berrange

Page 17: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

RHEL KVM Roadmap

RHEL 5.5 – Functions Nehalem-EX performance

optimizations Device model improvements More flexible Interrupt

handling PCI pass-through Bug fixes (vis a vis 5.4)

RHEL 6.0 – Functions Improved I/O throughput SR-IOV support Virtual Switch enhancements Improved RAS via Intel MCA Storage and Network

Management APIs

RHEL 6.x - Functions LPAR Mode Aptus support HA and node failover Clustered File System UEFI guest BIOS Westmere performance

optimizations

03/2010 2Q/2010 tbd

Page 18: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KSM (Kernel SamePage Merging)

Page 19: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KSM - Memory Page Sharing Implemented as loadable kernel module

– Kernel SamePage Merging (KSM) included in Linux Kernel 2.6.32 (Izik Eidus )– modprobe ksm– cat /sys/kernel/mm/KSM/max_kernel_pages (-> 2000)– cat /sys/kernel/mm/KSM/pages_sharing (>0)

Kernel scans memory of virtual machines– Looks for identical pages– Merges identical pages– Only stores one copy (read only) of shared memory– If a guest changes the page it gets it's own private copy

qemu-kvm KSM-patch added to kvm development tree after kvm-88 release

Significant hardware savings– Better consolidation ratio– Allows more virtual machines to run per host

• Memory Overcommit (avoiding Linux Swapping)• 600 VMs (web) on host with 48 cores and 256 GB RAM! (Red Hat claim)

http://www.linux-kvm.com/content/using-ksm-kernel-samepage-merging-kvm

Page 20: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

IBM Linux Technology Center - KSM Recommendations

Include the virtio balloon driver and auto-ballooning daemon in RHEL 5.x guests. Tests show that ballooning is necessary for KVM over-commitment to be effective for Linux guests (1.7x).

Substitute a more recent 2.6.3x kernel for the default RHEL 5.x 2.6.18 kernel. These later kernels are significantly better at handling low-memory situations than the default kernel.

Decrease the frequency or randomize the runtime of periodic daemons. For example, the yum-updatesd daemon loads by default 1 hour after boot. When this daemon loads concurrently on over-committed guests, all the guests fault in the python runtime, causing a concurrent resource spike. By changing the setting to be more random, there would be no spike in memory usage.

Setting the swappiness tunable in the KVM host to zero significantly improves performance by causing the host to prefer evicting page cache pages before guest pages.

Turn zone reclaim off. Initial testing showed that memory pressure caused by over-commitment can trigger some unexpected performance reductions when specific NUMA memory zones become fully allocated.

Turn Numa off. Provisionally we see better results with numa turned off. We suspect there are some latent bugs in the numa allocation that we should find and fix.

Page 21: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

VirtFS

Page 22: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

VirtFS - Overview What is it?

– Filesystem pass-through mechanism between the KVM host and guest operating systems (para-virtualized file system)

– Uses Plan-9 Protocol (9P2000.L) between Client and Server• Simple, efficient protocol, maintained by IBM

– Server is on Host and is part of QEMU with VirtIO transport– Client is part of the Quest Kernel.

What are the expectations?– Provide secure and isolated Filesystem exports

between the KVM Host and Guest.– Close to native Filesystem (GPFS) performance– Multi-tenancy

Who Needs it?– VSC (Virtual Storage Cloud)

• SoNAS on top of VirtFS client in a KVM guest– SoNAS

Page 23: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

VirtFS - Block Diagram

HARDWARE

HOST KERNEL

GPFS ClientVFS Interface

VirtFSServer

(v9fs server in QEMU)

HostUser Space

VirtIORing

Guest Kernel

VFS InterfaceVirtFS (v9fs)

Client

GPFS API

Apps on Guest

Page 24: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

VirtFS and GPFS Why not use GPFS Directly on the Guest?

– GPFS limits the number of cluster cross mounts limiting the number of virtual machines.

– GPFS in the guests would be a significant resource usage (memory) due to fixed i-node cache allocation.

– GPFS is much more sensitive (narrow) about supported kernel versions and Operating Systems.

– Disk management becomes difficult on dynamically changing environment.

Page 25: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtio

Page 26: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtio First proposed by Rusty Russell

– Based on our experiences with Xen frontend/backend architecture Addressed a number of concerns:

– Clear separation between protocol and transport to allow multiple hypervisors to utilize

– Each component uses well defined interface and is replaceable– Minimum driver implementation required– Fits on top of existing hardware abstraction well (PCI)

Linux will support lguest, KVM, Xen, KVM-lite, PHYP, VMware, Viridian, and possibly more

– If each has 4-5 PV drivers, that's 35 new drivers!– All drivers would be doing the same thing

virtio is an abstraction of the common mechanism of VMMs– A single driver could, with little modification, run on many different VMMs

Especially important for “small” drivers (entropy driver, CPU hotplug, ballooning, etc.)

Page 27: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtio Architecture

virtio

virtio-net

lguest virtiobackend

xen virtiobackend

kvm virtiobackend

virtio-videovirtio-blk virtio-9p

Page 28: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtio for KVM Paravirtualized Drivers for KVM/Linux

– virtio was chosen to be the main platform for IO virtualization in KVM – The idea behind it is to have a common framework for hypervisors for IO

virtualization (same in XEN)– network/block/balloon/PCI passthrough devices are supported for KVM– The host implementation is in userspace - qemu, so no driver is needed in

the host (but still has some perforance issues) Hardware assisted Virtualization

– Support for advanced hardware features for both KVM and Xen• VT-d for secure PCI Pass-thru on Intel platforms• IOMMU for secure PCI Pass-thru on AMD platforms• PCI Single-Root I/O Virtualization (SR-IOV)

– Delivers native I/O performance for network and block devices Support for Microsoft Windows Servers guests

– Paravirtualized drivers for network and disk (WHQL certified -> Enterprise Distros)– Microsoft SVVP Certification (-> Enterprise Distros)

Page 29: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Current State of Virtio Drivers:

− virtio-net− virtio-blk− virtio-console− virtio-balloon− virtio-random

Transports:− virtio-pci− virtio-s390− virtio-lguest

Page 30: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtio-net

Page 31: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

virtio-net Where most of the work is happening these days Current performance is in most cases, better than Xen

netfront/netback− Xen suffers from asymmetric RX/TX performance− KVM maintains symmetry on both

The mainline bits are still only roughly 50% of native− Active work underway to improve that further

Uses the tun/tap device− Added GSO support to improve performance

Page 32: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM netperf with virtio-net

© 2009 Chris Wright

Page 33: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM netperf with Device Assignement

© 2009 Chris Wright

Page 34: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

I/O Acceleration withvhost-net & SRIOV

Thanks to Ram Pai (IBM)

Page 35: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtualization Phase I - Qemu Emulation

QEMU

Guest OS

Emulation entirely done in user space Qemu emulated hardware

– E1000/RTL nic– IDE driver

No hardware support No Host OS support Qemu/Guest is just a user process for the Host OS Performance – VERY SLOW

Virtual NIC

NIC Driver Disk Driver

Host OS

Virtual Disk

Page 36: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtualization Phase II - KVM Acceleration

Disk Driver

Qemu emulates devices and runs in User Mode Guest still part of the QEMU process Guest image run in Guest Mode facilitated by KVM KVM exploits Intel VT / AMD-V CPU support Performance

– Guest CPU speed is near native– IO is slow

Guest->Host->User mode and vice-versa context switch penalty for each i/o operation

KVM

NIC Driver

Virtual NIC

QEMU Guest OS

Host OS

Virtual Disk

Page 37: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtualization Phase III - I/O Acceleration through Virtio

NICDriver

Qemu emulated virtio device Guest runs a paravirt virtio driver I/O is buffered in circular send and receive queue Context switch from Guest->Host->User and vice/versa reduced significantly Performance

– Guest CPU speed is near native– Better I/O throughput and lower latency at lower CPU utilization

KVM Disk Driver

Send Q

Recv Q

Virtio Driver

QEMU Guest OS

Virtual Disk

Virtual NIC

Host OS

Page 38: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtualization Phase IV - I/O Accel. through vhost-net

Disk Driver NIC Driver

Kernel emulated virtio device through vhost-net Guest runs a paravirt virtio driver I/O is buffered in circular send and receive queue Context switch from kernel to user and vice-versa

eliminated per vmexit. Performance

– Guest CPU speed is near native– Further lower latencies

KVM

Send Q

Recv Q

Virtio Driver

vhost-net

QEMU Guest OS

Host OS

Virtual Disk

Page 39: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

SRIOV Overview SRIOV : Single Root I/O Virtualization PCI-SIG Standard Ability to drive a PCIe function from multiple independent software entities Each software entity believes it has exclusive access Provides high throughput, low CPU utilization, high scalability Requires platform support

Configuration resource PF 0

PF1

PF2

PCIe port

VF0 0VF0 1

VF02

VF0 3

VF0 n

.. VF0 0

VF1 1

VF1 2

VF1 3

VF1 n

..

VF2 0

VF2 1

VF2 2

VF2 3

VF2 n

..

Replicates each hardware physical function into multiple virtual functions

– PF -> Physical Function– VF -> Virtual Function

Virtual Function is a replica of the Physical Function Each Physical Function can have up to 256 Virtual Functions Each Virtual Function can be driven by an independent software entity Virtual Functions are light weight which lack configuration resources.

Page 40: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Virtualization Phase V - SRIOV Hardware Acceleration

Guest OS1

KVM

PF ... ...

PF driver

VF driver

Guest OS2

VF driver

Guest OS3

VF driver

Guest OSn

VF driver

Host configures and enables PF and all VF Host and Qemu are by-passed in the I/O path Each guest controls a VF function Side band communication path between PF and VF

– For communicating device information– Co-ordination between PF and VF on device reset

Native CPU speed Promises native I/O performance at negligible CPU overhead

VF1 VF2 VFn

Host OS

Page 41: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Issues with SRIOV

All guest pages have to be pinned– Cannot overcommit guest memory

Requires PCI pass-thru platform support Guest migration across hosts is challenging

Page 42: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

TCP Bandwidth Comparison

SRIOV provides higher bandwidth at lower CPU utilization

• 3400M2 Nehalem, 16cpu, 4G memory• Intel 1G SRIOV adapter• rhel5u4 host running rhel5u4 guest

Page 43: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

TCP Latency Comparison SRIOV surprisingly has higher latency

• 3400M2 Nehalem, 16cpu, 4G memory• Intel 1G SRIOV adapter• rhel5u4 host running rhel5u4 guest

Page 44: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

How to use SRIOV Ensure SRIOV is supported in BIOS/UFI If kernel not enabled use command line workaround : pci=assign-busses Activate driver to enable VF: modprobe igb max_vfs=1 Check for the existence of VF lspci 07:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 08:10.0 Ethernet controller: Intel Corporation 82576 Virtual Function (rev 01) Enable the VF for pci-passthru pciid=$(lspci -n | grep $bus | awk '{print $3}' | sed -e 's/:/ /') echo -n $pciid > /sys/bus/pci/drivers/pci-stub/new_id echo -n 0000:$bus > /sys/bus/pci/devices/0000:$bus/driver/unbind echo -n 0000:$bus > /sys/bus/pci/drivers/pci-stub/bind. Pass the VF pciid to the guest qemu-kvm -pcidevice host=$bus Verify that the device is grabbed by the corresponding driver in the guest # ethtool -i eth0 driver: igbvf

Page 45: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

QEMU

Page 46: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

QEMU QEMU is a community-driven project

– No company has sponsored major portions of it's development

QEMU does a really amazing thing– Can emulate 9 target architectures on 13 host architectures!– Provides full system emulation supporting ~200 distinct devices– Very sophisticated and complete command line interface (CLI)– There are more than 90 options in the output of qemu-kvm --help

Is the basis of KVM, Xen HVM, and xVM Virtual Box– Every Open Source virtualization project uses QEMU

Userspace device model for KVM− Provides management interface− Provides device emulation− Provides paravirtual IO backends

libvirt communicates directly with QEMU libvirt-cim communicates with libvirt

Page 47: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Creating a virtual Disk with qemu-img # qemu-img create

– Options: format; filename; size; compression; encryption; base image– Formats: qcow/qcow2, Vmware, Virtual PC, Parallels, raw and 8 more– A base image is an existing virtual disk to use as the initial state for a

copy on write snapshot• No coordination so you should manually make it read only

– Example: # qemu-img create -f qcow2 /path/to/virtualdisk 6G # qemu-img info /path/to/virtualdisk

– Give information about the virtual disk, e.g. size (on-disk and virtual), format, snapshots

# qemu-img convert -f <fmt> /path/to/virtualdisk \ -O <fmt> /path/to/converteddisk– Convert the virtual disk format, e.g.:

• Virtual PC to qcow• Add compression or encryption

# qemu-img commit /path/to/virtualdisk– Write a virtual disk's changes to its base image

Page 48: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Windows 2003 Installation on Fedora 12 Install KVM package

# yum install qemu kvm Create an image file for the virtual hard disk

# qemu-img create -f qcow2 win2003-1.img 4G Start the virtual machine # qemu-kvm -no-acpi -m 384 -hda windows2003-1.img \

-cdrom w2k3.iso -boot d -smp 2-m = memory (in MB)-hda = first hard drive (many image file types supported)-cdrom = ISO Image or CD/DVD drive‐

-boot[a|c|d] = boot from Floppy (a), Hard disk (c) or CDROM (d)-smp = number of CPUs

Install Windows, download PV drivers to VM and restart with virtio network option # qemu-kvm -no-acpi -m 384 -boot c -smp 2 windows2003-1.img \

-net nic,model=virtio Install paravirtualized (PV) network drivers and reboot with virtio network option For paravirtualized Windows block device (driver installation and usage) check e.g. http://www.linux-kvm.com/content/redhat-54-windows-virtio-drivers-part-2-block-drivers

Page 49: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Result of Windows 2003 Installation

Page 50: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

OpenSolaris 2008.11 on RHEL-5.3/5.4 (KVM)

Page 51: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM Example with more Networking Options # qemu-kvm -hda /path/to/virtualdisk \

-net nic,model=e1000,macaddr=ac:de:48:64:61:1c \ -net tap,script=no,ifname=kvmtap679a \ -m 512 -smp 2 -usb -localtime -name “MyVM”

Networking Options– User space stack [-net user,vlan=<n>]

• Port forwarding [-redir tcp|udp:host-port:guestIP:guest-port]

– Tap a host interface[-net tap,vlan=<n>,ifname=<tapname>,script=no]

– Socket (private shared network with another host)[-net socket,vlan=<n>,listen=:<port>][-net socket,vlan=<n>,connect=<host>:<port>]

– Multicast socket (shared network with multiple hosts)[-net socket,vlan=<n>,mcast=<addr>:<port>]

– VM with no NICs [-net none]

Page 52: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

KVM Security

Page 53: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

sVirt: Hardening Linux Virtualization with Mandatory Access Control sVirt: pluggable security design for libvirt

– supports MAC security schemes like SELinux, SMACK MAC policy enforced by host kernel Guests and resources uniquely labeled: svirt_t, virt_image_t, virt_content_t,… Coarse rules for all isolated guests applied to svirt_t For simple isolation: all accesses between different UUIDs are denied Current status

– Low-level libvirt integration done– Can launch labeled gues– Basic label support in virsh

Future enhancements– Different types of isolated guests: svirt_web_t– Virtual network security– Controlled flow between guests– Distributed guest security

Related work– Labeled NFS– Labeled Networking– XACE

Similar work– XSM (port of Flask to Xen)

http://selinuxproject.org/page/SVirt

XEN Vulnerabilityhttp://www.hacker-soft.net/Soft/Soft_13289.htm

© 2009 James Morris

Page 54: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

sVirt Dynamic Labeling Generates a Random unused MCS (Multiple Category Security) label. Labels the image file/device: svirt_image_t:MCS1 Launches the image: svirt_t:MCS1 Labels R/O Content: virt_content_t:s0 Labels Shared R/W Content: svirt_t:s0 Labels image on completion: virt_image_t:s0

Page 55: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

sVirt Static Label (Multi-Level Security) Administrator must specify image label svirt_t:TopSecret Launches the image: svirt_t:TopSecret Libvirt will NOT label any content. Administrator responsible for labeling content.

virt-manager with static SELinux labeling

Page 56: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

Questions ?

Page 57: xVI05 Open Source Virtualization With KVM

© 2010 IBM Corporation

IBM Systems Technical University - Budapest, 3-6 May 2010 - xVI06 - Open Source Virtualization with KVM on IBM System x

TrademarksTrademarks

The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. For a complete list of IBM Trademarks, see www.ibm.com/legal/copytrade.shtml: AS/400, DBE, e-business logo, ESCO, eServer, FICON, IBM, IBM Logo, iSeries, MVS, OS/390, pSeries, RS/6000, S/30, VM/ESA, VSE/ESA, Websphere, xSeries, z/OS, zSeries, z/VM

The following are trademarks or registered trademarks of other companies

Lotus, Notes, and Domino are trademarks or registered trademarks of Lotus Development CorporationJava and all Java-related trademarks and logos are trademarks of Sun Microsystems, Inc., in the United States and other countriesLINUX is a registered trademark of Linux TorvaldsUNIX is a registered trademark of The Open Group in the United States and other countries.Microsoft, Windows and Windows NT are registered trademarks of Microsoft Corporation.SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC.Intel is a registered trademark of Intel Corporation* All other products may be trademarks or registered trademarks of their respective companies.

NOTES:

Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here.

IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply.

All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions.

This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area.

All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

References in this document to IBM products or services do not imply that IBM intends to make them available in every country.

Any proposed use of claims in this presentation outside of the United States must be reviewed by local IBM country counsel prior to such use.

The information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.