libvirt/kvm driver update (kilo)

40
OPENSTACK COMPUTE 101 Libvirt/KVM Driver Update Stephen Gordon (@xsgordon) Sr. Technical Product Manager

Upload: stephen-gordon

Post on 02-Aug-2015

814 views

Category:

Technology


2 download

TRANSCRIPT

OPENSTACK COMPUTE 101

Libvirt/KVM Driver UpdateStephen Gordon (@xsgordon)Sr. Technical Product Manager

Agenda● Architecture Refresher● Kilo Features● Liberty Predictions/Speculation

OPENSTACK COMPUTE 101OPENSTACK COMPUTE 101

ARCHITECTURE REFRESHER

OpenStack Components

OpenStack Compute● Execution and management of compute workloads● Relatively technology agnostic (VMs, BM, Containers)● Pluggable virtualization/container backends:

○ Libvirt (KVM, LXC, Parallels CT, Parallels VM, QEMU, Xen), Ironic, Hyper-V, VMware vCenter, XenServer, etc.

○ http://docs.openstack.org/developer/nova/support-matrix.html

Components● RESTful nova-api interface

exposed on TCP port 8774.● AMQP message queue used

for RPC communications.● nova-scheduler handles

hypervisor selection for instance placement.

● nova-conductor handles database access.

Components (cont.)● nova-compute acts as the

Compute agent, interacting with the relevant hypervisor APIs to launch/manage guests.

Libvirt/KVM● Driver used for 85% of production OpenStack deployments. [1]● Free and Open Source Software end-to-end stack:

○ Libvirt - Abstraction layer providing an API for hypervisor and virtual

machine lifecycle management. Supports many hypervisors and architectures.

○ Qemu - Machine emulator able to use dynamic translation, or with hypervisor assistance (e.g. KVM) virtualization.

○ KVM - Kernel-based-virtual machine is a kernel module providing full virtualization for the Linux kernel .

● Why Libvirt instead of speaking straight to QEMU?[1] http://superuser.openstack.org/articles/openstack-users-share-how-their-deployments-stack-up

Why Libvirt?$ /usr/libexec/qemu-kvm -name instance-00000007 -S -machine pc-i440fx-rhel7.1.0,accel=tcg,usb=off \ -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -object memory-backend-ram,size=2048M,id=ram-node0,host-nodes=1,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -uuid 57d7852e-0286-4913-bd7e-f897c5197d21 -smbios type=1,manufacturer=Red Hat,product=OpenStack Nova,version=2014.2.2-19.el7ost,serial=c3758f33-342b-4350-adf0-a67798b56209,uuid=57d7852e-0286-4913-bd7e-f897c5197d21 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-00000007.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/var/lib/nova/instances/57d7852e-0286-4913-bd7e-f897c5197d21/disk,if=none,id=drive-virtio-disk0,format=qcow2,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=25,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:45:de:c3,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/57d7852e-0286-4913-bd7e-f897c5197d21/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none

Libvirt/KVM Guest Configuration● CPU● NIC● Disks● PCI devices● Serial consoles● SMBios info● CPU pinning● VNC or SPICE● QEMU + SPICE agents

● VNC or SPICE● QEMU + SPICE agents● Clock (PIT, RTC) parameters● Scheduler, disk, network

tunables

Supporting Tool Highlights● virsh - CLI for interacting with Libvirt.● virt-rescue - Run a rescue shell on a virtual machine (using

libguestfs).● virt-sysprep - Reset a virtual machine so that clones can be

made. Removes SSH host keys, udev rules, etc.● virt-v2v - Convert guests from other platforms (VMware, Xen,

Hyper-V).● virt-sparsify - Convert disk image to thin provisioned.

Libvirt/KVM● nova-compute agent

communicates with Libvirt.● Libvirt launches and

manages qemu processes for each guest.

● KVM uses the Linux kernel for direct hardware access as needed.

Guest Enhancements● VirtIO drivers provide paravirtualized device to virtual

machines, improving speed over emulation.○ Built into modern enterprise Linux guest operating systems.○ Available for Windows.

● QEMU guest agent optionally runs inside guests and facilitates external interaction by users and/or management platforms including OpenStack.

● Anti-VENOM provided using sVirt (SELinux and AppArmour security drivers supported).

Virtual Interface Drivers● Responsible for plugging/unplugging guest interfaces.● Different interface types = different Libvirt XML definitions.● Simplified LibvirtGenericVIFDriver implementation supports a

wide range of VIF types.● Not easily pluggable by out-of-tree implementations.

○ Live in nova/virt/libvirt/vif.py○ More on this later...

Virtual Interface Drivers Example● passthrough:

<interface type="direct">

<mac address="DE:AD:BE:EF:CA:FE"/>

<model type="virtio"/>

<source dev="eth0" mode="passthrough"/>

</interface>

● vhost-user:<interface type="vhostuser">

<mac address="DE:AD:BE:EF:CA:FE"/>

<model type="virtio"/>

<source type="unix" mode="server" path="/vhost-user/test.sock"/>

</interface>

Volume Drivers● Conceptually similar to VIF drivers, albeit no “generic” driver.● volume_drivers=iscsi=nova.virt.libvirt.volume.

LibvirtISCSIVolumeDriver,iser=nova.virt.libvirt.volume.LibvirtISERVolumeDriver,local=nova.virt.libvirt.volume.LibvirtVolumeDriver...etc.

OPENSTACK COMPUTE 101OPENSTACK COMPUTE 101

KILO FEATURES

Performance Features● CPU Pinning● Huge Pages● NUMA-aware scheduling (cont.)

○ Memory binding○ I/O device locality awareness

CPU Pinning● Extends NUMATopologyFilter added in Juno:

○ Adds concept of a “dedicated resource” guest.

○ Implicitly pins vCPUs and emulator threads to pCPU cores for increased performance, trading off the ability to overcommit.

● Combine with existing techniques for isolating cores for maximum benefit.

Example - Hardware Layout# numactl --hardwareavailable: 2 nodes (0-1)node 0 cpus: 0 1 2 3node 0 size: 8191 MBnode 0 free: 6435 MBnode 1 cpus: 4 5 6 7node 1 size: 8192 MBnode 1 free: 6634 MBnode distances:node 0 1 0: 10 20 1: 20 10

Example - Hardware Layout

Node 0

Core 0 Core 1

Core 2 Core 3

Node 1

Core 4 Core 5

Core 6 Core 7

Node 0 RAM # 0

Node 0 RAM # 1 Node 1 RAM # 1

Node 1 RAM # 0

Example - Virsh Capabilities<cells num='2'>

<cell id='0'>

<memory unit='KiB'>8387744</memory>

<pages unit='KiB' size='4'>2096936</pages>

<pages unit='KiB' size='2048'>0</pages>

<distances>

<sibling id='0' value='10'/>

<sibling id='1' value='20'/>

</distances>

<cpus num='4'>

<cpu id='0' socket_id='0' core_id='0' siblings='0'/>

<cpu id='1' socket_id='0' core_id='1' siblings='1'/>

...

Example - Configuration● Scheduler:

○ Enable NUMATopologyFilter, and AggregateInstanceExtraSpecsFilter

● Compute Node(s):○ Alter kernel boot params to add isolcpus=2,3,6,7○ Set vcpu_pin_set=2,3,6,7 in /etc/nova.conf

Example - Hardware Layout

Node 0

Core 0 Core 1

Core 2 Core 3

Node 1

Core 4 Core 5

Core 6 Core 7

Node 0 RAM # 0

Node 0 RAM # 1 Node 1 RAM # 1

Node 1 RAM # 0

Host Processes

Guests

Example - Configuration● Flavor:

○ Add hw:cpu_policy=dedicated extra specification:$ nova flavor-key m1.small.performance set hw:cpu_policy=dedicated

● Instance:$ nova boot --image rhel-guest-image-7.1-20150224 \

--flavor m1.small.performance test-instance

Example - Resultant Libvirt XML● vCPU placement is static and 1:1 vCPU:pCPU relationship:

<vcpu placement='static'>2</vcpu>

<cputune>

<vcpupin vcpu='0' cpuset='2'/>

<vcpupin vcpu='1' cpuset='3'/>

<emulatorpin cpuset=' 2-3'/>

</cputune>

● Memory is strictly aligned to the NUMA node:<numatune>

<memory mode= 'strict' nodeset='0'/>

<memnode cellid=' 0' mode='strict' nodeset=' 0'/>

</numatune>

Huge Pages● Huge pages allow the use of larger page sizes (2M, 1 GB)

increasing CPU TLB cache efficiency.○ Backing guest memory with huge pages allows predictable memory

access, at the expense of the ability to over-commit.

○ Different workloads extract different performance characteristics from different page sizes - bigger is not always better!

● Administrator reserves large pages during compute node setup and creates flavors to match:○ hw:mem_page_size=large|small|any|2048|1048576

● User requests using flavor or image properties.

Example - Host Configuration# grubby --update-kernel=ALL --args= ”hugepagesz=2M hugepages=2048”

# grub2-install /dev/sda

# shutdown -r now

# cat /sys/devices/system/node/ node0/hugepages/hugepages-2048kB/nr_hugepages

1024

# cat /sys/devices/system/node/ node1/hugepages/hugepages-2048kB/nr_hugepages

1024

Example - Virsh Capabilities<topology>

<cells num=’2’>

<cell id=’0’>

<memory unit=’KiB’>4193780</memory>

<pages unit=’KiB’ size=’4’>524157</pages>

<pages unit=’KiB’ size=’2048’>1024</pages>

...

Example - Flavor Configuration$ nova flavor-key m1.small.performance set hw:mem_page_size=2048

$ nova boot --flavor=m1.small.performance \

--image=rhel-guest-image-7.1-20150224 \

numa-lp-test

Example - Result$ virsh dumpxml instance-00000001

...

<memoryBacking>

<hugepages>

<page size=’2048’ unit=’KiB’ nodeset=’0’/>

</hugepages>

</memorybacking>

...

Example - Hardware Layout w/ PCIe

Node 0

Core 0 Core 1

Core 2 Core 3

Node 1

Core 4 Core 5

Core 6 Core 7

Node 0 RAM # 0

Node 0 RAM # 1 Node 1 RAM # 1

Node 1 RAM # 0

Node 0 PCIe Node 1 PCIe

I/O-based NUMA Scheduling● Extends PciDevice model to include NUMA node the device

is associated with.● Extends NUMATopologyFilter to make use of this information

when scheduling.

Quiesce Guest Filesystem● Libvirt > 1.2.5 supports a fsFreeze/fsThaw API.● Freezes/thaws guest filesystem(s) using QEMU guest agent.● Ensures consistent snapshots.● To enable:

○ hw_qemu_guest_agent image property must be set to yes.○ hw_require_fsfreeze image property must be set to yes.○ QEMU guest agent must be installed inside guest.

Hyper-V Enlightenment● Windows guests support several additional paravirt features

when running on Hyper-V (similar to virtio, kvmclock, etc. on KVM).

● Helps avoid BSOD in guests on heavily loaded hosts, enhances performance.

● QEMU/KVM is able to support several of these natively.● Expands behavior of os_type=“windows” image property.

vhost-user support● VIF driver for new type of network interface implemented in

QEMU/Libvirt.● Intended to provide a more efficient path between a guest

and userspace vswitches.

Liberty Predictions

Liberty Predictions/Speculation● Libvirt hardware policy from libosinfo (approved)● Post-plug VIF scripts (under review)● Further work around SR-IOV incl.:

○ Interface attach/detach (under review)○ Live migration when using macvtap (under review)

● Ability to select guest CPU model and/or features (under review)

● VM HA (under review)● VirtIO network performance enhancements (under review)● Hot resize (under review)

Thank You

OPENSTACK COMPUTE 101OPENSTACK COMPUTE 101

Questions?

@[email protected]