linux storage and virtualization · storage protocol drivers qemu allows other images to reside...

46
Linux Storage and Virtualization Christoph Hellwig

Upload: others

Post on 26-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Linux Storage and Virtualization

Christoph Hellwig

Page 2: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Storage in Virtual Machines – Why?

A disk is an integral part of a normal computer Most operating systems work best with local

disks Boot from NFS / iSCSI still has problems

Simple integrated local storage is: Easier to setup than network storage More flexible to manage

For the highend use PCI passthrough or Fibre channel NPIV instead

Page 3: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

A view 10.000 feet

The host system or virtual machine monitor (VMMVMM): exports virtual disks to the guest The guest uses them like real disks

The virtual disks are backed by real devices.. Whole disks / partitions / logical volumes

.. or files Either raw files on a filesystem or image

formats

Page 4: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

A virtual storage stack

We have two full storage stacks in the host and in the guest Potentially also two

filesystem Potentially also a image

format (aka mini filesystem)

Guest storage driver

Storage hw emulation

Image format

Host volume manager

Host filesystem

Guest filesystem

Host storage driver

Page 5: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Requirements (high level)

The traditional storage requirements apply: Data integrityData integrity – data should actually be on disk

when the user / application require it Space efficiencySpace efficiency – we want to store the user /

application data as efficient as possible PerformancePerformance – do all of the above as fast as

possible Additionally there is a strong focus on:

ManageabilityManageability – we potentially have a lots of hosts to deal with

Page 6: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Requirements – guest

None - Guests should work out of the box Migrating old operating system images to virtual

machines is a typical use case Any guest changes should be purely

optimizations for: Storage efficiency or Performance

Page 7: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Requirements – host

The host is where all the intelligence sits Ensures data integrity

Aka: the data really is on disk when the guest thinks so

Optimizantions of storage space usage

Page 8: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

A practical implementation: QEMU/KVM

KVM is the major virtualization solution for Linux Included in the mainline kernel, with lots of

development from RedHat, Novell, Intel, IBM and various individual contributors

Page 9: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

What is QEMU and what is KVM?

QEMUQEMU primarily is a CPU emulator Grew a device model to emulate a whole

computer Actually not just one but a whole lot of them

KVMKVM is a kernel module to use expose hardware virtualization capabilities e.g. Intel VT-x or AMD SVM KVM uses QEMU for device emulation

As far as storage is concerned they're the same

Page 10: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

QEMU Storage stack overview

Host KernelHost Kernel

ATAATA SCSISCSI VirtioVirtio

VMDK / etc..VMDK / etc..RawRaw

Raw Posix / fileRaw Posix / file

LinuxLinux GuestGuest

OthersOthersGuestsGuests

WindowsWindows GuestGuest

QEMUQEMU

TransportsTransports

Image FormatsImage Formats

Backend /Backend /ProtocolProtocol

Posix file

Qcow2Qcow2

Page 11: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Storage transports

QEMU provides a simple Intel ATAATA controller emulation by default Works with about every operating systems

because it is so common Alternatively QEMU can emulate a Symbios SCSISCSI

controller

Page 12: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Paravirtualization

ParavirtualizationParavirtualization means providing interfaces more optimal than real hardware AdvantageAdvantage: should be faster than full

virtualization DisadvantageDisadvantage: requires special drivers for each : requires special drivers for each

guestguest

Page 13: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Paravirtualized storage transport

QEMU provides paravirtualized devices using the virtio framework

Virtio-blk provides a simple block driver ontop of Virtio-blk provides a simple block driver ontop of virtiovirtio Just simple read/write requestsJust simple read/write requests And SCSI requests through ioctlsAnd SCSI requests through ioctls ......

Page 14: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Life of an I/O request

Guest KernelGuest Kernel

Host KernelHost Kernel

QEMUQEMU

MMIO/PIOMMIO/PIO CompletionCompletionInterruptInterrupt

InterruptInterruptInjectionInjection

TrapTrap

DMADMA

Page 15: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Posix file storage backend

The primary storage backend Almost all I/O eventually ends up thereAlmost all I/O eventually ends up there

Simply backs disk images using a regular file or device file

Page 16: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Posix storage backend - AIO

The qemu main loop is effectively singe threaded: Time spent there blocks execution of the guest I/O needs to be offloaded as fast as possible

I/O backends need to implement asynchronous semantics

Page 17: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Posix storage backend - AIO

AIO support in hosts is severly lacking Use a thread pool to hand off I/O by default

Alternatively support for native Linux AIO: Only works for uncached access (O_DIRECT) Still can be synchronous for many use cases

Page 18: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Posix storage backend – vectored I/O

Typical I/O requests from guests are split into non-contingous parts scatter/gather lists

In the optimal case a whole SG list is sent to the host kernel in one request

preadv/pwritev system calls

Page 19: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Posix storage backend - AIO

AIO support in hosts is severly lacking Use a thread pool to hand off I/O by default

Alternatively support for native Linux AIO: Only works for uncached access (O_DIRECT) Still can be synchronous for many use cases

Page 20: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Posix storage backend – I/O restrictions

O_DIRECT requires strict alignment and specific I/O sizes Try to align memory allocations inside Qemu The Posix backend needs to perform

read/modify/write cycles in the worst case The alignment and size restrictions vary

There is no proper way to query the kernel for the restrictions

In Linux they generally depend on the sector size

Page 21: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Spot the error..

Page 22: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Data integrity in QEMU / caching modes

cache=none uses O_DIRECT I/O that bypasses the

filesystem cache on the host cache=writethrough

uses O_SYNC I/O that is guaranteed to be commited to disk on return to userspace

cache=writeback uses normal buffered I/O that is written back

later by the operating system

Page 23: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Data integrity - cache=writethrough

This mode is the safest as far as qemu is This mode is the safest as far as qemu is concernedconcerned There are no additional volatile write caches in There are no additional volatile write caches in

the hostthe host The downside is that it's rather slowThe downside is that it's rather slow

Page 24: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Data integrity - cache=writeback

When the guest writes data we simply put it in When the guest writes data we simply put it in the filesystem cachethe filesystem cache No guarantee that it actually goes to diskNo guarantee that it actually goes to disk Which is actually very similar to how modern Which is actually very similar to how modern

disks workdisks work

Page 25: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Data integrity - cache=writeback

The guest needs to issue a cache flush command The guest needs to issue a cache flush command to make sure data goes to diskto make sure data goes to disk Similar to real modern disks with writeback Similar to real modern disks with writeback

cachescaches Modern operating systems can deal with thisModern operating systems can deal with this

And the host needs to actually implement the And the host needs to actually implement the cache flush command and advertise it:cache flush command and advertise it: The QEMU SCSI emulation has always done thisThe QEMU SCSI emulation has always done this IDE and virtio only started this very recentlyIDE and virtio only started this very recently

Page 26: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Data integrity - cache=none

Direct transfer to disk should imply it's safeDirect transfer to disk should imply it's safe Except that it is not:Except that it is not:

Does not guarantee disk caches are flushedDoes not guarantee disk caches are flushed Does not give any guarantees about metadataDoes not give any guarantees about metadata

Thus also needs an explicit cache flushThus also needs an explicit cache flush

Page 27: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Performance – large sequential I/O

sequential read 8GBsequential read 8GB sequential write 8GBsequential write 8GB0 MB/s0 MB/s

20 MB/s20 MB/s

40 MB/s40 MB/s

60 MB/s60 MB/s

80 MB/s80 MB/s

100 MB/s100 MB/s

120 MB/s120 MB/s

140 MB/s140 MB/s

NativeNativeQEMU pthreadsQEMU pthreadsQEMU AIOQEMU AIO

Page 28: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Performance – 256 kilobyte random I/O

random read 256KBrandom read 256KB random write 256KBrandom write 256KB0 MB/s0 MB/s

20 MB/s20 MB/s

40 MB/s40 MB/s

60 MB/s60 MB/s

80 MB/s80 MB/s

100 MB/s100 MB/s

120 MB/s120 MB/s

140 MB/s140 MB/s

160 MB/s160 MB/s

180 MB/s180 MB/s

NativeNativeQEMU pthreadsQEMU pthreadsQEMU AIOQEMU AIO

Page 29: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Performance – 16 kilobyte random I/O

random read 16KBrandom read 16KB random write 16KBrandom write 16KB0 MB/s0 MB/s

10 MB/s10 MB/s

20 MB/s20 MB/s

30 MB/s30 MB/s

40 MB/s40 MB/s

50 MB/s50 MB/s

60 MB/s60 MB/s

70 MB/s70 MB/s

80 MB/s80 MB/s

NativeNativeQEMU pthreadsQEMU pthreadsQEMU AIOQEMU AIO

Page 30: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Disk image formats

Users want volume-manager like features in image files Copy-on write snapshots Encryption Compression

Also VM snapshots need to store additional metadata

Page 31: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Disk Image formats - Qcow2

QcowQcow was the initial QEMU image format to provide copy on write snapshots

In Qemu 0.8.3 Qcow2Qcow2 was added to add additional features and now is the standard image format for QEMU

Provides cluster based copy on write snaphots Supports encryption and compression Allows to store additional metadata for VM

snaphots

Page 32: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Disk image format data integrity issues

A disk image is a minimal filesystem

Metadata for block allocation tables Reference counts for snapshots

So the same integrity issues apply:

A guest cache flush needs to guarantee all metadata updates are on disk

Multiple metadata updates need to be ordered Multiple metadata updates should be transactional

or a image check is required on an unclean shutdown

Page 33: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Qcow2 data integrity issues

Until recently Qcow2 did not care about metadata integrity.

Recently Qcow2 was changed to write metadata synchronously.

Performance for some workloads decreased to a large extent.

Work is under way to implement metadata integrity more efficiently.

Page 34: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Qcow2 performance – streaming write

Qemu 0.12.4Qemu 0.12.4 Qemu 0.12.5Qemu 0.12.50 seconds0 seconds

50 seconds50 seconds

100 seconds100 seconds

150 seconds150 seconds

200 seconds200 seconds

250 seconds250 seconds

cache=writebackcache=writebackcache=writethroughcache=writethroughcache=nonecache=none

Numbers from Alexander Loob <[email protected]>1GB streaming write using dd bs=1024k count=1024

Page 35: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Qed image format

New simplified image format proposed by IBM in September 2010:

No support for internal snapshots, encyption, compression.

Requires sparse file support. Initial implementation supports very efficient

metadata operations

But requires a image check on unclean shutdown.

Page 36: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Qcow2 vs Qed performance

sequential write, 1 threadsequential write, 1 thread sequential write, 16 threadssequential write, 16 threads random write, 1 threadrandom write, 1 thread random write, 16 threadsrandom write, 16 threads0 MB/s0 MB/s

20 MB/s20 MB/s

40 MB/s40 MB/s

60 MB/s60 MB/s

80 MB/s80 MB/s

100 MB/s100 MB/s

120 MB/s120 MB/s

140 MB/s140 MB/s

160 MB/s160 MB/s

QedQedQcow2Qcow2

Numbers from Khoa Huynh <[email protected]>FFSB on ext4 in the guest

Page 37: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Storage protocol drivers

Qemu allows other images to reside outside the Qemu allows other images to reside outside the local filesystem.local filesystem.

ProtocolProtocol drivers implement the storage access: drivers implement the storage access: The The curlcurl backend allows using VM images from backend allows using VM images from

the internet over http and ftp connectionsthe internet over http and ftp connections. The nbdnbd backend allows direct access to nbd

servers. The sheepdogsheepdog backs images by a distributed

storage protocol.

Page 38: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Thin provisioning

Simple example: a sparse image fileSimple example: a sparse image file Initially does not have blocks allocated to itInitially does not have blocks allocated to it Block get allocate on the first writeBlock get allocate on the first write

To make it fully useful also needs to support To make it fully useful also needs to support reclaiming space after deletionsreclaiming space after deletions

An important topic both for high-end storage An important topic both for high-end storage arrays and virtualizationarrays and virtualization

Page 39: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Thin provisioning - standards

The T10 SBC standard for SCSI disks / storage The T10 SBC standard for SCSI disks / storage arrays contains TP support in it's newest revisionsarrays contains TP support in it's newest revisions The UNMAP and WRITE SAME commands allow The UNMAP and WRITE SAME commands allow

telling the storage device to free datatelling the storage device to free data Perfect use case for qemu to know that the Perfect use case for qemu to know that the

guest has freed the storageguest has freed the storage The ATA spec has a similar TRIM command for The ATA spec has a similar TRIM command for

Solid State Drives (SSDs)Solid State Drives (SSDs)

Page 40: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Thin provisioning – implementation

The guest needs extensive enablement for thin The guest needs extensive enablement for thin provisioning:provisioning: Support in the device drivers to actually send Support in the device drivers to actually send

the commandsthe commands Code in the filesystem to track deleted spaceCode in the filesystem to track deleted space Guest enablement is shared with support for Guest enablement is shared with support for

thinkly provisionen RAID arrays and SSDsthinkly provisionen RAID arrays and SSDs

Page 41: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Thin provisioning - implementation

Decoding the Decoding the WRITE SAMEWRITE SAME / / UNMAPUNMAP / / TRIMTRIM commands is easycommands is easy

Actually freeing space is harder:Actually freeing space is harder: The standard Posix APIs don't allow punching The standard Posix APIs don't allow punching

holes into filesholes into files Need filesystem specific extensions for that Need filesystem specific extensions for that

(e.g. in (e.g. in XFSXFS))

Page 42: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Thin provisioning - implementation

Fine grained allocation and freeing of blocks is Fine grained allocation and freeing of blocks is problematic:problematic: Causes fragmentation of the backing fileCauses fragmentation of the backing file Allocation overhead can be highAllocation overhead can be high

Need good thresholds for freeing blocksNeed good thresholds for freeing blocks Similar problem faced by storage arraysSimilar problem faced by storage arrays SBC allows to communicate these thresholdsSBC allows to communicate these thresholds

Block allocation also needs the same thresholdsBlock allocation also needs the same thresholds

Page 43: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Storage efficiency

A big virtualization specific problem is to avoid A big virtualization specific problem is to avoid duplicate storage of data data:duplicate storage of data data: Often many similar guests run on the same Often many similar guests run on the same

hosthost Two approaches:Two approaches:

Image clones – start with a common image and Image clones – start with a common image and track changes with a copy on write schemetrack changes with a copy on write scheme

Data deduplication – find duplicate blocks and Data deduplication – find duplicate blocks and merge them after the factmerge them after the fact

Page 44: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Backing images

QEMU allows for copy on write images in the QEMU allows for copy on write images in the QCOW2 format.QCOW2 format. Simple to set up and useSimple to set up and use

LVM supports copy on write volumes (snapshots)LVM supports copy on write volumes (snapshots) Requires the usage of full block devices,Requires the usage of full block devices,

Some modern filesystems (Some modern filesystems (btrfsbtrfs, , ocfs2ocfs2) allow ) allow file level snapshotsfile level snapshots

None of the above two integrated with QEMU yetNone of the above two integrated with QEMU yet

Page 45: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Data deduplication

All the above options have one disadvantage:All the above options have one disadvantage: The data sharing needs to be planned aheadThe data sharing needs to be planned ahead

The term data deduplication is used for the The term data deduplication is used for the process of finding these duplicates later and process of finding these duplicates later and merging themmerging them A relatively expensive and slow process A relatively expensive and slow process

without additional metadatawithout additional metadata Hot topic in the storage industryHot topic in the storage industry Not yet implemented in QEMU or lower layersNot yet implemented in QEMU or lower layers

Page 46: Linux Storage and Virtualization · Storage protocol drivers Qemu allows other images to reside outside the local filesystem. Protocol drivers implement the storage access: The curl

Questions?

Thanks for your attention! Feel free to contact me at: [email protected]