better utilization of storage features from kvm guest via...

39
© Hitachi, Ltd. 2013. All rights reserved. Hitachi, Ltd. Information & Telecommunication Systems Company IT Platform Division Group Masaki Kimura Better Utilization of Storage Features from KVM Guest via virtio-scsi <[email protected]>

Upload: lylien

Post on 24-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

© Hitachi, Ltd. 2013. All rights reserved.

Hitachi, Ltd.

Information & Telecommunication Systems Company IT Platform Division Group

Masaki Kimura

Better Utilization of Storage Features from KVM Guest via virtio-scsi

<[email protected]>

© Hitachi, Ltd. 2013. All rights reserved.

1. Background (Use Cases and Requirements)

2. KVM features for guest SCSI commands

3. Current Status of these features

Better Utilization of Storage Features from KVM Guest via virtio-scsi

1

Contents

4. Summary

5. Future work

© Hitachi, Ltd. 2013. All rights reserved.

1. Background

2

Better Utilization of Storage Features from KVM Guest via virtio-scsi

3 © Hitachi, Ltd. 2013. All rights reserved.

1-1. Background

Enterprise systems expect that virtualized environment has the same level of manageability, availability, and reliability achieved in bare-metal.

For example: 1. Thin-provisioned Storage for manageability, 2. HA cluster for availability, 3. Backup server for reliability.

In bare-metal environment, some of these requirements are achieved by using storage features, such as SCSI commands.

In virtualized environment, the same use cases exists for guests. Issuing SCSI commands from guests are required.

Three use cases, thin-provisioned storage, HA cluster, and backup server will be explained in the next slides.

4 © Hitachi, Ltd. 2013. All rights reserved.

1-2. Use case #1: Thin-provisioned Storage

Server

OS

Server

Host OS

QEMU

WriteSame

reclaiming disk blocks

Bare-metal KVM Guest

WriteSame

Guest OS

Guest OS

This use case exists for both bare-metal and KVM guests. WriteSame is required to be issued to storage from guests.

- Many types of enterprise storage have thin-provision function. - For achievement of thin-provision, a disk block is allocated on access. - However, once it is allocated, it can not be reclaimed by storage automatically even when the disk block becomes unused by OS. - To reclaim unused disk block, OS needs to let storage know unused blocks by issuing WriteSame SCSI command.

5 © Hitachi, Ltd. 2013. All rights reserved.

1-3. Use Case #2: HA cluster (1/2)

LU

Server

OS

Server

OS

LU

Server

Host OS

Server

Host OS

QEMU QEMU HA Cluster

HA Cluster

Persistent Reservation Persistent

Reservation

Guest OS

Guest OS

Guest OS

Guest OS

This use case also exists for KVM guests. Persistent Reservation is required to be issued to storage from guest.

- To improve availability, HA cluster is commonly used in bare-metal.

- For HA cluster, Persistent Reservation SCSI command is generally used to guarantee an exclusive access from an active system.

Bare-metal KVM Guest

blocked blocked

Active Standby

Standby Active

6 © Hitachi, Ltd. 2013. All rights reserved.

1-4. Use Case #2: HA cluster (2/2)

- Persistent Reservation is held by so-called I_T nexus, the combination of initiator ID and target ID.

- I/Os from standby system are blocked, because the I_T nexus is different.

- Therefore, I_T nexus is required to be unique for Persistent Reservation to work properly.

LU

Server

OS

Server

OS

Bare-metal

LU

Server

Host OS

Server

Host OS

QEMU QEMU

KVM Guest

HA Cluster

HA Cluster

Persistent Reservation

Persistent Reservation

Guest OS

Guest OS

Guest OS

Guest OS

initiator_1 initiator_2

initiator_3 initiator_4

target_A target_B

Reserved (initiator_1, target_A)

Reserved (initiator_3, target_B)

initiator_* target_* : Initiator ID. : Target ID.

blocked blocked

Active Standby

Standby Active

7 © Hitachi, Ltd. 2013. All rights reserved.

1-5. Use Case #3: Backup Server

- Persistent Reservation is also used by some backup-server products to guarantee an exclusive access from a backup server on backup.

- Persistent Reservation to storage and unique I_T nexus are required by these products.

Server

OS

Bare-metal

Server

Host OS

QEMU

KVM Guest

Persistent Reservation

Persistent Reservation

Guest OS

Guest OS

Guest OS

initiator_1

initiator_4

initiator_* target_* : Initiator ID. : Target ID.

blocked blocked

Backup Server

Server

OS

LU

initiator_2

Server

OS

Business systems

LU

initiator_3

initiator_5

Backup Server

LU LU

Business systems

initiator_6

target_A target_B

8 © Hitachi, Ltd. 2013. All rights reserved.

1-6. Summary of Requirements

Requirements from the use cases: Requirement #1: SCSI commands to storage from guests.

- Thin-provisioned storage requires WriteSame to storage from guests.

- HA cluster and backup server require Persistent Reservation to storage from guests.

Requirement #2: Unique initiator ID across guests. - HA cluster and backup server require I_T nexus to

be unique.

© Hitachi, Ltd. 2013. All rights reserved.

2. KVM features for guest SCSI commands

9

Better Utilization of Storage Features from KVM Guest via virtio-scsi

10 © Hitachi, Ltd. 2013. All rights reserved.

2-1. KVM features for guest disks

This presentation focuses on following three device types and their configurations.

# Configuration

Device Type Initiator Target Backend

1

(1) virtio-blk -

File

2 Device

3 LUN

4

(2) virtio-scsi

- (a)qemu

File

5 Device

6 LUN

7 - (b)lio

block

8 pscsi

9 (c) libiscsi - iSCSI storage

10 (3)PCI device assignment

(a)Legacy - PCI device

11 (b)VFIO - PCI device

11 © Hitachi, Ltd. 2013. All rights reserved.

2-2. (1) virtio-blk (1/2)

- Para-virtualized disk. - Shown as /dev/vdX on Linux guests. - Maximum number of disks is limited by maximum number of PCI devices (32). - Improving performance with virtio-blk plane and bio-based I/O.

Guest kernel

QEMU

Host kernel

Hardware

virtio-blk

Block Layer

SCSI Layer

/dev/vdx

/dev/sdx

Block Layer

SCSI Layer

virtio-blk Device Layer

Device

Device Layer

Characteristic (1)virtio-blk

12 © Hitachi, Ltd. 2013. All rights reserved.

2-3. (1) virtio-blk (2/2)

Configurations and how SCSI commands are handled.

- SCSI command from guest reaches to storage only when attached as LUN.

Guest

QEMU

Host kernel

Hardware

virtio-blk

Block

/dev/vdx Block

SCSI

Device

Device

File Block

LUN VFS

virtio-blk /dev/vdx

virtio-blk /dev/vdx

virtio-blk

Pass-through

SCSI Command Capability

Backend KVM Command Line Libvirt XML SCSI command

Disk (file)

-device virtio-blk-pci,scsi=off <disk type=‘file’ device=‘disk’> <target dev='vda' bus='virtio'/>

Not Supported

Disk (block)

-device virtio-blk-pci,scsi=off <disk type=‘block’ device=‘disk’> <target dev='vda' bus='virtio'/>

Not Supported

LUN -device virtio-blk-pci,scsi=on <disk type=‘block’ device=‘lun’> <target dev='vda' bus='virtio'/>

Pass-through

virtio-blk virtio-blk

13 © Hitachi, Ltd. 2013. All rights reserved.

2-4. (2) virtio-scsi

virtio-scsi has three types of configurations.

(a)Qemu target (b)lio target (c)libiscsi

Guest kernel

QEMU

Host kernel

Hardware

virtio-scsi scsi_mod

Block Layer

SCSI Layer

/dev/sdx

/dev/sdx

Block Layer

SCSI Layer

virtio-scsi Device Layer

Device

Device Layer

virtio-scsi scsi_mod

/dev/sdx

virtio-scsi

virtio-scsi scsi_mod

/dev/sdx

libiscsi

tcm_vhost

/dev/sdx LIO

14 © Hitachi, Ltd. 2013. All rights reserved.

2-5. (2) virtio-scsi: (a) qemu target (1/2)

[virtio-scsi] - Para-virtualized SCSI transport. - Shown as /dev/sdX on Linux guests.

[Qemu target] - User space target

Guest kernel

QEMU

Host kernel

Hardware

Block Layer

SCSI Layer

Block Layer

SCSI Layer

Device Layer

Device

Device Layer

virtio-scsi

scsi_mod

/dev/sdx

/dev/sdx

virtio-scsi

Characteristic (a)Qemu target

15 © Hitachi, Ltd. 2013. All rights reserved.

2-6. (2) virtio-scsi (a) qemu-target (2/2)

Configurations and how SCSI commands are handled.

- SCSI command from guest reaches to storage only when attached as LUN. - Emulated results return to guest when attached as file or device.

Guest

QEMU

Host kernel

Hardware

Block

Block

SCSI

Device

Device

File Device

LUN VFS

/dev/sdx

virtio-scsi

virtio-scsi scsi_mod

/dev/sdx

virtio-scsi

/dev/sdx

virtio-scsi

virtio-scsi scsi_mod

virtio-scsi scsi_mod

Emulated result Pass-

through

SCSI Command Capability

Backend KVM Command Line Libvirt XML SCSI command

Disk (file)

-device scsi-hd <disk type=‘file’ device=‘disk’> <target dev='sda' bus='scsi'/>

Emulated

Disk (block)

-device scsi-hd <disk type=‘block’ device=‘disk’> <target dev='sda' bus='scsi'/>

Emulated

LUN -device scsi-block <disk type=‘block’ device=‘lun’> <target dev='sda' bus='scsi'/>

Pass-through

16 © Hitachi, Ltd. 2013. All rights reserved.

2-7. (2) virtio-scsi: (b) lio target

Guest kernel

QEMU

Host kernel

Hardware

Block Layer

SCSI Layer

Block Layer

SCSI Layer

Device Layer

Device

Device Layer

virtio-scsi scsi_mod

/dev/sdx

tcm_vhost

/dev/sdx

[virtio-scsi] - Para-virtualized SCSI transport. - Shown as /dev/sdX on Linux guests.

[lio target] - Kernel space target. - Using LIO (linux-iscsi.org ) as backend. - LIO supports following back-stores: - block - fileio - pscsi - ramdisk

- Not yet evaluated. (pscsi is pass-through SCSI, therefore it is expected to work well.)

Characteristic

SCSI Command Capability

LIO

(b)lio target

17 © Hitachi, Ltd. 2013. All rights reserved.

2-8. (2) virtio-scsi: (c) libiscsi

Guest kernel

QEMU

Host kernel

Hardware

Block Layer

SCSI Layer

Device Layer

Device

virtio-scsi scsi_mod

/dev/sdx

virtio-scsi libiscsi

[virtio-scsi] - Para-virtualized SCSI transport. - Shown as /dev/sdX on Linux guests.

[libiscsi] - User space iSCSI initiator. - Only support iSCSI. - QEMU directly talk to iSCSI storage, therefore host does not see guest disks.

- SCSI command from guest reaches to storage.

Characteristic

SCSI Command Capability

Directly talk to iSCSI storage.

(c)libiscsi

18 © Hitachi, Ltd. 2013. All rights reserved.

2-9. (3) PCI Device assignment

(a) Legacy

Guest kernel

QEMU

Host kernel

Hardware PCI device

SCSI LLD scsi_mod

PCI device

/dev/sdx Block Layer

SCSI Layer

pci-stub

Device Layer

Device

Device Layer

PCI device

SCSI LLD scsi_mod

PCI device

/dev/sdx

vfio

(b) VFIO

PCI Device assignment has two types:

Characteristic

SCSI Command Capability

- Assign PCI device to guests. - Host PCI device is dedicated to one guest, therefore the number of guests is limited to the number of PCI devices (or their ports.)

- SCSI command from guest reaches to storage in both legacy and VFIO configurations.

19 © Hitachi, Ltd. 2013. All rights reserved.

2-10. Summary of SCSI command capability

# Configuration Whether guest

SCSI commands reach to storage Device Type Initiator Target Backend

1

virtio-blk -

File No

2 Device No

3 LUN Yes

4

virtio-scsi

- qemu

File No

5 Device No

6 LUN Yes

7 - lio

block No

8 pscsi ???

9 libiscsi - iSCSI storage Yes

10 PCI Device assignment

Legacy - PCI device Yes

11 VFIO - PCI device Yes

In the next chapter, we will see SCSI command capabilities deeper only with configurations marked “Yes” in above table.

© Hitachi, Ltd. 2013. All rights reserved.

3. Current Status of these features

20

Better Utilization of Storage Features from KVM Guest via virtio-scsi

21 © Hitachi, Ltd. 2013. All rights reserved.

3-1. Evaluation Item and Configuration

Evaluation Items:

# Device Type Initiator Target Backend

1 virtio-blk - LUN

2 virtio-scsi

- qemu LUN

3 libiscsi - iSCSI storage

4 PCI Device assignment

Legacy - PCI device

5 VFIO - PCI device

From next slide, I will share what problems remain in which configurations.

Configurations:

# Evaluation Item Remarks

(a) Whether SCSI commands reach to storage. Requirement #1

(b) Whether unique initiator ID is assigned. Requirement #2

(c) Whether SCSI commands return proper results. -

22 © Hitachi, Ltd. 2013. All rights reserved.

[virtio-blk] [virtio-scsi]

Guest kernel

QEMU

Host kernel

Hardware

Block Layer

SCSI Layer

/dev/vdx /dev/sdx Block Layer

SCSI Layer

Device Layer

Device

Device Layer

Permission Check

Permission Check

CAP_SYS_RAWIO

read_ok command

write_ok command && has write permission

No

No

Not allowed Allowed No

Yes

- There is a permission check in host kernel, when guest SCSI commands is issued via virtio-blk or virtio-scsi with qemu target.

- Libvirt-managed KVM guests run as qemu user, who lacks CAP_SYS_RAWIO.

Some SCSI commands, such as Persistent Reserve and Write Same, are blocked by this check unless KVM is running as root user.

: Inquiry, Report LUN, etc

: Persistent Reserve, WriteSame, etc

3-2. (a) Whether SCSI commands reach to storage(1/4) Problem

virtio-blk

virtio-scsi scsi_mod

/dev/sdx /dev/sdx

virtio-blk virtio-scsi

23 © Hitachi, Ltd. 2013. All rights reserved.

However, neither of them has been merged yet.

To solve this issue, following patches have been submitted.

3-3. (a) Whether SCSI commands reach to storage(2/4)

Subject : [PATCH 00/13] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542) Date : January 24, 2013 Committer: Paolo Bonzini URL : https://lkml.org/lkml/2013/1/24/279

Subject : [PATCH v3 0/2] add per-device sysfs knob to enable unrestricted, unprivileged SG_IO Date : November 13, 2012 Committer: Paolo Bonzini URL : https://lkml.org/lkml/2012/11/13/440

Subject : [PATCH v3 part2] Add per-device sysfs knob to enable unrestricted, unprivileged SG_IO Date : May 23, 2013 Committer: Paolo Bonzini URL : https://lkml.org/lkml/2013/5/23/294

24 © Hitachi, Ltd. 2013. All rights reserved.

Concept of these patches is to introduce a flag, unpriv_sgio, to allow non-root users to issue SCSI commands. Permission Check

CAP_SYS_RAWIO || unpriv_sgio

read_ok command

write_ok command && has write permission

No

No

Not allowed Allowed No

Yes # cat /sys/block/sda/queue/unpriv_sgio 0 # echo 1 > /sys/block/sda/queue/unpriv_sgio

* Kernel side interface (Not merged yet.)

If this kernel patch is merged, KVM guest running as qemu user will be able to configure to issue any SCSI commands to storages.

(*) Unpriv_sgio flag can be set per disk.

3-4. (a) Whether SCSI commands reach to storage(3/4)

: Inquiry, Report LUN, etc

: Persistent Reserve, WriteSame, etc

25 © Hitachi, Ltd. 2013. All rights reserved.

3-5. (a) Whether SCSI commands reach to storage(4/4) Why these patches have not been merged yet? Still under discussion on how to avoid opcodes-overlap problem.

READ SUBCHANNEL

UNMAP

0x42 0x42

READ SUBCHANNEL

0x42

0x42 => UNMAP

Intended in guest.

Interpreted in host.

When issued from host When issued from guest

Share same opcode.

Different device classes.

Guest

QEMU

Host

Hardware

[Problem] : Destructive commands might pass-throughed to the host from guests. [Implementation]: Split permission check by device class (*1)

or Introduce per-device filter w/o unpriv_sgio (*2)

Permission Check

Permission Check

Permission Check

Permission Check

w/ unpriv_sgio.

(*1) https://lkml.org/lkml/2013/1/24/299 (*2) https://lkml.org/lkml/2013/5/27/230

26 © Hitachi, Ltd. 2013. All rights reserved.

Guest 2 Guest 1

3-6. (b) Whether unique initiator ID is assigned(1/3)

In such a condition, exclusive access is not guaranteed.

When both guest1 and guest2 are on the same host and use the same HBA, they share the same initiator ID. (virtio-blk or vitio-scsi w/ qemu target)

[virtio-blk] [virtio-scsi]

Guest kernel

QEMU

Host kernel

Hardware

Block Layer SCSI Layer

Block Layer SCSI Layer

virtio-blk virtio-scsi Device Layer

Device

Device Layer

Persistent Reservation

virtio-blk /dev/vda

virtio-blk /dev/vda

Guest 2 Guest 1

virtio-scsi scsi_mod

/dev/sda

virtio-scsi scsi_mod

/dev/sda Persistent Reservation

initiator_* target_* : Initiator ID. : Target ID.

target_a target_b

Problem

/dev/sdx /dev/sdx

HBA HBA initiator_1 initiator_2

27 © Hitachi, Ltd. 2013. All rights reserved.

3-7. (b) Whether unique initiator ID is assigned(2/3)

With NPIV (N-Port ID Virtualization), virtio-blk and vitio-scsi w/ qemu target can assign unique initiator ID.

Guest 2 Guest 1

[virtio-blk] [virtio-scsi]

Guest kernel

QEMU

Host kernel

Hardware

Block Layer SCSI Layer

Block Layer SCSI Layer

virtio-blk virtio-scsi Device Layer

Device

Device Layer

Persistent Reservation

virtio-blk /dev/vda

virtio-blk /dev/vda

Guest 2 Guest 1

virtio-scsi scsi_mod

/dev/sda

virtio-scsi scsi_mod

/dev/sda Persistent Reservation

initiator_* target_* : Initiator ID. : Target ID.

target_a target_b

initiator_1_1 initiator_1_2 initiator_2_1 initiator_2_2

/dev/sdy /dev/sdy

NPIV

Solution

/dev/sdx

HBA initiator_1

/dev/sdx

HBA initiator_2 blocked blocked

28 © Hitachi, Ltd. 2013. All rights reserved.

3-8. (b) Whether unique initiator ID is assigned(3/3)

With libiscsi or PCI device assignment, exclusive access is guaranteed, because initiator IDs are unique with these configurations.

FYI

: Initiator ID. : Target ID.

Guest 2 Guest 1

[PCI device assignment]

Guest kernel

QEMU

Host kernel

Hardware

Block Layer SCSI Layer

Block Layer SCSI Layer Device Layer

Device

Device Layer

Guest 2 Guest 1

initiator_* target_*

target_a target_b

virtio-scsi scsi_mod

/dev/sda

virtio-scsi libiscsi

initiator_1

virtio-scsi scsi_mod

/dev/sda

virtio-scsi libiscsi

Persistent Reservation

initiator_2

[virtio-scsi + libiscsi]

SCSI LLD scsi_mod

pci_device

/dev/sda

pci-stub/vfio

SCSI LLD scsi_mod

pci_device

/dev/sda

pci-stub/vfio

HBA initiator_4 HBA initiator_3

Persistent Reservation

blocked

blocked

29 © Hitachi, Ltd. 2013. All rights reserved.

With virtio-blk, Report LUNs returns the list of LUNs including LUNs which are not assigned to the guest.

Guest kernel

QEMU

Host kernel

Hardware

virtio-blk

virtio-scsi scsi_mod

Block Layer SCSI Layer

/dev/vdx /dev/sdx

/dev/sdb /dev/sdb

Block Layer SCSI Layer

LUN1 LUN1

virtio-blk virtio-scsi Device Layer

Device

Device Layer

LUN0 LUN2 LUN0 LUN2

/dev/sda /dev/sdc /dev/sda /dev/sdc

Report LUNs Report LUNs

Just pass-through Returns emulated results, depending on SCSI commands.

HBA HBA

3-9. (c) Whether SCSI commands return proper results

Problem

* The list of LUNs that host sees.

LUN0 LUN1 LUN2

* The list of LUNs that guest sees.

LUN1

Virtio-blk needs emulation functions to return proper results for particular SCSI commands, such as Report LUNs.

[virtio-blk] [virtio-scsi]

© Hitachi, Ltd. 2013. All rights reserved.

4. Summary

30

Better Utilization of Storage Features from KVM Guest via virtio-scsi

31 © Hitachi, Ltd. 2013. All rights reserved.

4. Summary

Enterprise system requires SCSI commands in virtualized environments. KVM has some configurations which can issue SCSI commands to

storage from guests, however each configuration has some restrictions.

# Configuration

SCSI command Unique WWN

Restriction Persistent Reservation

Write Same

Inquiry Report LUNs

1 virtio-blk - No No Yes No Yes Requires NPIV for unique WWN.

2 virtio-scsi

qemu No No Yes Yes Yes Requires NPIV for unique WWN.

3 libiscsi Yes Yes Yes Yes Yes iSCSI only.

4 PCI Device assignment

Legacy Yes Yes Yes Yes Yes Max number of guests is limited by the number of HBA ports.

5 VFIO Yes Yes Yes Yes Yes

: Already available. : Patch exists, but not merged yet. : No patch, but could be fixed.

© Hitachi, Ltd. 2013. All rights reserved.

5. Future Work

32

Better Utilization of Storage Features from KVM Guest via virtio-scsi

33 © Hitachi, Ltd. 2013. All rights reserved.

5. Future work

1. To allow qemu user to issue Persistent Reservation and WriteSame, proper permission check in host kernel is needed.

2. To make Report LUNs return proper results with virtio-blk, an emulation function for Report LUNs is needed.

3. To make SCSI command capability of virtio-scsi w/ lio target clear, evaluation is needed for virtio-scsi w/ lio target.

34 © Hitachi, Ltd. 2013. All rights reserved.

Questions?

© Hitachi, Ltd. 2013. All rights reserved.

LinuxCon North America

Better Utilization of Storage Features from KVM Guest via virtio-scsi

END

35

Hitachi Ltd.

Information & Telecommunication Systems Company IT Platform Division Group

Masaki Kimura <[email protected]>

36 © Hitachi, Ltd. 2013. All rights reserved.

Trademarks

Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.