hardware virtualization technology and its security - pku virtualization... · vt is designed to...
TRANSCRIPT
Hardware virtualization
technology and its security
Dr. Qingni Shen
Peking University
Intel UPO Supported
Virtual Machine Monitors (VMMs)
VMM is a software layer
Allow many virtual machine to share hardware
Allow unmodified software directly compatible
...
Virtual Machine Monitor (VMM)
VMn VM0 VM1
Platform HW
I/O Devices Processor/CS Memory
Virtual
Machines
(VMs)
Appn App0
Guest OS0
App1
Guest OS1 Guest OSn
Workload Isolation
Purpose of Virtualization
Workload Consolidation
Workload Migration Workload Embedding
HW
App2 App1
OS
HW1 HW2
App2 App1
OS1 OS2
VMM
HW
App2 App1
OS1 OS2
VMM
HW1
App
HW2
VMM
OS
VMM
HW1
App
HW2
VMM
OS
VMM
HW
App App
OS1 OS2
VMM
HW
App1 App2
OS OS
Virtualization has powerful capabilities
Virtualization Usage Models
Legacy software support
Test
The active partition
Manageable
…
Server consolidation
Failure recovery architecture
High elastic data center
Manageable
…
Migration
Consolidation
Consolidation
Consolidation
Isolation
Migration
Embedding
Isolation Migration
Embedding
Isolation Migration
CL
IEN
T
SE
RV
ER
What is Intel VT technology Formerly known by the codenames Vanderpool* & Silvervale*
VT is a collection of a series of hardware enhanced
components
VT is designed to simplify the virtualization software
VT brings a new value, and various opportunities
VT-x and VT-i the first VT series products implement on
Intel processor and chip set.
VT-x for IA-32 CPU virtualization enhancement
VT-i for IPF CPU virtualization enhancement
Main components of Intel-VT
Intel-VT technology, which is designed by
Intel corporation, is a solution of hardware
assisted virtualization. Including:
VT-x/VT-i for CPU
VT-d for chip set
VT-c for network
Core function of VT-x/VT-i
Intel flexible priority technology
–(Intel VT FlexPriority)
Intel VT flexible migration technology
–(Intel VT FlexMigration)
Intel VT extended page table
–(Extended Page Tables)
Intel VT FlexPriority When the processor executes the task, it will receive request or
“Interruption” command which needs to pay attention to and
produced by other devices or applications. In order to minimize the
impact on performance, a special register within the processor will
monitor the task priority. Thus, only a higher priority than the
currently running task interruption will be timely focused. Intel
FlexPriority can create a virtual copy of TPR6,which can be read,
and can be modified by guest os without any intervention in some cases.
This measure can make a significant performance improvement in 32
bit OS which uses TPR frequently.( For instance,the performance of
application in Windows Server* 2000 will be improved by 35%.)
Intel VT FlexMigration
An important advantage of virtualization is that in no downtime
condition, running applications can be migrated between physical
machines. The aim of Intel VT FlexMigration is to achieve the
seamless migration between current server and future server
which are based on Intel processor, even if the new system may
include enhanced instruction set. With the help of this technology,
management process can create a set of consistent instructions in
all servers in migration pool, realizing seamless migration of
workload. This generates a more flexible and unified server
resource pool which can run seamlessly among generations of
hardware.
Platform Hardware
VM1
VM Monitor
VM0
Guest OS0
App App App ...
... Guest OS1
App App App
...
OS and applications should not know that they are
sharing CPU resources with others
VMM should be able to protect themselves from other client software threat
Challenge of development of VMM
VMM should be able to make software stack in VM mutually independent
VMM should be able to provide virtual hardware platform interface to guest software
Platform Hardware
VM1
VM Monitor
VM0
Guest OS0 ... Guest OS1
Run VMM in VMM to handle
errors during Guest OS operation
CPU virtualization of current IA architecture requires complex software design.
Software solution: Client degradation
Virtual hole of IA architecture: • Ring level rename • Non-trap instruction • Out of bound error • I interruption virtualization • Context switching of CPU state •Address space compression
Complex software skills • Source code modification • Binary code modification
App App App ... App App App ...
Sensitive instruction will go wrong when run Guest OS in ring 0 and above
VMM is able to execute privilege instructions before guest software
VT removes the design of virtualization hole and complex software
Intel® Virtualization Technology
Guest software runs in the new model, and the privilege is down;
• Applications still run in ring 3 • OS runs in degraded privilege ring 0 • VMM runs in a new model with all privileges
Platform Hardware
VM1
VM Monitor
VM0
Guest OS0 ... Guest OS1
App App App ... App App App ...
An overview of VT-x
Operation Mode
Guest OS VMM transition
VM control structure
Virtual-machine control structure
Principle of VM exit
Benefits
Operation mode VMX root mode:
Own all privileges for the operation of the VMM
VMX non-root mode:
Own a subset of privileges for running guest softwares
Rely on the ring level to reduce guest and software privileges
With the help of renaming the ring and compression
VMX operation mode
Root operation mode
VMM is running in the root operation mode
Non- root operation mode
Guest software is running in the non-root operation
mode
VM Entry and VM Exit
VM Entry From VMM into Guest
Fetch VM state from VMCS,and enter in non-root mode
VMLAUNCH instruction is used to initialize the entry VMRESUME is used to re-enter the virtual machine state
Physical Host Hardware
VM1
VM Monitor
VM0
Guest OS0
App App App ...
... Guest OS1
App App App ...
VM Exit VM Entry
VM Exit ➤From Guest into VMM ➤Enters VMX root mode ➤Place guest state into
VMCS ➤Import VMM state from
VMCS
VT-x Operation
Ring 0
Ring 3 VMX Root
Operation
VMX
Non-root
Operation
. . . Ring 0
Ring 3
VM 1
Ring 0
Ring 3
VM 2
Ring 0
Ring 3
VM n
VMLAUNCH
VT-x Operation
Ring 0
Ring 3 VMX Root
Operation
VMX
Non-root
Operation
. . . Ring 0
Ring 3
VM 1
Ring 0
Ring 3
VM 2
Ring 0
Ring 3
VM n
VMCS2 VMCSn VMCS1
Virtual Machine Control
Structure (VMCS)
VMCSs is control structure stored in the memory
Only one VMCS is active every time
VMCS Payload:
VM execution,exit,entry control
Guest and host state
VM exits information field
VMCS currently has no uniform standard , so different designs may have different definitions
VMPTRLD: a pointer pointing to VMCS
VMREAD/VMWRITE: new VMCS access instructions
Virtual machine control structure (VMCS)
In the view of VMX operation,Intel defines VMCS. This structure can only be operated by VMCLEAR, VMPTRLD, VMREAD, and VMWRITE。
a) GUEST-STATE domain:state of processor when VM changes from root mode to non-root mode;
b) HOST-STATE domain:state of processor when VM changes from non-root mode to root mode ;
c) VM execution control domain: Processor is forced to exit from non-root operation mode to root operation mode if VM is running in non-root operation mode.
d) VM exit control domain:Store information f VM exits from non-root operation mode.
e) VM entry control domain:Read information if VM enters into non-root operation mode.
f) VM exit information domain:Save the reason into domain if VM exits from non-root operation mode to root operation mode.
Reasons of VM EXIT
Exit paging state to operate on the page table
Access CR3, INVLPG instruction(Control TLB disabled)
Page error
CR0/CR4 access
Some states need virtualization
CPUID, RDMSR, WRMSR, RDPMC, RDTSC, MOV DRx
Exception and I/O access
32-entry exception bitmap, I/O-port access bitmap
Control of the asynchronous events
When guest interrupt blocks, VMM should handle this situation
Detect guest states in order to facilitate VM scheduling
HLT, MWAIT, PAUSE
Benefits: VT helps improve VMMs
VT reduces the guest OS’s dependency
No need for binary package or translation
Provide support for legacy system
VT improves robustness
No need for complex software technology
Simplified
Smaller Trusted Compute Base (TCB)
VT improves performance
Fewer switching between VM and VMM
Device Virtualization (VT-d)
As for server, I/O is an important component. The improvement
of CPU computing ability can lead to faster data processing, only
with the premise of the smooth arrival of data to CPU. As a result,
whether the storage or the network, as well as the graphic cards,
memory, and so on, I/O capability is an critical part of enterprise-
level architecture.
Without VT-d technology, VMM must be involved in the
interaction with I/O directly, which will not only slows down the
speed of data transmission, but also increases processor’s
workload due to frequent VMM activities. VT-d provides direct
access to real hardware mechanism for guest OS, which greatly
reduces server processor’s workload.
Current way of virtualization
Simulate the I/O device:VMM simulates an I/O device for the
guest so that the guest can make use of the corresponding real
drivers through fully simulating devices’ functionality. This
approach can provide perfect compatibility (regardless of the fact
that whether this device exists or not), but this simulation will
affect performance apparently.
Additional software interface : This mode is more like I/O
simulation model. VMM software will provide a series of direct
device interface to VM, so as to enhance the efficiency of
virtualization. This is a bit like the DirectX technology of
Windows OS, which offers better performance than I/O simulation
model, but decreases the capability.
Design of VT-d
The key to I/O virtualization is to solve the problem of DMA and
IRQ interrupt request.
Intel VT-d technology is based on hardware-assisted virtualization
technology of North Bridge. The DMA virtualization hardware
and IRQ virtualization hardware, built in the North Bridge,
greatly enhance the reliability, flexibility and performance of I/O.
Traditional IOMMUs (I/O memory management units)
distinguishes devices through the range of memory address. So it is
easy to realize, but is not easy to implement DMA isolation.
Therefore, VT-d realizes the existence of multiple DMA protected
areas by updating the design of IOMMU architecture, and
achieves DMA virtualization eventually. It is also called DMA
Remapping.
I/O device will generate many interrupt requests, so the I/O virtualization
must separate these requests correctly, and routes them to different virtual
machines. Traditional devices have two kinds of interrupt requests: One way is
through I/O interrupt controller router, and the other way is through
MSI(message signaled interrupts) which is sent by DMA write request directly.
Due to the need to embed the target memory address into DMA request, this
architecture requires fully access all the memory addresses, without realizing
interrupt isolation.
VT-d’s interrupt-remapping architecture solves this problem by redefining
MSI format. The new MSI is still in the form of a DMA write request, but does
not embed the target memory address, and replaces with a message ID instead.
Hardware can identify different VM domains through different message IDs
by maintaining a table structure. The interrupt-remapping architecture
implemented by VT-d is able to support all I/O resources, including IOAPICs,
and all types of interrupt, such as common MSI and extended MSI-X.
DMA Remapping
DMA remapping can provide hardware isolation for
devices to access the memory. Through different I/O
page tables, every device will be assigned to a specific
domain. When the device attempts to access the
system memory, DMA intercepts the access, decides
whether to allow the access, and determines the real
address location simultaneously. When the I/O table
data structure is used frequently, it will be cached.
DMA remapping mechanism can be configured
independently by every device.
Interrupt Remapping
Interrupt remapping provides the
functions of remapping and routing the
interrupt requests from I/O devices.
New design of IOMMU
IOMMU manages device access to system
memory. It locates between the peripheral
devices and the host, and translates the
address of device request to system memory
address, and also checks the appropriate
permission for each access.
With IOMMU, every device can be assigned
to a protection domain, which defines that the
I/O page translation will be used in every
device of the domain, and reveals the read
privilege of every I/O page. As to
virtualization, VMM can specify all devices
to a specific guest OS environment in the
same protected domain, which will create a
series of address translation and access
restrict for devices running on specific guest
OS.
Two kinds of new device virtualization based on VT-d
Direct assignment of I/O device:Physical I/O device is directly assigned to VM. In
this model, drivers inside the VM will directly communicate with hardware devices,
only through a small amount or without the management of VMM. For the sake of
system’s robustness, hardware virtualization is needed to isolate and protect
hardware resources only for specified VM to use. In the meanwhile, hardware also
needs to possess multiple I/O container partitions for multiple VMs simultaneously.
This model almost eliminates the need of running drivers in VMM completely.
Such as CPU,although it is not an I/O device in common sense, it is surely in this
way allocated to VM, while the CPU resources are still under the management of
VMM.
Shared I/O device: This model is an extension of the I/O assignment model, and has
a high requirement that needs to support multiple function interfaces, and each
interface can be assigned to a VM independently. This model will no doubt provide
very high virtualization performance.
Network Virtualization (VT-c)
Intel VT-c can further optimize network for virtualization.
Essentially, the function of this set of technology
combination is similar with post office: categorize all the
received letters, packages and envelopes, and deliver them to
their respective destinations. Intel VT-c significantly
increases the speed of delivery, and reduces the workload of
VMM and server processor through these functions
implementing in private network chips. VT-c includes:
Virtual Machine Device Queue (VMDq)
Virtual Machine Direct Connection (VMDc)
VMDq
In traditional server virtualization environment, VMM must
categorize every individual data packet, and deliver it to its
assigned VM, which will take up a lot of processor cycles. And
with VMDq, this function can be performed by specified hardware
within Intel server network card, and VMM is only responsible to
deliver presort data packet group to appropriate guest OS. This
will slow down I/O latency, and gain more available cycles for
processor to deal with business applications. I/O throughput can
be more than doubled by Intel VT-c, so that virtualized
applications are able to reach the level of the host throughput.
Every server will integrate more applications, while I/O
bottlenecks will be less.
Network virtualization model
Currently, all the VM softwares with
network capabilities have built-in virtual
switches, a majority of which provide the
function of router on that basis. Their
aim is to connect multiple virtual
machines together into one or more
networks, like the effect of real switch or
router.
Structure of VMDq
VMDq technology provides a classification/sorting engine, belonging to
the second layer of ISO OSI 7-layer model, realizes part of the functions
of the switch. In order to offer a suitable performance, it must use a stack
buffer queue, therefore the network card that supports VMDq will also
supports RSS receiver’s extended function.
A layer 2 classification/sorting device is realized by a hardware on the
network card that supports VMDq, which through the MAC address or
VLAN to send packets to specified VM queue(this queue is called pool).
VMM software that completes virtual switch task only requires simple
data replication in the final. Thus it greatly improve the efficiency of the
virtual network.
Network card that supports VMDq queue usually supports RSS queue.
For example, Intel 82576EB network card supports 8 VM queues, and 16
RSS queues. The are essentially 16 send/receive queue pairs, which means
every VM can be assigned two pairs.
Diagram of VMDq Acceleration Structure
Make use of hardware to accomplish the work of certain soft routing.
Virtual Machine Direct Connection( VMDc )
With the aid of single root I/O virtualization (SR-IOV)
standard in PCI-SI, VM direct connection (VMDc) supports
VM’s direct access to network I/O hardware, and thus
improves the performance significantly. As it is mentioned
before, Intel VT-d supports direct communication channel
between guest OS and I/O port. SR-IOV can be extended by
supporting each I/O port’s multiple communication
channels. For example,each of the 10 guest OSes can be
assigned a protected and 1Gb/s private link by the mean of
a single Intel 10 Gigabit server network card. These links
bypass the VMM switch,and can further enhance I/O in
performance and reduce workload of server processors.
Security Analysis of VT-d
Hardware virtualization solves the security
problem of virtual system, and provides a
better isolation solution in system hardware
resources.
But the hardware system is complicated, so
there are still some security problems to be
solved. In the meantime, a few attackers
have discovered some loopholes in hardware
virtualization.
Attack Scenario
Assume such a virtual system, which builds a driver
domain with the aid of the Intel VT-d technology.
Driver domains are similar to traditional VMs, but
they are assigned the privileges of choosing devices
such as network card, disk controller etc.
We can attempt to get the complete control of the
whole system by the mean of such a deriver domain.
In this attack scenario, we suppose that attackers
have managed to get a full control of a certain driver
domain.
MSI( Message Signaled Interrupts )
MSI Format(From Intel developer manual ):
All the three attacks, which will be mentioned later, make use of I/O devices to generate the MSI, so as to realize the attack.
1)Threat based on SIPI Construction
SIPI ( Start-up Inter Processor Interrupt )
interrupt is a key function of any multiprocessor
(or multi-core) system based on Intel processor.
BIOS uses SIPI interrupt to initialize all
processers and distribute tasks to them at startup.
When system starts, only one processor, called
Bootstrap processor or BSP, is active, and its job
is to initialize other processors to make them
work properly.
SIPI interrupt informs target processor to start to
execute special boot code at the address 0xvv000.
While VV is passed by SIPI interrupt vector. In
order to make SIPI effective, target CPU must be
sent a INIT interrupt firstly, which will reset CPU to
enter the wait-for-SIPI state. BSP sends SIPI
interrupts to all other processors under normal
circumstances.
The only mechanism of sending SIPI interrupt is
through the local advanced programmable interrupt
controller.
3)#AC-based injection attack
#AC can be tried to confuse the stack layout
of exception handler.
#AC exception is the only exception that
meets the following two requirements:
The vector value is greater than 15, so that it
can be distributed by MSI;
It is the only one that can be interpreted as
exception, without storage error codes.
LOW
HIG
H
ErrorCode
RIP
CS
RFLAGS
RSP
SS
Normal distribution of #AC exception
Storage exception code
The #AC handler will be triggered to execute on
target CPU if the MSI, with a vector value 0x11(#
AC), is distributed from some devices. Because
handler is expected to place error codes on the top of
the stack, so it will go wrong when resolve other
values on the stack. In this case, CS may be revolved
to RIP, and RFLAGS will be treated as CS and so on.
When an exception handler ends, it will execute
IRET instruction to popup saved register values, and
jumps back to CS:RIP, which means that handler
will return to RFLAGS:CS actually。
Bibliography 1. Hiremane, R. (2007). "Intel virtualization technology for directed i/o (intel vt-d)."
Technology@ Intel Magazine 4(10).
2. Neiger, G., et al. (2006). "Intel virtualization technology: Hardware support for
efficient processor virtualization." Intel Technology Journal 10(3): 167-177.
3. Uhlig, R., et al. (2005). "Intel virtualization technology." Computer 38(5): 48-56.
4. Adams, K. and O. Agesen (2006). A comparison of software and hardware
techniques for x86 virtualization. ACM SIGOPS Operating Systems Review, ACM.
5. Zhang, X. and Y. Dong (2008). Optimizing Xen VMM Based on Intel®
Virtualization Technology. Internet Computing in Science and Engineering, 2008.
ICICSE'08. International Conference on, IEEE.
6. Perez, R., et al. (2008). "Virtualization and hardware-based security." Security &
Privacy, IEEE 6(5): 24-31.
7. De Gelas, J. and I. ESX (2008). "Hardware Virtualization: the Nuts and Bolts."
AnandTech. Retrieved March 17: 2008.