Front. Comput. Sci., 2013, 7(1): 34–43
DOI 10.1007/s11704-012-2084-0
Design and verification of a lightweight reliable virtual machinemonitor for a many-core architecture
Yuehua DAI, Yi SHI , Yong QI, Jianbao REN�Peijian WANG
School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
c© Higher Education Press and Springer-Verlag Berlin Heidelberg 2012
Abstract Virtual machine monitors (VMMs) play a central
role in cloud computing. Their reliability and availability are
critical for cloud computing. Virtualization and device emu-
lation make the VMM code base large and the interface be-
tween OS and VMM complex. This results in a code base
that is very hard to verify the security of the VMM. For exam-
ple, a misuse of a VMM hyper-call by a malicious guest OS
can corrupt the whole VMM. The complexity of the VMM
also makes it hard to formally verify the correctness of the
system’s behavior. In this paper a new VMM, operating sys-
tem virtualization (OSV), is proposed. The multiprocessor
boot interface and memory configuration interface are virtu-
alized in OSV at boot time in the Linux kernel. After booting,
only inter-processor interrupt operations are intercepted by
OSV, which makes the interface between OSV and OS sim-
ple. The interface is verified using formal model checking,
which ensures a malicious OS cannot attack OSV through
the interface. Currently, OSV is implemented based on the
AMD Opteron multi-core server architecture. Evaluation re-
sults show that Linux running on OSV has a similar perfor-
mance to native Linux. OSV has a performance improvement
of 4%–13% over Xen.
Keywords virtual machine monitor, model, operating sys-
tem, many core, formal verification
1 Introduction
Virtualization has been widely used in cloud computing. With
Received March 14, 2012; accepted June 16, 2012
E-mail: [email protected]
the help of virtual machine monitors (VMMs), cloud comput-
ing can provide dynamic on-demand services for users at an
attractive cost. In order to host multiple instances of an oper-
ating system in a resource limited server, traditional VMMs
have to virtualize the limited resources, which makes the sys-
tem much more complex [1,2]. For example, Xen has about
200 k lines of code in the hypervisor itself, and over 1 M
lines in the host OS. The I/O devices to be shared between the
virtual machines are virtualized by the VMM, which greatly
contributes to the size of the code base of the VMM. The large
code base and complex hyper-call interfaces for the VMM
make it hard to give a formal verification for the reliability of
the VMM. The best formal verification techniques available
today are only able to handle around 10 k lines of code [3].
Furthermore, the complex code of a VMM also makes it hard
for shipping bug-free code [4]. A malicious guest OS in the
VMM can exploit these bugs to attack the VMM and gain the
control access of the whole VMM. The potential risks make
many companies hesitant about moving to the cloud [5].
On the other hand, the hardware infrastructure for cloud
computing is very powerful. For example, servers with tens
of processor cores and hundreds gigabytes of RAM are com-
mon today. The trend indicates that computers with hundreds
of cores will appear in the near future [6]. The computing re-
sources in a computer will be abundant. The virtualization of
these computing resources is not substantial.
In this paper, we present a VMM model governor. The gov-
ernor removes the virtualization layer of the current VMM.
By preallocating resources to operating systems, the perfor-
mance overhead of the VMM can be reduced. An implemen-
tation of the governor model, OSV, is made. Formal verifica-
Yuehua DAI et al. Design and verification of a lightweight reliable virtual machine monitor for a many-core architecture 35
tion results show that the VMM is immune to attacks made
by malicious guest OSes through the interfaces between OSV
and OS. In particular, there are three contributions we make
in this paper.
First, a reliable and micro-VMM model for many-core is
proposed. The model avoids virtualization of devices by us-
ing existing distributed protocols. This reduces the complex-
ity interface between the VMM and OS. With pre-allocated
processor cores and memory on the many core platforms, the
guest-OS can access them directly, which improves the over-
all performance of the guest-OS.
Second, we explore the implications of applying the model
to a concrete VMM implementation and present an instance
of governor, OSV. Compared to other solutions OSV is rela-
tively small and portable with only about 8 000 lines of code.
Our evaluation results show that OSV has a performance im-
provement of 4%–13% over Xen.
Finally, a formal reliability verification of OSV is given.
Due to the large code base and complex interfaces, traditional
VMMs are rarely formally verified. By using the SPIN model
checker, a formal model for OSV is constructed. The verifi-
cation results show that OSV is secure and reliable.
In Section 2 we discuss related works on the reliability of
VMMs and introduce our motivation. Our proposed gover-
nor model is discussed in Section 3. A formal verification of
OSV is given in Section 4. The performance of OSV is also
discussed in Section 4. Finally, we discuss the limitations and
future work in Section 5 and conclude in Section 6.
2 Related work
Although based on a new viewpoint, the governor model is
related to much previous work on both VMMs and operating
systems.
In order to reduce the complexity of operating systems,
micro-kernel operating systems such as exokernel [7], and
corey [6], were proposed. These systems can be tuned for
performance and reliability. Multikernel [8] is another type
of operating system, which treats a multi-core computer as a
distributed system and uses message passing as its basic op-
eration.
As VMMs becomes much more complex, some work has
been done on simplifying the VMM. One example is SecVi-
sor [9], a tiny VMM, which can support a single guest-OS
and is used for the protection of the operating system ker-
nel. TrustVisor [10] is designed to protect the code and data
of higher layer applications. Like SecVisor, TrustVisor can
only run a single guest-OS in a protection domain. NoHype
[11] is similar to our work which removes the virtualization
layer of a traditional VMM. With pre-allocated resources, it
can host many instances of an operating system. But there
are no interactions between NoHype and a running guest OS.
It sacrifices the flexibility of the VMM, which is important
for cloud computing. BitVisor [12] is a lightweight VMM
which encrypts the data communication between the I/O de-
vices and the guest-OS. In order to protect private data pass-
ing between the guest-OS and VMM, a nested VMM, Cloud-
Visor [13], was proposed. CloudVisor runs at a higher priv-
ilege level than a traditional VMM, like Xen. With the help
of CloudVisor, the guest-OS can keep its data private, from
being leaked to Xen. Nova [14] is a micro-kernel like VMM.
There is an additional layer called the user level VMM in
Nova. The user level VMM is used to provide the services
needed by other guest-OSes. The interfaces between NOVA
and guest-OS are as complex as a traditional VMM. Despite
much work on micro-kernel VMM, the interfaces in these
models cannot be well verified.
There has been some work on formal verification of system
software, such as seL4 [15]. seL4 is a micro-kernel operating
system, which is well verified from design to implementa-
tion. Franklin et al. [16] built a formal model for SecVisor.
This model focuses on whether the design principles can pro-
tect the operating system kernel. However, our work is more
focused on the reliability on the VMM itself.
Micro kernels and lightweight VMMs reduce the attack-
able surface and increase the trust computing base (TCB).
But the lightweight VMMs mentioned above lose some func-
tionality of traditional VMMs. NoHype reduces the attack-
able surface by removal of interaction between the OS and
VMM. It is based on Xen and needs some new hardware
architecture support. Xen also contributes to the large TCB
of NoHype. In this paper, we propose OSV VMM based
on current X86 processors. OSV removes the virtualization
layer, allocates resources with non-uniform memory access
(NUMA) and has a verified interface between VMM and
OSV. These mean OSV has a limited attackable surface and
small TCB.
3 Design and implementation
In this section we present our VMM architecture for multi-
core machines, which we call the governor model. In a nut-
shell, we construct the VMM as a resource guidance tool that
allocates the resource to the OSes, guards access to these
resources, and uses distributed protocols instead of virtual-
ization to multiplex the devices. The design of the governor
36 Front. Comput. Sci., 2013, 7(1): 34–43
VMM model is guided by three design principles:
1) Make each OS access resources directly.
2) Affinity aware allocation of resources to OSes.
3) Utilize existing distributed protocols for resource shar-
ing between OSes.
These principles allow the VMM to benefit from a
lightweight system and achieve safety, and flexibility with a
small code base. Furthermore, protocols and systems devel-
oped for existing distributed systems can also be reused in the
VMM based on the governor model.
After discussing the principles in detail, we explore the im-
plications of these principles by describing the implementa-
tion of OSV, which is a new VMM based on the governor
model.
3.1 Application of design principles
3.1.1 Make each OS access resources directly
Within the governor VMM, all resources allocated to OSes
can be accessed directly without interceptions by the VMM.
No resource is shared between OSes running in the VMM,
except for some interrupts and memory used for OSes to
communicate with each other. OSes directly accessing the
resources can offload virtualization of the VMM, which can
simplify the VMM.
Virtualization used in traditional VMM isolates the OSes
from each other and shares the resources between OSes. In
order to run as many instances of an OS on a resource lim-
ited machine, a traditional VMM virtualizes the resources, so
that several OSes can access the same resources at the same
time. For example, the scheduler in Xen allows different OSes
to run on the same processor core. The OSes access the re-
sources in an exclusive manner, so the VMM must isolate
them from each other. In order to protect the OSes from each
other, the VMM must virtualize an advanced programmable
interrupt controller (APIC) and machine specific registers in
a modern CPU. Therefore, XEN VMM contains an APIC.
Modern processors take into account efficient virtualization.
Extra privilege levels and additional page table management
has been added to the processors. These technologies can im-
prove performance and reduce the complexity of a VMM.
But the virtualization for APIC, devices and memory is still
needed, which complicates the VMM, and increases the code
base of the VMM.
In order to reduce the complexity of the VMM, an OS
should manage the resources by itself. This approach enables
the OS to provide more reliable resource management. The
resources allocated to an OS can only be accessed by itself.
This means that the applications running in an OS will not
be affected by applications running in another OS. Further-
more, this approach makes the VMM more reliable. A tradi-
tional VMM intercepts some operations of the OS, such as
the syscall instruction that is issued by an application for a
system call. Attackers may violate the VMM through these
interfaces [11,17], by which they can gain control of all the
OSes running in the VMM. Running the OS directly on the
resources allocated to it reduces these attack interfaces and
makes the system more reliable.
Finally, running the OS directly on hardware in the VMM
promises good service for cloud computing [11]. Quality
of service is very important in cloud computing. Users can
subscribe a virtual machine from a cloud coputing provider.
OSes running in a traditional VMM will affect each other
and reduce the performance when request the same resources
from the VMM. This has a negative impact for QoS. A VMM
based on direct resource access provides good performance
isolation for OSes and QoS.
3.1.2 Affinity aware allocation of resources to OSes
Processor cores and memory are critical to the performance
of an OS. The communication latency between processor
cores is different to the latency between processor cores and
memory depending on the topology of the system. Proces-
sor cores that share cache have the lowest communication la-
tency. Accessing memory connected to the processor core’s
native memory controller is faster than remote access [6].
An affinity aware governor VMM allocates resources to
OS. Processor cores allocated to a single OS must share cache
(contemporary processor cores in a die share L3 cache). The
memory allocated to one OS should lie in the same NUMA
node. This has several important potential benefits.
Firstly, allocating processor cores and memory with loca-
tion awareness will improve the communication performance
of the OS and reduce the latency of accessing the memory. In
particular, inter-core communication can be more efficient.
This is helpful for the performance of the OS.
Secondly, OSes running on processor cores sharing the
same cache will improve the hit rate of the cache. The cache
will not be polluted by applications from other OSes. In this
way, the shared data in an OS are stored in the cache shared
by the processor cores. If one processor modifies the data, the
operation is performed in the shared cache. This avoids cache
migrations. Frequent cache migrations will cause cache Ping-
Pong, which is harmful to application performance.
Yuehua DAI et al. Design and verification of a lightweight reliable virtual machine monitor for a many-core architecture 37
Thirdly, this approach makes the VMM suitable for het-
erogeneous processor architectures. For heterogeneous pro-
cessor architectures, the GPU or DSP cores are now moved
into the processor die. Both GPU and CPU cores have their
own local or shared memory for their own use. The mem-
ory allocation policy to these cores has the most significant
impact on the performance. With an affinity aware allocation
policy, the performance will be improved.
3.1.3 Utilize existing distributed protocols for resource
sharing between OSes
Operating systems running in the VMM need to share re-
sources with each other, such as disks, memory, processors,
and so on. A traditional VMM provides a specific driver for
each device, with which an OS can share the device. A device
driver has two parts: one daemon server running in the privi-
leged OS for processing requests and driver clients in guest-
OSes for sending requests to the server, such as the Xen front
end and back end driver model. This driver model increases
the complexity of the VMM.
Distributed protocols are common in modern OSes for
sharing resources in a distributed environment. These proto-
cols are used to work in a networked system. The governor
can use these protocols for device sharing among the OSes
by providing a virtual network interface card (NIC). In gen-
eral, an OS can access the devices exported by another OS
through such distributed protocols. The security and reliabil-
ity of the distributed system can be inherited by the VMM.
Most of the devices in the machine can be exported by
existing distributed protocols. The disks can be shared via
the network file system (NFS). The keyboard, mouse, and
displays can be shared through Virtual Network Comput-
ing (VNC). The protocol for accessing remote systems’ 3D
graphical card is under development [18]. All these protocols
are common in recent OSes, such as Linux and Windows.
The performance of devices accessed via distributed pro-
tocols can be optimized by the virtual NIC. The virtual NIC
can achieve low latency and high bandwidth, also the pro-
tocols can be optimized. In traditional distributed environ-
ments, the network is not reliable. But, within the governor,
the virtualized network is reliable. So, the protocols can be
simplified for better performance. By using existing proto-
cols, the governor can be small and flexible. Furthermore,
without the need for device drivers, the governor cannot be
attacked through the device drivers. This can make the gov-
ernor more reliable and secure.
3.1.4 Applying the model
Like all models, the governor, while theoretically elegant, has
an idealist position: no device should be virtualized and the
governor should never intercept the OS. This has several im-
plications for a real VMM.
As we discussed previously the OS manages the processor
cores using the logical APIC ID and physical ID. The phys-
ical ID is unique in the machine, but different OS can use
the same logical ID for different processor cores. When one
OS sends an inter-processor interrupt (IPI) using a logical ID,
other OSes will be confused and will misbehave. However, an
idealist governor model does not intercept the OS.
In order to make the OS work correctly with logical IDs,
the VMM must intercept the OS operations of managing the
processor core IDs or modifying the source code of the OS.
The VMM must replace the logical ID with the physical ID
in the IPI. For each OS, a list must be managed by the VMM
for mapping the logical ID to corresponding physical ID. The
logical ID for a CPU in an OS is not changed once the sys-
tem is initialized. Thus, there is little overhead for managing
the mapping list. On the other hand, when sending an IPI, the
OS has to trap into the VMM for changing the logical ID into
physical ID, which induces some latency for the IPI.
From a research perspective, a legitimate question is to
what extent a real implementation can adhere to the model,
and the consequent effect on system performance and reli-
ability. We are implementing OSV, a substantial prototype
VMM structured according to the governor model. The goals
for OSV are:
• To demonstrate the approach for a governor model in
current x86 architecture;
• To make the VMM lightweight, more reliable, and se-
cure;
• To utilize existing software for sharing the devices with-
out new drivers.
3.2 Implementation
The OSV VMM is implemented as a multithreaded program
running on multi-core processors. The current implementa-
tion of OSV is based on AMD processors. A port to Intel
processors is in progress. OSV differs from existing systems
in that it pays more attention to resource isolation rather than
virtualizing these resources. For example, OSV does not con-
tain any structure for virtualizing the processor cores and
main memory. The code size for OSV is about 8 000 lines.
38 Front. Comput. Sci., 2013, 7(1): 34–43
It is easy to tune this code line by line. The overall architec-
ture of OSV is shown in Fig. 1. The current implementation is
based on AMD Opteron processors with SVM [19] technol-
ogy support and can support multiple Linux operating system
instances. The components used by OSV to host multiple op-
erating systems are as follows:
• NUMA nodes Each operating system can access the
memory belonging to a NUMA node. The memory in
other nodes are invisible to the OS. This is initialized in
the E820 memory map.
• Paging For the privileged operating system, it works
the same as in bare metal. For other OSes, OSV man-
ages their memory using a nested paging table (NPT).
The NPT manages the mapping of OS physical ad-
dresses to machine physical addresses. In the AMD pro-
cessor, the NCR3 register is used to store the NPT ad-
dress and the processor computes the mapping automat-
ically as in traditional page mapping. The NPT is initial-
ized based on the physical memory allocated to the OS.
Thus, the OS does not cause any NPT page fault during
its running period. If the OS demands more memory, it
must request it from the OSV. The OSV prepares the
newly allocated pages in the NPT, and the OS accesses
these pages using mmap.
• Multi-Processor The processor cores are allocated to
an OS in a NUMA node fashion. If an OS demands
more cores than a NUMA node, the OSV allocates all
cores in a NUMA node, then extra cores from another
NUMA node. The memory allocated to an OS is from
the NUMA node which has the most cores allocated to
the OS. This can reduce the remote cache access and
memory access latency.
• Interrupt All I/O interrupts are delivered to the privi-
leged OS. Other OSes access the I/O through distributed
protocols.
• Timer The external timer interrupt is dispatched by
the privileged OS through the IPI.
• Network interface card The privileged OS controls
all the network cards. A virtual network card (VNIC)
is provided to each OS. The VNIC is based on shared
memory. OSV allocates a memory region, and each
OS maps the memory region into its address space us-
ing mmap. And, OSV constructs the mapping for these
pages in the corresponding NPT. When transmitting the
data, the OS can read and write the memory region di-
rectly without accessing the OSV. The OS polls its data
using a timer.
• Disks, etc. These devices are exported as services by
the privileged OS. Other OSes can access these services
through standard distributed protocols.
Fig. 1 The architecture of the OSV VMM
3.3 Booting Procedure
Currently, OSV can run up to 32 Linux kernels concurrently
on a 32-core server. OSV boots Linux using a 32-bit boot pro-
tocol the Linux defined. A boot_params structure is defined in
Linux for a 32-bit boot protocol. OSV provides boot params
for each Linux instance. The boot_params structure includes
the memory information and kernel information. OSV fills
up the memory information in the boot_params. Linux reads
the configuration information of the virtual machine from
boot_params and then initializes its internal structures. Linux
kernels will only access memory based on the boot_params
information.
Multi-processors are supported in commodity operating
systems. The linux kernel gets the multi-processor informa-
tion from either ACPI or the Intel Multi-processor specifica-
tion. The multi-processor boot and initialization in Linux are
based on this information. OSV provides a predefined Multi-
processor configuration table for each Linux. OSV configures
the table based on the CPU cores allocated to the Linux OS.
The unallocated CPU cores are masked in this table. So, the
OS will only initialize those CPU cores assigned to it. Then,
the kernel can now boot and run applications.
Multi-processor boot in X86 is based on IPI. The booting
processor (BP) sends an INIT IPI to the application processor
(AP). If the AP gets the INIT IPI, the processor will be reset
and starts work from a dedicated address. If OSV directly
uses the INIT IPI, the reset operation of processor will clear
all the internal states initialized by OSV. Then OSV will lose
control of the processor. Thus, we intercept the INIT IPI in
Yuehua DAI et al. Design and verification of a lightweight reliable virtual machine monitor for a many-core architecture 39
OSV. The INIT IPI now is redirected into an exception. Once
OSV catches the exception, OSV will initialize the processor
for Linux and set the IP register to the address corresponding
to the Linux multi-processor booting code. After these oper-
ations, the AP will execute the Linux kernel.
4 Modeling and evaluation of OSV
In this section we present an overview of model for the OSV,
and detail our formal OSV model, including the state transi-
tion system and isolation properties.
4.1 Overview of the model
The memory and interrupt system are the only two interfaces
used in OSV for communication between OSes. The reliabil-
ity of these two systems is critical for the reliability of the
VMM. In this context, we first describe the relevant memory
protection data structures, the entities that use and manipulate
these data structures, and the functional relationship between
these entities. For the interrupt system, the model focuses
on the inter-processor interrupt (IPI). This is because in the
governor model, the IPI is the only interrupt source that can
be dispatched by the OS running in OSV, whereas external
interrupts are all dedicated to specified OS. The IPI protec-
tion data structures and functional relationship between these
structures are also described.
4.2 Modeling data
We use the SPIN [20] model checker for our OSV VMM.
SPIN uses PROMELA as its verification language. In order
to model the hardware platform, memory mapping, and other
data structures, a number of basic data types and macros are
defined.
The system is modeled as a record data type containing
physical memory, a CPU mode corresponding to either VMM
or Guest mode�an ip pointer to the instruction address and
an integer specifies the guest-OS id. All capitalized words are
pre-defined integer constants.
typedef sys
{
bit mode;
unsigned ip : 6;
byte osid;//Address id,used for identify the
//guest id,0 for VMM;
};
The physical memory is modeled as an array of bits, one
bit for each page. We model virtual memory at the granular-
ity of pages with the page table entry (PTE). A PTE includes
a read/write bit, execute bit, and the physical address of the
entry. The nested page tables (NPT) used by OSV to manage
the guest-OS memory are modeled as an array of PTE entries.
bit mem[MEM_INDEX];
Some macros are pre-defined to specify the system’s state.
Such as follows:
#define r (sys.mode==GUEST_MODE)
#define p (mem[VMM_CODE]==1)
#define q (mem[VMM_DATA]==1)
#define w (sys.mode==VMM_MODE)
When “r” is defined as the system in GUEST_MODE, and
“w” is for the system in VMM_MODE. The VMM code be-
ing accessed is denoted by as “p”, and data being accessed is
denoted by “q”.
4.3 Memory system
A traditional VMM provides complex interfaces to the guest
OS. A malicious guest-OS may violate the VMM through
these interfaces, so that it can then execute malicious code
with VMM privileges. The interfaces between OSV and
guest OS are only 4 simple hypercalls: get_did, mem_map,
send_ipi, guest_run, and 1 exception handler: npt_fault. In
this section we give a formal model of hyper-calls and excep-
tion handler, except the send_ipi. The send_ipi is discussed
in the subsequent section.
In the VMM mode, the CPU should not execute any code
in the guest-OS memory region. In addition, the VMM can-
not be modified by the guest-OS. We specify this invariant
as
#define code_invt []( (!(r&&(p||q)))&& (!(w&&p)) )
There are two atomic operations in OSV VMM, vmm_run
and vmm_exit. The vmm_run is called when a transition
to VMM mode occurs. On a transition to guest mode, the
vmm_exit is called. The PROMELA code is listed as follows:
inline vmm_run()
{
computer_sys.mode = VMM_MODE;
computer_sys.ip = VMM_CODE;
computer_sys.osid = 0;
npt[VMM_CODE]=X|R;npt[VMM_DATA]=RW|NO_X;
40 Front. Comput. Sci., 2013, 7(1): 34–43
npt[OS_CODE]=RW|NO_X;npt[OS_DATA]=RW|NO_X;
}
inline vmm_exit()
{
computer_sys.mode = GUEST_MODE;
npt[VMM_CODE]=NO_RW|NO_X;npt[VMM_DATA]=NO_RW|NO_X;npt[OS_CODE]=R|X;npt[OS1_DATA]=RW|NO_X;
}
The model begins with a call to the init function, which
makes a call to system_run. Then the model launches all
guest-OSes. In order to simplify the model, we only define
two guest-OSes.
The total number of states in the model is 25 483. The ver-
ification results show that no error states occurred, which in-
dicates that code_invt is satisfied in all situations. This means
that malicious OS cannot attack OSV through the interfaces.
4.4 Interrupt system
In OSV, the IPI is used for signals and notification of guest-
OSes. The guest-OSes can also communicate with each other
via the IPI. A misused IPI can cause other guest-OSes to be-
come corrupted or be busy when responding to the IPI which
will result in denial of service (DoS). We enforce that the IPI
can only be sent from and to authorized guest-OSes. In this
way, we can prevent malicious guests from attacking other
guests through IPI interfaces.
In order to model the IPI, additional data types are pre-
defined. Sending IPI in current x86 platform is based on the
advanced programmable interrupt controller (APIC) in pro-
cessor cores. We model the APIC as an array of registers.
The register index of 0x300 is used to send the IPI. The desti-
nation of the IPI is specified in register 0x310. The data types
are as follows:
typedef generl_regs{
unsigned reg [APIC_REG];
unsigned index;
};
#define ltl_r ((regs.index==REG_300))
#define ltl_p des_id
#define ltl_q auth_des_id
We specify that the guest-OS must send the IPI to the pro-
cessor cores corresponding to the authorized OSV VMM.
When a guest-OS writes to the 0x300 register to send the IPI,
the actual destination must be equal to the authorized id of
the OSV VMM. This can be defined as an invariant:
ltl_r {<>[](ltl_p==ltl_q) }
The total number of the states in this model is 26 735.
There is no error state. The results show that the IPI system
is reliable.
4.5 Reliability discussion
A VMM may be compromised through its interfaces or the
bugs in the code base. A large code base of a VMM may have
many bugs, and has a large attackable surface. Software en-
gineers estimate that the density of bugs in production qual-
ity source code is about one to ten bugs in 1 000 lines of
code [4].
OSV removes the virtualization layer of traditional VMMs.
In OSV, device multiplexing is implemented using existing
distributed protocols. OSV eliminates itself from the inter-
ception operations of some privileged instructions. The in-
terfaces between guest OS and OSV are simple, with only
four hyper-calls. All of these approaches reduce the number
of lines of code in OSV. The codebase implementing OSV is
about 8 000 lines of code, while Xen has about 200 k lines of
code. This reduces the attacking surface of OSV.
4.6 Performance evaluation
We measure the overhead of virtualization through a set of
operating system benchmarks. The performance is compared
with Xen and a native Linux kernel. The experiments are per-
formed on two servers. One is a Dell T605 with two quad-
core Opteron 2 350 processors at 2.0 GHZ with 16 GB RAM,
a Broadcom NetXtreme 5 722 NIC and a 146 GB 3.5- inch
15 k RPM SAS Hard Drive. The other is a Sun x4600M2
eight quad-core Opteron 8 478 processors at 2.8 GHZ with
256 GB RAM and 4×256 GB HDD arranged in RAID0. The
Dell machine has two NUMA nodes, each node has a quad-
core processor and 8 GB RAM. The Sun x4600M2 server
has eight NUMA nodes, each node has a quad-core proces-
sor and 32 GB RAM. Linux kernel 2.6.31 is employed, and
compiled for architecture x86_64. NFS version 4.1 is used.
The Opteron processors in the machine support SVM and
NPT which are essential for OSV. In order to measure perfor-
mance we use lmbench [21]. Linux running on OSV and Xen
is allocated with a NUMA node, including processor cores
and memory. For both servers, each NUMA node has four
processor cores.
Yuehua DAI et al. Design and verification of a lightweight reliable virtual machine monitor for a many-core architecture 41
Figure 2 shows the local communication latency. The
tests evaluate the latency of two processes exchanging data
through three different methods: shared memory, pipe, and
AF UNIX protocol. Xen HVM based guest OS acieve a sim-
ilar performance to OSV kernels and performs better than
Linux and other PVM guests. This is mainly caused by the
NUMA architecture of multi-core server: time cost for local
access is smaller than for remote access. The resources used
by the OSV kernel and Xen HVM are bound to a NUMA
node so that CPU cores only access local memory. Thus, the
performance of OSV and that of Xen PVM guests are bet-
ter than Linux. Xen HVM has a similar latency to Linux and
OSV. This is because the operations for Xen HVM are not
intercepted by Xen. The latency of Xen PVM guests is about
three to seven times that of OSV and Linux. This is because
the Xen PVM is limited by the intervention of Xen when ac-
cessing system resources.
Fig. 2 Local communication latencies
System call latency is critical to application’s performance.
This benchmark tests some common system call latencies:
null system call, I/O system call, file open/close operations,
process fork system call and the system call for execution.
The benchmark results are listed in Fig. 3. Raq linux has the
lowest latency for system calls, while OSV kernel and HVM
guest on Xen have a similar result. The privileged domain,
called domain 0, for OSV kernel has similar performance as
raw linux. The namal domain, called domain 1 of OSV kernel
in the open/close test has very high latency. This is mainly
caused by access to NFS. When opening and closing a file,
the domain 1 needs two more network connections to finish
the job which introduces in high latency. PVM guests on Xen
have high latency in all tests. These are caused by the inter-
vention of Xen when PVM performs some privileged opera-
tions.
Fig. 3 Processor and system call latency
Figure 4 shows SPEC_int benchmark results. The
SPEC_int evaluates the overall performance of a system.
The perl bench tests the performance of the execution of perl
script language. Bzip2 and h264ref tests evaluate the com-
pression speed of the system. Gcc and xalancbmk tests test
the speed of generating code and XML processing. These five
tests are system resources binding tests. Mcf, gobmk, hmmer,
sjeng, and astr are computing intensive. They are used in ar-
tificial intelligence and path searching. Omnetpp is a discrete
event simulation tests. It models a large Ethernet campus net-
work and is computing intensive. The results are normalized
into speedup ratio and and show a relative performance in-
crease/hit of XEN/OSV with native Linux. The grey bar rep-
resents Linux on Xen, and the black Linux on OSV. For most
of the benchmarks, OSV and Xen exhibit some performance
overhead. For perlbench, sjeng, and gobmk benchmarks, Xen
and OSV show some performance improvement. There are
two reasons: A) these benchmarks have few system calls. B)
Both Xen and OSV are configured in a NUMA node. The
native Linux has some accesses across the NUMA nodes’
Fig. 4 The SPEC_int speedup compared to native Linux
42 Front. Comput. Sci., 2013, 7(1): 34–43
memory. The latencies for accesses across NUMA nodes are
higher than local accesses [6].
For benchmarks with frequent system calls and memory
operations, with the performance of OSV is much better than
that of Xen, such as gcc, omnetpp, and libquantum. The virtu-
alization of system calls in Xen induces significant latencies.
For example, the syscall instruction is intercepted in Xen.
This contributes to the latency of system call.
5 Limitations and future work
The VMM we propose in this paper is mainly focused on
reducing the attackable surface and trust computing base.
OSV removes the virtualization layer of traditional VMMs.
Compared with Xen and VMware, OSV is not capable of
server consolidation. For a single server, OSV runs multiple
OS concurrently with static resource allocation. The interac-
tions between guest OS and OSV are also removed in OSV.
In this way, OSV achieves a similar performance to native
Linux. Thus, OSV is suitable for real time workloads that
have fixed resource demands. Dynamic resource reallocation
and scheduling are not supported in OSV. With the help of
Xen, OSV can provide the OS with more virtual cores and
memory, also with live migration operations and dynamic
scheduling. In order to make OSV more useful in cloud com-
puting, we are porting Xen to run on top of OSV using nested
virtualization technology.
The security problems for VMM are mainly caused by the
complex interfaces between OS [11,22–24]. In this paper, we
have verified the interfaces of OSV. This ensures that the OS
cannot attack OSV through these interfaces.
The reliability of OSV itself is important, and a fully ver-
ified system from design to implementation period can keep
the system bug-free [16]. The internal states of VMM are not
verified. As a future work, we will provide a full formal veri-
fication of OSV to keep the system bug-free and reliable. Fur-
thermore, we will add support for the recovery of corruption
of VMM. This will make the system more robust.
Currently OSV is based on NFS. NFS on OSV is based on
TCP/IP over the VNIC. NFS latency in OSV is caused by the
VNIC. An inter-OS communication socket based on OSV has
been implemented [25]. We will implement the NFS based on
this socket to improve the NFS performance as a future work.
6 Conclusion
The reliability of VMM is a key factor for cloud computing.
In this paper we propose a reliable and tiny VMM model,
governor. An implementation of the governor model, OSV is
given in this paper. The OSV has about 8 000 lines of code.
It has a reduced attackable surface. The formal verification
of OSV shows that the communication interface of OSV is
well defined and safe which can keep the system reliable.
The evaluation results indicate that OSV has a comparable
performance to native Linux. OSV has a performance im-
provement of 4%–13% compared to Xen.
Acknowledgements This work was supported in part by the National Nat-ural Science Foundation of China (Grant Nos. 60933003 and 61272460).
References
1. Barham P, Dragovic B, Fraser K, Hand S, Harris T, Ho A, Neugebauer
R, Pratt I, Warfield A. Xen and the art of virtualization. In: Proceedings
of the 19th ACM Symposium on Operating Systems Principles. 2003,
164–177
2. Understanding Memory Resource Management in VMware ESX
Server. VMWare white paper. www.vmware.com/files/pdf/perf-
vsphere-memory_management.pdf
3. Klein G, Elphinstone K, Heiser G, Andronick J, Cock D, Derrin P,
Elkaduwe D, Engelhardt K, Kolanski R, Norrish M, Sewell T, Tuch H,
Winwood S. seL4: formal verification of an OS kernel. In: Proceedings
of the ACM SIGOPS 22nd Symposium on Operating Systems Princi-
ples. 2009, 207–220
4. Holzmann G J. The logic of bugs. In: Proceedings of Foundations of
Software Engineering. 2002
5. Gens F. IT cloud services user survey, part.2: top benefits & challenges.
http://blogs.idc.com/ie/?p=210
6. Boyd-Wickizer S, Chen H, Chen R, Mao Y, Kaashoek F, Morris R,
Pesterev A, Stein L, Wu M, Dai Y. Corey: an operating system for
many cores. In: Proceedings of the 8th USENIX Conference on Oper-
ating Systems Design and Implementation. 2008, 43–57
7. Engler D, Kaashoek M. Exokernel: an operating system architecture
for application-level resource management. ACM SIGOPS Operating
Systems Review, 1995, 29(5): 251–266
8. Baumann A, Barham P, Dagand P, Harris T, Isaacs R, Peter S, Roscoe
T, Schupbach A, Singhania A. The multikernel: a new OS architecture
for scalable multicore systems. In: Proceedings of the ACM SIGOPS
22nd Symposium on Operating Systems Principles. 2009, 29– 44
9. Seshadri A, Luk M, Qu N, Perrig A. SecVisor: a tiny hypervisor to pro-
vide lifetime kernel code integrity for commodity OSes. ACM SIGOPS
Operating Systems Review, 2007, 41(6): 335–350
10. McCune J M, Li Y, Qu N, Zhou Z, Datta A, Gligor V, Perrig A. TrustVi-
sor: efficient TCB reduction and attestation. IEEE Symposium on Se-
curity and Privacy. 2010, 143–158
11. Keller E, Szefer J, Rexford J, Lee R B. NoHype: virtualized cloud
infrastructure without the virtualization. ACM SIGARCH Computer
Architecture News, 2010, 38(3): 350–361
12. Shinagawa T, Eiraku H, Tanimoto K, Omote K, Hasegawa S, Horie T,
Hirano M, Kourai K, Oyama Y, Kawai E. BitVisor: a thin hypervi-
Yuehua DAI et al. Design and verification of a lightweight reliable virtual machine monitor for a many-core architecture 43
sor for enforcing i/o device security. In: Proceedings of the 2009 ACM
SIGPLAN/SIGOPS International Conference on Virtual Execution En-
vironments. 2009, 121–130
13. Zhang F, Chen J, Chen H, Zang B. CloudVisor: retrofitting protection
of virtual machines in multi-tenant cloud with nested virtualization. In:
Proceedings of the 23rd ACM Symposium on Operating Systems Prin-
ciples. 2011, 203–216
14. Steinberg U, Kauer B. NOVA: a microhypervisor-based secure virtual-
ization architecture. In: Proceedings of the 5th European Conference
on Computer Systems. 2010, 209–222
15. Klein G, Elphinstone K, Heiser G, Andronick J, Cock D, Derrin P,
Elkaduwe D, Engelhardt K, Kolanski R, Norrish M. seL4: formal ver-
ification of an OS kernel. In: Proceedings of the ACM SIGOPS 22nd
Symposium on Operating Systems Principles. 2009, 207–220
16. Franklin J, Seshadri A, Qu N, Chaki S, Datta A. Attacking, repairing,
and verifying SecVisor: a retrospective on the security of a hypervisor.
Technical Report CMU-CyLab-08-008. 2008
17. Wang Z, Jiang X. Hypersafe: a lightweight approach to provide life-
time hypervisor control-flow integrity. IEEE Symposium on Security
and Privacy (SP). 2010, 380–395
18. Ravi V, Becchi M, Agrawal G, Chakradhar S. Supporting GPU shar-
ing in cloud environments with a transparent runtime consolidation
framework. In: Proceedings of the International Symposium on High-
Performance Parallel and Distributed Computting. 2011
19. AMD. Amd64 architecture programmers manual volume 2: system
programming. 2007
20. Holzmann G J. The model checker SPIN. IEEE Transactions on Soft-
ware Engineering, 1997, 23(5): 279–295
21. McVoy L, Staelin C. Lmbench: portable tools for performance analy-
sis. In: Proceedings of the 1996 Annual Conference on USENIX An-
nual Technical Conference. 1996, 23
22. Kortchinsky K. Hacking 3D (and breaking out of VMWare). In: Pro-
ceedings of Black Hat conference. 2009
23. Wojtczuk R, Rutkowska J. Xen Owning trilogy. In: Proceedings of
Black Hat conference. 2008
24. Secunia. Xen multiple vulnerability report. http://secunia.com/
advisories/44502/
25. Ren J, Qi Y, Dai Y, Xuan Y. Inter-domain communication mechanism
design and implementation for high performance. In: Proceedings of
the 4th International Symposium on Parallel Architectures, Algorithms
and programming (PAAP). 2011, 272–276
Yuehua Dai received his BS in com-
puter software and theory from Xi’an
Jiaotong University in 2004. He is cur-
rently a PhD candidate in computer
science at Xi’an Jiaotong University.
His research interests include operating
systems, VMM, cloud computing and
system security.
Yi Shi received her PhD in computer
software and theory from Xi’an Jiao-
tong University in 2008. She is a lec-
turer in the School of Electronic and In-
formation Engineering, Xi’an Jiaotong
University. Her research interests in-
clude operating systems, network secu-
rity, cloud computing, and VMM.
Yong Qi received his PhD in computer
software and theory from Xi’an Jiao-
tong University in 2001. He is cur-
rently a professor in the School of
Electronic and Information Engineer-
ing, Xi’an Jiaotong University and the
director of the Institute of Computer
Software and Theory. His research in-
terests include operating systems, distributed systems, pervasive
computing, software aging and VMM. He has published more than
80 papers in international conferences and journals, including ACM
SenSys, IEEE PerCom, ICNP, ICDCS, ICPP, IEEE TMC, and IEEE
TPDS.
Jianbao Ren received his BS in com-
puter software and theory from Xi’an
Jiaotong University in 2009. He is cur-
rently a PhD candidate in computer
science at Xi’an Jiaotong University.
His research interests include operating
systems, VMM, cloud computing, and
system security.
Peijian Wang received the BS in computer software and theory from
Xi’an Jiaotong University in 2004. He is currently a PhD candi-
date in computer science at the same university. His research inter-
ests include power management, cloud computing, and Internet data
center.