Virtual machine monitor-based lightweight intrusion detection
Post on 24-Dec-2016
Virtual Machine Monitor-Based Lightweight IntrusionDetection
Fatemeh AzmandianNortheastern University
Micha MofeNortheastern University
Malak AlshawabkehNortheastern University
Javed AslamNortheastern Universityjaa@ccs.neu.edu
David KaeliNortheastern Universitykaeli@ece.neu.edu
ABSTRACTAs virtualization technology gains in popularity, so do at-tempts to compromise the security and integrity of virtual-ized computing resources. Anti-virus software and rewallprograms are typically deployed in the guest virtual machineto detect malicious software. These security measures are ef-fective in detecting known malware, but do little to protectagainst new variants of intrusions. Intrusion detection sys-tems (IDSs) can be used to detect malicious behavior. Mostintrusion detection systems for virtual execution environ-ments track behavior at the application or operating systemlevel, using virtualization as a means to isolate themselvesfrom a compromised virtual machine.
In this paper, we present a novel approach to intrusion de-tection of virtual server environments which utilizes onlyinformation available from the perspective of the virtual ma-chine monitor (VMM). Such an IDS can harness the abilityof the VMM to isolate and manage several virtual machines(VMs), making it possible to provide monitoring of intru-sions at a common level across VMs. It also oers uniqueadvantages over recent advances in intrusion detection forvirtual machine environments. By working purely at theVMM-level, the IDS does not depend on structures or ab-stractions visible to the OS (e.g., le systems), which aresusceptible to attacks and can be modied by malware tocontain corrupted information (e.g., the Windows registry).In addition, being situated within the VMM provides ease ofdeployment as the IDS is not tied to a specic OS and can bedeployed transparently below dierent operating systems.
Due to the semantic gap between the information availableto the VMM and the actual application behavior, we em-ploy the power of data mining techniques to extract usefulnuggets of knowledge from the raw, low-level architecturaldata. We show in this paper that by working entirely atthe VMM-level, we are able to capture enough information
to characterize normal executions and identify the presenceof abnormal malicious behavior. Our experiments on over300 real-world malware and exploits illustrate that there issucient information embedded within the VMM-level datato allow accurate detection of malicious attacks, with an ac-ceptable false alarm rate.
Categories and Subject DescriptorsD.4.6 [Security and Protection]
General TermsVirtualization, Security, Data Mining, Intrusion Detection
KeywordsVirtual Machine, Virtual Machine Monitor, Intrusion De-tection System, Data Mining
1. INTRODUCTIONVirtual execution environments provide many advantagesover traditional computing environments, such as server con-solidation, increased reliability and availability, and enhancedsecurity through isolation of virtual machines (VMs) .Anti-virus programs and rewalls can guard a system againstknown exploits, but these mechanisms provide little protec-tion against new classes of attacks and insider threats. Vir-tualization can provide us the ability to isolate and inspectVM-based execution. Virtual machines themselves are notcompletely immune to viruses and malicious attacks. Toprotect the guest OS running inside a virtual machine andguard against the existence of malicious software, or mal-ware, there needs to be an intrusion detection system (IDS)in place.
Traditionally, an IDS can be categorized as one of two types:a host-based intrusion detection system (HIDS) or a network-based intrusion detection system (NIDS). An HIDS resideson the system that is being monitored and thus has theadvantage of a rich view of the internal workings of the sys-tem. The disadvantage with this approach is that a malwarecan determine the existence of the HIDS and subsequentlycompromise it or attempt to evade detection. An NIDS, onthe other hand, performs intrusion detection from outsidethe target system, using information from the network ow.This makes it more resistant to attacks and evasion, but atthe cost of poor visibility of the system.
In a virtualized execution environment, the virtual machinemonitor (VMM) is a software layer that allows the multi-plexing of the underlying physical machine between dierentvirtual machines, each running its own operating system. Inthis paper we propose a VMM-based IDS, a variant of host-based intrusion detection systems wherein the IDS resideson the physical host machine, yet remains outside of the vir-tual machine being monitored. As such, a VMM IDS is ableto enjoy the advantages oered by both HIDSs and NIDSs:a rich view of the target system (the VM) combined witha greater resistance to attacks and evasion by the malware.The latter is one of the benets of isolation provided by theVMM.
The VMM IDS only uses information available at the VMM-level to detect intrusions. There exists a large semantic gapbetween this low-level architectural data and the actual pro-gram behavior. Consequently, we utilize sophisticated datamining algorithms to extract meaningful and useful informa-tion to distinguish normal (non-malicious) from abnormal(malicious) behavior.
There are two main approaches to intrusion detection: mis-use detection and anomaly detection. In misuse detection,the behavior of the system is compared to patterns of knownmalicious behavior, or attack signatures. A weakness of thisapproach is its inability to detect new and previously unseenattacks, known as zero-day attacks. In anomaly detection,a prole of normal behavior is built and any deviations fromthis normal prole is agged as a potential attack. Whileanomaly detection has the ability to detect zero-day attacks,it is also prone to false alarms, i.e., previously unseen nor-mal behavior may incorrectly be identied as an attack. Asvirtualization and the information available to the VMM fa-cilitate the proling of normal behavior, in our VMM IDS wetake the second approach to intrusion detection. We use sys-tem events visible to the VMM and incorporate data miningalgorithms to help characterize normal execution patternsand distinguish deviating anomalous behavior, while tryingto balance the trade-o between true detections and falsealarms.
A key advantage that a pure VMM-level IDS provides isease of deployment. Only the VMM needs to be modied toextract low-level architectural events during runtime. Thisties the IDS to a particular VMM and instruction set archi-tecture (ISA). No modication to the operating system isrequired. Hence, it can be deployed in any virtualized com-puting environment with minimal eort. In our work, wefocus on virtualized server applications . These applica-tions are combined with a customized commodity operatingsystem to run optimally in a virtual environment. As thereare no login operations and typical execution consists of onemain process running alongside background processes, weexpect the normal behavior of these workloads to be fairlystable in time and space. Our IDS uses data mining algo-rithms to characterize the normal behavior of the workload.A malicious attack would introduce deviations from the nor-mal behavior, which should be identied by the data min-ing algorithms and agged by the IDS. Along these lines, aVMM IDS has the advantage of being able to detect zero-dayattacks, in addition to previously known malware.
As part of our contributions, we have implemented a proto-type of a pure VMM-level intrusion detection system usingVirtualBox , an open-source full-virtualization VMM. Tothe best of our knowledge, this is the rst work to utilize onlythe low-level architectural information visible to the VMMfor detecting the existence of malware. Our IDS consists oftwo key components:
A front-end, whose duties include: Event Extraction - Capturing the low-level archi-
tectural data available to the VMM such as diskand network IO accesses, page faults, translationlook-aside buer (TLB) ushes, and control reg-ister updates.
Feature Construction - Using statistical techniquesto transform the raw data into features, which areused by the data mining algorithms.
A back-end, whose duties include: Feature Reduction - Reducing the large space of
possible features, which improves both the timecomplexity and the accuracy of data mining algo-rithms.
Normal Model Creation - Proling the normal ex-ecution of the workloads and build a model ofnormal behavior.
Anomaly Detection - Identifying anomalous be-havior as deviations from the model of normalbehavior.
Raising an Alarm - Flagging behavior that devi-ates from the norm as a possible threat.
Figure 1: High-level design of our VMM IDS
A high-level overview of our VMM IDS design is presentedin Figure 1. There are two main phases of the VMM IDS op-eration: a calibration phase and a testing phase. In the cali-bration phase, the front-end extracts the VMM-level eventsand constructs all the possible features (using methods de-scribed in section 3.2). These features are passed on to theback-end where feature reduction takes place. The reducedset of features are provided to the data mining algorithms tobuild a model of normal execution behavior. Next, anomaly
detection is performed on a set of both normal and abnormaldata points, assigning a score to each based on how muchthey deviate from the normal model. The scores are thenpassed through a lter to remove noise and determine whento raise an alarm.1 During the calibration phase, we evalu-ate the true detection and false alarm accuracy of our IDSto select an optimal set of features and lter conguration.
In the testing phase of the IDS, once the VMM-level eventsare extracted, only the reduced set of features are constructed.Using the model and IDS conguration from the previousphase, anomaly detection is performed on a previously un-seen set of normal and abnormal data points. The scoresassigned to them are passed through the lter to distinguishan appropriate time to raise an alarm.
To examine the eectiveness of our VMM IDS in detectingreal-word attacks, we evaluated the IDS on several dierentserver workloads, injecting more than 300 malware obtainedfrom a repository of real attacks. It is important that theIDS not only detect the malware, but do so within a rea-sonable amount of time. To this end, we present both theaccuracy of the IDS (in terms of true detections and falsealarms) and the time-to-detection results. We show that onaverage, we are able to correctly detect about 93% of themalicious attacks within about 20 seconds from the start ofthe attack, at a cost of only 3% false alarms.
The remainder of the paper is organized as follows. In sec-tion 2, we present a revised IDS taxonomy and use it toclassify the current state of the art in IDSs. In section 3, wedescribe the front-end of our VMM IDS, including the infor-mation we are able to extract from the VMM and how it isused to build features. In section 4, we review the approachtaken by our back-end to best learn the normal behavior andidentify malware. In section 5, we evaluate our VMM-basedIDS in terms of its detection and false alarm rate, as wellas its ability to detect intrusions in a timely manner. Wediscuss several aspects of our work in section 6. Finally, weconclude the paper and present directions for future work insection 7.
2. RELATED WORKMuch work has been done in the area of host-based IDSs. Weorganize our discussion here according to the information, orsemantics, utilized by the IDS:
1. Program-level IDS An IDS that uses informationavailable at the program/application abstraction level.This includes source code, static or dynamic informa-tion ow, and application execution state.
2. OS-level IDS An IDS that utilizes information avail-able at the OS level such as system calls and systemstate.
3. VMM-level IDS An IDS that uses semantics and in-formation available at the VMM-level. This includesarchitectural information.
1When the alarm is raised, we assert that a malware hasbeen found.
A related characterization of IDSs can be found in the workdone by Gao et al. . They use the terms white box, graybox, and black box to refer to the class information availableto the IDS. Black box systems only use system call infor-mation, white box systems include all information availableincluding high program-level source or binary analysis, andgray box lies in between.
In our classication criteria, we consider a broader rangeof semantics available to the IDS. Program-level IDSs useinformation similar to that available in white or gray boxsystems. OS-level IDSs can use all system-level informationavailable including (but not limited to) system calls (i.e., ablack box system). VMM IDSs extend the characterizationeven further to include VMM-level information. We use thisclassication to contrast and compare current IDSs in thenext sections, and to highlight the novelty of our own work.
2.1 Program-Level IDSWagner et al.  show how static analysis can be used tothwart attacks that change the run-time behavior of a pro-gram. They build a static model of the expected behavior(using system calls, call graph, etc.) and compare it to theruntime program behavior. In the work done by Kirda etal. , both static and dynamic analysis (including infor-mation leakage) is used to determine if behavior is malicious.
There have been a number of information ow tracking sys-tems that fall into this category. These systems includestatic [7, 31] and dynamic [45, 33, 41] data ow analysisto extract program-level information available to the appli-cation only.
2.2 Operating System-Level IDSSystem calls have been used extensively to distinguish nor-mal from abnormal behavior. One example is the work doneby Kosoresow et al. . In this work they use system calltraces to nd repeated system calls and common patterns,and store them in an ecient deterministic nite automa-ton. Then, during execution they compare and verify thatall system call traces have been seen before.
Many other intrusion detection systems have used systemcall proles to successfully detect malicious code [14, 40,49]. System call tracing can be done very eciently and canprovide much insight into program activities.
Stolfo et al.  use Windows registry accesses to detectanomalous behavior. The underlying idea is that while reg-istry activity is regular in time and space, attacks tend tolaunch programs never launched before and change keys notmodied since OS installation.
A disk-based IDS is presented in . This IDS monitorsdata accesses, meta data accesses, and access patterns whenlooking for suspicious behavior. This disk-based IDS usessemantics available at the OS-level it is able to read andinterpret on-disk structures used by the le system....