revirt: enabling intrusion analysis through virtual machine logging and replay authors: george w....
Post on 19-Dec-2015
226 views
TRANSCRIPT
ReVirt: Enabling Intrusion Analysis through Virtual Machine Logging And Replay
Authors:
George W. DunlapSamuel T. KingSukru CinarMurtaza A. BasraiPeter M. Chen
Presentation by: Will Hrudey
Introduction
ReVirt is an intrusion analysis solution that facilitates post attack analysis
ReVirt applies VM and fault tolerant techniques to enable the Administrator to replay long term instruction-by-instruction execution of a computer system
ReVirt runs the target operating system (OS) and applications in a VM running as a kernel module in a host OS, allowing:
– Migration of logging from the target OS to the host OS below the VM
– Playback of the target system’s execution before, during, and after an intruder compromises the system
Motivation
The improvement of today’s computer system security is an urgent and difficult problem
The complexity and rapid change in software systems prevent developers from verifying their code to eliminate all vulnerabilities
Administrators have to routinely cope with computer break-ins
– CERT Coordination Center reports steady a increase of incidents handled and the number of vulnerabilities over the past 4 years
Goals
Solve two problems with current audit logging:
1. Improve the integrity of the logger because: Existing loggers depend on the integrity of the OS Attackers can disable, modify or delete system logs Kernel’s are large and complex so tend to contain many bugs
Solution:Encapsulate target system within VM and place logging below VM
2. Improve the completeness of the logger because: Existing loggers don’t save enough data to replay and analyze attacks so
Administrator still has to guess what happened Can’t account for non-determinism
Solution:Utilize checkpointing, logging and roll-forward recovery
Virtual Machines
A virtual-machine monitor (VMM) is a software layer that emulates the hardware of a complete computer system
The VMM creates an abstraction called a virtual machine (VM)
The host platform that the VMM runs on can be another OS (the host OS) or the bare hardware
– So the VMM runs in a separate domain from the guest OS and applications
Although the VMM can still be compromised, it makes a better trusted computing base (TCB) than the guest OS due to its narrow interface and small size
Virtual Machines
The VMM interface is similar to the physical hardware whereas the interface provided by a typical OS is much richer
The narrower interface restricts actions and the smaller code is easier to verify the VMM
VM's can be classified by how similar they are to the host hardware
– On one end, VM’s export a backwards compatible interface with the host hardware such as IBM VM/370. OS’s and applications intended to run on the host platform can run on these VMM’s without change
– On the other end, language-level VM's like Java VM export an interface completely different from the host hardware. These VMM’s can run only OS’s and applications written specifically for them
UMLinux
ReVirt uses UMLinux as the virtual machine
– VMM in UMLinux exports an interface similar but not identical to the host hardware
– VMM custom optimizations in the underlying OS increase speed
Virtual machine in UMLinux runs as a user process on the host
– Guest OS and guest applications run inside this user host process
– Guest OS uses host services (system calls and signals) as the interface to peripheral devices, hence OS-on-OS architecture
Normal structure of target applications running directly on the host OS reflects the Direct-on-Host architecture
UMLinux
VMM in UMLinux is a loadable kernel module in the host OS
– Module is called before/after each signal and system call to/from the VM process
– Most instructions executed within the VM execute directly on host CPU
Memory accesses are translated by the host’s MMU based on translations that are set up via the host OS’s memory system calls
A host X application displays console output and reads keyboard input
The VMM module maintains a virtual privilege level (VPL)– Set to kernel when transferring control to the guest kernel– Set to user when transferring control to a guest application
UMLinux
If the current VPL is kernel, the VMM knows the guest OS made the system call and it checks to ensure its a call the guest OS should be making, then passes it onto the host OS
If the current VPL is user, the VMM knows the guest application made the system call and it sends a SIGUSR1 to the guest OS to notify it
– SIGUSR1 signal handler in the guest kernel is the equivalent of the system-call trap handler in a normal OS
SIGALRM, SIGIO, and SIGSEGV signals are used to emulate the hardware timer, I/O device interrupts, and memory exceptions
UMLinux emulates the enabling/disabling of interrupts by masking signals
The TCB is comprised of the VMM kernel module and the host OS
UMLinux
Attacker strategies:
From above
DoH: Attacker can cause application processes to exploit any/all host OS functionality in dangerous ways
OoO: Attacker can take similar avenues to attack Guest OS, however VMMlimits available systems calls to < 7% and Guest OS can only accessa limited number of host files and devices
From below
DoH: Attacker can send dangerous network packets to the host to compromise lowerlevels of the protocol stack
OoO: Less of the host OS network stack is exposed to the same dangerous packets
Logging And Replaying
Logging is used to recover state– Start from a checkpoint of a prior state, then roll forward using the log
Most events are deterministic and needn’t be logged however any host system calls that can yield non-deterministic results must be logged
Non-deterministic events are categorized as either time or external input– Time refers to the point in the execution stream which an event takes place
– External input is data received from a non-logged entity (keyboard, mouse, etc)
Output to peripherals does not affect the replay process
Log records are added and saved to disk similar to Linux syslogd daemon
PC and the # of branches executed since the last interrupt are logged
New asynchronous virtual interrupts do not perturb VM process playback
Logging And Replaying
ReVirt goes through two phases to find the right instruction at which to deliver the original asynchronous virtual interrupt
– 1st phase has branch_retired generate an interrupt after most branches– 2nd phase is needed to stop at exactly the right instruction
Replay can occur on any host with similar processor type as host
Most non-deterministic sources generate small amounts of log data
Received network messages can generate massive logs
– Can reduce the amount of logged network data since the receiver doesn’t need to log data because the sender can recreate the data via replay
– Requires cooperating computers to trust each other to regenerate the same message data during replay
Logging And Replaying
Administrator tools used to in understanding the attack:
– Tools that run inside the guest VM to probe the VM state edit files list current processes, etc
– Tools that run outside the guest VM to analyze the state of a VM
Xserver Debuggers Disk Analyzer, etc
Experiments: Testbed
VM is configured to use 192 MB of physical memory Virtual hard disk is stored on a raw disk partition
Experiments: Objective
Measure Virtualization Overhead:– Application runtimes within UMLinux vs. runtimes on the host OS– Evaluates 5 workloads with a warm cache averaged over 3 runs
Validate Correctness:– Micro-benchmarks run in the VM to verify virtual interrupts are being
replayed at the same point at which they occurred during logging– Macro-benchmark verifies ReVirt faithfully plays back input from
external systems
Measure Logging And Replaying Overhead– Quantify the time and space overhead of logging– Checkpoint overhead is not included
Attack Analysis– Exploit the ptrace race condition and verify replay
Future Work
Make checkpointing faster and more convenient– Accelerate disk copy done during checkpointing– Enable the VMM to checkpoint a running VM
Reduce host OS size used to support UMLinux
Build higher level analysis tools to leverage ability to replay detailed, long-term executions
Move the X server into another VM
Use ReVirt as a building block for new security services
Cooperative logging in ReVirt?
Conclusion
ReVirt adopts VM and fault tolerance techniques to enable replay of long-term instruction by instruction execution to facilitate attack analysis
Target OS and applications run within the VM
ReVirt can replay execution before, during and after an intrusion
ReVirt logs all non-deterministic events so it can replay non-deterministic attacks and executions
ReVirt provides arbitrarily detailed observations about what transpired
ReVirt is implemented as a set of modifications to the host OS
ReVirt adds “reasonable?” time and space overhead
Observations
Total overhead for kernel-intensive workloads: up to 66%– Is this overhead justifiable?– Should have reported total overhead in tables for increased clarity
Checkpoint time and space overhead not characterized Host OS can still be compromised
– No quantitative data to support narrower interface is more secure Tests seem to focus on overhead rather than ability to enable analysis There are no specific tools to analyze potentially large ReVirt logs Log growth could be much larger since SPECWeb99 benchmark was
based on only 15 simultaneous connections Replay must start from a powered-off VM state, is this practical? How portable is ReVirt to other guest/host OS’s? “No perceptible time overhead” is a weak measurement. Better metric? No multiprocessor support yet published in late 2002
Discussion
1. The authors state that they “believe that even an overhead of 58% is not prohibitive for sites that value security.” (p11) I believe that an overhead of 58% is pretty big, especially for busy systems. How much of a concern is this really?
2. They show the average space/day logging takes. But does this include the daily snapshot as well? If you're running a lot of guest OS’s concurrently, couldn't this become a bottleneck (or does ReVirt only run one guest OS at a time)? They give results for both virtualization overhead and logging overhead, but not both at the same time (which is the real-world scenario). Is there any indication to how much the total overhead is?
Discussion
3. The authors talk about checkpointing in a few areas of the paper. They claim it will be a rare event and so do not test the time and space overhead to run one. They then say that their future work is to “make checkpointing faster and more convenient.”
I wonder how slow and inconvenient checkpointing is at this point for them to avoid testing it (or releasing the test results)? I think this should have been included in the paper as, even though checkpointing may not happen often, it is still part of the system overhead.
Discussion
4. If ReVirt detects the non-deterministic events occurred during the attack, what can it do to prevent further attack? Is it possible to isolate them?
5. Is UMLinux the only guest OS that can be used in ReVirt? Is there any other OS were ported to ReVirt? Or how about the development of ReVirt or some system like it?
Discussion
6. The authors introduce ReVirt to address two shortcomings of current systems - integrity and completeness. They state that the "current system loggers lack integrity because they assume the operating system kernel is trustworthy." However, they also indicate that "even the VMM may be subject to security breaches," but that the VMM is more trustworthy than operating system because the interface is narrower.
Does a narrower interface really make that much of a difference in securing the system? Can't attackers still do a lot of damage?
Discussion
7. They talk about how this approach is useful in analyzing an attack, and in section 5.4 give an example of this. But to do so they introduced a vulnerability and then used the logging method to analyze an attack that they themselves initiated. While the example may have some validity, it would have been nice to see something that they didn't set up themselves.
Discussion
8. Cooperative logging is cited as being capable of significantly reduced storage as no LAN data needs to be logged (it can just be regenerated); however you lose the ability to run independent machines without running the whole network (or so it seems). Are there any schemes that let you do both?
9. They use a modified version of Linux 2.4.18 as the host OS. I’m wondering how modified it is? They claim that the host OS is safe from attack, but because it is still just an ordinary OS, I’m not sure about this. What do you think?
Discussion
10. ReVirt logs all input from external devices. Could these logs be used to pick up passwords from keyboard input or other security input (i.e. fingerprint readers and files from memory sticks)?
11. "ReVirt log all input from external entities. These include most virtual devices: keyboard, mouse, network interface card, ..." When we want to analyze the intrusion of a highly-used web server, logging all input from the network device seems quite expensive (I believe it would be much more than 1.4 GB/day as shown in the experiment). Any solution for that?
Discussion
12. So does it make more sense to add this VM layer just so we can track, or is it just easier? (i.e. what are the arguments for not having a VM layer?)
13. When they used ReVirt to analyze and attack, they only tested it with one attack. I think a broader range of attacks should have been tested to get an accurate account of what ReVirt can do. What do you think about this?
14. What kind of analysis tools do the authors suggest/ provide? They were able to find an error, but when they themselves knew exactly what they were looking for.
Discussion
15. In section 4.4, the paper mentioned alternative architectures for logging and replay. Basically, they compared OS-on-OS structure with direct-on-host structure. How about the direct-on-VMM structure? Does removing host OS improve the performance and stability of ReVirt?
16. In section 6, the paper compared hypervisors with ReVirt and argued that they are targeting different goals. However, since Hypervisors already have similar logging functionalities, why not design ReVirt as a plugin (i.e. a special VM) for some hypervisors?
Discussion
17. Is there some other way to improve security that does not involve loading the VMM as a kernel module?
18. The guest doesn't run X itself, but rather connects to a remote X server (say on the host). Doesn't this introduce a hook that a malicious user could use to gain access to (or at least destabilize) the host?
Discussion
19. Why does ReVirt have only a single disk checkpoint which is the virtual machine being powered off? Why did they not think to add in other checkpoints? Why did they "envision checkpointing being a rare event?" Is this because they don't see their system being attacked more frequently than that?