effective and efficient malware detection at the end host

EFFECTIVE AND EFFICIENT MALWARE DETECTION

AT THE END HOST

Presentation by Clark Wachsmuth

C. Kolbitsch, P. M. Comparetti, C. Kreugel, E. Kirda, X. Zhou and X. Wang

or: How I Learned to Stop Worrying and Love

the Malware-infested Internet

2

THE PROBLEM Malware! Ineffective (and/or inefficient) detection

models Can be evaded by fairly simple means by

malware authors, such as using polymorphism, obfuscation or system call reordering

Resource-heavy detectors may be effective, but not efficient enough for average consumer computer

3

PAST IMPLEMENTATIONS Network-based detection

Pros: Useful for detecting some network malware Modern malware is heavily network-bound

Cons: For network-based malware only Content sniffers thwarted by encrypted data Blending attacks make malicious data match

normal data signatures

4

PAST IMPLEMENTATIONS Host-based detection

Pros: Has the resources to see complete work of malware

programs – not limited to specific host resource Some pre-emptive strategies

Cons: Code obfuscation and polymorphism can easily

bypass methods such as byte signatures while keeping the same functionality

System call sequence based detection, again, easily bypassed by reordering calls or making unused calls

5

PAST IMPLEMENTATIONS Static analysis-based detection

Pros: More effective due to focus on malware

behavior, thus less stymied by obfuscation and polymorphism

Cons: Method itself is difficult to employ Has its own vulnerabilities such as detecting

metamorphic code and runtime packaging Takes a heavy toll on system resources making

it unreasonable for home computer systems

6

PAST IMPLEMENTATIONS Dynamic analysis-based detection

Pros: Less rigid focus on malware behavior allowing

for a more general and broad way of detecting malware

Cons: Can require special hardware for detection

(data tainting) Large associated overhead making it unusable

in the home computer realm

7

THE SOLUTION Effective and Efficient malware detection, duh!

But how? Effective:

Can’t be duped by simple order-changing, rearranging schemes

Doesn’t rely only on known quantities; can detect unknown running programs

No false positives Efficient:

Not incurring a very significant chunk of system resource overhead

8

THE PLAN In a sandboxed environment, observe

different malware and develop fine-grained models

Efficiently match these models up with the run-time behavior of an unknown program

If a match is found, terminate and eliminate

9

BUT HOW?! By creating a behavior graph where each

node is an “interesting” system call The nodes store a symbolic expression

(simple node) or a program “slice” (complex node) that can calculate the output of the system call

These expressions/slices used to detect if output is the argument of another interesting system call during runtime If found, an edge is created between the

two nodes

10

THE CONTROLLED ENVIRONMENT Uses Anubis (Analyzing Unknown Binaries)

Disassembles instructions (including system calls) and keeps an instruction log

Keeps memory log for instructions that read from memory, where (in memory) the instruction reads and writes

Each bite tainted to detect data dependencies between system calls

Any labels within a branch operation are labeled with the taint of the controlling instruction for control dependencies

11

THE INITIAL BEHAVIOR GRAPH

With all the instructions labeled, an initial graph is creating placing on it all system calls (as nodes)

Edges are created when a dependency is found Using the logs, a recursive backwards trace of

system call arguments is made to determine how the argument’s bits were created

These instructions are gathered into a program slice until either an instruction that can’t be traced further (from the outside) or a value produced by an immediate operand from an instruction or coming from the initialized data segment

12

PROGRAM SLICES FUNCTIONS With the slice, we know how and who

created the argument of the sys call It’s not necessarily the direct program

code, though (unrolled loops won’t match with different sizes)

Each line in binary that appears at least once in slice is marked and appropriate code copied to function. Non-marked lines become nops.

Stack needs fixing because stack creating code often not part of slice (uses instruction log)

13

SIMPLIFYING FUNCTIONS Yay! We have a function that gives an

expected output for a given input Some functions can be quite long and

fairly basic We can optimize it to a smaller symbolic

expression This optimization can have huge overhead

reduction at the end host Other functions aren’t so basic, so we

retain the program code of the function rather than have a symbolic reduction

14

SCANNING END HOST Scanner monitors running program for sys calls

Has admin privileges running is user-mode Assume programs can’t get to kernel

All nodes inactive in initial behavior graph When a system call is made, the scanner checks

graph for inactive nodes of the same type and sees if parent nodes are active If found, checks all arguments from sys calls for simple

functions; defers complex functions for later but allows complex function to hold

If all simple function arguments hold, node becomes active

15

SCANNING END HOST When do we check the complex functions?

When we reach an interesting node Interesting if it is a security-relevant system call

(writing to file system, network or registry, starting new processes)

Also interesting if node has no outgoing edges If complex function holds, the interesting node

is confirmed Otherwise, the node with the complex function

becomes inactive and any subgraph rooted under it becomes inactive as well as the edge being formed

16

MATCHING MALWARE If an interesting node is confirmed, then

the program is matched as malware However, if there is no complex function

dependency, then the graph created is not used to help detect future malware programs

The subgraph created with the interesting node is also a behavior graph that denotes a trait of the particular malware running

17

DETECTION EFFECTIVENESS Generated behavior graphs for six

popular malware families (Table 1) 100 samples of each family were

Selected from the database and the non-interesting

samples were tossed out 50 random samples chosen from

remaining bunch to create behavior graphs and train dataset Not all samples could be detected due to

non-interesting behavior and complex function crashes

18

TESTING DATA

19

IS IT EFFECTIVE? Some were effective and some weren’t so much

AV software notoriously bad at classifying malware Confirmed by manual inspection, especially for Agent Restricting samples to 155 known variants yielded

92% effectiveness Also restricted data samples to 108 unknown variants

and still achieved 23% effectiveness, indicating that this method can even detect some unknown variants

This behavior-based method is more general than an AV scanner, therefore requires less graphs than signatures

20

WHAT ABOUT FALSE POSITIVES? Tested on WinXP using IE, Firefox,

Thunderbird, putty and Notepad Yielded no false positives When complex functions were unchecked

and allowed to hold, all of the above yielded false positives

Therefore, system call dependencies are at the root of this method’s success

21

OK, BUT IS IT EFFICIENT? System setup for testing:

WinXP, single-core 1.8Ghz P4 with 1GB RAM Tested using 7-Zip, IE, Visual Studio

22

UMM, DID THAT SAY 40%? CPU / I/O-Bound tests showed low

overhead Compiling seems quite high at 40%

System calls in compiling 5000/sec compared to 7-zip’s 700/sec

Compilation is worst-case scenario Improved symbolic execution engine could

possibly reduce high complex function evaluation of 16.7%

Still performed well for common tasks

23

LIMITATIONS Authors could use time-triggered behavior or

command and control mechanisms to prevent malware behavior during test

A reactive method that only works on running malware But, new graphs can be employed quickly and it can

detect some unknown variants Authors could change algorithms rendering

program slices unusable Changing algorithms is a lot of work and this method

still raises the bar considerably higher for malware authors

24

TECHNICAL CONTRIBUTIONS Developed effective models with

detailed semantic information about the malware family

Created a scanner that efficiently matches the behavior of an unknown, running program against the models by tracking system call dependencies

Experimental evidence that approach is feasible and usable in practice

25

CONCLUSION Effective? Check

With correctly labeled, known variants, a 92% effectiveness was obtained with no false positives

Efficient? Check While compiling was a worst-case scenario,

tasks common to the average end user incurred only a low overhead

effective and efficient malware detection at the end host

Documents

malware behavior

efficient malware detection

different malware

malware authors

system resources

network malwaremodern

detection data taintinglarge

system callsany labels