mining specifications of malicious behavior
DESCRIPTION
Mining Specifications of Malicious Behavior. Mihai Christodorescu (work done at University of Wisconsin) Somesh Jha University of Wisconsin Christopher Kruegel Technical University Vienna. IBM Research. Wide Spectrum of Detectors. Static detectors:. Dynamic/hybrid detectors, host IDS:. - PowerPoint PPT PresentationTRANSCRIPT
Mining Specifications of Mining Specifications of Malicious BehaviorMalicious Behavior
Mihai Christodorescu (work done atUniversity of Wisconsin)
Somesh Jha University of Wisconsin
Christopher Kruegel Technical University Vienna
IBM Research
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 2
Wide Spectrum of DetectorsWide Spectrum of Detectors
• Static detectors:
Semantics-aware malware detection [Christodorescu et al. 2005]
Model checking-based malware detection[Kinder et al. 2005]
Behavior-based spyware detection[Kirda et al. 2006]
Shadow Honeypots [Anagnostakis et al. 2005]
•Dynamic/hybrid detectors, host IDS:
……
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 3
Misuse DetectionMisuse Detection
Distinct techniques, fundamentally similar…
They all require high-quality specifications of malicious behavior.
……
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 4
Current Specification Current Specification GenerationGeneration• Specifications are manually developed
by experts.
Two issues:– Time consuming– Error prone
? Spec
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 5
Problem 1: Problem 1: Specification Specification DelayDelayTime from appearance of new malware
to availability of specification:• Manual analysis of the binary• Testing of the specification
Anti-virus industry: 4-18 hours to generate a new specification.
window of vulnerability
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 6
Problem 2: Problem 2: Spec. ImprecisionSpec. Imprecision
Too general = false positives Angry users
Infected machinesToo specific = false negatives
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 7
Our SolutionOur Solution
MINIMAL: a technique for mining malicious specifications
• Automatic• Flexible specification language• Fast• Performs well (compared to a human
expert)
Specification LanguageSpecification Language
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 9
What’s In a Specification?What’s In a Specification?
Requirements for obfuscationresilience:
1. Describe only relevant operations
2. Capture dependencies where present
3. Preserve independence of operations
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 10
Specifying Malicious Specifying Malicious OperationsOperations• We chose system calls
– Compatible with specifications for behavior-based detectors
– Define interface between trusted OS and untrusted programs
• Mining algorithm is not restricted to the system-call interface.
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 11
Specifying Malicious Specifying Malicious ConstraintsConstraints• Program operations are insufficient to
distinguish malicious from benign.
• We need to capture relations between operations:
F=open(“file”) ; read(F,buf) ; send(S,buf)
Constraints = logical formulas over system-call arguments
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 12
A Sample Specification A Sample Specification (Malspec)(Malspec)• Mass-mailing malware:
Y))T,Base64(l(StringEqua
send(X4,“DATA”)
X:=socket()
connect(X2)
send(X3,“EHLO”)
Y:=read(Z2)
send(X5,T)
Z:=open(S2)
S:=process_name()
2XX
32 XX
43 XX
54 XX
2SS
2ZZ
Mining AlgorithmMining Algorithm
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 14
The Specification Mining The Specification Mining ProblemProblem
Y))T,Base64(l(StringEqua
send(X4,“DATA”)
X:=socket()
connect(X2)
send(X3,“EHLO”)
Y:=read(Z2)
send(X5,T)
Z:=open(S2)
S:=process_name()
2XX
32 XX
43 XX
54 XX
2SS
2ZZ
Known malware
Known benign programsMINIMAL
Specification of malicious behavior
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 15
The Basic Mining OperationThe Basic Mining Operation
Known malware
Known benign program
C Q A
A O O O Q C
Q M C P
O F
O
F M C
A
Malware dependence graph
O O A A B O
R R R M M A A
R RC C R
A W
O W
C C
Benign dependence graph
O O O Q
M C P
Minimalmalspec
Compute dependence graphsStep 1 Compute graph differenceStep 2
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 16
Multi-Program MiningMulti-Program Mining
Maximal union of malspecs:
O O O Q
M C Pvs.
O Q
P
O O
M C
O O O Q
M C P
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 17
System-Call Dependence System-Call Dependence GraphGraph• We use a dynamic analysis to construct
the dependence graph– Static analysis too imprecise on binary
code
• Steps:1. Collect system-call trace2. Infer dependencies between system calls3. Construct (an underapproximation of the)
dependence graph
Step 1
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 18
Discovering DependencesDiscovering Dependences
NtOpenKey( 372, 0x20019, {24, 356, "ActiveComputerName", 0x40, 0, 0} )
NtQueryValueKey( 372, "ComputerName", Full, {TitleIdx=0, Type=1,Name="ComputerName", Data="Z...“}, 108, 76 )
NtClose( 372 )
Def-UseDependenc
es
SubstringDependenc
es
Step 1
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 19
Discovering Local ConstraintsDiscovering Local Constraints
• Access to well-defined resources:– Windows registry– Access to self– System files/directories
Step 1
NtCreateFile ( …, { …, "I-Worm.Mydoom.l.exe", … }, … )
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 20
Graph DifferencingGraph Differencing
Problem:Find the smallest subgraph of malicious operations that does not appear in any benign graph.
Solution:Minimal Contrast Subgraph
[Ting, Bailey SDM 2006]
Step 2
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 21
Minimal Contrast SubgraphsMinimal Contrast Subgraphs
• Idea:Minimal contrast subgraphs andmaximal common edge setsare duals.
• Vertex and edge labels (i.e., system calls and constraint formulas) help the search.
Step 2
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 22
Mining Contrast SubgraphsMining Contrast Subgraphs
• Size of graphs:100K-1.5M nodes, similar for edges
• Worst-case complexity: O(N!)
Step 2
C Q A
A O O O Q C
Q M C P
O F
O
F M C
AMalware dependence graph
O O A A B O
R R R M M A A
R RC C R
A W
O W
C C
Benign dependence graph
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 23
Heuristics Reduce Problem Heuristics Reduce Problem SizeSize• Normalize dependence graph
– Replace system-call sequences with shorter equivalents
• Eliminate disconnected subgraphs
• Eliminate trivial subgraphs
[see paper for details]
EvaluationEvaluation
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 25
Evaluating Evaluating MMINIINIMMALAL
• Goals:– Compare MINIMAL malspecs with those
from human expert– Use mined malspecs with behavior-based
detector
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 26
Experimental SetupExperimental Setup
• Trace collection in Windows 2000:– Malware samples run with no user input
(cf. expected execution model)– Benign samples run with normal user input– Execution for 1 or 2 minutes
• 16 malware samples:– Netsky, MyDoom, Beagle
• 6 benign programs:– Firefox, Thunderbird, installers
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 27
MMINIINIMMALAL vs. Human Expert vs. Human Expert
Average success rate: 77.26%Average mining time: 8 minutes
MMINIINIMMALAL malspecs
Netsky.A 5 7
Netsky.H 5 6
Bagle.C 7 12
MyDoom.E
7 10
Behavioral features as
given by Symantec website.
Behavioral features as
given by Symantec website.
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 28
Mined Malspecs for Netsky.AMined Malspecs for Netsky.A
MMINIINIMMALAL malspecs
Create mutex
Self-installation
Modify boot sequence
Terminate antivirus
Email self as ZIP file
Copy self to network drive
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 29
Limitations of Limitations of MMINIINIMMALAL
• Sensitive to test environment– Malicious behavior might not be observed
during tracing.
• Underapproximation of dependence graph– Complex constraints are not discovered.
• Sensitive to test-set selection– Not all differences are malicious behaviors.
Future WorkFuture Work
Mining Specifications of Mining Specifications of Malicious BehaviorMalicious Behavior
Mihai [email protected]
Here Be Dragons!Here Be Dragons!
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 32
Mining Malspecs from Mining Malspecs from MalwareMalwareDynamic differential analysis
Malware system-call trace
Benign program system-call trace
Malspec:
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 33
Specifying Malicious BehaviorSpecifying Malicious Behavior
• Example• Now manual• NEED: automatic
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 34
A C Q A O O O Q C F O Q O M C P F O M C
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 35
1.1. Only Relevant OperationsOnly Relevant Operations
• System calls
Approach: Collect system-call traces.
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 36
2.2. DependencesDependences
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 37
3.3. Independence of Independence of OperationsOperations
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 38
Good SpecificationsGood Specifications
• One can write specifications satisfying the requirements.
• The algorithm to generate specifications must write specifications satisfying the requirements.
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 39
Misuse-Detection Misuse-Detection FundamentalsFundamentals
• Database of specifications (“signatures”) defines what is malicious.
Protected computerProtected computerProtected computer
Malware detector
Protected computerUnknown executable Protected computer
SigDB
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 40
• Byte signatures are not rich enough to capture malicious behavior.
Hackers evade detection through obfuscation.
Failures of Current DetectorsFailures of Current Detectors
All mass-mailing worms
Underapproximating byte signature
Overapproximating byte signature
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 41
Behavior-Based DetectorsBehavior-Based Detectors
• New detection techniques use higher level specificationsSemantics-Aware Malware Detection (2005)Model Checking-based Malware Detection (2005)Behavior-based Spyware Detection (2006)
These still depend on the quality of the specifications!
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 42
MMINIINIMMALAL: Malware-Mining : Malware-Mining AlgorithmAlgorithm
Known malware
NtOpenKey( 372, 0x20019, {24, 356, "ActiveComputerName",
0x40, 0, 0} )
Monitor system calls during Monitor system calls during executionexecution
System-call traceA C Q A O O O Q C F O Q O M C P F O M C
Step 1
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 43
MMINIINIMMALAL: Malware-Mining : Malware-Mining AlgorithmAlgorithm
Discover dependences between Discover dependences between syscallssyscalls
System-call traceA C Q A O O O Q C F O Q O M C P F O M C
Data dependence
graph
C Q A
A O O O Q C
Q M C P
O F
O
F M C
A
Step 2
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 44
MMINIINIMMALAL: Malware-Mining : Malware-Mining AlgorithmAlgorithm
Find malicious–benign graph Find malicious–benign graph differencedifference
C Q A
A O O O Q C
Q M C P
O F
O
F M C
A
Malspec
Malware dependence graph
O O O Q
M C P
O O A A B O
R R R M M A A
R RC C R
A W
O W
C C
Benign dependence graph
Step 3
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 45
Dependence Graph Dependence Graph NormalizationNormalization• Many sequences of system calls are
equivalent:
Aggregation replaces such sequences with a canonical sequence. (see paper for details)
Step 2
read( …, 1 )read( …, 1 )read( …, 1 )read( …, 1 )read( …, 1 )
read( …, 5 )≡
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 46
MMINIINIMMALAL Specs in Detection Specs in Detection
• Missed malspecs:– Due to unrecovered dependences
(e.g., ZIP compression of data)– Due to incompleteness of trace data
(e.g., certain behaviors did not execute)
• Using mined malspecs in semantics-aware malware detection:Netsky.A malspec Netsky.D, E, F, …
ESEC/FSE 2007, Sept. 5 Mihai Christodorescu 47
ConclusionsConclusions
• The mining malicious of behavior can be automated
• Mined malspecs compare well with those from human experts
• Mining time significantly reduced over manual specification