linear obfuscation to combat symbolic execution
DESCRIPTION
Linear Obfuscation to Combat Symbolic Execution. 1 Nankai University 2 Pennsylvania State University 3 Singapore Management University. Zhi Wang 1 , Jiang Ming 2 , Chunfu Jia 1 and Debin Gao 3. European Symposium on Research in Computer Security 2011. Outline. Introduction - PowerPoint PPT PresentationTRANSCRIPT
Linear Obfuscation to Combat Symbolic Execution
Zhi Wang1, Jiang Ming2, Chunfu Jia1 and Debin Gao3
1 Nankai University2 Pennsylvania State University3 Singapore Management University
European Symposium on Research in Computer Security 2011
Outline
Introduction Linear Obfuscation Evaluation Conclusion
Outline
Introduction Linear Obfuscation Evaluation Conclusion
Trigger-based Code and Symbolic Execution
• Trigger-based code only executes when specific inputs are received.
• Symbolic execution– Combined with dynamic taint analysis and
theorem proving– Discover trigger-based code– Find out the trigger condition
Conditional Code Obfuscation
• Sharif et al. proposed a conditional code obfuscation scheme:– Obfuscate equality conditions– One-way hash function– Hard to reason about trigger conditions – Cryptographic functions might improve malware
detection– Inequality conditions
Our goals
• Less suspicious without using cryptographic functions• Support both equality and inequality
conditions.
Linear Obfuscation
• Use linear operations to combat symbolic execution without any cryptographic functions.– The obfuscated code becomes less suspicious in
malware detection.• Introduce unsolvable conjectures into trigger
conditions that inequality conditions are able to be easily obfuscated.
Unsolved Conjectures
• Many unsolved conjectures involve simple linear operations.
• Such operations are usually fast and commonly used in basic algorithms.
• They are perfect candidates to be used in linear obfuscation.
• Another advantage is that they can be used to obfuscate inequality conditions.
Collatz Conjecture(3x+1 Conjecture)
Take any natural number n. If n is even, divide it by 2, if n is odd multiply it by 3 and add 1. Repeat the process , ai will eventually reach 1 regardless of the value of n
Unsolved conjectures
• These conjectures are similar to the Collatz conjecture in that they all converge to a fixed value regardless of the starting value.
Outline
Introduction Linear Obfuscation Evaluation Conclusion
Overview
• Linear obfuscation does not hide the malicious behavior, but to hide the trigger conditions.
• Linear obfuscation complicates symbolic execution by 3 steps.– Inserting a spurious input variable – Choosing an unsolved conjectures– Rebuilding the trigger condition
A linear obfuscation example
Semantics
• Symbolic execution has a hard time figuring out the trigger condition, are we able to figure that out?– The new trigger conditions introduced by
unsolvable conjectures are undecidable for symbolic execution.
– But in the common program integer range(232 or 264), the new trigger conditions are decidable.
– The 3x+1 conjecture has been tested and found to always reach 1 for all integers <= 20*258
How to insert a spurious variable
• Only variables derived from program input are taken as symbol in symbolic execution.
• Spurious variables must dependent upon real program inputs.
• It is not the case that the more complicated the relationship between y and x is, the longer symbolic execution takes.– Floating point operations– Complex pointer operations
How to insert a spurious variable(2)
• Symbolic execution will use concrete values to simplify the constraints.
• So the relationship between x and y should be simple enough.
How to choose an unsolved conjecture
• Convergent: the loop converges• Partially decidable: although no proof exists, it
has been tested that the terminating condition is known under certain range.
• Machine implementable: it can be easily implemented in common programming languages.
• Simple/Linear: the implementation is simple and involves linear operations
Variation
• Intuitively the trigger conditions is related to the converge value. – not only converge value can be used. For Collatz
conjecture we can use 1, 2, 4 as terminating conditions.
– Stopping time can also be used as terminating conditions.
while (y > 1 ) for (i=0; i<1000; i++)
Rebuild Trigger Condition
• Now, what we have?– a new spurious variable y = x+1000– an unsolved conjecture with a trigger condition y
== 1• Depending on the original trigger condition,
we modify it in three different ways.
Rebuild Trigger Condition
• > or >= (e.g., x > 30): Since the spurious variable is always greater than or equal to 1, so x - y > 29 // 29 = 30 – 1.
• < or <= (e.g., x < 30): Similarly, we have x + y < 31 // 31 = 30 + 1.
• == (e.g., x == 30): This is equivalent to the intersection of two inequalities (x >= 30) && (x<= 30), and therefore we have (x+y >= 31) && (x – y <= 29)
Outline
Introduction Linear Obfuscation Evaluation Conclusion
Overhead in Size
Malware Size of original binary
Increase in size (bytes) after obfuscationBefore memory alignment
After memory alignment
Blaster 29,426 72 64
Mydoom 28,240 46 64
NetSky 36,182 60 64
• Small: the size of the obfuscated code is less than one hundred bytes longer than the original program
Dynamic trigger condition
• The obfuscated trigger condition is a sequence of dynamic conditions in the execution trace.
Pattern Match
• Linear obfuscation might be susceptible to pattern recognition, assuming that the unsolved conjecture we use is known to attackers.
• Solutions:– randomly choosing various unsolved conjectures– combining with other existing obfuscation
techniques (e.g., opaque constants)
Control Flow Comparison•Similar to common program algorithm
A quick sort algorithm Our obfuscated Code
Limitation
• In our analysis, we assume that there is a single trigger condition, and show that symbolic execution has a hard time figuring it out.
• However, the results may change when there is a larger set of trigger inputs that satisfy the trigger condition.
• For example, x > 5.
Outline
Introduction Linear Obfuscation Evaluation Conclusion
Conclusion
• In this paper, we introduce a novel linear obfuscation scheme that makes symbolic execution difficult in finding trigger conditions.
• Our obfuscator applies the concept of unsolved conjectures and only adds a loop to the obfuscated code without cryptographic functions.
• Security analysis shows that there does not exist other analyzing strategy in making the analysis simpler.
Thank you!