linear obfuscation to combat symbolic execution

Linear Obfuscation to Combat Symbolic Execution

Zhi Wang1, Jiang Ming2, Chunfu Jia1 and Debin Gao3

1 Nankai University2 Pennsylvania State University3 Singapore Management University

European Symposium on Research in Computer Security 2011

Outline

Introduction Linear Obfuscation Evaluation Conclusion

Trigger-based Code and Symbolic Execution

• Trigger-based code only executes when specific inputs are received.

• Symbolic execution– Combined with dynamic taint analysis and

theorem proving– Discover trigger-based code– Find out the trigger condition

Conditional Code Obfuscation

• Sharif et al. proposed a conditional code obfuscation scheme:– Obfuscate equality conditions– One-way hash function– Hard to reason about trigger conditions – Cryptographic functions might improve malware

detection– Inequality conditions

Our goals

• Less suspicious without using cryptographic functions• Support both equality and inequality

conditions.

Linear Obfuscation

• Use linear operations to combat symbolic execution without any cryptographic functions.– The obfuscated code becomes less suspicious in

malware detection.• Introduce unsolvable conjectures into trigger

conditions that inequality conditions are able to be easily obfuscated.

Unsolved Conjectures

• Many unsolved conjectures involve simple linear operations.

• Such operations are usually fast and commonly used in basic algorithms.

• They are perfect candidates to be used in linear obfuscation.

• Another advantage is that they can be used to obfuscate inequality conditions.

Collatz Conjecture(3x+1 Conjecture)

Take any natural number n. If n is even, divide it by 2, if n is odd multiply it by 3 and add 1. Repeat the process , ai will eventually reach 1 regardless of the value of n

Unsolved conjectures

• These conjectures are similar to the Collatz conjecture in that they all converge to a fixed value regardless of the starting value.

Outline


Overview

• Linear obfuscation does not hide the malicious behavior, but to hide the trigger conditions.

• Linear obfuscation complicates symbolic execution by 3 steps.– Inserting a spurious input variable – Choosing an unsolved conjectures– Rebuilding the trigger condition

A linear obfuscation example

Semantics

• Symbolic execution has a hard time figuring out the trigger condition, are we able to figure that out?– The new trigger conditions introduced by

unsolvable conjectures are undecidable for symbolic execution.

– But in the common program integer range(232 or 264), the new trigger conditions are decidable.

– The 3x+1 conjecture has been tested and found to always reach 1 for all integers <= 20*258

How to insert a spurious variable

• Only variables derived from program input are taken as symbol in symbolic execution.

• Spurious variables must dependent upon real program inputs.

• It is not the case that the more complicated the relationship between y and x is, the longer symbolic execution takes.– Floating point operations– Complex pointer operations

How to insert a spurious variable(2)

• Symbolic execution will use concrete values to simplify the constraints.

• So the relationship between x and y should be simple enough.

How to choose an unsolved conjecture

• Convergent: the loop converges• Partially decidable: although no proof exists, it

has been tested that the terminating condition is known under certain range.

• Machine implementable: it can be easily implemented in common programming languages.

• Simple/Linear: the implementation is simple and involves linear operations

Variation

• Intuitively the trigger conditions is related to the converge value. – not only converge value can be used. For Collatz

conjecture we can use 1, 2, 4 as terminating conditions.

– Stopping time can also be used as terminating conditions.

while (y > 1 ) for (i=0; i<1000; i++)

Rebuild Trigger Condition

• Now, what we have?– a new spurious variable y = x+1000– an unsolved conjecture with a trigger condition y

== 1• Depending on the original trigger condition,

we modify it in three different ways.

Rebuild Trigger Condition

• > or >= (e.g., x > 30): Since the spurious variable is always greater than or equal to 1, so x - y > 29 // 29 = 30 – 1.

• < or <= (e.g., x < 30): Similarly, we have x + y < 31 // 31 = 30 + 1.

• == (e.g., x == 30): This is equivalent to the intersection of two inequalities (x >= 30) && (x<= 30), and therefore we have (x+y >= 31) && (x – y <= 29)

Outline


Overhead in Size

Malware Size of original binary

Increase in size (bytes) after obfuscationBefore memory alignment

After memory alignment

Blaster 29,426 72 64

Mydoom 28,240 46 64

NetSky 36,182 60 64

• Small: the size of the obfuscated code is less than one hundred bytes longer than the original program

Dynamic trigger condition

• The obfuscated trigger condition is a sequence of dynamic conditions in the execution trace.

Pattern Match

• Linear obfuscation might be susceptible to pattern recognition, assuming that the unsolved conjecture we use is known to attackers.

• Solutions:– randomly choosing various unsolved conjectures– combining with other existing obfuscation

techniques (e.g., opaque constants)

Control Flow Comparison•Similar to common program algorithm

A quick sort algorithm Our obfuscated Code

Limitation

• In our analysis, we assume that there is a single trigger condition, and show that symbolic execution has a hard time figuring it out.

• However, the results may change when there is a larger set of trigger inputs that satisfy the trigger condition.

• For example, x > 5.

Outline


Conclusion

• In this paper, we introduce a novel linear obfuscation scheme that makes symbolic execution difficult in finding trigger conditions.

• Our obfuscator applies the concept of unsolved conjectures and only adds a loop to the obfuscated code without cryptographic functions.

• Security analysis shows that there does not exist other analyzing strategy in making the analysis simpler.

Thank you!

linear obfuscation to combat symbolic execution

Documents