investigating provably secure and practical software protection

Investigating Provably Secure and Practical Software Protection

Lt Col Todd McDonaldAFIT/[email protected]

mailto:[email protected]

Research Interests

Program EncryptionProgram protection / secure codingObfuscation / tamperproofing

Mobile agent security / mobile code Information / database security Multi-agent architectures Trust-based computing

Three Focus Areas for Program Protection Semantic Transformation Random Program Security Model /

Randomizing Obfuscators Perfectly Secure White Box

Obfuscators

Goal: Characterize the aspects of program Goal: Characterize the aspects of program protection that can be done with some protection that can be done with some measurable degree of securitymeasurable degree of security

Program Scenario

010010010001001001001

Program Protection

Adversarial Observation:Black Box AnalysisWhite Box Analysis

If the adversary cannot determine the function/intent of the device by input/ output analysis, we say it is black-box protected

If the adversary cannot determine the function/intent of the device by analyzing the structure of the code, we say it is white-box protected

Intent Protected: Combined black-box and white-box protectiondoes not reveal the function/intent of the program

How to Define/Measure “Program Protection” Security

Explicitly Define adversary task and require that it is

computationally difficult Disadvantage: lot of threats/some are difficult to

formulate in terms of computational problems Implicitly

Define ideal security model and require our case is nearly as good as ideal one

Disadvantage: Barak et al. result shows this is impossible based

Where are we at?

Obfuscation: somehow make something less recognizable

Known methods of obfuscation are reverse of good software engineering

None guarantee impossibility of retrieving sensitive information or algorithms (concealment is not considered strong security, only deterrent)

A determined specialist given enough time and resources is able to de-obfuscate any obfuscated program

Heuristic Metrics

Metric Name Definition Cyclomatic Complexity

Function complexity increases as number of predicates increase

Nesting Complexity

Function complexity increases as conditional structure nesting levels increase

Data Structure Complexity

Function complexity increases as static data structure complexity increases

Potency Measures the complexity of the obfuscated program versus that of the original program

Resilience Measures the ability of an obfuscation to withstand a deobfuscation attack Overhead Measures the time or space increase of an obfuscation Stealth Measures the recognizable difference between obfuscated code and normal

code within a program Quality Measures the combined qualities of potency, overhead, and resilience

Heuristic Obfuscation

Technique Methodology Opaque Predicates Using predicates with known values at obfuscation time: always true, sometimes true, always false Variable Renaming Renaming variables and data structures to cognitively meaningless names Control Flow Mangling Reordering normal program control and execution flow to prevent decompilation and disassembly Memory Mangling Adding or reversing the order of addressing/dereferencing operations String Encryption Encrypting sensitive data strings using a data cipher and decrypting them prior to use Multiple Functions Introducing additional functions into code to obscure the original (intended) function Code Encryption Encrypting parts of the code using a data cipher and decrypting them prior to execution Loop Unrolling Confusing the normal logic of a loop by altering indexes or executing some number of loop runs Array Merging / Splitting Splitting an array into two arrays or merging two arrays into one large one in order to confuse the index logic Method Cloning Creating different versions of the same method Code Interleaving Merging two pieces of code in parallel and using specific means to distinguish the original methods. Code Concatenation Merging two pieces of code serially by taking the output of one and using it input to the other: f(x), g(y) g(f(x)) Code Outlining Taking a statement sequence and creating a separate function Code Inlining Replacing a function call with its actual code Random Statements Inserting execution neutral statements with proper characteristics in random and pre-selected places Randomized Ciphers Altering well-known data ciphers in random ways to produce unique embedded key-based encryptions Code Morphing Creating self-modifying code; changes the runtime and static code structure of the obfuscated program on execution

Information Theoretic Definition of Obfuscation

Virtual black box (VBB): anything one can compute from the obfuscated program could also be computed from input-output behavior of the original program

Obfuscated Program P’Program P

P’ = O(P)

Oracle for P

?????

I O

TTP

Black Box Intent Protectiona.k.a Semantic Transformation

Input x

Program p Program p

Output y Output y

Transformation t(p, k) = r, p

Recovery r(y, k-1) = y

Semantically Secure Black Box Protection

P’ = O(P)

Circuit P’

White Box Protection ??

Two “Provable” Approaches to White Box Protection

Try to hide/interleave the seem between P and E (using randomization and a random program model) How do we/can we characterize the hiding?

Completely hide all intermediate operations (using perfect white-box protection via canonical reduction)

Random Programs/Circuits

circuit

Random Programs/Circuits

Correlating Program and Data Encryption

Randomizing Obfuscators

Perfect White Box Protection

main (int argc, char *argv) { int x,y;

/* Get input from the user */x = argv[1];

/* Super secret algorithm */ …….. ……..

/* Output the result */cout << y;

}


What is the best we can hope for to protect the “structure” of the code that performs the secret algorithm?We want the program to act just like

an oracle wouldWe want the program to be a “black-

box” implementation

Perfect White Box Protection = Black Box Implementation

main (int argc, char *argv) { int x,y;

/* Get input from the user */x = argv[1];

/* Super secret algorithm */if (x == 1)

y = 281827391; else if (x == 2)

y = 23; else if (x == 3)

y = 1867575;….

/* Output the result */cout << y;

}


Problems with this approach:You have to know all inputs/outputsTherefore, the algorithm could never

be efficient for all size input nTherefore, the algorithm could never

be general for all programsWhich lends support to impossibility

results…


But:Mobile code programs are targeted for

small information exchangesInput size might be limitedYou may not care about the full range

of possible inputs, only some…


Regardless of efficiency:We can define a methodology for perfect

white box protectionWe could apply that method for programs

of small input size n (which is defined only by the amount of time or resources you want to apply to get the result)

Those programs would be perfectly white box protected

Circuits

Consider circuit P 3 representations:

• Algebraically (Boolean function)• Structurally (circuit diagram)• Truth table (input/output behavior)

INPUT(3)INPUT(2)INPUT(1)

OUTPUT(7)OUTPUT(6)

4 = AND(3,2)5 = OR(4,1)6 = XOR(4,3)7 = NAND(5,6)

Structural view of P:

5

6

74

2

3

1

Circuits

Behavioral view of P:

Circuits

Functional view of P: fP

1) Derive it from structurey6 = (x3x2(x3x2x3)’)’(((x3(x3x2x3)’)’)’y7 = ((x3x2 + x1) (x3x2(x3x2x3)’)’(((x3(x3x2x3)’)’)’)’

2) Derive it from truth table

y6 = x1’x2’x3 + x1x2’x3 y7 = x1’x2’x3’ + x1’x2’x3 + x1’x2x3’ + x1’x2x3 + x1x2’x3’ + x1x2x3’ + x1x2x3

So what does canonical minimization do?

All you need is the truth table or behavioral view to get an SOP form

So what does canonical minimization do for us?

This is what an oracle for P would “use” when asked questions about P …

Any circuit that implements this truth table would then be a “black box implementation” of P

The “Logic” of Canonical P

if (x1 == 0) && (x2 ==0) & (x3==0) y6 = 1

y7 = 0else if ((x1==0) && (x2==0) && (x3==1) y6 = 1

y7 = 1……

Can I ever recover the structure of the original P from canonical P?

Original P

Canonical P

2

3

1

6

4

7

5

6

74

2

3

1

Perfect White Box Protection Architecture

For Designing Catenation-Based Obfuscators : P’ = P + E

Questions

???

investigating provably secure and practical software protection

Documents