investigating provably secure and practical software protection
DESCRIPTION
Investigating Provably Secure and Practical Software Protection. Lt Col Todd McDonald AFIT/ENG [email protected] x4639. Research Interests. Program Encryption Program protection / secure coding Obfuscation / tamperproofing Mobile agent security / mobile code - PowerPoint PPT PresentationTRANSCRIPT
Investigating Provably Secure and Practical Software Protection
Lt Col Todd McDonaldAFIT/[email protected]
Research Interests
Program EncryptionProgram protection / secure codingObfuscation / tamperproofing
Mobile agent security / mobile code Information / database security Multi-agent architectures Trust-based computing
Three Focus Areas for Program Protection Semantic Transformation Random Program Security Model /
Randomizing Obfuscators Perfectly Secure White Box
Obfuscators
Goal: Characterize the aspects of program Goal: Characterize the aspects of program protection that can be done with some protection that can be done with some measurable degree of securitymeasurable degree of security
Program Scenario
010010010001001001001
Program Protection
Adversarial Observation:Black Box AnalysisWhite Box Analysis
If the adversary cannot determine the function/intent of the device by input/ output analysis, we say it is black-box protected
If the adversary cannot determine the function/intent of the device by analyzing the structure of the code, we say it is white-box protected
Intent Protected: Combined black-box and white-box protectiondoes not reveal the function/intent of the program
How to Define/Measure “Program Protection” Security
Explicitly Define adversary task and require that it is
computationally difficult Disadvantage: lot of threats/some are difficult to
formulate in terms of computational problems Implicitly
Define ideal security model and require our case is nearly as good as ideal one
Disadvantage: Barak et al. result shows this is impossible based
Where are we at?
Obfuscation: somehow make something less recognizable
Known methods of obfuscation are reverse of good software engineering
None guarantee impossibility of retrieving sensitive information or algorithms (concealment is not considered strong security, only deterrent)
A determined specialist given enough time and resources is able to de-obfuscate any obfuscated program
Heuristic Metrics
Metric Name Definition Cyclomatic Complexity
Function complexity increases as number of predicates increase
Nesting Complexity
Function complexity increases as conditional structure nesting levels increase
Data Structure Complexity
Function complexity increases as static data structure complexity increases
Potency Measures the complexity of the obfuscated program versus that of the original program
Resilience Measures the ability of an obfuscation to withstand a deobfuscation attack Overhead Measures the time or space increase of an obfuscation Stealth Measures the recognizable difference between obfuscated code and normal
code within a program Quality Measures the combined qualities of potency, overhead, and resilience
Heuristic Obfuscation
Technique Methodology Opaque Predicates Using predicates with known values at obfuscation time: always true, sometimes true, always false Variable Renaming Renaming variables and data structures to cognitively meaningless names Control Flow Mangling Reordering normal program control and execution flow to prevent decompilation and disassembly Memory Mangling Adding or reversing the order of addressing/dereferencing operations String Encryption Encrypting sensitive data strings using a data cipher and decrypting them prior to use Multiple Functions Introducing additional functions into code to obscure the original (intended) function Code Encryption Encrypting parts of the code using a data cipher and decrypting them prior to execution Loop Unrolling Confusing the normal logic of a loop by altering indexes or executing some number of loop runs Array Merging / Splitting Splitting an array into two arrays or merging two arrays into one large one in order to confuse the index logic Method Cloning Creating different versions of the same method Code Interleaving Merging two pieces of code in parallel and using specific means to distinguish the original methods. Code Concatenation Merging two pieces of code serially by taking the output of one and using it input to the other: f(x), g(y) g(f(x)) Code Outlining Taking a statement sequence and creating a separate function Code Inlining Replacing a function call with its actual code Random Statements Inserting execution neutral statements with proper characteristics in random and pre-selected places Randomized Ciphers Altering well-known data ciphers in random ways to produce unique embedded key-based encryptions Code Morphing Creating self-modifying code; changes the runtime and static code structure of the obfuscated program on execution
Information Theoretic Definition of Obfuscation
Virtual black box (VBB): anything one can compute from the obfuscated program could also be computed from input-output behavior of the original program
Obfuscated Program P’Program P
P’ = O(P)
Oracle for P
?????
I O
TTP
Black Box Intent Protectiona.k.a Semantic Transformation
Input x
Program p Program p
Output y Output y
Transformation t(p, k) = r, p
Recovery r(y, k-1) = y
Semantically Secure Black Box Protection
P’ = O(P)
Circuit P’
White Box Protection ??
Two “Provable” Approaches to White Box Protection
Try to hide/interleave the seem between P and E (using randomization and a random program model) How do we/can we characterize the hiding?
Completely hide all intermediate operations (using perfect white-box protection via canonical reduction)
Random Programs/Circuits
circuit
Random Programs/Circuits
Correlating Program and Data Encryption
Randomizing Obfuscators
Perfect White Box Protection
main (int argc, char *argv) { int x,y;
/* Get input from the user */x = argv[1];
/* Super secret algorithm */ …….. ……..
/* Output the result */cout << y;
}
Perfect White Box Protection
What is the best we can hope for to protect the “structure” of the code that performs the secret algorithm?We want the program to act just like
an oracle wouldWe want the program to be a “black-
box” implementation
Perfect White Box Protection = Black Box Implementation
main (int argc, char *argv) { int x,y;
/* Get input from the user */x = argv[1];
/* Super secret algorithm */if (x == 1)
y = 281827391; else if (x == 2)
y = 23; else if (x == 3)
y = 1867575;….
/* Output the result */cout << y;
}
Perfect White Box Protection
Problems with this approach:You have to know all inputs/outputsTherefore, the algorithm could never
be efficient for all size input nTherefore, the algorithm could never
be general for all programsWhich lends support to impossibility
results…
Perfect White Box Protection
But:Mobile code programs are targeted for
small information exchangesInput size might be limitedYou may not care about the full range
of possible inputs, only some…
Perfect White Box Protection
Regardless of efficiency:We can define a methodology for perfect
white box protectionWe could apply that method for programs
of small input size n (which is defined only by the amount of time or resources you want to apply to get the result)
Those programs would be perfectly white box protected
Circuits
Consider circuit P 3 representations:
• Algebraically (Boolean function)• Structurally (circuit diagram)• Truth table (input/output behavior)
INPUT(3)INPUT(2)INPUT(1)
OUTPUT(7)OUTPUT(6)
4 = AND(3,2)5 = OR(4,1)6 = XOR(4,3)7 = NAND(5,6)
Structural view of P:
5
6
74
2
3
1
Circuits
Behavioral view of P:
Circuits
Functional view of P: fP
1) Derive it from structurey6 = (x3x2(x3x2x3)’)’(((x3(x3x2x3)’)’)’y7 = ((x3x2 + x1) (x3x2(x3x2x3)’)’(((x3(x3x2x3)’)’)’)’
2) Derive it from truth table
y6 = x1’x2’x3 + x1x2’x3 y7 = x1’x2’x3’ + x1’x2’x3 + x1’x2x3’ + x1’x2x3 + x1x2’x3’ + x1x2x3’ + x1x2x3
So what does canonical minimization do?
All you need is the truth table or behavioral view to get an SOP form
So what does canonical minimization do for us?
This is what an oracle for P would “use” when asked questions about P …
Any circuit that implements this truth table would then be a “black box implementation” of P
The “Logic” of Canonical P
if (x1 == 0) && (x2 ==0) & (x3==0) y6 = 1
y7 = 0else if ((x1==0) && (x2==0) && (x3==1) y6 = 1
y7 = 1……
Can I ever recover the structure of the original P from canonical P?
Original P
Canonical P
2
3
1
6
4
7
5
6
74
2
3
1
Perfect White Box Protection Architecture
For Designing Catenation-Based Obfuscators : P’ = P + E
Questions
???