![Page 1: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/1.jpg)
Wire – A Formal Intermediate Language for Binary Analysis
Silvio Cesare and Yang XiangSchool of Information TechnologyDeakin University
![Page 2: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/2.jpg)
Introduction - Motivation
• Static analysis has many benefits
• Applications include:• Bug detection• Plagiarism detection• Code optimisation
• Mostly source-level, but binary-level analysis offers additional benefits and applications:• Malware detection• Software theft detection• Bug detection of compiled and link-edited programs
![Page 3: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/3.jpg)
Introduction - Challenges
• Binary analysis is hard.– Even separating code from data is undecidable.– Perfect disassembly of x86 is undecidable.
• Many challenges.– Native CISC architectures have hundreds of
complex instructions.– Native instructions have side effects which require
hidden assumptions in analysis.– Native architectures require separate
implementations on each platform.
![Page 4: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/4.jpg)
Innovation in our work
• Wire - a new formal intermediate language (IL).
• Translation of native assembly to our IL.
• Applications - semantic equivalence proofs of obfuscated assembly.
• Applications - Malwise, a malware classification system from our previous work uses Wire as the IL.
![Page 5: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/5.jpg)
Related Work
• A compiler’s intermediate representation– Three Address Code
• Dynamic Binary Instrumentation– QEMU– VEX (used in Valgrind).
• A decompiler’s intermediate representation– DCC– Boomerang– IDA Pro and HexRays
• Binary analysis– Vine (based on Vex), BIL (BitBlaze)– REIL
i := 0L1:
if i >= 10 goto L2 t0 := i*I
t1 := &bt2 := t1 + I*t2 := t0i := i + 1goto L1
L2:
![Page 6: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/6.jpg)
Translating Native Code (1)
• Load object file format– X86 ELF32, PE32– Some Java class file support.
• Disassemble– Linear Sweep– Recursive Traversal– Speculative
• Translate each native instruction to n three address codes.
𝑛𝑎𝑡𝑖𝑣𝑒¿→ {(n , (𝑂𝑝𝑐𝑜𝑑𝑒 ,𝑂𝑝𝑒𝑟𝑎𝑛𝑑1 ,𝑂𝑝𝑒𝑟𝑎𝑛𝑑2 ,𝑂𝑝𝑒𝑟𝑎𝑛𝑑3 ) )|n∈ℕ }
![Page 7: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/7.jpg)
Translating Native Code (2)
• Map registers between IL’s abstract machine and native architecture.
• Assign labels to beginning of basic blocks.
• Assign results of arithmetic (etc.) instructions to condition code variables:– E.g. eq_cond = mkbool x == y
• Decompile parts of IL for additional information.
• Optimise IL code.
![Page 8: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/8.jpg)
Formal Syntax
Instructions I ::=n i
Heap H::= nxn n
Memory M ::= n n
Register R ::=r n
Labels L ::=l pc
AllocAMemory V ::=nxnn
Instructions: (maps instruction number to instruction)Heap: (maps heap address and memory size to non overlapping memory addresses)Register: (maps register name to numeric value)Memory: (maps address to numeric value)Labels: (maps label to instruction address pc)AllocAMemory: (maps alloca address and memory size to non overlapping memory addresses)
Program p ::= p i | i Instruction i ::= m| m t Type t ::=
u8_t| u16_t| u32_t| s8_t| s16_t| s32_t
Instructions m ::= *(r3) := r1|
r3 := (*r1)|
r3 := r1|
r3 := n|
r3 := uop r1|
r3 := r1 bop r2|
r3 := r1 bop n|
mkbool r1 ucond|
mkbool r1 bcond r2| nop| halt|
label l| jmp
l|
ijmp r| if r1
cond1 jmp l| if r1
cond2 r2 jmp l| lcall
s|
cast(r1, t)|
r3 := getpc()|
r3 := returnaddress()|
pusharg(n, r)|
r3 := malloc(r)|
free(r)|
r3 := alloca(r)
Operations uop ::= -|~|!bop ::=
+,-,*,/,%,>>,<<,|,&,^Conditions ucond ::= == 0|!= 0
bcond ::= ==|!= | >|>=|<|<=
Operands v ::= n (an integer literal)r (a
register)l (a
label)s (a
symbol)
![Page 9: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/9.jpg)
Formal Semantics
• Operational semantics define the state transitions that occur from execution of the program.
where I is the current instruction, P is the program state and
P’ is the new program state.
𝑝𝑟𝑒𝑚𝑖𝑠𝑒1...
𝑝𝑟𝑒𝑚𝑖𝑠𝑒𝑛(𝑖 ,𝑃 )⇒ 𝑃 ′
𝑁𝐴𝑀𝐸
![Page 10: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/10.jpg)
Formal Semantics of Wire
• Control Flow Instructions• Arithmetic Instructions• Boolean Instructions• Memory Access Instructions• Casting Instructions• Decompiled Instructions
– Address Instructions– Memory Allocation Instructions– Procedural Instructions
![Page 11: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/11.jpg)
Formal Semantics Examples
• See paper for full instruction semantics
The LOAD instruction implements a memory read.
The STORE instruction implements a memory write.
![Page 12: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/12.jpg)
Formal Semantics – Three Address Code
![Page 13: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/13.jpg)
Applications
• A formal language leads to formal proofs.
• Equivalence proofs enable detection of obfuscated code in malware.
• We assume the translation from the native assembly architecture to the IL is correct.
![Page 14: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/14.jpg)
Applications - Dead Code Insertion• Dead code or junk code is a semantic nop (no
operation).• Inserted into malware to evade signature detection of
code.• The native assembly and Wire’s three address code is
shown below:
native assembly
Wire’s IL BOPCADD %eax,$50,%eax
BOPCSUB %eax,%50,%eax
ASSIGNC $0,,%eax
ASSIGNC $0,-,%eax
mov $0,%eaxadd $50,%eaxsub $50,%eaxmov $0,%eax
![Page 15: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/15.jpg)
How the equivalence proofs work
• The original code is executed following the operational semantics of Wire.
• In the second part of the proofs, the obfuscated code is executed.
• The proofs are constructed by showing the final states of the two previous parts are the same given the initial states.
![Page 16: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/16.jpg)
Dead Code Insertion Proof
Reg_name(“eax”) = 0Reg_name(“ebx”) = 1Reg_name(“zf”) = 100 In the first part of the dead code equivalence
proof we execute the instructions without the dead code.
In the second part of the proof we execute the instructions with the dead code.
Now we can see that t’’’-pc = s’-pc which means they are semantically equivalent when ignoring the effect the code has on the program counter. We also note that s’ and s’’ are semantically equivalent. We have thus proven the obfuscated and deobfuscate code samples are equivalent.
![Page 17: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/17.jpg)
Applications – Code Reordering
• Code reordering changes the order of instructions while maintaining semantic equivalence.
ASSIGNC $0x2,,%eax
ASSIGNC $1,,%ebx
BOPADD %ebx,%eax,%ebx
ASSIGNC $0x1,-,%ebx
ASSIGNC $2,-,%eax
BOPADD %ebx,%eax,%ebx
mov $2,%eaxmov $1,%ebxadd %eax,%ebx
mov $1,%ebxmov $2,%eaxadd %eax,%ebx
![Page 18: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/18.jpg)
Code Reordering Proof
For the first part of the proof we execute the first instruction sequence.
For the second part of the proof we execute the second instruction sequence.
Thus we see that t’’’-pc = s’’’-pc and therefore the two instruction sequences are semantically equivalent.
![Page 19: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/19.jpg)
Applications – Opaque Predicate Insertion• An opaque predicate is a predicate that always
evaluates to the same value, but this value is hard to determine statically.
xor %eax,%eaxmov $2,%eax
xor %eax,%eaxjnz $0x80482000mov $2,%eax
BOPXOR %eax,%eax,%eax
UMKBOOLIsZero %eax,,%zf
ASSIGNC $2,-,%eax
BOPXOR %eax,%eax,%eax
UMKBOOLIsZero %eax,,%zf
UCJMPIsNotZero %zf,,$target
ASSIGNC $2,-,%eax
![Page 20: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/20.jpg)
Opaque Predicate Insertion Proof
In the first part of the proof we execute the first code sequence.
In the second part of the proof we execute the second code sequence.
We see that register 100 is set which makes the conditional branch in the following instruction use a false condition.
Thus we see that s’’-pc=t’’’’-pc and this proves semantic equivalence
![Page 21: Wire - A Formal Intermediate Language for Binary Analysis](https://reader035.vdocuments.mx/reader035/viewer/2022081413/546b19aeaf79599b248b4c83/html5/thumbnails/21.jpg)
Conclusion
• Wire is a new formal intermediate language.
• Formally defined semantics allow for formal reasoning.
• Wire has demonstrated applications in binary analysis.