raid 2010 hybrid analysis and control of malware barton p. miller [email protected] 1 hybrid analysis...
TRANSCRIPT
RAID 2010RAID 2010
Hybrid Analysis and Control of Malware
Barton P. Miller
1Hybrid Analysis of Program Binaries
1
Kevin A. Roundy
Computer Science
Department
RAID 2010
2
Need for forensic analysis Malware attacks cost billions of
dollars annually[1]
65% of users feel effect of cyber crime[2]
28 days to resolve an average cybercrime[2] 90% of malware resists analysis[3]
7a 77 0e 20
e9 3d e0 09
e8 68 c0 45
be 79 5e 80
89 08 27 c0
73 1c 88 48
6a d8 6a d0
56 4b fe 92
malware binary
Our approach
analyze code before executing it
CFG-based interface for instrumentation
bring malware under analyst’s control
[1] Computer Economics. 2007[2] Norton. 2010[3] McAfee. 2008
RAID 2010
malware
binary7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be 79 5e 80 89 08 27 c0 73 1c 88 48 6a d8 6a d0 56 4b fe 92 57 af 40 0c b6 f2 64 32 f5 07 b6 66 21
Malware analysis factory
Hybrid Analysis of Program Binaries
3
SD-Dyninst
code coverage
instrumentation
network call instrumentati
on
Stack trace at 1st network communication
Control flow graph showing code coverage
Defensive tactics report unpacked code overwritten code control flow obfuscations
Trace of Win API calls
RAID 2010
storm worm
Obfuscated control flow
Hybrid Analysis of Program Binaries
4
Entry Pointobfuscated control
flow
03 04 05 06 07 08 09 0a 0b 0c 0d
e8 03 00 00 00 e9 eb 04 5d 45 55 c3
CALL JMP40d00a 459dd4f7
JMP POP INC PUSH RET40d00e ebp ebp ebp
40d002
CALL ptr[eax]
?
XOR eax,eaxMOV ecx,*[eax]
exceptionhandler
?
handler-based ctrl flow
unpacked code
overwritten code
obfuscated control flow
handler-based ctrl flow
RAID 2010
storm worm
Unpacked code
Hybrid Analysis of Program Binaries
5
Entry Point
7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be
79 5e 80 89 08 27 c0 73 1c 88 48 6a d8
6a d0 56 4b fe 92 57 af 40 0c b6 f2 64
32 f5 07 b6 66 21 0c 85 a5 94 2b 20 fd
5b 95 e7 c2 16 90 14 8a 14 26 60 d9 83
a1 37 1b 2f b9 51 84 02 1c 22 8e 63 01
7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be
79 5e 80 89 08 27 c0 73 1c 88 48 6a d8
6a d0 56 4b fe 92 57 af 40 0c b6 f2 64
32 f5 07 b6 66 21 0c 85 a5 94 2b 20 fd
5b 95 e7 c2 16 90 14 8a 14 26 60 d9 83
a1 37 1b 2f b9 51 84 02 1c 22 8e 63 01
obfuscated control flow
handler-based ctrl flow
unpacked code
overwritten code
RAID 2010
Overwritten code
Hybrid Analysis of Program Binaries
6
Upack packer
7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be
79 5e 80 89 08 27 c0 73 1c 88 48 6a d8
6a d0 56 4b fe 92 57 af 40 0c b6 f2 64
32 f5 07 b6 66 21 0c 85 a5 94 2b 20 fd
5b 95 e7 c2 16 90 14 8a 14 26 60 d9 83
a1 37 1b 2f b9 51 84 02 1c 22 8e 63 01
7a 77 0e 20 e9 3d e0 09 e8 68 c0 45 be
79 5e 80 89 08 27 c0 73 1c 88 48 6a d8
6a d0 56 4b fe 92 57 af 40 0c b6 f2 64
32 f5 07 b6 66 21 0c 85 a5 94 2b 20 fd
5b 95 e7 c2 16 90 14 8a 14 26 60 d9 83
a1 37 1b 2f b9 51 84 02 1c 22 8e 63 01
Entry Point
obfuscated control flow
handler-based ctrl flow
unpacked code
overwritten code
RAID 2010
Factory results for Conficker A
7
initial bootstrap code
packed payload
Hybrid Analysis of Program Binaries
RAID 2010
Hybrid Analysis of Program Binaries
Factory results for Conficker A
8
API func
non executed
block
static block
unpacked block
RAID 2010
Factory results for Conficker A
9Hybrid Analysis of Program Binaries
Stack-walk of Conficker’s communications threadFrame pc=0x7c901231 func: DbgBreakPoint
at 7x901230 [Win DLL]Frame pc=0x10003c83 func: DYNbreakPoint
at 0x100003c70 [instrument.]Frame pc=0x100016f7 func: DYNstopThread
at 0x100001670 [instrument.]Frame pc=0x71ab2dc0 func: select at 0x71ab2dc0 [Win DLL]Frame pc=0x401f34 func: nosym1f058 at 0x41f058 [Conficker]
Instrument select and perform a stack-walk
RAID 2010
Outline
Hybrid Analysis of Program Binaries
10
R.W.
Par.
Related workHybrid analysis algorithmParsingDynamic analysis componentsResults
D.A.
H.A.
Res.
RAID 2010
Non-Defensive Binary Analysis
11Hybrid Analysis of Program Binaries
program binary
Process
Dynamic instrumenter
Static toolstatic code
CFG
un-controlled executionpre-execution
R.W.
parsing value-set
analysis binary
slicinge.g., Dyninst, CodeSurfer-x86
CFG-based API for instrument-ation
e.g., ATOM, Vulcan (static)Dyninst (dynamic)
RAID 2010
Static tool
analysis resistant binary
Hybrid Analysis of Program Binaries 12
obfuscated code
static code
un-controlled execution
Dynamic instrumenter
dynamic code
Process
pre-execution
CFG
R.W.Non-Defensive Binary
Analysis
parsing value-set
analysis binary
slicinge.g., Dyninst, CodeSurfer-x86
CFG-based API for instrument-ation
e.g., ATOM, Vulcan (static)Dyninst (dynamic)
RAID 2010
un-controlled execution
analysis resistant binary
Dynamic instrumenter
13Hybrid Analysis of Program Binaries
obfuscated code
static code
dynamic code
Process
pre-execution post-execution analysis
CFG
Trace analysisTrace
R.W.Non-Defensive Binary
Analysis
Instruction-filter based API for instrument-ation
e.g.: PIN, Valgrind, DynamoRIO, DIOTA
e.g.:Madou et al. 2005Quist, Liebrock. 2009
RAID 2010
un-controlled execution
Our approach
14Hybrid Analysis of Program Binaries
SD-Dyninst
obfuscated code
static code
analysis resistant binary
Parser
pre-execution
Dynamic instrumenter
Parser
(source,dest)
CFG
CFG
dynamic code
Process
R.W.
CFG-based API for instrument-ation
RAID 2010
Outline
15Hybrid Analysis of Program Binaries
Related workHybrid analysis algorithmParsingDynamic analysis componentsResults
D.A.
Res.
R.W.
P.
H.A.
RAID 2010
Code discovery algorithm
16Hybrid Analysis of Program Binaries
Hybrid algorithm:
? ?
Parse from known entry points
Instrument control flow that may lead to new codeResume execution
H.A.
instrument exceptionoverwriteCALL ptr[eax] DIV eax, 0
RAID 2010
Code discovery algorithm
17Hybrid Analysis of Program Binaries
?
Parse from known entry points
Instrument control flow that may lead to new codeResume execution ?
Hybrid algorithm:
H.A.
instrument exceptionoverwriteCALL ptr[eax] DIV eax, 0
RAID 2010
Code discovery algorithm
18Hybrid Analysis of Program Binaries
?
Parse from known entry points
Instrument control flow that may lead to new codeResume execution ?
Hybrid algorithm:
H.A.
instrument exceptionoverwriteCALL ptr[eax] DIV eax, 0
RAID 2010
Code discovery algorithm
19Hybrid Analysis of Program Binaries
?
Parse from known entry points
Instrument control flow that may lead to new codeResume execution ?
Hybrid algorithm:
H.A.
instrument exceptionoverwriteCALL ptr[eax] DIV eax, 0
RAID 2010
Code discovery algorithm
20Hybrid Analysis of Program Binaries
Parse from known entry points
Instrument control flow that may lead to new codeResume execution ?
Hybrid algorithm:
H.A.
instrument exceptionoverwriteCALL ptr[eax] DIV eax, 0
RAID 2010
Outline
21Hybrid Analysis of Program Binaries
Related workHybrid analysis algorithmParsingDynamic analysis componentsResults
D.A.
H.A.
Res.
R.W.
P.
RAID 2010
Standard control-flow traversal[1]
start from known entry points follow control flow to find code
New conservative assumption un-analyzed calls (pointer-based) may not return
New stack tamper detection backwards slice at
return instruction
call 40d00a
pop ebp
inc ebp
push ebp
ret
garbage
22Hybrid Analysis of Program Binaries
Accurate parsingP.
[1] Sites et al., Binary Translation. 1993.
RAID 2010
Outline
23Hybrid Analysis of Program Binaries
Related workHybrid analysis algorithmParsingDynamic analysis componentsResults
H.A.
Res.
R.W.
P.
D.A.
RAID 2010
24
Invalid control transfers
Indirect jumps/calls
Abnormal return instructionspush eax
ret
call 401000
Invalid Region
call ptr [eax]
?
jmp eax
?
Instrumentation-based discoveryD.A.
Hybrid Analysis of Program Binaries
RAID 2010
?
call ptr[eax]
call ptr[eax]
findTarget (ptr[eax])
SD-Dyninst
process
findTarget (ptr[eax])
new target
0x402d8a
resume execution
call ptr[eax]
Instrumentation-based discoveryD.A.
25Hybrid Analysis of Program Binaries
RAID 2010
26
SD-Dyninst
Overwritten code discoveryOverwrite DetectionPossible strategies Check each executed
instruction for changes [1]
Monitor writes to code
Page-level write detection [2]
Remove write permissions from code pages
Write to code causes exception Handle exception[1] Royal et al. PolyUnpack. ACSAC ’06
[2] Maebe, De Bosschere. AADEBUG ’03
code write handler
write
RWE RWER ER E RWER E
D.A.
Hybrid Analysis of Program Binaries
RAID 2010
Hybrid Analysis of Program Binaries
27
write
SD-Dyninst
Overwritten code discoveryWhen to updateCases to consider large incremental
overwrites writes to data writes to own page
R E R ER E
code write handler
CFG update routine
D.A.
RAID 2010
Hybrid Analysis of Program Binaries
28
SD-Dyninst
Overwritten code discoveryWhen to updateCases to consider large incremental
overwrites writes to data writes to own page
Delaying the update until write routine
terminates
R E R ER E
CFG update routine
code write handler
D.A.
write
RAID 2010
Delayed updatesTwo components1. Handle overwrite signal
a) instrument write loopb) copy overwritten pagec) restore write permissions
2. Update CFG when writes enda) remove overwritten and
unreachable blocksb) parse at entry points to
overwritten regionsc) remove write permissions
Hybrid Analysis of Program Binaries
29
SD-Dyninst
Overwritten code discovery
R ER E
code write handler
CFG update routine
D.A.
write
Delayed updatesTwo components1. Handle overwrite signal
a) instrument write loopb) copy overwritten pagec) restore write permissions
2. Update CFG when writes enda) remove overwritten and
unreachable blocksb) parse at entry points to
overwritten regionsc) remove write permissions
cb
RWE
cb
R E
RAID 2010
Hybrid Analysis of Program Binaries
30
SD-Dyninst
Overwritten code discoveryDelayed updatesTwo components1. Handle overwrite signal
a) instrument write loopb) copy overwritten pagec) restore write permissions
2. Update CFG when writes enda) remove overwritten and
unreachable blocksb) parse at entry points to
overwritten regionsc) remove write permissions
R E R ER E RWE
code write handler
CFG update routine
cb
D.A.
write
cb
RAID 2010
Exception State
eip 401002
...
eip 402d8a
31
xor eax,eaxmov ecx,*[eax]push eax ...
Operating System
Handler-based CF obfuscations[1]
[1] Popov, Debray, Andrews. Usenix 2007. Danekhar. http://www.codeproject.com/KB/system/inject2exe.aspx 2005.
Monitored Program
D.A.
access violationhandler
…mov *[ebp+10],eaxmov 402d8a,edxmov edx,*[eax+b8]
Hybrid Analysis of Program Binaries
RAID 2010
Exception State
eip 401002
...
eip 402d8a
32
xor eax,eaxmov ecx,*[eax]push eax ...
Operating System
[1] Popov, Debray, Andrews. Usenix 2007. Danekhar. http://www.codeproject.com/KB/system/inject2exe.aspx 2005.
Monitored Program
D.A.
access violationhandler
…mov *[ebp+10],eaxmov 402d8a,edxmov edx,*[eax+b8]
Resolving handler-based CF
access violationhandler
…mov *[ebp+10],eaxmov 402d8a,edxmov edx,*[eax+b8]
SD-Dyninst
instrument exit
analyze code at new target
Hybrid Analysis of Program Binaries
RAID 2010
33
Outline
Related workHybrid analysis algorithmParsingDynamic analysis componentsResults
R.W.
P.
D.A.
Res.
H.A.
Hybrid Analysis of Program Binaries
RAID 2010
yes
yes
yes
yes
yes
yes
yes
yes
yes
34
Fully analyzed packed programs
Packer
Malware market share[1]
0.13%MEW
0.17%WinUPack
0.33%Yoda's Protector
0.37%Armadillo
0.43%Asprotect
1.26%FSG
1.29%Aspack
1.74%nPack
2.08%Upack
2.59%PECompact
2.95%Themida
4.06%EXECryptor
6.21%PolyEnE
9.45%UPX
0.89%Nspack
Res.
Self check-
summing
yes
yes
Self-modifyin
g
yes
yes
yes
yes
yes
yes
Exception-based
ctrl
yes
yes
yes
yes
yes
Obfuscated
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
yes
[1] Packer (r)evolution. Panda Research, 2008. Two-month average Feb-March 2008.
RAID 2010
Self-checksumming techniques
Hybrid Analysis of Program Binaries
[1] Packer (r)evolution. Panda Research, 2008.Two-month average Feb-March 2008.
Fully analyzed packed programs
Packer
Malware market share[1]
0.13%MEW
0.17%WinUPack
0.33%Yoda's Protector
0.37%Armadillo
0.43%Asprotect
1.26%FSG
1.29%Aspack
1.74%nPack
2.08%Upack
2.59%PECompact
2.95%Themida
4.06%EXECryptor
6.21%PolyEnE
9.45%UPX
SD-Dynins
t
yes
yes
yes
yes
yes
yes
yes
yes
yes
0.89%Nspack yes
Time to unpack
3.9
23.6
1.4
4.4
1.5
23.5
3.2
1.2
0.5
2.7
uninstrumented times are about .02 secs
unoptimized overwrite detection
expensive overwrite detection
Res.
35
RAID 2010
Instrumentation costs
36Hybrid Analysis of Program Binaries
Res.
Packer
Pre-payload execution time Instrumented locations
SD-Dyninst
Renovo
Saffron Intel-PIN
Ether Unpack
SD-Dyninst
Renovo
Saffron Intel-PIN
UPX 0.5 5 2.7 7.6 6 2,278 4,526
Aspack 4.4 5 fail 18.7 34 2,045 4,141
FSG 1.6 8 1.4 31.1 14 18,822 31,854
WinUpack
23.6 8 23.5 67.8 23 18,826 32,945
MEW 4.0 6 fail 150.5 22 21,186 35,466
RAID 2010
Conclusion
37Hybrid Analysis of Program Binaries
Analysis before execution allows for Understanding & control of before execution Selective monitoring Build-your-own analysis factory
Ongoing workHandling self-checksumming codeReleasing Dyninst w/ SD-Dyninst insidehttp://www.paradyn.org/