rozzle de-cloaking internet malware

37
Rozzle De-Cloaking Internet Malware Ben Livshits with Clemens Kolbitsch, Ben Zorn, Christian Seifert, Paul Rebriy Microsoft Research

Upload: candice-bowers

Post on 02-Jan-2016

34 views

Category:

Documents


1 download

DESCRIPTION

Rozzle De-Cloaking Internet Malware. Ben Livshits with Clemens Kolbitsch, Ben Zorn, Christian Seifert, Paul Rebriy Microsoft Research. Static – Dynamic Analysis Spectrum. + High precision - High overhead - Low coverage. + High precision + High scalability + High coverage - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Rozzle De-Cloaking Internet Malware

RozzleDe-Cloaking Internet Malware

Ben Livshits

with Clemens Kolbitsch, Ben Zorn,Christian Seifert, Paul Rebriy

Microsoft Research

Page 2: Rozzle De-Cloaking Internet Malware

2

Static – Dynamic Analysis Spectrum

Entirely static Entirely runtime

+ High coverage- Low precision- May not scale

+ High precision- High overhead- Low coverage

Symbolic execution

DART, SAGE, KLEE+ High precision+ Scales reasonably well? High coverage

Multi-execution

+ High precision+ High scalability+ High coverage- Watch out for resource usage

Page 3: Rozzle De-Cloaking Internet Malware

3

Blacklisting Malware in Search Results

Page 4: Rozzle De-Cloaking Internet Malware

4

Motivation

Haha, I cannot belive this guy actually does this!! LOL

Page 5: Rozzle De-Cloaking Internet Malware

5

Drive-by Malware Detection Landscape

runtime

static

online(browser-based)

offline (honey-monkey)

Nozzle[Usenix Security ’09]

Zozzle[Usenix Security ’11]

• Instrumented browser• Looks for heap sprays• Moderately high overhead

• Mostly static detection• Low overhead, high reach• Can be deployed in browser

Page 6: Rozzle De-Cloaking Internet Malware

6

Search Engine Crawling

Page 7: Rozzle De-Cloaking Internet Malware

7

detect crawler

Server side

Malware Cloaking

• Source IP• Request (User-

Agent, Browser ID)detect vulnerable target

Client side • Fingerprint browser & plugin versions

• Do this using JavaScript

<script> if (navigator.userAgent.indexOf(‘IE 6’)>=0) { var x=unescape(‘%u4149%u1982%u90 […]’); eval(x); }</script>

Page 8: Rozzle De-Cloaking Internet Malware

8

Client-side Cloaking Defense

TraditionalRozzle

• Single browser, one visit

• Appear as vulnerable as possible

Page 9: Rozzle De-Cloaking Internet Malware

• Background & Motivation: Cloaking

• Detecting Internet Malware

• Rozzle: Fighting Evasion

• Experiments

Overview

Page 10: Rozzle De-Cloaking Internet Malware

10

Detecting Internet Malware

Dynamic Detection

Nozzle

Static Detection

Zozzle

Nozzle: A Defense Against Heap-spraying Code Injection Attacks

[Usenix Security 2009]• Scan heap allocated objects to identify valid x86 code

sequences

Zozzle: Low-overhead Mostly Static JavaScript Malware Detection

[Usenix Security 2011]• Bayesian classification of hierarchical features of the

JavaScript abstract syntax tree. In the browser (after unpacking)

6/1/2

011

6/3/2

011

6/5/2

011

6/7/2

011

6/9/2

011

6/11/2

011

6/13/2

011

6/15/2

011

6/17/2

011

6/19/2

011

6/21/2

011

6/23/2

011

6/25/2

011

6/27/2

011

6/29/2

011

Page 11: Rozzle De-Cloaking Internet Malware

11

Nozzle: Runtime Heap Spraying Detection

Normalized attack surface (NAS)

good

bad

Page 12: Rozzle De-Cloaking Internet Malware

12

Object Surface Area Calculation

• Each block starts with its own size as weight

• Weights are propagated forward with flow

• Invalid blocks don’t propagate

• Iterate until a fixpoint is reached

• Compute block with highest weight

12

An example object from visiting google.com

4

2

4

2

2

310

14

4

12

6

912

14

12

12

12

15

Page 13: Rozzle De-Cloaking Internet Malware

13

// Shellcodevar shellcode=unescape(‘%u9090%u9090%u9090%u9090%uceba%u11fa%u291f%ub1c9%udb33 […]′);bigblock=unescape(“%u0D0D%u0D0D”);headersize=20;shellcodesize=headersize+shellcode.length;while(bigblock.length<shellcodesize){bigblock+=bigblock;}heapshell=bigblock.substring(0,shellcodesize);nopsled=bigblock.substring(0,bigblock.length-shellcodesize);while(nopsled.length+shellcodesize<0×25000){nopsled=nopsled+nopsled+heapshell}

// Sprayvar spray=new Array();for(i=0;i<500;i++){spray[i]=nopsled+shellcode;}

// Triggerfunction trigger(){ var varbdy = document.createElement(‘body’); varbdy.addBehavior(‘#default#userData’); document.appendChild(varbdy); try { for (iter=0; iter<10; iter++) { varbdy.setAttribute(‘s’,window); } } catch(e){ } window.status+=”;}document.getElementById(‘butid’).onclick();

shellcode unescapebigblock unescape %u0D0D%u0D0D shellcodesize shellcode.length

bigblock.substring shellcodesizenopsled bigblock.substring bigblock.length shellcodesize nopsled.length shellcodesize nopsled nopsled nopsled heapshell

spray spray nopsled shellcode

varbdy.addBehavior #default#userData document.appendChild varbdy

varbdy.setAttribute

butid

Zozzle: Static/Statistical Detection

Page 14: Rozzle De-Cloaking Internet Malware

14

Naïve Bayes Classification

* P(malicious)

Feature P(malicious)

string:0c0c 0.99

function:shellcode 0.99

loop:memory 0.87

Function:ActiveX 0.80

try:activex 0.41

if:msie 7 0.33

function:Array 0.21

function:unescape 0.45

loop:+= 0.55

loop:nop 0.95

eval(""+O(2369522)+O(1949494)+O(2288625)+O(648464)+O(2304124)+O(2080995)+O(2020710)+O(2164958)+O(2168902)+O(1986377)+O(2227903)+O(2005851)+O(2021303)+O(646435)+O(1228455)+O(644519)+O(2346826)+O(2207788)+O(2023127)+O(2306806)+O(1983560)+O(1949296)+O(2245968)+O(2028685)+O(809214)+O(680960)+O(747602)+O(2346412)+O(1060647)+O(1045327)+O(1381007)+O(1329180)+O(745897)+O(2341404)+O(1109791)+O(1064283)+O(1128719)+O(1321055)+O(748985)+...);

Page 15: Rozzle De-Cloaking Internet Malware

• Background & Motivation: Cloaking

• Detecting Internet Malware

• Rozzle: Fighting Evasion

• Experiments

Overview

Page 16: Rozzle De-Cloaking Internet Malware

16

Environment Fingerprinting Prevents Detection

Nozzle

Zozzle

<script> if (navigator.userAgent.indexOf(‘IE 6’)>=0) { var x=unescape(‘%u4149%u1982%u90 […]’); eval(x); }</script>

<script> var adobe=new ActiveXObject(‘AcroPDF.PDF’); var adobeVersion=adobe.GetVariable (‘$version’); if (navigator.userAgent.indexOf(‘IE 6’)>=0 && adobeVersion == ’9.1.3’) { var x=unescape(‘%u4149%u1982%u90 […]’); eval(x); }</script>

Is this a practical problem for our malware detectors?

• In 7.7% of JS files, code gets a reference to environment

• In 1.2%, code branches on such sensitive values

• 89.5% of malicious JS branches on such values

Page 17: Rozzle De-Cloaking Internet Malware

17

Typical Malware Cloaking

Page 18: Rozzle De-Cloaking Internet Malware

18

More Complex Fingerprinting

Fingerprint: Q0193807F127J14

Page 19: Rozzle De-Cloaking Internet Malware

19

Avoiding Dynamic Crawlers

Page 20: Rozzle De-Cloaking Internet Malware

20

Avoiding Static Detection

Page 21: Rozzle De-Cloaking Internet Malware

21

How to Allocate Detection Resources?

1.41.52.0

9.09.1

10.0

89

10

Rozzle

Clearly does not scaleHow many resources should be allocated to filter malicious sites?What if the site simply is

not malicious?

Page 22: Rozzle De-Cloaking Internet Malware

22

• Execute individual branches sequentially to increase coverage

• Static analysis: Retain much of runtime precision

• Branch on environment-sensitive checks

• No forking• No snapshotting

• Symbolic execution: re-verting to a previous state similar to running multiple browsers in parallel

Rozzle

Multi-path execution framework for JavaScript

• Multiple browser profiles on single machine

What it is/does

• Cluster of machines: too resource consuming

What it is not

Page 23: Rozzle De-Cloaking Internet Malware

23

Multi-Execution in Rozzle

<script> var adobe=new ActiveXObject(‘AcroPDF.PDF’); var adobeVersion=adobe.GetVariable (‘$version’); if (navigator.userAgent.indexOf(‘IE 7’)>=0 && adobeVersion == ’9.1.3’) { var x=unescape(‘%u4149%u1982%u90 […]’); eval(x); } else if (adobeVersion == ’8.0.1’) { var x=unescape(‘%u4073%u8279%u77 […]’); eval(x); } … </script>

Page 24: Rozzle De-Cloaking Internet Malware

24

Challenges

Consistent updatesof variables

Introduce concept of Symbolic Memory:• Multiple concrete values associated with one variable• New JavaScript data type Symbolic

• 3 subtypes• symbolic value / formula / conditional

• Weak updates for conditional assignments

Page 25: Rozzle De-Cloaking Internet Malware

25

Symbolic Memory

<script> var userAgentString=0; userAgentString = navigator.userAgent; var isIE; isIE = (userAgentString.indexOf(‘IE’)>=0); …

Variable : userAgentStringValue : 0Symbolic : no

Variable : userAgentStringValue : < navigator.userAgent >Symbolic : yes

Hooks into engine, return symbolic values for• Sensitive global objects: navigator.userAgent,

navigator.platform, …• Sensitive functions: ScriptEngine(), allocation of

ActiveXObject, …

Variable : isIEValue : < navigator.userAgent. indexOf(‘IE’) >= 0 >Symbolic : yes

Page 26: Rozzle De-Cloaking Internet Malware

26

Symbolic Memory

<script> var isIE=false; var isIE7=false; if (navigator.userAgent.indexOf(‘IE’)>=0) { isIE=true; if (navigator.userAgent.indexOf(‘IE 7’)>=0) { isIE7=true; } } if (isIE7) { …

Variable : isIE7

Value : false

Symbolic : no

Variable : isIEValue : falseSymbolic : no

Current path predicateValue : < nav.userAgent.indexOf(..)>=0 >Symbolic : yes

Variable : isIEValue : < nav.userAgent.indexOf(…)>=0 > ? true : falseSymbolic : yes

Current path predicateValue : < nav.userAgent.indexOf(..)>=0 > &&

< nav.userAgent.indexOf(..)>=0 >Symbolic : yes

Variable : isIE7

Value : <…>

Symbolic : yes

Page 27: Rozzle De-Cloaking Internet Malware

27

Symbolic Memory

indexOf0

‘IE’navigator

.userAgent

Variable : isIEValue : < nav.userAgent.indexOf(…)>=0 > ? true : falseSymbolic : yes

?

>=true false

Page 28: Rozzle De-Cloaking Internet Malware

28

Challenges

Consistent updatesof variables Handling loops

Indirect control flow: Exception

handlingI/O

Consistent updatesof variables Handling loops

Indirect control flow: Exception

handling

• Loop condition might be symbolic, number of iterations unknown!

• Unroll k iterations (currently k=1)

• Instruction pointer checks (endless loops/recursion)

• try-blocks regularly used to test availabilityof plugins (ActiveXObjects)

• catch-blocks set default values, cannot be ignored

• Execute catch-statement similar to else branch, add virtual if-condition: “ActiveX supported”

• Handling symbolic values when they are…— … written to the DOM— … sent to a remote server— … executed (as part of eval)

• Lazy evaluation to concrete values (only when needed)

Page 29: Rozzle De-Cloaking Internet Malware

29

Experiments

Offline• Controlled Experiment

• 7x more Nozzle detections

Online • Similar to Bing crawling

• Almost 4x more Nozzle detections

• 10.1% more Zozzle detections

Overhead• 1.1% runtime overhead

• 1.4% memory overhead

Page 30: Rozzle De-Cloaking Internet Malware

30

Zozzl

e70k

-2,000 0 2,000 4,000 6,000 8,000 10,000 12,000

10,381

Shared New Detections Errors

+595% runtime detections

Offline• Exploits hosted on our server

• Minimize external influences

• 70,000 known malicious scripts (flagged by Zozzle)

• Fully unrolled/de-obfuscated exploits, wrapped in HTML

Page 31: Rozzle De-Cloaking Internet Malware

31

• List of URLs recently crawled by Bing

• Pre-filtering: Increase likelihood of finding malicious sites

• 57,000 URLs over the last week

Online• Dedicated machine for crawling the web

• Clone of the Bing malware crawler

Nozzle Detections

24

50174

+203% runtime detections

Zozzle Detections225

2,510

156

Page 32: Rozzle De-Cloaking Internet Malware

33

Overhead

• 500 randomly selected URLs crawled by Bing

• Slightly biased towards malicious sites (pre-filtering)

Memory Overhead

Median: 0.6%

80th Percentile: 1.4%

Runtime Overhead

Median: 0.0%

80th Percentile: 1.1%

• Average numbers of 3 repeated runs per configuration

• Base runs (cookie setup)

Page 33: Rozzle De-Cloaking Internet Malware

34

Overhead Numbers

0.7000.736

0.7720.808

0.8440.880

0.9160.952

0.9881.024

1.0601.096

1.1321.168

1.2041.240

1.2761.312

1.3481.384

1.4201.456

1.4921.528

1.5641.600

1.6361.672

1.7081.744

1.7801.816

1.8521.888

1.9241.960

1.9962.032

2.0682.104

2.140

20

40

60

80

100

6 2 2 3 2 1 1 2 3

5 7 6

12

88

70

25

13

5 4 2 2 3 3 2 1 1 1 1 1 1 2 2 2 1 1 1 1 2 1 2 1 1 1

1.0

Page 34: Rozzle De-Cloaking Internet Malware

35

For most sites, virtually no overhead

Take Away

Tremendous impacton runtime detector

due to increasedpath coverage

Visible impact onstatic detector

More important with growing trend to

obfuscation

Also improves other existing tools: Exposes detectors to additional site content

Page 35: Rozzle De-Cloaking Internet Malware

36

if (navigator.userAgent.toLowerCase().indexOf("\x6D"+"\x73\x69\x65"+"\x20\x36")>0)

document.write("<iframe src=x6.htm></iframe>");if (navigator.userAgent.toLowerCase().indexOf(

"\x6D"+"\x73"+"\x69"+"\x65"+"\x20"+"\x37")>0) document.write("<iframe src=x7.htm></iframe>");

try { var a; var aa=new ActiveXObject("Sh"+"ockw"+"av"+"e"+"Fl"+[…]);} catch(a) { } finally { if (a!="[object Error]") document.write("<iframe src=svfl9.htm></iframe>");}try { var c; var f=new ActiveXObject("O"+"\x57\x43"+"\x31\x30\x2E\x53"+[…]);} catch(c) { } finally { if (c!="[object Error]") { aacc = "<iframe src=of.htm></iframe>"; setTimeout("document.write(aacc)", 3500);} }

Online

… an example pulled from our DB…

"\x6D"+"\x73\x69\x65"+"\x20\x36"

="msie 6"

"\x6D"+"\x73"+"\x69"+"\x65"+"\x20"+"\x37"

="msie 7"

"O"+"\x57\x43"+"\x31\x30\x2E\x53"+"pr"+"ea"+"ds"+"he"+"et"

="OWC10.Spreadsheet"

Page 36: Rozzle De-Cloaking Internet Malware

39

Summary• Rozzle: Multi-profile execution– Look as vulnerable as possible– Improve existing malware detectors

• Implementation:– Implemented on top of IE9’s JavaScript engine– Still some flaws, promising results

• Idea of multi-execution is promising in other contexts

Page 37: Rozzle De-Cloaking Internet Malware

40

Static – Dynamic Analysis Spectrum

Entirely static Entirely runtime

+ High coverage- Low precision- May not scale

+ High precision- High overhead- Low coverage

Symbolic execution

DART, SAGE+ High precision+ Scales reasonably well? High coverage

Multi-execution

+ High precision+ High scalability+ High coverage- Watch out for resource usage