behavior-based malware detection

139
Behavior-Based Malware Behavior-Based Malware Detection Detection Somesh Jha University of Wisconsin, Madison

Upload: lola

Post on 19-Jan-2016

67 views

Category:

Documents


1 download

DESCRIPTION

Behavior-Based Malware Detection. Somesh Jha University of Wisconsin, Madison. The Malware Problem. Host-based malicious-code detection: New program arrives an end-host system. Need to identify whether the program is malicious or not. Viruses, trojans, backdoors, bots, adware, spyware,. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Behavior-Based Malware Detection

Behavior-Based Malware Behavior-Based Malware DetectionDetection

Somesh JhaUniversity of Wisconsin,

Madison

Page 2: Behavior-Based Malware Detection

The Malware ProblemThe Malware ProblemHost-based malicious-code detection:• New program arrives an end-host

system.• Need to identify whether the

program is malicious or not.

Viruses, trojans, backdoors, bots, adware, spyware, ...

June 2011 Somesh Jha: Behavior-Based Malware Detection 2

Page 3: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 3

Malware: A Threat Malware: A Threat AssessmentAssessment

Win32 viruses and other malware

445 687 9941,702

4,496

7,360

10,866

0

3,000

6,000

9,000

12,000

J an.-J une

2002

J uly-Dec.

2002

J an.-J une

2003

J uly-Dec.

2003

J an.-J une

2004

J uly-Dec.

2004

J an.-J une

2005

Tota

l num

ber

Total viruses and worms

Total families

Source: Symantec Research

Page 4: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 4

New Win32 virus and worm variants 2002-2005

445 687 9941,702

4,496

7,360

10,866

141 184 164 171 170N/ A N/ A0

3,000

6,000

9,000

12,000

J an.-J une2002

J uly-Dec.2002

J an.-J une2003

J uly-Dec.2003

J an.-J une2004

J uly-Dec.2004

J an.-J une2005

Period

Tota

l num

ber

Total viruses and worms

Total families

New Win32 virus and worm variants 2002-2005

445 687 9941,702

4,496

7,360

10,866

0

3,000

6,000

9,000

12,000

J an.-J une2002

J uly-Dec.2002

J an.-J une2003

J uly-Dec.2003

J an.-J une2004

J uly-Dec.2004

J an.-J une2005

Period

Tota

l num

ber

Total viruses and worms

Total families

Malware: A Threat Malware: A Threat AssessmentAssessment

Source: Symantec Research

Page 5: Behavior-Based Malware Detection

Symantec Threat Report Symantec Threat Report 20102010• Highlights from the report

• See– http://www.symantec.com/en/uk/

business/theme.jsp?themeid=threatreport

June 2011 Somesh Jha: Behavior-Based Malware Detection 5

Page 6: Behavior-Based Malware Detection

DemographicsDemographics• Where do attacks emerge?• US is still top on the list

– 19% in 2009 (23% in 2008)

• Emergence of other countries in the top 10 list– Brazil and India– Emergence of these new countries

related to increased internet connectivity in these countries

Page 7: Behavior-Based Malware Detection

Attack TargetsAttack Targets• Who are the attackers targeting?• Old news

– Spam, identity theft, …– Still important factors

• New Trend– It looks like hackers are now targeting

enterprises and government organizations

– The goal seems to theft of sensitive data or espionage

– Stuxnet is most sophisticated example of this attack

Page 8: Behavior-Based Malware Detection

Vulnerabilities ExploitedVulnerabilities Exploited• What vulnerabilities are attackers

exploiting?• It seems like web-based attacks are

the most popular– Mozilla Firefox seems to be the most

vulnerable

• The most common Web-based attack in 2009 was related to malicious PDF activity– Exploits vulnerabilities in “plug ins” that

read the attached PDF file

Page 9: Behavior-Based Malware Detection

Malware TrendsMalware Trends• What types of malware were most

prevalent?• Trojans rule!

– Out of 10 malware families detected 6 were Trojans (2 worms, 1 back door, and 1 virus)

• Tool kits for creating malware and variants have matured– Popular kits: SpyEye, Fragus, Zues, …– In 2009 Symantec encountered 90,000

variants of malware variants created by the Zues toolkit

Page 10: Behavior-Based Malware Detection

Take AwaysTake Aways• Demographics of attack origins is

expanding• Web is the major vector for attack• Trojans are the most prevalent form

of malware• Creating malware variants is easy

because the toolkits have matured• Enterprises and organizations are

going to be increasingly targeted

Page 11: Behavior-Based Malware Detection

Market TrendsMarket Trends• Security market will have a rapid

growth in other countries (e.g., Brazil and India)– Reason: Demographics of attack origin

• Enterprise market will expand– Reason: Enterprises are being targeted

by the attackers

• Other technologies for detection and remediation will become important

Page 12: Behavior-Based Malware Detection

DefensesDefenses• Simple measures

– Having policies in an enterprise can go a long way

– For example, don’t open a PDF attachment if you don’t recognize the sender

• Signature-based detection is not enough– In 2009 Symantec created 2,895,000

signatures– In 2008 they created 1,691,323 signatures– These detectors need to be complemented

with other types of detection

Page 13: Behavior-Based Malware Detection

Defenses Defenses • Complementing technologies

– Behavior-based and reputation-based detection can complement signature-based detection

– These complementing defenses can keep the number of signatures in check

– These two technologies are mentioned throughout the report

• Data breaches– Keep confidential data secure even if an

enterprise gets compromised– There are several solutions in the market– Remediation solutions will also gain traction

Page 14: Behavior-Based Malware Detection

Key DefinitionsKey DefinitionsVariants : New strains of viruses that

borrow code, to varying degrees, directly from other known viruses.

Source: Symantec Security Response Glossary

Family: a set of variants with a common code base.

Beagle family has 197 variants (as of Nov. 30).Warezov family has 218 variants (as on Nov.

27).

Page 15: Behavior-Based Malware Detection

The Malware ProblemThe Malware Problem• Malware writers use any and all

techniques to evade detection.– Obfuscation / packing / encryption– Remote code updates– Rootkit-based hiding

• Detectors use technology from 15 years ago: signature-based detection.

Page 16: Behavior-Based Malware Detection

lea eax, [ebp+Data]push offset aServices_exepush eaxcall _strcatpop ecxlea eax, [ebp+Data]pop ecxpush edipush eaxlea eax, [ebp+ExistingFileName]push eaxcall ds:CopyFileA

Signature-Based DetectionSignature-Based Detection8D 85 D8 FE FF FF68 78 8E 40 0050E8 69 06 00 00598D 85 D8 FE FF FF5957508D 85 D4 FD FF FF50FF 15 C0 60 40 00

Signature

• Signatures (aka scan-strings) are the most common malware detection mechanism.

Page 17: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 17

Signature Detection Does Not Signature Detection Does Not ScaleScaleOne signature for one malware

instance.

Page 18: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 18

Current Signature Current Signature ManagementManagementMcAfee: release daily updates

– Trying to move to hourly “beta” updates

DAT File #

DateThreats

DetectedNew Threats

AddedThreats Updated

4578 Sep. 09 147,382 22 188

4579 Sep. 12 147,828 27 231

4580 Sep. 13 148,000 11 236

4581 Sep. 14 148,368 42 140

4582 Sep. 15 148,721 16 203

4583 Sep. 16 149,050 18 117

Source: McAfee DAT Readme

Page 19: Behavior-Based Malware Detection

Huge Signature DatabasesHuge Signature Databases• Recently, McAfee announced the

addition of the 200,000th signature.– More signatures than files on a standard

Windows machine (approx. 100k).

• McAfee notes that:“Good family detection becomes crucial for a less worrisome experience on the Internet.”

Source: McAfee Avert Labs

Page 20: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 20

Roadmap to Better Roadmap to Better DetectionDetection• Make the malware writer’s job as

hard as possible.

• Detect malware families,not individual malware instances.

• Catch behavior,not syntactic artifacts.

Page 21: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 21

OutlineOutline• Introduction• Threat Model• Evaluation of Current Detectors• Behavior-Based Detection• Future Directions

Page 22: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 22

Threat ModelThreat Model• Malware writers craft their programs

so to avoid detection.

Two common evasion techniques:– Program Obfuscation

(Preserves malicious behavior)

– Program Evolution(Enhances malicious behavior)

Page 23: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 23

Obfuscations for EvasionObfuscations for EvasionNop insertionRegister renamingJunk insertionInstruction reorderingEncryptionCompressionReversing of branch conditionsEquivalent instruction substitutionBasic block reordering...

Page 24: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 24

lea eax, [ebp+Data]push offset aServices_exepush eaxcall _strcatpop ecxlea eax, [ebp+Data]pop ecxpush edipush eaxlea eax, [ebp+ExistingFileName]push eaxcall ds:CopyFileA

lea eax, [ebp+Data]noppush offset aServices_exenopnoppush eaxcall _strcatnopnopnoppop ecxlea eax, [ebp+Data]pop ecxpush edipush eaxnoplea eax, [ebp+ExistingFileName]push eaxcall ds:CopyFileA

Evasion Through Evasion Through Junk Junk InsertionInsertion

8D 85 D8 FE FF FF68 78 8E 40 0050E8 69 06 00 00598D 85 D8 FE FF FF5957508D 85 D4 FD FF FF50FF 15 C0 60 40 00

Signature

Page 25: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 25

lea eax, [ebp+Data]noppush offset aServices_exenopnoppush eaxcall _strcatnopnopnoppop ecxlea eax, [ebp+Data]pop ecxpush edipush eaxnoplea eax, [ebp+ExistingFileName]push eaxcall ds:CopyFileA

lea eax, [ebp+Data]jmp label_one

label_two:lea eax, [ebp+Data]...push eaxcall ds:CopyFileAjmp label_three

label_one:...call _strcat...jmp label_two

label_three: ...

Evasion Through Evasion Through ReorderingReordering8D 85 D8 FE FF FF90*68 78 8E 40 0090*5090*E8 69 06 00 0090*5990*...90*5090*FF 15 C0 60 40 00

Regex Signature

Page 26: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 26

lea eax, [ebp+Data]jmp label_one

label_two:lea eax, [ebp+Data]...push eaxcall ds:CopyFileAjmp label_three

label_one:...call _strcat...jmp label_two

label_three: ...

Evasion Through Evasion Through EncryptionEncryption8D 85 D8 FE FF FF90*68 78 8E 40 0090*5090*E8 69 06 00 0090*5990*...90*5090*FF 15 C0 60 40 00

Regex Signature

lea esi, data_areamov ecx, 37

again: xor byte ptr [esi+ecx], 0x01 loop again jmp data_area . . .data_area: db 8C 84 D9 FF ... . . . db FE 14 C1 61 ...

Page 27: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 27

Evasion Through EvolutionEvasion Through Evolution• Malware writers are good at software

engineering:– Modular designs– High-level languages– Sharing of exploits, payloads, and

evasion techniques

Example:Beagle e-mail virus gained additional functionality with each version.

Page 28: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 28

Beagle EvolutionBeagle EvolutionSource: J. Gordon, infectionvectors.com

• More than 100 variants, not counting associated components.

BeagleMass mailer

MitgliederSpam relay

ToosoWeakens security

LodearUpdate Engine

MonikeyPropagation Mgr

LDPinchPassword Theft

TarnoPassword Theft

FormgliederBank Info Theft

Page 29: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 29

OutlineOutline• Introduction• Threat Model• Behavior-Based Detection• Mining Malicious Behaviors

Page 30: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 30

• Start with a set of known viruses.• Create obfuscated versions:

– Reordering– Register/variable renaming– Encryption

• Measure resilience to obfuscation (detection rate of obfuscated versions)

Empirical StudyEmpirical Study [Christodorescu & Jha, ISSTA [Christodorescu & Jha, ISSTA 2004]2004]

Page 31: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 31

Evaluation Goal: Evaluation Goal: ResilienceResilienceQuestion 1:•How resistant is a virus scanner to

obfuscations or variants of known worms?

Question 2:•Using the limitations of a virus

scanner, can a blackhat determine its detection algorithm?

Page 32: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 32

OutlineOutline• Introduction• Threat Model• Evaluation of Current Detectors• Behavior-Based Detection• Future Directions

Page 33: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 33

Describing Malicious BehaviorDescribing Malicious Behavior[Christodorescu et al., Oakland 2005][Christodorescu et al., Oakland 2005]

• Informal description:“Mass-mailing virus”

• A more precision description:“A program that:

sends messages containing copies ofitself,using the SMTP protocol,in a large number over a short

periodof time.”

Page 34: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 34

push 10hpush eaxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send

push 10hpush eaxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send

• A specification of behavior.

MalspecMalspec

= +connect(Y);

send(Z,T);

connect(Y);

send(Z,T);

Syntactic info

“HELO”

Y

Z T

Semantic info

Malspec

Malware Instance(Netsky.B)

Page 35: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 35

Obfuscation Preserves Obfuscation Preserves BehaviorBehavior

push 10hpush eaxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send

push 10hpush eaxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send

push 10hnoppush eaxxor eax, ebxxor eax, ebxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush eaxpop eaxpush ecxpush edicall send

push 10hnoppush eaxxor eax, ebxxor eax, ebxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush eaxpop eaxpush ecxpush edicall send

• Junk insertion + code reordering.

Page 36: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 36

Obfuscation Preserves Obfuscation Preserves BehaviorBehavior

• Junk insertion + code reordering.

push 10hpush eaxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send

push 10hpush eaxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send

push 10hnoppush eaxjmp L1L4: push ecxpush edijmp L5L2: xor eax, ebxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush eaxjmp L3L1: xor eax, ebxjmp L2L3: pop eaxjmp L4L5: call send

push 10hnoppush eaxjmp L1L4: push ecxpush edijmp L5L2: xor eax, ebxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush eaxjmp L3L1: xor eax, ebxjmp L2L3: pop eaxjmp L4L5: call send

Page 37: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 37

push 10hnoppush eaxjmp L1L4: push ecxpush edijmp L5L2: xor eax, ebxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush eaxjmp L3L1: xor eax, ebxjmp L2L3: pop eaxjmp L4L5: call send

push 10hnoppush eaxjmp L1L4: push ecxpush edijmp L5L2: xor eax, ebxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush eaxjmp L3L1: xor eax, ebxjmp L2L3: pop eaxjmp L4L5: call send

Obfuscation Preserves Obfuscation Preserves BehaviorBehavior

• Junk insertion + code reordering.

push 10hpush eaxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send

push 10hpush eaxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send

Page 38: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 38

Evolution Preserves Evolution Preserves BehaviorBehavior

• Add error handling.

push 10hpush eaxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send

push 10hpush eaxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send

push 10hpush eaxpush edicall connect... ; check return codejnz error_handler... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send... ; check return codejnz error_handler...error_handler:...

push 10hpush eaxpush edicall connect... ; check return codejnz error_handler... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send... ; check return codejnz error_handler...error_handler:...

Page 39: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 39

Evolution Preserves Evolution Preserves BehaviorBehavior

• Add error handling.

push 10hpush eaxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send

push 10hpush eaxpush edicall connect... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send

push 10hpush eaxpush edicall connect... ; check return codejnz error_handler... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send... ; check return codejnz error_handler...error_handler:...

push 10hpush eaxpush edicall connect... ; check return codejnz error_handler... ; compose SMTP

; command "HELO ..."push eaxpush ecxpush edicall send... ; check return codejnz error_handler...error_handler:...

Page 40: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 40

Detection Using MalspecsDetection Using MalspecsStatic detection:

Given an executable binary, check whether it satisfies the malspec.

φ

Malspec

Just like model checking, but...

• Malicious code allows no assumptions to be made

• Real-time constraints

Page 41: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 41

A Behavior-Based DetectorA Behavior-Based Detector• Match the syntactic constructs, then

check the semantic information.

connect(Y);

send(Z,T);

connect(Y);

send(Z,T);

Syntactic info

“HELO”

Y

Z T

Semantic info

Malspec

Page 42: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 42

push 10hpush eaxpush [ebp+s]call connect...push ebxlea eax, [ebp+s]push eaxcall send_email

push 10hpush eaxpush [ebp+s]call connect...push ebxlea eax, [ebp+s]push eaxcall send_email

Check the Semantic InfoCheck the Semantic InfoProgram (Netsky.O): connect(Y);

send(Z,T);

connect(Y);

send(Z,T);

Syntactic info

“HELO”

Y

Z T

Semantic info

Malspec... ; compose SMTP; command

“HELO ..."lea eax, [ebp+arg1]push eaxlea eax, [ebp+buffer]push eaxcall SMTP_send_and_rcv

... ; compose SMTP; command

“HELO ..."lea eax, [ebp+arg1]push eaxlea eax, [ebp+buffer]push eaxcall SMTP_send_and_rcv

push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send

push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send

push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send

push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send

send_email()

SMTP_send_and_rcv()

Page 43: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 43

Doeseax before == ebx after for the code

sequence:push eaxcall foomov ebx, [ebp+4]

?

Check with the OracleCheck with the Oracle• Assume we have an oracle that can

validate value predicates.

Yes.

Page 44: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 44

push 10hpush eaxpush [ebp+s]call connect...push ebxlea eax, [ebp+s]push eaxcall send_email

push 10hpush eaxpush [ebp+s]call connect...push ebxlea eax, [ebp+s]push eaxcall send_email

Check the Semantic InfoCheck the Semantic InfoProgram (Netsky.O): connect(Y);

send(Z,T);

connect(Y);

send(Z,T);

Syntactic info

“HELO”

Y

Z T

Semantic info

Malspec... ; compose SMTP; command

“HELO ..."lea eax, [ebp+arg1]push eaxlea eax, [ebp+buffer]push eaxcall SMTP_send_and_rcv

... ; compose SMTP; command

“HELO ..."lea eax, [ebp+arg1]push eaxlea eax, [ebp+buffer]push eaxcall SMTP_send_and_rcv

push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send

push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send

push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send

push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send

A:

B:

send_email()

SMTP_send_and_rcv()

Page 45: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 45

push 10hpush eaxpush [ebp+s]call connect...push ebxlea eax, [ebp+s]push eaxcall send_email

push 10hpush eaxpush [ebp+s]call connect...push ebxlea eax, [ebp+s]push eaxcall send_email

Query the OracleQuery the OracleProgram (Netsky.O): connect(Y);

send(Z,T);

connect(Y);

send(Z,T);

Syntactic info

“HELO”

Y

Z T

Semantic info

Malspec... ; compose SMTP; command

“HELO ..."lea eax, [ebp+arg1]push eaxlea eax, [ebp+buffer]push eaxcall SMTP_send_and_rcv

... ; compose SMTP; command

“HELO ..."lea eax, [ebp+arg1]push eaxlea eax, [ebp+buffer]push eaxcall SMTP_send_and_rcv

push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send

push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send

push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send

push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send

A:

B:

send_email()

SMTP_send_and_rcv()

Doesmemory[ebp@A+4]

== memory[ebp@B+4]

hold for the code sequence between A

and B?

Yes.

Page 46: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 46

A Recipe for an OracleA Recipe for an Oracle• Instance of program verification

problem:Does program P respect property φ ?

PatternMatching

PatternMatching

RandomExecution

RandomExecution

SimplifyTheorem Prover

SimplifyTheorem Prover

UCLIDModel Checker

UCLIDModel Checker

CodeFragment P

Expressionse1, …, ek

Yes No Yes Yes

More powerful, higher cost

Page 47: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 47

A Behavior-Based PrototypeA Behavior-Based Prototype

• Developed malspecs for several families of worms.

• No false positives.

• Improved resilience to common obfuscations.

Page 48: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 48

Evaluation of MalspecsEvaluation of Malspecs

McAfee uses individual signatures for each worm.

Malspecs provide forward detection.

Netsky.B

Decryption sig

Mass-mailing sig

Prototypedetector

Netsky.C

Netsky.D

Netsky.O

Netsky.P

Netsky.T

Netsky.W

Page 49: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 49

PerformancePerformance• Prototype is slower than commercial

anti-virus tools.

• Plenty of room for improvement.e.g. disassembler: 25% of time.

Malware Family

Running Time

Average Std. Deviation

Netsky 99.57 s 41.01 s

Beagle 56.41 s 40.72 s

Page 50: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 50

Evaluation: False Positive Evaluation: False Positive RateRate• Tested the malspecs on 2,000 benign

Windows binaries.• False positive rate: 0%

0%

20%

40%

60%

80%

100%

0 B 35,840 B 71,680 B 107,520 B 143,360 B

Program size (grouped in 5 kB increments)

Disas

sem

bly

rate

Page 51: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 51

Evaluation: Obfuscation Evaluation: Obfuscation ResilienceResilience• Different types garbage insertion

applied to Beagle.Y to obtain more variants.

Obfuscation TypeBehavior-Based Detection

McAfeeAverage Time Detection Rate

Nop insertion 74.81 s 100% 75%

Stack op. insertion 159.10 s 100% 25%

Math op. insertion 186.50 s 95% 5%

Page 52: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 52

Detector

Obfuscation

Formally Assessing Formally Assessing ResilienceResilience

[POPL 2007][POPL 2007]• Soundness (no false positives)• Completeness (no false negatives)

“HELO”

Y

Z T

Malspec

agmoPrr

Program

?

Page 53: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 53

Approach to Assessing Approach to Assessing ResilienceResilience

• Detector “filters out” irrelevant aspects of the program (described in terms of trace semantics).

Detector

“HELO”

Y

Z T

Malspec

agmoPrr

Program

?

=Program

Program Abstractio

n

Page 54: Behavior-Based Malware Detection

Dynamic Dynamic Behavior-Based DetectionBehavior-Based Detection• Threatfire

• Sana Security

• Novashield

June 2011 Somesh Jha: Behavior-Based Malware Detection 54

Page 55: Behavior-Based Malware Detection

NovaShield Behavior Engine NovaShield Behavior Engine ArchitectureArchitecture

FileMonitor

RegistryMonitor

ProcessMonitor

NetworkMonitor

OS Kernel

BehaviorEngine

User

Process

User

Process

User

Process

Security

Policies

Security

Policies

Page 56: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 56

Additional InformationAdditional Information• Papers

– M. Christodorescu and S. Jha, Testing Malware Detectors, International Sympoisum on Testing and Analysis (ISSTA), 2004

– M. Christodorescu, S. Seshia, S. Jha, D. Song, and R. Bryant, Semantics-Aware Malware Detection, IEEE Symposium on Security and Privacy (Oakland), 2005.

– M. Dalla Preda, M. Christodorescu, S. Debray and S. Jha, A Semantics-Based Approach to Malware Detection, Symposium on Principles of Programming Languages (POPL), January 2007.

• Website– http://www.cs.wisc.edu/~jha/

Page 57: Behavior-Based Malware Detection

Behavior-Based DetectionBehavior-Based Detection

The old way – match syntactic signatures:

The new way – examine underlying behavior:

One-to-one

One-to-one

One-to-many

One-to-many

< 50% detection

< 50% detection

Page 58: Behavior-Based Malware Detection

Specifying BehaviorsSpecifying Behaviors

NtOpenKey“…\CurrentVersion\

Run”

NtDeleteValueKey“McAfee Firewall”

Page 59: Behavior-Based Malware Detection

Specifying BehaviorsSpecifying Behaviors

June 2011 Somesh Jha: Behavior-Based Malware Detection 59

Behavior-graph representation– Nodes epresent events & arguments

•System calls, library calls, high-level events

– Edges represent data dependencies• Data substring equality, resource

generation/use

– Argument values are crucial!

Page 60: Behavior-Based Malware Detection

Finding the Needle in the Finding the Needle in the HaystackHaystack

NtOpenKey“…\CurrentVersion\

Run”

NtDeleteValueKey“McAfee Firewall”

NtOpenKey“…\

InternetSettings\...”

NtSetValueKey“ProxyBypass”

Page 61: Behavior-Based Malware Detection

Large, Complex ProblemLarge, Complex Problem• Behavior graphs are large

– Between tens of thousands to millions of nodes

• New malware is ever-present– Lower bound of 7,933 samples/day in 2009

• Large, diverse benign application pool– Windows 7 is backwards compatible to

NT/95

• Manual analysis, brute force not feasible

7933 samples

Page 62: Behavior-Based Malware Detection

62 Synthesizing Optimal Malware Specifications June 2011

Large, Complex ProblemLarge, Complex Problem• Behavior graphs are large

– Between tens of thousands to millions of nodes

• New malware is ever-present– Lower bound of 7,933 samples/day in 2009

• Large, diverse benign application pool– Windows 7 is backwards compatible to

NT/95

• Manual analysis, brute force not feasible

7933 samples

Page 63: Behavior-Based Malware Detection

63 Synthesizing Optimal Malware Specifications June 2011

Our ContributionsOur Contributions• New specification-synthesis algorithm

– Perform efficient, large-scale data mining first to uncover suspicious behaviors

– Probabilistically refines and optimizes specifications

• Key algorithms scale to real problem size– Reduces the window of vulnerability

• Tunable true positive/false positive rate– 86% TP for low FP, 100% TP for higher FP

Page 64: Behavior-Based Malware Detection

NtOpenKey“…\CurrentVersion\

Run”

NtDeleteValueKey“McAfee Firewall”

Holmes: Our Approach to Holmes: Our Approach to Specification SynthesisSpecification Synthesis•Roadmap:

–Workflow1.Mine significant behaviors2.Synthesize specification

–Results–Conclusion

Page 65: Behavior-Based Malware Detection

Significant BehaviorsSignificant Behaviors

• Significant behaviors discriminate between labeled malicious and benign sets

• Measured statistically via frequency counting of subgraphs– Can use information gain, cross entropy, G-test,

NtOpenKey“…\CurrentVersion\

Run”

NtDeleteValueKey“McAfee Firewall”

Page 66: Behavior-Based Malware Detection

Key RequirementKey Requirement• Significant behavior appears in many

malware graphs, few benign graphs

Page 67: Behavior-Based Malware Detection

Leap Mining: Extracting Leap Mining: Extracting Significant BehaviorsSignificant Behaviors• Want to find subgraph that optimizes

significance measure• Problem: Number of candidate

subgraphs is factorial in # Nodes + # Edges

Page 68: Behavior-Based Malware Detection

Leap Mining (Contd)Leap Mining (Contd)• Insight: Correlation between

structural similarity, significance score similarity to guide search [Yan et al., SIGMOD ‘08]– “Leap” over branches in search tree with

similar structure

• Future: Probabilistically compress source graphs to mine behaviors more efficiently [Chen et al, VLDB ‘09]

June 2011 Somesh Jha: Behavior-Based Malware Detection 68

Page 69: Behavior-Based Malware Detection

Leap Mining: ExampleLeap Mining: Example

Significance

0̀0.1

Significance score similar to parent!

Significance score similar to parent!

This means we can prune

siblings

This means we can prune

siblings0.20.80.10.2

Most significant pattern!

Most significant pattern!

Page 70: Behavior-Based Malware Detection

NtOpenKey“…\CurrentVersion\

Run”

NtDeleteValueKey“McAfee Firewall”

Holmes: Our Approach to Holmes: Our Approach to Specification SynthesisSpecification Synthesis•Roadmap:

–Workflow1.Mine significant behaviors2.Synthesize specification

–Results–Conclusion

Page 71: Behavior-Based Malware Detection

Naïve Synthesis: Just Naïve Synthesis: Just Significant BehaviorsSignificant Behaviors• Use all significant behaviors exhibited

by a specific sample• Pros:

– Not path-dependent– Significance metric likely to select

behaviors that give low false positives

• Cons:– Some significant behaviors may be variant-

specific false negatives!– Some samples may not exhibit many mined

suspicious behaviors false positives!

Page 72: Behavior-Based Malware Detection

Searching for the Optimal Searching for the Optimal SpecificationSpecification• Insight: significant behaviors are

suspicious behaviors• A good spec. is the right combination

of suspicious behaviors• Given a malware set, search using

concept analysis– Concept is a pair: ({malware samples},

{suspicious behaviors})– Find set of concepts with optimal

true/false positive characteristics

Page 73: Behavior-Based Malware Detection

Simulated AnnealingSimulated Annealing

• Concept space is enormous: factorial in number of suspicious behaviors• Simulated annealing: probabilistic search over localized portions of

solution space– Derive new solutions greedily most of the time– With certain probability, move to sub-optimal solutions in the search avoid

local minima– Known sampling methods, cooling schedules to guarantee optimal

convergence

Page 74: Behavior-Based Malware Detection

Simulated Annealing: Simulated Annealing: ExampleExample

Detection Rate False Positives

` 0678 111 5

Probabilistically take sub-optimal

solution!

Probabilistically take sub-optimal

solution!

Page 75: Behavior-Based Malware Detection

75

WorkflowWorkflow

Known Malware

Specification Synthesis

DiscriminativeSpecification

Benign Apps

Significant Behaviors

Behavior Mining

Benign Apps

Recent Malware

Page 76: Behavior-Based Malware Detection

NtOpenKey“…\CurrentVersion\

Run”

NtDeleteValueKey“McAfee Firewall”

Holmes: Our Approach to Holmes: Our Approach to Specification SynthesisSpecification Synthesis•Roadmap:

–Workflow1.Mine significant behaviors2.Synthesize specification

–Results–Conclusion

Page 77: Behavior-Based Malware Detection

Evaluation WorkflowEvaluation Workflow

492 samples

Known Malware

Specification Synthesis

DiscriminativeSpecification

Benign Apps

Significant Behaviors

Behavior Mining

11 apps

166 behaviors

378 samples

Benign Apps

28 apps

1 specification

(with 10-fold cross-validation)

Behavior-BasedMalware Detection

DetectionResults

Recent Malware

Benign Apps

28 apps

42 samples

New Malware

Page 78: Behavior-Based Malware Detection

78 Synthesizing Optimal Malware Specifications June 2011

Corpus DetailsCorpus Details• 912 malware samples

– 18 AV-labeled families• Spyware, worms, bots, filesystem viruses, …

– 492 samples in 6 families for mining– 420 samples in 12 families for synthesis

& evaluation

• 49 benign applications– Behaviorally-diverse set: browsers,

system administration, media…

Page 79: Behavior-Based Malware Detection

Corpus Details (Contd)Corpus Details (Contd)• Trace collection accounts for a single

path– 120 seconds for malware– Typical usage patterns for benign

applications

Page 80: Behavior-Based Malware Detection

80 Synthesizing Optimal Malware Specifications June 2011

Behavior Mining ResultsBehavior Mining Results• Mined 109 unique behaviors

– 18.1 per family, on average– 77 manually deemed malicious

• Non-malicious behaviors due to sample size

• Most behaviors correspond to those in AV databases– Mined some unreported by AV, e.g. code

injection & browser reconfiguration in worms and viruses

– Some behaviors missing (likely) due to single-path collection

Page 81: Behavior-Based Malware Detection

81 Synthesizing Optimal Malware Specifications June 2011

Specification Synthesis Specification Synthesis ResultsResults

• 0 FP on test corpus for 86.5% detection rate• TP/FP tradeoff configurable• Better than commercial AV on our corpus: Sana (42.61%),

Threatfire (61.70%)

Page 82: Behavior-Based Malware Detection

82 Synthesizing Optimal Malware Specifications June 2011

Specification Synthesis Specification Synthesis ResultsResults

• 0 FP on test corpus for 86.5% detection rate• TP/FP tradeoff configurable• Better than commercial AV on our corpus: Sana (42.61%),

Threatfire (61.70%)

Page 83: Behavior-Based Malware Detection

Synthesizing Optimal Malware Specifications June 2011

Performance and ScalabilityPerformance and Scalability• Behavior mining runtime varies between

families– Worst-case exponential; can tweak tradeoff

in accuracy– Similarity between malicious/benign graphs

affects runtime– Can easily parallelize for linear speedup

• Specification synthesis works quickly– Most specifications found in under one

minute (near-optimal solutions)– Optimal solution can be found in

exponential time using same algorithm

Page 84: Behavior-Based Malware Detection

84 Synthesizing Optimal Malware Specifications June 2011

ConclusionsConclusions

NtOpenKey“…\CurrentVersion\

Run”

NtDeleteValueKey“McAfee Firewall”

• Synthesizing specifications is hard!• Holmes utilizes large-scale data

mining to extract suspicious behaviors

• Holmes probabilistically searches for near-optimal specifications using suspicious behaviors

• Detection results beat industry results

• Algorithms scale to real problem size

Page 85: Behavior-Based Malware Detection

Additional InformationAdditional Information• Matt Fredrikson, Somesh Jha, Mihai

Christodorescu, Reiner Sailer, Xifeng Yan– Synthesizing Near-Optimal Malware

Specifications from Suspicious Behaviors.

– IEEE Symposium on Security and Privacy, 2010.

June 2011 Somesh Jha: Behavior-Based Malware Detection 85

Page 86: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 86

OutlineOutline• Introduction• Threat Model• Evaluation of Current Detectors• Behavior-Based Detection• Future Directions

Page 87: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 87

Take awaysTake aways• Malware detection is $5-6 billion

dollar industry• No well defined threat model• Need to formally defined a threat

model and design detection techniques based on it

• Behavior-based malware detection is a move towards that vision

Page 88: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 88

On the theoretical sideOn the theoretical side• Can we prove oracle completeness

results?– For example, if the oracle can give me a

perfect control-flow graph, I can handle reordering heuristics perfectly

• How about bounding the adversary?– Computational power (like in

cryptography)– Limit the class of obfuscations

Page 89: Behavior-Based Malware Detection

Questions?Questions?

Page 90: Behavior-Based Malware Detection

90 Synthesizing Optimal Malware Specifications June 2011

Naïve Synthesis: Full Naïve Synthesis: Full SpecificationSpecification• Use entire behavior graph for malware

sample• Pros:

– Fits malware very tightly– Low false positives

• Cons:– Path-specific: e.g. some looping/branching

behavior, non-determinism not critical for specification

– Impossible to build full graph – behaviors not in training run are not accounted for

– Likely to miss variants

Page 91: Behavior-Based Malware Detection

91 Synthesizing Optimal Malware Specifications June 2011

Specifying BehaviorsSpecifying Behaviors• Behavior graph representation

– Nodes represent events & arguments• System calls, library calls, high-level events

– Edges represent data dependencies• Data substring equality, resource

generation/use

– Argument values are crucial!

NtOpenKey“…\CurrentVersion\

Run”

NtDeleteValueKey“McAfee Firewall”

NtOpenKey

NtDeleteValueKey

DefUse(1, 1) DefUse(1, 1)

NtOpenKey(501, ACC_WRITE,

“Run”, )

NtDeleteValueKey(501, “… Firewall”, )

DefUse(1, 1)

Too genera

l!

Too genera

l!

Too specifi

c!

Too specifi

c!

Just RightJust

Right

Page 92: Behavior-Based Malware Detection

92 Synthesizing Optimal Malware Specifications June 2011

Multi-Faceted ProblemMulti-Faceted Problem• Detailed behavior information makes large,

data-rich raw source• Difficult to extract complete behavior

information– See multi-path problem [Cadar et al., CCS ‘06],

[Moser et al., Oakland ‘07]

• Malicious and benign behaviors look similar– Benign application update vs. malicious

dropping– Benign network activity vs. malicious C&C– Benign software patching vs. malicious code

injection

Page 93: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 93

Start upStart up• There is a startup which is

commercializing some of the ideas presented in this talk

• Securitas Technologies Inc.– See www.securitastech.com

Page 94: Behavior-Based Malware Detection

Here be Dragons!Here be Dragons!

Page 95: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 95

DisclaimerDisclaimer

Virus detection is undecidable.[Cohen 1984]

Best approximation up to now:byte signatures.

Page 96: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 96

My Proposal for a SolutionMy Proposal for a Solution• Make the malware writer’s job as

hard as possible.

• Stop malware based on behavior:– Employ semantics of instructions– Use enforceable interfaces– Combine static and dynamic techniques

Page 97: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 97

Current AV Detection Current AV Detection MethodsMethods• Scan strings

(byte sequences from a malicious executable)

– Enhanced using regular expressions

• Heuristics– Binary file structure– APIs used– Byte (n-gram) distribution

Page 98: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 98

Previous ResearchPrevious Research• Different structures over bytes

N-gram distributions[Li, Wang, & Stolfo, SMC 2005]

Neural networks, Bayes[Arnold & Tesauro, VB2000]

Additional features: DLL imports, syscalls[Schultz, Eskin, Zadok, & Stolfo, Oakland 2001]

• Different information about the programSlices from syscalls

[Lo, Levitt, & Ollson, 1995]

Recovery of high-level constructs[Bergeron, Debbabi, Erhioui, & Ktari, SREIS 2001]

Model checking[Kinder, Katzenbeisser, Schallhart, & Veith, DIMVA 2005]

Page 99: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 99

Key ObservationsKey ObservationsVariants : New strains of viruses that

borrow code, to varying degrees, directly from other known viruses.

Source: Symantec Security Response Glossary

• Syntactic signatures cannot capture variants.

• Syntactic signature methods do not scale.

Need to focus on behavior.

Page 100: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 100

My Previou

s Researc

h

Proposed

Research

Behavior-Based DetectionBehavior-Based Detection• How to describe malicious behavior?

• How to identify malicious behavior?– Static Techniques– Static + Dynamic Techniques

• How to automatically learnmalicious behavior?

• How effective are these techniques?

Page 101: Behavior-Based Malware Detection

A Language to Describe A Language to Describe Malicious BehaviorsMalicious Behaviors

Previous Researc

h

Page 102: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 102

Establishing a Threat ModelEstablishing a Threat ModelA threat model has three components:

• Attack ModelHow is the attack performed?

• Defensive GoalWhat is the system designed to protect?

• TimeHow long is the protection operational?

: Malicious Behavior

: Trusted Computing Base

: Forever?

Page 103: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 103

• Interface to TCB has to be enforceable.

For this talk: TCB = OS + Processor.

Choosing a TCBChoosing a TCB

Program

LibrariesAPI calls

OS KernelSystem calls

ProcessorInstructions

TCB: Libraries/Interpreter+ OS+ Processor

TCB:OS+ ProcessorTCB:Processor

Page 104: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 104

Formal Definition of MalspecFormal Definition of MalspecΣ = { σk }k≥1 is the set of system calls

V = { vi }i≥1 is the set of uninterpreted vars

A is a logic of formulas over V

G = (N,E) is a graph:Vertices are labeled with system calls

from Σ instantiated with variables from V.

Edges are labeled with predicates in A.

Page 105: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 105

Malspec BenefitsMalspec Benefits• Representation-independent

– Depends only on the interface to the TCB

– Ignores functions boundaries– Ignores specific data structures– Ignores process boundaries

• Order-independent– Allows any order of operations, as long

as the dependence predicates are satisfied.

Page 106: Behavior-Based Malware Detection

Static Detection ofStatic Detection ofMalicious BehaviorMalicious Behavior

Previous Researc

h

Page 107: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 107

Step 1: Matching NodesStep 1: Matching NodesStraightforward…… except for encrypted code!

• Encryption & compression effectively hide the system calls (i.e., the TCB operations).

• Solution: Malware normalization

Page 108: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 108

A Malware NormalizerA Malware Normalizer• Dynamic analysis technique:

– Run program in a contained environment

– Stop as soon as control flow reaches a previously written address

– Reconstruct program with current memory snapshotPacked

Executable

Normalizer

Qemu (system

emulator)

Unpacked Executable

Page 109: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 109

Detector CharacteristicsDetector Characteristics• Intraprocedural:

– Flow sensitive Handles many syntacticobfuscations

• Interprocedural:– Context sensitive

OR– Context insensitive

Handles changesthroughevolution

Page 110: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 110

Step 2: Predicate Step 2: Predicate VerificationVerificationCheck whether a program path satisfies

the corresponding malspec predicate.

Requirements for the predicate logic:• Addition, comparison, multiplication• Bit-vector arithmetic• Arrays• On 32-bit values (and soon 64-bit

values)

Page 111: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 111

For predicates that express preservation of values.φ(A): A1 = A2

• Syntactic check:Compare code sequence with a known set of obfuscations– Nops, pushes & pops– Operations on non-live

variables

A Simple VerifierA Simple Verifier

φ

Malspec

Page 112: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 112

Preliminary ResultsPreliminary Results [Christodorescu & Jha, USENIX Security [Christodorescu & Jha, USENIX Security 2003]2003]

Detection succeeds in the presence of:– Code reordering– Simple junk insertion– Register renaming

Zero missed detections(compared to very high missed detection rates for commercial virus scanner)

Page 113: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 113

A Value-Preservation VerifierA Value-Preservation VerifierExpress program path as a

state transformer.– Use instruction semantics

Use decision procedures.

φ

Malspec

∂ ∂ φ ?

Page 114: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 114

Verification ToolsVerification Tools• Instance of program verification

problem:Does program P respect property φ ?

PatternMatching

PatternMatching

RandomExecution

RandomExecution

SimplifyTheorem Prover

SimplifyTheorem Prover

UCLIDModel Checker

UCLIDModel Checker

CodeFragment

Predicateφ

Yes No Yes Yes

More powerful, higher cost

RandomAbstract

Interpretation

Page 115: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 115

Evaluation of Value-Evaluation of Value-PreservationPreservation [Christodorescu & Jha, Oakland [Christodorescu & Jha, Oakland 2005]2005]

McAfee uses individual signatures for each worm.Semantic malspecs provide forward detection.

Netsky.B

Decryption malspec

Mass-mailing malspec

Prototypedetector

Netsky.C

Netsky.D

Netsky.O

Netsky.P

Netsky.T

Netsky.W

Page 116: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 116

Architecture (up to now)Architecture (up to now)Executable

MalwareNormalizerMalware

NormalizerNormalize

dExecutabl

e

Malspec

Library

Semantics-Aware

Malware Detector

Semantics-Aware

Malware Detector

Semantic Query EngineSemantic Query Engine

Decision Procedures

Static AnalysesInstruction/Syscall

Semantics

Page 117: Behavior-Based Malware Detection

Hybrid Detection ofHybrid Detection ofMalicious BehaviorMalicious Behavior

Proposed

Research

Page 118: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 118

Static Analysis is Not PerfectStatic Analysis is Not Perfect• Safety at the cost of precision

– Good for strict security, bad for usable security.

φ

Perl interpreter

Page 119: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 119

Imprecision of Static Imprecision of Static AnalysisAnalysis• Many sources of imprecision :

– Disassembly– Control flow reconstruction– Loops, recursion– Malspec predicate verification (decision

procedures)

• Leads to false positives

Page 120: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 120

Dynamic AnalysisDynamic Analysis• As precise as possible for a particular

execution– Can retrieve any part of program state– Adds time dimension

• But... adds runtime overhead– Emulators are orders of magnitude

slower

Page 121: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 121

A Hybrid Malware DetectorA Hybrid Malware DetectorCombine static + dynamic

– Identify where static analysis loses precision

– Have the dynamic analyzer check those locations

Detection goal:Check only whether malicious behavior

appears in the current execution.

Small (<10%) runtime overhead needed.

Page 122: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 122

ExampleExample

Runtime monitor determines whether portion of trace satisfies predicate.

φ

Perl interpreter

Static Stage

Dynamic Stage

Perl interpreter

Runtime monitorin

g

Page 123: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 123

Hybrid Detector OperationHybrid Detector Operation1. Determine path validity

Static analysis identifies a certain path as possibly malicious.

Dynamic analysis confirms that the current execution trace follows that path.

2. Check that trace satisfies predicateAt the end of the trace segment that

matches the path, verify the malspec predicate.

Page 124: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 124

Semantic Query EngineSemantic Query EngineStatic + Dynamic

Analyses

Architecture [hybrid]Architecture [hybrid]Executable

MalwareNormalizerMalware

NormalizerNormalize

dExecutabl

e

Malspec

Library

Semantics-Aware

Malware Detector

Semantics-Aware

Malware Detector

Decision Procedures

Static AnalysesInstruction/Syscall

Semantics

Page 125: Behavior-Based Malware Detection

Automatic Extraction of Automatic Extraction of Malicious BehaviorMalicious Behavior

Proposed

Research

Page 126: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 126

Deriving MalspecsDeriving MalspecsGoal:

Extract a malspec from a sample program labeled as malicious.

• Requirements– Capture behavior, not implementation– Low to no false positives

multiple samplesTwo options

one sample

Page 127: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 127

Malspec from Multiple Malspec from Multiple SamplesSamplesLearning a malspec from multiple

samples:

1. Identify common sequences of system calls.

– Subgraph isomorphism

2. For each pair of system calls, construct a predicate describing the actual code paths.

- Symbolic execution, human expert

Page 128: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 128

ExampleExample

X =socket()connect( Y )write( Z, “EHLO ...” )

write( B, “DATA” )write( C, body )

close( D )

write( A, “TO ” + address )

X =socket()connect( Y )

foo( Z, “EHLO ...” )

foo( B, “DATA” )

foo( C, body )

close( D )

foo( A, “TO ” + address )

write( A, B )read( C )

Beagle.B Beagle.Cfoo( A, B )

Page 129: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 129

Malspec from One SampleMalspec from One SampleAdditional semantic information

needed

• System call API usage rules– Provides sequencing information and

some data flow information

• Network protocol semantics– Provides sequencing information and

additional data flow information

Page 130: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 130

System call rules:

socket connect (write|read)* close

SMTP protocol:

write(“EHLO”)

write(“MAILTO”+addr)

write(“DATA”)

write(body)

Example: Beagle.BExample: Beagle.B

X =socket()connect( Y )write( Z, “EHLO ...” )

write( B, “DATA” )write( C, body )

close( D )

write( A, “TO ” + address )

Page 131: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 131

Complete ArchitectureComplete ArchitectureExecutable

MalwareNormalizerMalware

NormalizerNormalize

dExecutabl

e

Malspec

Library

Semantics-Aware

Malware Detector

Semantics-Aware

Malware Detector

MalspecGenerato

r

MalspecGenerato

r

Semantic Query EngineSemantic Query Engine

Decision ProceduresStatic + Dynamic

AnalysesInstruction/Syscall Semantics

Page 132: Behavior-Based Malware Detection

Theoretical Limits of Theoretical Limits of Behavior-Based DetectionBehavior-Based Detection

Proposed

Research

Page 133: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 133

What Does This Buy Us?What Does This Buy Us?• How strong (theoretically) is this

system?ORHow much harder does the malware writer have to work to evade my system?

Goal:“Design” a computationally-bounded adversary. Assess the behavior-based detector against this adversary.

Page 134: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 134

TimelineTimeline

2005 2006 2007June June

Malspec extractionfrom many samples

Malspec extractionfrom one sample

Hybrid detection• runtime monitor• path checking• predicate

checking

Theoretical workThesis writingInterview season

Page 135: Behavior-Based Malware Detection

Behavior-Based Malware Behavior-Based Malware DetectionDetection

Somesh Jha

Joint work with Mihai Christodorescu

Page 136: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 136

Page 137: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 137

Step 2: UnificationStep 2: Unification• One-way unification to associate

program expressions with the uninterpreted variables in the malspec.

• Result: one binding map for each matched pair (malspec node, program location).

Page 138: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 138

Evaluation: Obfuscation Evaluation: Obfuscation ResilienceResilience• Different types junk insertion applied

to Beagle.Y to obtain more variants.

Obfuscation TypeSemantics-Aware Detection

McAfeeAverage Time Detection Rate

Nop insertion 74.81 s 100% 75%

Stack op. insertion 159.10 s 100% 25%

Math op. insertion 186.50 s 95% 5%

Page 139: Behavior-Based Malware Detection

June 2011 Somesh Jha: Behavior-Based Malware Detection 139

Problems with Dynamic Problems with Dynamic AnalysisAnalysis• Execution may have affected the

host machine in a malicious way.

Goal:Stop execution as soon as itenters a path that iscertainly malicious.

• Static analysis can help identify these points of no return.

Perl interpreter