automatic code features extraction using bio-inspired ... · automatic code features extraction...

52
Automatic Code Features Extraction Using Bio-inspired Algorithms EICAR 2013 Ciprian Opris , a, George Cab˘ au and Adrian Coles , a Bitdefender, Technical University of Cluj-Napoca November 18, 2013

Upload: haliem

Post on 05-Jun-2018

246 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

Automatic Code Features Extraction Using

Bio-inspired AlgorithmsEICAR 2013

Ciprian Opris,a, George Cabau and Adrian Coles, a

Bitdefender, Technical University of Cluj-Napoca

November 18, 2013

Page 2: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

Agenda

1 Introduction

2 Objectives

3 OpCodes Extraction and Normalization

4 Automatic Filters Selection

5 Experimental results

6 Conclusions and future work

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 2 / 25

Page 3: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

1. Introduction

Agenda

1 Introduction

2 Objectives

3 OpCodes Extraction and Normalization

4 Automatic Filters Selection

5 Experimental results

6 Conclusions and future work

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 3 / 25

Page 4: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

1. Introduction

Where are we? (1)We need to detect malware.

︸ ︷︷ ︸↓

Hash(es)↓?

Malware database↙ ↘

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 4 / 25

Page 5: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

1. Introduction

Where are we? (2)

︸ ︷︷ ︸↓

↓?Malware database

↙ ↘

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

Page 6: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

1. Introduction

Where are we? (2)

︸ ︷︷ ︸↓

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pmsmplpcmlpctjczczczmJ

↓?Malware database

↙ ↘

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

Page 7: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

1. Introduction

Where are we? (2)

︸ ︷︷ ︸↓

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pmsmplpc mlpctjczczczmJ

<pmsmplpc>

↓?Malware database

↙ ↘

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

Page 8: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

1. Introduction

Where are we? (2)

︸ ︷︷ ︸↓

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ p msmplpcm lpctjczczczmJ

<pmsmplpc>, <msmplpcm>

↓?Malware database

↙ ↘

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

Page 9: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

1. Introduction

Where are we? (2)

︸ ︷︷ ︸↓

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pm smplpcml pctjczczczmJ

<pmsmplpc>, <msmplpcm>, <smplpcml>

↓?Malware database

↙ ↘

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

Page 10: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

1. Introduction

Where are we? (2)

︸ ︷︷ ︸↓

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pms mplpcmlp ctjczczczmJ

<pmsmplpc>, <msmplpcm>, <smplpcml>, <mplpcmlp>

↓?Malware database

↙ ↘

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

Page 11: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

1. Introduction

Where are we? (2)

︸ ︷︷ ︸↓

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pmsm plpcmlpc tjczczczmJ

<pmsmplpc>, <msmplpcm>, <smplpcml>, <mplpcmlp>, <plpcmlpc>

↓?Malware database

↙ ↘

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

Page 12: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

1. Introduction

Where are we? (2)

︸ ︷︷ ︸↓

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pmsmplpcmlpctjczczczmJ

<pmsmplpc>, <msmplpcm>, <smplpcml>, <mplpcmlp>, <plpcmlpc>, . . .

↓?Malware database

↙ ↘

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

Page 13: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

1. Introduction

Where are we? (2)

︸ ︷︷ ︸↓

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pmsmplpcmlpctjczczczmJ

<pmsmplpc>, <msmplpcm>, <smplpcml>, <mplpcmlp>, <plpcmlpc>, . . .

↓?Malware database

↙ ↘

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

Page 14: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

2. Objectives

Agenda

1 Introduction

2 Objectives

3 OpCodes Extraction and Normalization

4 Automatic Filters Selection

5 Experimental results

6 Conclusions and future work

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 6 / 25

Page 15: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

2. Objectives

Objectives

GoalImprove detection on .NET malware by filtering the OpCodes toextract more meaningful n-grams.

Extract OpCode sequences from .NET applications.

Eliminate unreachable code.

Design a fitness function to evaluate the quality of an OpCodefilter.

Use bio-inspired algorithms to find the best filter.

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 7 / 25

Page 16: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

Agenda

1 Introduction

2 Objectives

3 OpCodes Extraction and Normalization

4 Automatic Filters Selection

5 Experimental results

6 Conclusions and future work

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 8 / 25

Page 17: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

Parsing and disassembling .NET

An extension of theMicrosoft PortableExecutable format

Many, many tables

=== Method 4: name=’mpress._::Main’; RVA=0x0000254C;

FA=0x0000074C; size=0x9A ===

= Exception handlers: 000025D6; =

0000254C: [00] nop

0000254D: [28 0E 00 00 0A] call 0x0A00000E

00002552: [12 00] ldloca.s 0x00

00002554: [28 03 00 00 06] call 0x06000003

00002559: [13 06] stloc.s 0x06

0000255B: [11 06] ldloc.s 0x06

0000255D: [2D 10] brtrue.s 0x10

0000255F: [00] nop

00002560: [72 01 00 00 70] ldstr 0x70000001

00002565: [72 23 00 00 70] ldstr 0x70000023

0000256A: [28 0F 00 00 0A] call 0x0A00000F

0000256F: [26] pop

00002570: [15] ldc.i4.m1

00002571: [13 05] stloc.s 0x05

00002573: [2B 02] br.s 0x02

00002575: [26] pop

00002576: [06] ldloc.0

00002577: [28 10 00 00 0A] call 0x0A000010

0000257C: [80 01 00 00 04] stsfld 0x04000001

00002581: [7E 01 00 00 04] ldsfld 0x04000001

00002586: [6F 11 00 00 0A] callvirt 0x0A000011

0000258B: [0B] stloc.1

...

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 9 / 25

Page 18: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

Parsing and disassembling .NET

An extension of theMicrosoft PortableExecutable format

Many, many tables

=== Method 4: name=’mpress._::Main’; RVA=0x0000254C;

FA=0x0000074C; size=0x9A ===

= Exception handlers: 000025D6; =

0000254C: [00] nop

0000254D: [28 0E 00 00 0A] call 0x0A00000E

00002552: [12 00] ldloca.s 0x00

00002554: [28 03 00 00 06] call 0x06000003

00002559: [13 06] stloc.s 0x06

0000255B: [11 06] ldloc.s 0x06

0000255D: [2D 10] brtrue.s 0x10

0000255F: [00] nop

00002560: [72 01 00 00 70] ldstr 0x70000001

00002565: [72 23 00 00 70] ldstr 0x70000023

0000256A: [28 0F 00 00 0A] call 0x0A00000F

0000256F: [26] pop

00002570: [15] ldc.i4.m1

00002571: [13 05] stloc.s 0x05

00002573: [2B 02] br.s 0x02

00002575: [26] pop

00002576: [06] ldloc.0

00002577: [28 10 00 00 0A] call 0x0A000010

0000257C: [80 01 00 00 04] stsfld 0x04000001

00002581: [7E 01 00 00 04] ldsfld 0x04000001

00002586: [6F 11 00 00 0A] callvirt 0x0A000011

0000258B: [0B] stloc.1

...

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 9 / 25

Page 19: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

Parsing and disassembling .NET

An extension of theMicrosoft PortableExecutable format

Many, many tables

=== Method 4: name=’mpress._::Main’; RVA=0x0000254C;

FA=0x0000074C; size=0x9A ===

= Exception handlers: 000025D6; =

0000254C: [00] nop

0000254D: [28 0E 00 00 0A] call 0x0A00000E

00002552: [12 00] ldloca.s 0x00

00002554: [28 03 00 00 06] call 0x06000003

00002559: [13 06] stloc.s 0x06

0000255B: [11 06] ldloc.s 0x06

0000255D: [2D 10] brtrue.s 0x10

0000255F: [00] nop

00002560: [72 01 00 00 70] ldstr 0x70000001

00002565: [72 23 00 00 70] ldstr 0x70000023

0000256A: [28 0F 00 00 0A] call 0x0A00000F

0000256F: [26] pop

00002570: [15] ldc.i4.m1

00002571: [13 05] stloc.s 0x05

00002573: [2B 02] br.s 0x02

00002575: [26] pop

00002576: [06] ldloc.0

00002577: [28 10 00 00 0A] call 0x0A000010

0000257C: [80 01 00 00 04] stsfld 0x04000001

00002581: [7E 01 00 00 04] ldsfld 0x04000001

00002586: [6F 11 00 00 0A] callvirt 0x0A000011

0000258B: [0B] stloc.1

...

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 9 / 25

Page 20: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

Parsing and disassembling .NET

An extension of theMicrosoft PortableExecutable format

Many, many tables

=== Method 4: name=’mpress._::Main’; RVA=0x0000254C;

FA=0x0000074C; size=0x9A ===

= Exception handlers: 000025D6; =

0000254C: [00] nop

0000254D: [28 0E 00 00 0A] call 0x0A00000E

00002552: [12 00] ldloca.s 0x00

00002554: [28 03 00 00 06] call 0x06000003

00002559: [13 06] stloc.s 0x06

0000255B: [11 06] ldloc.s 0x06

0000255D: [2D 10] brtrue.s 0x10

0000255F: [00] nop

00002560: [72 01 00 00 70] ldstr 0x70000001

00002565: [72 23 00 00 70] ldstr 0x70000023

0000256A: [28 0F 00 00 0A] call 0x0A00000F

0000256F: [26] pop

00002570: [15] ldc.i4.m1

00002571: [13 05] stloc.s 0x05

00002573: [2B 02] br.s 0x02

00002575: [26] pop

00002576: [06] ldloc.0

00002577: [28 10 00 00 0A] call 0x0A000010

0000257C: [80 01 00 00 04] stsfld 0x04000001

00002581: [7E 01 00 00 04] ldsfld 0x04000001

00002586: [6F 11 00 00 0A] callvirt 0x0A000011

0000258B: [0B] stloc.1

...

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 9 / 25

Page 21: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

CIL instruction types

instructions that move data around:ldc (load constant), ldarg (load argument), . . .

arithmetic and logic instructions:add, div, or, and, xor, . . .

object model instructions:newobj, . . .

instructions that modify the control flow

returning instructions (call, callvirt)unconditional branches (br, br.s)conditional branches (brtrue, brfalse, breq.s)flow disruptive instructions (ret, throw, jmp)

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 10 / 25

Page 22: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

Eliminating unreachable code

Enqueue the entry point andexception handlers.While queue is not empty:

Dequeue the next address.

Sweep until already reached codeor end of the buffer isencountered

Unconditional branch→ follow the branchConditional branch→ enqueue branch, continuesweepingFlow disruptive instruction→ stop current sweeping

Queue:

i1

i2

i3

. . .

ik

ik+1

br

. . .

ik

brtrue

i5

. . .

ret

. . .

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

Page 23: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

Eliminating unreachable codeEnqueue the entry point andexception handlers.While queue is not empty:

Dequeue the next address.

Sweep until already reached codeor end of the buffer isencountered

Unconditional branch→ follow the branchConditional branch→ enqueue branch, continuesweepingFlow disruptive instruction→ stop current sweeping

Queue:

i1

i2

i3

. . .

ik

ik+1

br

. . .

ik

brtrue

i5

. . .

ret

. . .

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

Page 24: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

Eliminating unreachable codeEnqueue the entry point andexception handlers.While queue is not empty:

Dequeue the next address.

Sweep until already reached codeor end of the buffer isencountered

Unconditional branch→ follow the branchConditional branch→ enqueue branch, continuesweepingFlow disruptive instruction→ stop current sweeping

Queue:

i1

i2

i3

. . .

ik

ik+1

br

. . .

ik

brtrue

i5

. . .

ret

. . .

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

Page 25: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

Eliminating unreachable codeEnqueue the entry point andexception handlers.While queue is not empty:

Dequeue the next address.

Sweep until already reached codeor end of the buffer isencountered

Unconditional branch→ follow the branchConditional branch→ enqueue branch, continuesweepingFlow disruptive instruction→ stop current sweeping

Queue:

i1

i2

i3

. . .

ik

ik+1

br

. . .

ik

brtrue

i5

. . .

ret

. . .

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

Page 26: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

Eliminating unreachable code

Enqueue the entry point andexception handlers.While queue is not empty:

Dequeue the next address.

Sweep until already reached codeor end of the buffer isencountered

Unconditional branch→ follow the branchConditional branch→ enqueue branch, continuesweepingFlow disruptive instruction→ stop current sweeping

Queue:

i1

i2

i3

. . .

ik

ik+1

br

. . .

ik

brtrue

i5

. . .

ret

. . .

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

Page 27: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

Eliminating unreachable code

Enqueue the entry point andexception handlers.While queue is not empty:

Dequeue the next address.

Sweep until already reached codeor end of the buffer isencountered

Unconditional branch→ follow the branchConditional branch→ enqueue branch, continuesweepingFlow disruptive instruction→ stop current sweeping

Queue:

i1

i2

i3

. . .

ik

ik+1

br

. . .

ik

brtrue

i5

. . .

ret

. . .

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

Page 28: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

Eliminating unreachable codeEnqueue the entry point andexception handlers.While queue is not empty:

Dequeue the next address.

Sweep until already reached codeor end of the buffer isencountered

Unconditional branch→ follow the branchConditional branch→ enqueue branch, continuesweepingFlow disruptive instruction→ stop current sweeping

Queue:

i1

i2

i3

. . .

ik

ik+1

br

. . .

ik

brtrue

i5

. . .

ret

. . .

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

Page 29: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

3. OpCodes Extraction and Normalization

OpCodes normalization

DefinitionThe basic normalization function:

normal : O → Σ ∪ {ε}

normal(nop) = ε

normal(brtrue) = normal(brfalse)

DefinitionFiltering (Λ-normalization), Λ ⊆ Σ:

normalΛ(o) =

{normal(o) , if normal(o) ∈ Λε , otherwise

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 12 / 25

Page 30: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

Agenda

1 Introduction

2 Objectives

3 OpCodes Extraction and Normalization

4 Automatic Filters Selection

5 Experimental results

6 Conclusions and future work

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 13 / 25

Page 31: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

Λ-detectability

Sequences of symbols from:

558695 clean methods

272 malware clusters

Different n-grams for different filters Λ.

p1 →

ng1, , ng4, , ng7cleanset filtering−−−−−−−−−→ ng1, , ng7

p2 →

ng2, ng4, , ng8, ng9cleanset filtering−−−−−−−−−→ ng2, , ng8, ng9

p3 →

, ng4, , ng6cleanset filtering−−−−−−−−−→ , ng6

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 14 / 25

Page 32: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

Λ-detectability

Sequences of symbols from:

558695 clean methods

272 malware clusters

Different n-grams for different filters Λ.

p1 →

ng1, , ng4, , ng7cleanset filtering−−−−−−−−−→ ng1, , ng7

p2 →

ng2, ng4, , ng8, ng9cleanset filtering−−−−−−−−−→ ng2, , ng8, ng9

p3 →

, ng4, , ng6cleanset filtering−−−−−−−−−→ , ng6

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 14 / 25

Page 33: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

Λ-detectability

Sequences of symbols from:

558695 clean methods

272 malware clusters

Different n-grams for different filters Λ.

p1 → ng1, ng3, ng4, ng5, ng7

cleanset filtering−−−−−−−−−→ ng1, , ng7

p2 → ng2, ng4, ng5, ng8, ng9

cleanset filtering−−−−−−−−−→ ng2, , ng8, ng9

p3 → ng3, ng4, ng5, ng6

cleanset filtering−−−−−−−−−→ , ng6

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 14 / 25

Page 34: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

Λ-detectability

Sequences of symbols from:

558695 clean methods

272 malware clusters

Different n-grams for different filters Λ.

p1 → ng1, ng3, ng4, ng5, ng7

cleanset filtering−−−−−−−−−→ ng1, , ng7

p2 → ng2, ng4, ng5, ng8, ng9

cleanset filtering−−−−−−−−−→ ng2, , ng8, ng9

p3 → ng3, ng4, ng5, ng6

cleanset filtering−−−−−−−−−→ , ng6

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 14 / 25

Page 35: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

Λ-detectability

Sequences of symbols from:

558695 clean methods

272 malware clusters

Different n-grams for different filters Λ.

p1 → ng1, ng3, ng4, ng5, ng7cleanset filtering−−−−−−−−−→ ng1, ng4, ng7

p2 → ng2, ng4, ng5, ng8, ng9cleanset filtering−−−−−−−−−→ ng2, ng4, ng8, ng9

p3 → ng3, ng4, ng5, ng6cleanset filtering−−−−−−−−−→ ng4, ng6

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 14 / 25

Page 36: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

Λ-detectability

Sequences of symbols from:

558695 clean methods

272 malware clusters

Different n-grams for different filters Λ.

p1 → ng1, ng3, ng4, ng5, ng7cleanset filtering−−−−−−−−−→ ng1, ng4, ng7

p2 → ng2, ng4, ng5, ng8, ng9cleanset filtering−−−−−−−−−→ ng2, ng4, ng8, ng9

p3 → ng3, ng4, ng5, ng6cleanset filtering−−−−−−−−−→ ng4, ng6

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 14 / 25

Page 37: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

The fitness function

DefinitionThe fitness function:

f : P(Σ)→ R

f (Λ) =clusters detectability

number of clusters

Search space: | P(Σ) |= 2|Σ|

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 15 / 25

Page 38: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

The fitness function

DefinitionThe fitness function:

f : P(Σ)→ R

f (Λ) =clusters detectability

number of clusters

Search space: | P(Σ) |= 2|Σ| Example

ecbeceaaed

bedccecaeed

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 15 / 25

Page 39: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

The fitness function

DefinitionThe fitness function:

f : P(Σ)→ R

f (Λ) =clusters detectability

number of clusters

Search space: | P(Σ) |= 2|Σ|

Example

ecbeceaaed

bedccecaeed

Λ = Σ = {a, b, c , d , e}ec be ce a a ed

be dc ce c a e ed

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 15 / 25

Page 40: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

The fitness function

DefinitionThe fitness function:

f : P(Σ)→ R

f (Λ) =clusters detectability

number of clusters

Search space: | P(Σ) |= 2|Σ|

Example

ecbeceaaed

bedccecaeed

Λ = {e}eeeeeeee

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 15 / 25

Page 41: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

The fitness function

DefinitionThe fitness function:

f : P(Σ)→ R

f (Λ) =clusters detectability

number of clusters

Search space: | P(Σ) |= 2|Σ|

Example

ecbeceaaed

bedccecaeed

Λ = {a, b, e}e beea a ebeea e e

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 15 / 25

Page 42: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

Evolutionary algorithms

Start with a population of random solutions.At each step, the individuals interact and evolve towards bettersolutions.Eventually, they should reach an optimum solution (global or local).

Genetic Algorithm

Particle Swarm Optimization

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 16 / 25

Page 43: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

Genetic Algorithm

Binary encoding: 0 1 1 0 1 1 0 . . . 1 0

Crossover: Λ1,Λ2crossover−−−−−→ Λ′1,Λ

′2

Mutation: Λmutation−−−−−→ Λ′

Roulette Wheel selection: Pselection(Λk) =f (Λk)∑Λ

f (Λ)

Elitism

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 17 / 25

Page 44: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

4. Automatic Filters Selection

Particle Swarm Optimization

Representation: p = (X ,V ,Xbest , best fitness)X ∈ [0, 1]|Σ|, V ∈ [−1, 1]|Σ|

Update:

X ′ = X + V

V ′ = ωV + φ1r1(Xbest − X ) + φ2r2(global best − X )

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 18 / 25

Page 45: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

5. Experimental results

Agenda

1 Introduction

2 Objectives

3 OpCodes Extraction and Normalization

4 Automatic Filters Selection

5 Experimental results

6 Conclusions and future work

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 19 / 25

Page 46: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

5. Experimental results

5. Experimental results (1)

Parallel speedup for the fitness function:

Amdahl’s law: S(k) =T (1)

T (k)=

1

B + 1−Bk

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 20 / 25

Page 47: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

5. Experimental results

5. Experimental results (1)

Parallel speedup for the fitness function:

Amdahl’s law: S(k) =T (1)

T (k)=

1

B + 1−Bk

Experimentally, B = 0.04 so Smax = limk→∞

S(k) =1

B= 25

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 20 / 25

Page 48: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

5. Experimental results

5. Experimental results (2)Learning evaluation:

Best fitness learnt:

GA: 0.3965

PSO: 0.4029Cross-validation results:

GA best PSO bestSimilar malware samples 0.1819 0.1833

Obfuscated samples 0.8859 0.8859

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 21 / 25

Page 49: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

6. Conclusions and future work

Agenda

1 Introduction

2 Objectives

3 OpCodes Extraction and Normalization

4 Automatic Filters Selection

5 Experimental results

6 Conclusions and future work

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 22 / 25

Page 50: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

6. Conclusions and future work

Summary

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 23 / 25

Page 51: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

6. Conclusions and future work

Conclusions

n-grams are a robust way to classify programs.

Existing methods can be improved by filtering the OpCodesequences.

Bio-inspired algorithms can be used for finding good filters.

C. Opris,a (Bitdefender) Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 24 / 25

Page 52: Automatic Code Features Extraction Using Bio-inspired ... · Automatic Code Features Extraction Using ... Microsoft Portable Executable format ... Automatic Code Features Extraction

Thank you!

Questions?