building & leveraging white database for antivirus testing

23
Building and Leveraging a Whitelist Database for Anti-Virus Testing Mario Vuksan, Director, Knowledgebase Services

Upload: frisksoftware

Post on 18-Nov-2014

3.720 views

Category:

Technology


0 download

DESCRIPTION

Presented at the International Antivirus Testing Workshop 2007 by Mario Vuksan, Director, Knowledgebase Services, Bit9

TRANSCRIPT

Page 1: Building & Leveraging White Database for Antivirus Testing

Building and Leveraginga Whitelist Database for Anti-Virus TestingMario Vuksan, Director, Knowledgebase Services

Page 2: Building & Leveraging White Database for Antivirus Testing

Agenda• Growing Signature/Definition

Problem• Building a Global Whitelist• Leveraging a Global Whitelist• QA

Page 3: Building & Leveraging White Database for Antivirus Testing

Growing Signature Problem• Cumulative unique variants have grown ten-fold

over last 5 years (Yankee Group)• “Denial-Of-Service” Attacks: Malware changing

signature every 10 minutes

• Solutions– Heuristic & Behavioral Detections

• New Problem: High “False Positive” Count

Page 4: Building & Leveraging White Database for Antivirus Testing

Whitelist: a Google-sized ProjectSizing Software Universe

• Number of Files Released Daily by:• Microsoft – 500K / IBM – 100K / Sourceforge – 500K / Mozilla.Org – 250K

• More Components, Daily Builds, Auto Updaters

• 2.7B Files Indexed, heading for 10B• 30TB of Installers, heading for 100TB• Daily acquiring 50M File Records, ¼ of YouTube• Tracking 20,000 Software Companies

– E.g. DMOZ tracks 200,000+ Entities

100 TB

June2005

30M

300M

3B

10B

FilesIndexed

March2006

May2007

Dec2007

1 TB

8 TB

30 TB

Storage

Page 5: Building & Leveraging White Database for Antivirus Testing

Mechanics of a Whitelist

Collect

Extract

Analyze

Software Infrastructure

Hardware Infrastructure

Publish (Interfaces)

Consumers

Outbound Metadata Inbound User Metadata

Page 6: Building & Leveraging White Database for Antivirus Testing

Building a Whitelist• Trusted Partners

– Benefits• Trusted Source of Binary Material• In-depth Information on the Binary Data

Indexed– Realities

• Expensive Partner Programs• Complicated Applications• Lack of Interest• Lack of Comprehensive Repositories

Page 7: Building & Leveraging White Database for Antivirus Testing

Certifying Software– Certificate Mechanism

• As a Component for Validation• Costly Process, Cumbersome for QA

Departments• Great When Seen on Shareware Sites Less than 10% Penetration

– First-Seen Date• Microsoft & Shared Installer Components• Long Time & No Detection Likely Good

Page 8: Building & Leveraging White Database for Antivirus Testing

Challenges of Software Acquisition

• Buying/Getting Physical Media– Retail Prices vs. Ebay– How to process 35K DVDs?

• FTP Sites• Web Sites

– Simple: Links and Forms– Complicated: Javascript– Super Complicated: Frames and AJAX

• Shareware Sites• Warez

– Legal Ramifications– Users vs. Collectors

Page 9: Building & Leveraging White Database for Antivirus Testing

Harvesting The Internet• Order of Difficulty

– FTPs – Wget, Curl– Simple HTTPs – Open Source Spiders– Try Grabbing Download.com– Try Grabbing Downloads.microsoft.com– Try Grabbing Canon or any Driver Site

• Datacenter Requirements

Page 10: Building & Leveraging White Database for Antivirus Testing

Assuring Software is Trustworthy• Anti-Malware Scanning

– Name and Type Normalization• Behavior Scanning• Code Inspection• External Meta Data Collection and Matching

Page 11: Building & Leveraging White Database for Antivirus Testing

Software Analysis Results• Basic Embedded Data• PE Header Analysis

– Processor, Language, Binary Type• Packers and Protectors

– 500+ Variants– ASPack and Adobe– PECompact and Google

• Install Formats– Proprietary (like Skype)– Binary Diffs (Patch Factory, MS PSF)

• Runtime Analysis and Sandboxing

Page 12: Building & Leveraging White Database for Antivirus Testing

Software Classifications• Classifying Source

– Trust-based vs. Type-based• Classifying Files

– Functional (Font, Driver, Screensaver) vs. Descriptive • Classifying Products

– Basic• Open Source• Commercial: Driver vs. Application• IM / P2P / Games

– Better• Malware Classifications

– Interesting• Steganography/Watermarking/Hacking/Hiding

Page 13: Building & Leveraging White Database for Antivirus Testing

Industry & Government Certifications

• Government Certifications– NIAP, FIPS, DCTS

• Vulnerability Reports– CVE, CERT, SANS, MSB, etc.

• For Good Software:– Certification Programs

• Built for Vista, Windows Certified, Java Approved– eTrust Download

• For Malware:– StopBadware, CME

Page 14: Building & Leveraging White Database for Antivirus Testing

Leveraging the WhitelistDistribution of language

85%

2%1%

1%1%1%1%1%1%1%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%0%

English (U.S.)JapaneseChinese (Traditional)Chinese (Simplified)KoreanGermanItalianFrenchSpanishPortuguese (Brazil)DutchPolishTurkishRussianSwedishCzechDanishNorwegian BokmalFinnishHungarianGreekPortuguese (Portugal)HebrewArabicEnglish (Canadian)SlovakSlovenianBasqueCatalanCroatianBulgarianUkrainian

Page 15: Building & Leveraging White Database for Antivirus Testing

PE Header Subsystem

Distribution of subsystem

65%

29%

5% 1%0%0%0%0%0%

The Windows graphical userinterface (GUI) subsystem

The Windows character subsystem

Device drivers and native Windowsprocesses

Windows CE

The Posix character subsystem

Unknown subsystem

An Extensible Firmware Interface(EFI) application

An EFI driver with boot services

An EFI driver with run-time services

Page 16: Building & Leveraging White Database for Antivirus Testing

Other PE Header Data

Percentage of .NET Applications (based on COR20 header)

6%

94%

.NET application

Others

Percentage of binaries recoganized as DLLs (based on file characteristics bitmask)

76%

24%

DLL

Others

Percentage of binaries with bounded import table

29%

71%

Bounded Import Table

Unbounded

Distribution of machine code

87%

8%

4%

1%0%0%0%0%0%0%0%0%0%

Intel 386 or later processors andcompatible processors

Intel Itanium processor family

AMD64

Alpha_AXP

MIPS little endian

Power PC little endian

ARM little endian

Thumb

Hitachi SH3

MIPS with FPU

Hitachi SH4

MIPS16

Page 17: Building & Leveraging White Database for Antivirus Testing

What about False Positives?

• Typical Suspects:– Internet Explorer– Drivers (Network, File Access)– OS Components– Universal Installer and Uninstaller

Components• Optimized Applications:

– Using Obscure Third-Party Software– ASPack, PECompact, Themida

Page 18: Building & Leveraging White Database for Antivirus Testing

Archive Format Distribution• Most popular archive/packer formats

ARC/GZIP44%

ARC/MSCAB27%

ARC/ZIP7%

ARC/BZIP27%

ARC/TAR6%

SFX/MSCAB2%

ARC/LZ1%

SFX/UPX1%

ARC/MSI1%

SFX/MSDelta1%

ARC/PSF1%

ARC/RAR0%

ARC/ISCAB0%

SFX/ZIP0%

SFX/Nullsoft0%

SFX/RAR0%

SFX/IS0%

SFX/WISE0%

ARC/ISO0%

ARC/7ZIP0%

SFX/WISE/Embedded

0%

UPX 0.8x - 2.xx0%

ASPack 2.120%

SFX/BZIP20%

PECompact 2.xx0%

- ASPack 2.112.11d

0%

ARC/PSF0%

SFX/NOS0%

ARC/UDF0%

ARC/WIM0%

ARC/PSF0%

ARC/MSCAB0%

ASPack 2.10%

ARC/MSCAB0%

ASPack 2.110%

UPX 0.8x - 2.xx0%

PECompact 1.681.76 -

0%

- ASPack 2.112.11d

0%

ASPack 2.120%

ASPack 1.08.030%

ASPack 1.07b0%

PECompact 2.xx0%

ASPack 2.0000%

- WinUPack 0.370.390%

ARC/WIM0%

SFX/7ZIP0%

- WinUPack 0.280.3x0%

- ASPack 1.06b1.061b

0%

ASPack 1.08.020%

ASPack 2.120%

- ASPack 2.112.11d

0%

ASPack 2.0010%

Private exeProtector 2.0

0%

CExe 1.0a0%

PE Pack 1.00%

PECompact 1.301.32 -

0%

PECompact 2.xx0%

PC Guard 5.000%

UPX 0.720%

UPX 0.8x - 2.xx0%

Private exeProtector 2.0

0%

ASPack 2.10%

- ASPack 1.08.001.08.01

0%

ASPack 2.0000%

ASPack 1.08.030%

ASPack 1.08.040%

Page 19: Building & Leveraging White Database for Antivirus Testing

Or Are They False Positives?(FTP Injection Attacks)

• HP

Page 20: Building & Leveraging White Database for Antivirus Testing

Or Are They False Positives?(FTP Injection Attacks)• Nero AG

Page 21: Building & Leveraging White Database for Antivirus Testing

Vertical Detection• Malware Sample Vertical File Detection

Chart

• Good File Vertical Analysis• Anti-Malware Reports per Web Site

– Bit9 ISV Safe Software Program

Page 22: Building & Leveraging White Database for Antivirus Testing

Use Case: Anti-Malware• Benefits

– R&D Tool•Packers, Metadata, Sources

– QA Tool•False Positives

– Performance Accelerator•Robin Bloor’s AVID•Next Generation Anti-Malware

Page 23: Building & Leveraging White Database for Antivirus Testing

About Bit9• What We Do:

– Application and Device Control Solutions and Software Metadata Reporting

• What We Offer:– Bit9 Parity Protects against Malicious Software and Data

Leakage– The Bit9 Knowledgebase is the Largest Collection of

Actionable Intelligence about the World’s Software• Background

– Founded in 2002 by founders of Okena (Cisco)– $2 Million NIST ATP Grant in 2003– Headquartered in Cambridge, Mass.– Venture Funded