sate 2010 background vadim okun, nist vadim.okun@nist.gov october 1, 2010 the samate project

SATE 2010 Background

Vadim Okun, NISTvadim.okun@nist.gov

October 1, 2010

The SAMATE Projecthttp://samate.nist.gov/

Cautions on Using SATE Data• Our analysis procedure has limitations• In practice, users write special rules, suppress

false positives, and write code in certain ways to minimize tool warnings.

• There are many other factors that we did not consider: user interface, integration, etc.

• So do NOT use our analysis to rate/choose tools

Overview

X Human findings

CVE entriesProgram

BufLeakRace…

Security?Quality?Insignificant?False??

Tool ATool BTool C

Tools that work on source code

SATE 2010 timeline

• Choose test programs (2 C, 1 C++, 2 Java). Provide them to tool makers (28 June)

• Teams run their tools, return reports (30 July)• Analyze the tool reports (22 Sept)• Report at the workshop (1 Oct)• Teams submit a research paper (Dec)• Publish data (between Feb and May 2011)

Participating teams• Armorize CodeSecure• Concordia University Marfcat• Coverity Static Analysis for C/C++• Cppcheck• Grammatech CodeSonar• LDRA Testbed• Red Lizard Software Goanna• Seoul National University Sparrow • SofCheck Inspector• Veracode

– a service company

Test cases

• Dovecot: secure IMAP and POP3 server – C• Pebble: weblog server – Java• Wireshark - C• Google Chrome – C++• Apache Tomcat - Java

• All are open source programs• All have aspects relevant to security• From 30k LoC (Pebble) to 4.7M LoC (Chrome)

CVE-based pairs(vulnerable and fixed)

Tool reports

XML HTML DB

Original tool formats• Teams converted reports to

SATE format– Several teams also provided

original reports

• Described environment in which they ran tool

• Some teams tuned their tools• Several teams provided

analysis of their tool warnings

SATEformat

Analysis procedure

Tool warnings~60K warnings

Selectrandomly

Relatedto humanfindings

3 Selection Methods

Analyze forcorrectness

and associate

Analyzethe data

Selectedwarnings

Relatedto CVEs

Method 1 – Warning SelectionFor Dovecot and Pebble only

• We assigned severity if a tool did not

• Mostly avoid warnings with severity 5 (lowest)

• Statistically select from each warning class

• Select more warnings from higher severities

• Select 30 warnings from each of 10 tool reports– 1 report had only 6 warnings– Did not analyze Marfcat warnings

• Total is 276

Method 2 – Human findingsFor Dovecot and Pebble only

• Security experts analyze test cases

• A small number of findings– Root cause, with an example trace

• Find related warnings from tools

• Goal: focus our analysis on weaknesses found most important by security experts

Method 3 - CVEsFor Wireshark, Chrome, and Tomcat

• Identify the CVEs– Locations in code

• Find related warnings from tools

• Goal: focus our analysis on real-life exploitable vulnerabilities

Correctness categories

• True security weakness• True quality weakness• True but insignificant weakness• Weakness status unknown• Not a weakness

Differences from SATE 2009

• Add CVE-selected test cases

• Include a C++ test case

• Larger test cases: Chrome - 4.7 MLOC

• More correctness categories (true quality)

• More detailed guidelines for analysis

• Still, much can be improved…

Thanks

• Romain Gaucher, Ram Sugasi

• Aurelien Delaitre, Sue Wang, Paul Black, Charline Cleraux, and other SAMATE team members

• Most of all, the participating teams!

Questions

• What weaknesses exist in real programs?• What do tools report for real programs?• Do tools find important weaknesses?

• Focus on tools that work on source code• Defects that may affect security• Goal is NOT too choose the “best” tools• This is the 3rd SATE (1st in 2008)

SATE goals

• To enable empirical research based on large test sets • To encourage improvement and adoption of tools

• NOT to choose the “best” tools

SATE common tool output format

<name cweid=“79”>SQL Injection</name>

<output> Query is constructed

with user supplied input … </output>

</weakness>

one or moretraces

1 to 5, with 1 – the highest

optional

that it is true

…and other annotation

Lessons learned

• Guidelines for analysis often ambiguous – need to be refined even more

• Our analysis has inconsistencies and lapses

• Careful analysis takes longer than expected– We do not know the code well

• Tool interface is important to understand a weakness

Analysis procedure

• We cannot know all weaknesses in the test cases• Impractical to analyze all tool warnings

So analyze the following:

• Method 1. A subset of warnings from each tool report

• Method 2. Tool warnings related to manually identified weaknesses

SATE tool output format

• Common format in XML• For each weakness

– One or more trace - locations - line number and pathname

– Name of weakness and (optional) CWE id

– Severity: 1 to 5 (ordinal scale), with 1 – the highest

– Probability that the problem is true positive

– Original message from the tool

– And other annotation

Our analysis

• Correctness of warning

• Associate warnings that refer to the same (or similar) weakness

sate 2010 background vadim okun, nist vadim.okun@nist.gov october 1, 2010 the samate project

tools slide

weakness slide

cves slide

related warnings

tool b tool c x tools

fixed slide

improved slide

tool makers

Documents

financial assistance basics - nist.gov

ley de okun en el peru

eksistensi hukum okun di indonesia

legea lui okun in franta 1975-2009

jenny okun: dreamscape photo art art junkie.pdf · jenny...

okun interviewi 19690320

nist.gov sp250 37

leon okun interview

safety overview for vcat - nist.gov

nbs handbook 137 - nist.gov

session 2a gordon - nist.gov

medicine 2.0 okun

connect:direct secure+ option - nist.gov - computer security

okun´s law

coeficiente okun ecuador 3er encuentro[1]

ieee 1073 testing mary brady rick rivello nist...

fair geyer okun. cap1 5,8

conformance lynne s. rosenthal mary brady national institute...

michael kass michael.kass@nist.govmichael.kass@nist.gov han...

virtual cement - nist.gov