session id: air-w02 predicting … · -cvss temporal-remote code...

SESSION ID:

#RSAC

Michael Roytman

PREDICTING EXPLOITABILITY -FORECASTS FOR VULNERABILITY MANAGEMENT

AIR-W02

Chief Data ScientistKenna Security@mroytman

#RSAC

“Prediction is very difficult, especially about the future.”

-Niels Bohr

#RSAC

3 Types of Data-Driven

3

#RSAC

THE PROBLEM

4

Too many vulnerabilities.How do we derive risk from vulnerability in a data-driven manner?

#RSAC

EXPLOITABILITY

5

1. RETROSPECTIVE2. REAL-TIME3. PREDICTIVE

#RSAC

EXPLOITABILITY

6


#RSAC

Retrospective Model: CVSS

7

Analyst InputVulnerability Management Programs Augmenting Data

Temporal Score Estimation

Vulnerability Researchers

#RSAC

EXPLOITABILITY

8


#RSAC

Real-Time - The Data

9

Vulnerability Scans (Qualys, Rapid7, Nessus, etc):• 7,000,000 Assets (desktops, servers, urls, ips, macaddresses)• 1,400,000,000 Vulnerabilities (unique asset/CVE pairs)

Exploit Events - Successful Exploitations• ReversingLabs’ backend metadata

• Hashes for each CVE• Number of found pieces of malware corresponding to each hash

• Alienvault Backdoor• “attempted exploits” correlated with open vulnerabilities

#RSAC

Attackers Are Fast

10

#RSAC

Positive Predictive Value of Remediating:

11

#RSAC

DATA OF FUTURE PAST

12

Q: “Of my current vulnerabilities, which ones should I remediate?”

A: Old ones with stable, weaponized exploits

#RSAC

FUTURE OF DATA PAST

13

Q: “A new vulnerability was just released. Do we scramble?”

A:

#RSAC

EXPLOITABILITY

14


#RSAC

Learning Machine Learning

15

#RSAC

The Future

17

•Classification: output is qualitative

•prediction:

“Will this vulnerability have an exploit written for it?”(== cause more risk later)

Enter: AWS ML

#RSAC

The Data

19

N = 81303

All CVE. Described By:1. National Vulnerability Database2. Common Platform Enumeration3. Occurrences in Kenna Scan DataLabelled as Exploit Available/Not:1. Exploit DB2. Metasploit3. D2 Elliot/Canvas4. Blackhat Exploit Kits

#RSAC

70% Training, 30% Evaluation Split

N = 81303

All Models:

20

L2 regularizer

1 gb

100 passes over the data

Receiver operating characteristics for comparisons

#RSAC

Predictive - The Expectations

21

Distribution is not uniform. 77% of dataset is not exploited1. Accuracy of 77% would be bad

Precision matters more than Recall1. No one would use this model absent actual exploit available data.2. False Negatives matter less than false positives - wasted effort

We are not modeling when something will be exploited, just IFCould be tomorrow or in 6 months. Re-run the model every day

#RSAC

Model 1: Baseline

22

-CVSS Base-CVSS Temporal-Remote Code Execution-Availability-Integrity-Confidentiality-Authentication-Access Complexity-Access Vector-Publication Date

LMGTFY:

Moar Simple?

Measuring Performance

#RSAC

Model 2: Patches

26

-CVSS Base-CVSS Temporal-Remote Code Execution-Availability-Integrity-Confidentiality-Authentication-Access Complexity-Access Vector-Publication Date-Patch Exists

#RSAC

Model 3: Affected Software

27

-CVSS Base-CVSS Temporal-Remote Code Execution-Availability-Integrity-Confidentiality-Authentication-Access Complexity-Access Vector-Publication Date-Patch Exists-Vendors-Products

#RSAC

Model 4: Words!

28

-CVSS Base-CVSS Temporal-Remote Code Execution-Availability-Integrity-Confidentiality-Authentication-Access Complexity-Access Vector-Publication Date-Patch Exists-Vendors-Products-Description, Ngrams 1-5

#RSAC

Model 5: Vulnerability Prevalence

29

-CVSS Base-CVSS Temporal-Remote Code Execution-Availability-Integrity-Confidentiality-Authentication-Access Complexity-Access Vector-Publication Date-Patch Exists-Vendors

-Products-Description, Ngrams 1-5-Vulnerability Prevalence-Number of References

#RSAC

Model 6: ”Somewhat Likely”

30

#RSAC

Model 6: ”Highly Likely”

31

#RSAC

Model 6: ”Most Likely”

32

#RSAC

33

-Track Predictions vs. Real Exploits

-Integrate 20+ BlackHat Exploit Kits - FP reduction?

-Find better vulnerability descriptions - mine advisories for content? FN reduction?

Future Work

-Predict Breaches, not Exploits

-Attempt Models by Vendor

#RSAC

34

Too many vulnerabilities.How do we derive risk from vulnerability in a data-driven manner?

PROBLEM

#RSAC

35

1. Gather data about known successfulattack paths2. Issue forecasts where data is lacking in order to predict new exploits3. Gather MORE data about known successful attack paths

SOLUTION

#RSAC

36

2. When Risk is Rare, Precision is Difficult

Takeaways1. Simple, Power Questions make Machine Learning Useful in Security

3. When Precision is Difficult, Be Smart about Tradeoffs

#RSAC

38

The Takeaway

____ Machine Learn!

#RSAC

Putting It All Together

39

Thank You for waking up so early for this!

@mroytman

www.kennasecurity.com

session id: air-w02 predicting … · -cvss temporal-remote code...

Documents