improving intrusion detectors by crook-sourcing · improving intrusion detectors by crook-sourcing...

26
Improving Intrusion Detectors by Crook-Sourcing Frederico Araujo IBM Research The 35th Computer Security Applications Conference Gbadebo Ayoade, Khaled Al-Naami, Yang Gao, Kevin Hamlen, and Latifur Khan The University of Texas at Dallas The research reported herein was supported in part by ONR award N00014-17-1-2995; NSA award H98230-15-1-0271; AFOSR award FA9550-14-1-0173; NSF FAIN awards DGE-1931800, OAC-1828467, and DGE-1723602; NSF awards DMS-1737978 and MRI-1828467; an IBM faculty award (Research); and an HP grant. Any opinions, recommendations, or conclusions expressed are those of the authors and not necessarily of the aforementioned supporters.

Upload: others

Post on 24-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Improving Intrusion Detectors by Crook-Sourcing

—Frederico AraujoIBM Research

The 35th Computer Security Applications Conference

Gbadebo Ayoade, Khaled Al-Naami, Yang Gao, Kevin Hamlen, and Latifur KhanThe University of Texas at Dallas

The research reported herein was supported in part by ONR award N00014-17-1-2995; NSA award H98230-15-1-0271; AFOSR award FA9550-14-1-0173;NSF FAIN awards DGE-1931800, OAC-1828467, and DGE-1723602; NSF awards DMS-1737978 and MRI-1828467; an IBM faculty award (Research); andan HP grant. Any opinions, recommendations, or conclusions expressed are those of the authors and not necessarily of the aforementioned supporters.

Page 2: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Information Asymmetry(Kasparov vs. Deep Blue, 1997)

2

1997: IBM Deep Blue becomes the first machine to beat a chess grandmaster (Garry Kasparov) under tournament conditions.

After the match, Kasparov complains match was unfair:

“It was difficult to prepare for an opponent with no games. … I couldn’t prepare myself properly for such an event. … You have to

know your opponent!” –Garry Kasparov

In contrast, Deep Blue had trained using every match Kasparov had ever played.

IBM Research / June 29, 2018 / © 2018 IBM Corporation

Page 3: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Information asymmetry in cyber defenseAttackers have months or years to study vulnerabilities and defenses

Defenders have seconds to react to never-before-seen attacks

3IBM Research / June 29, 2018 / © 2018 IBM Corporation

3

ISTR, vol. 23, 2018

1 in 13Web requests lead to malware

Edgescan, 2019

Source: Edgescan, 2019

Ponemon, 2019

19.2% of all web application vulnerabilities high or critical (24.9% if internal networks)

Page 4: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

4IBM Research / June 12, 2019 / © 2019 IBM Corporation

§ ML offers so much promise for powerful, fast intrusion detection– Face and speech recognition, recommendation systems, natural language translation, …

§ Yet, most deployed IDS solutions are still human rule-based with weak AI support... Why?(1) Unbalanced data: Hard to get enough malicious data to properly train ML-based IDSes

(2) Huge feature space: Security-relevant features within the data not known in advance

(3) Encryption opacity: Encrypted traffic is commonplace and hides much of the best data.

(4) False alarms: High false alarm rates lead to very low base detection rates.

The task of identifying attacks is fundamentally different from other application domains where machine learning is applied

Information asymmetry & ML for intrusion detection

Page 5: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

5

Main idea:

When an attack is detected, don’t disconnect it!

Keep the attacker talking to harvest threat data.

Apply automated data mining for IDS training.

IDS learns over time with no data collection burden.

Research Question: Does such an IDS actuallylearn concepts useful for thwarting real attacks?

(Spoiler alert: Yes, with surprising effectiveness!)

IBM Research / June 29, 2018 / © 2018 IBM Corporation

crook-sourcing —noun. the conscription and manipulation of attackers into performing free penetration testing for improved IDS model training and adaptation.

Detected attacks are missed IDS training opportunities

Page 6: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Attack kill chain: a vicious cycle

6IBM Research / June 12, 2019 / © 2019 IBM Corporation

secrets

attack

Page 7: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Attack kill chain: a vicious cycle

7IBM Research / June 12, 2019 / © 2019 IBM Corporation

attack

reject

§ facilitates low-risk reconnaissance§ accentuates the information and time asymmetry that favors attackers§ amplifies the impact of n-day exploits

conventional software security patches advertise themselves to attackers

Page 8: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Enhancing IDSes through crook-sourcing

8IBM Research / June 12, 2019 / © 2019 IBM Corporation

attack

fake secrets

software security patches repurposed as feature extractors

Page 9: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Crook-sourcing advantages

9IBM Research / June 12, 2019 / © 2019 IBM Corporation

§ Deceive attackers into performing free penetration testing for IDS model training and adaptation– attackers contribute their TTP patterns to the data streams processed by the

embedded deceptions– automatically labels malicious attacker behavior

§ Enables (semi-) supervised learning for intrusion detection – improves base detection rates– enables multi-class detection and contextually-richer predictions

§ Overcomes issues related to concept differences between honeypot attacks and those against genuine assets– deceptions are embedded into the actual target of attacks

Page 10: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

System architecture

10IBM Research / June 12, 2019 / © 2019 IBM Corporation

UserAttacker

monitoring stream

honey-patched

anomalydetector audit stream

attack traces

Page 11: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

System architecture

11IBM Research / June 12, 2019 / © 2019 IBM Corporation

attack detectionattack modeling

featureextraction

data queueing

audit stream

attack traces feature

extraction

attackdata

audit data

attack model classifier

model update

featureextraction

monitoring stream

monitoring data

alerts

UserAttacker

monitoring stream

honey-patched

anomalydetector audit stream

attack traces

Page 12: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Feature set models

12IBM Research / June 12, 2019 / © 2019 IBM Corporation

§ Network features (Bi-Di)− Packet length− Uni-burst size, time, count− Bi-burst size, time

§ System features (N-Gram)− System calls: enter or exit− Bi-, tri-, and quad-events

Page 13: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Attack detection (model 1)

13IBM Research / June 12, 2019 / © 2019 IBM Corporation

§ Bi-Di-SVM: Network features + SVM§ N-Gram-SVM: System features + SVM§ Ens-SVM: ensemble

Page 14: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Attack detection (model 2)

14IBM Research / June 12, 2019 / © 2019 IBM Corporation

§ Bi-Di-OML: Network features + OAML + k-NN§ N-Gram-OML: System features + OAML + k-NN

Page 15: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Online Adaptive Metric Learning

15IBM Research / June 12, 2019 / © 2019 IBM Corporation

Page 16: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Experimental framework

16IBM Research / June 12, 2019 / © 2019 IBM Corporation

UserAttacker

monitoring stream

honey-patched

anomalydetector audit stream

attack traces

UserAttacker

monitoring stream

honey-patched

anomalydetector audit stream

attack tracesUser

Attacker

monitoring stream

honey-patched

anomalydetector audit stream

attack traces

red teaming

Page 17: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Vulnerability Classes

17IBM Research / June 12, 2019 / © 2019 IBM Corporation

Page 18: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Dataset

18IBM Research / June 12, 2019 / © 2019 IBM Corporation

§ Raw data: 42 GB of (uncompressed) network packets and system events over a period of three weeks

§ Training data: after feature extraction, the training data comprised 1800 normal instances and 1600 attack instances

§ Testing data: 3400 normal and attack instances gathered from monitors deployed at unpatched servers, where the distribution of normal and attack instances varies per experiment

§ Red teaming data: collected over three days, 10 graduate students with basic to advanced offensive security skills, average 45 min sessions.

Page 19: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Detection accuracies on simulated environment

19IBM Research / June 12, 2019 / © 2019 IBM Corporation

Bi-Di:networkfeaturesN-Gram:systemfeatures

Page 20: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Red teaming validation

20IBM Research / June 12, 2019 / © 2019 IBM Corporation

Bi-Di:networkfeaturesN-Gram:systemfeatures

Page 21: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

False positive rate reduction

21IBM Research / June 12, 2019 / © 2019 IBM Corporation

Page 22: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Crook-sourcing advantage

22IBM Research / June 29, 2018 / © 2018 IBM Corporation

50

55

60

65

70

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

accu

racy

(%)

no deception deceptive defense

Experiments on synthetic dataapproximating numerous attackers

number of attack classes

Page 23: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Human subject evaluation: a cautionary tale

23IBM Research / June 12, 2019 / © 2019 IBM Corporation

50

55

60

65

70

75

80

85

90

95

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1650556065707580859095

100

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

accu

racy

(%)

no deception deceptive defense

Experiments on synthetic dataapproximating numerous attackers

Experiments with 10 actualhuman attackers (students)

number of attack classes

Page 24: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Monitoring performance

24IBM Research / June 12, 2019 / © 2019 IBM Corporation

Host:16cores,24GBRAM,64-bitUbuntu16.04LTSBenchmarkprofile:c=10,500req/s

Page 25: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Conclusions

25IBM Research / June 12, 2019 / © 2019 IBM Corporation

§ Crook-sourcing yields higher-accuracy detection models– no additional developer effort apart from routine patching activities– effortless labeling of the data

§ Deceive attackers into disclosing their TTP patterns for IDS model evolution– embedded deceptions extract relevant features from attack sessions

§ Enables semi-supervised learning for intrusion detection – Improves base detection rates– Enables multi-class detection and contextually-richer predictions

Page 26: Improving Intrusion Detectors by Crook-Sourcing · Improving Intrusion Detectors by Crook-Sourcing — FredericoAraujo IBM Research The 35th Computer Security Applications Conference

Thank you

26

Frederico Araujo—[email protected]/faraujo

IBM Research / June 12, 2019 / © 2019 IBM Corporation ©D. Kirat