transmogrifying other peoples’ marketing into … · aliases (sofacy and tg-4127) and 2 are...

36
TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTOTHREAT HUNTINGTREASURES USING MACHINE LEARNING MAGIC AN EXPLORATION OF NATURAL LANGUAGE TECHNIQUES FOR THREAT INTELLIGENCE

Upload: others

Post on 06-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

TRANSMOGRIFYING OTHER PEOPLES’ MARKETING

INTO THREAT HUNTING TREASURES USING MACHINE LEARNING MAGIC

AN EXPLORATION OF NATURAL LANGUAGE TECHNIQUES FOR THREAT INTELLIGENCE

Page 2: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

• Windows Defender Research team

• Past: Threat Intelligence and APT response

• Present: Builds algorithms to classify malware in real-time

• Future: ??

אניBhavna Soman

Security Researcher

Page 3: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Threat Analysis is largely manual work

• Some Reverse Engineering• Some Network and Endpoint Forensics

• Lots of… READING!!

(besides IOC feedsAnd pew pew maps)

Alphaspirit/Shutterstock.com

Page 4: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

https://www.fireeye.com/content/dam/fireeye-www/global/en/current-threats/pdfs/rpt-apt28.pdf

https://www.csoonline.com/article/3321911/security/russian-cozy-bear-apt-29-hackers-may-be-impersonating-state-department.html

https://www.symantec.com/blogs/threat-intelligence/dragonfly-energy-sector-cyber-attacks

Page 5: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

APT 28Data Obfuscation, Connection Proxy, Standard Application Layer Protocol, Remote File Copy, Rundll32 ,Indicator Removal on Host, Timestomp, Credential Dumping, Screen Capture, Bootkit, Component Object Model Hijacking, Exploitation for Privilege Escalation, Obfuscated Files or Information, Input Capture, Replication Through Removable Media, Communication Through Removable Media, Pass the Hash, Data Staged, Data from Removable Media, Peripheral Device Discovery, Access Token Manipulation, Valid Accounts, Office Application Startup, System Owner/User Discovery, Process Discovery, System Information Discovery, File Deletion, Credentials in Files, File and Directory Discovery, Network Sniffing, Dynamic Data Exchange, Data from Local System, Hidden Files and Directories, Scripting, Logon Scripts, Spearphishing Attachment, Deobfuscate/Decode Files or Information, Exploitation of Remote Services, Exploitation for Defense Evasion, Data from Information Repositories, User Execution

APT 29PowerShell, Scripting, Indicator Removal on Host, Software Packing, Scheduled Task, Registry Run Keys / Start Folder, Bypass User Account Control, Windows Management Instrumentation Event Subscription, Windows Management Instrumentation, Pass the Hash, Accessibility Features, Domain Fronting ,Multi-hop Proxy, Spearphishing Attachment, Spearphishing Link, Exploitation for Client Execution, User Execution

Shoebox of Tools, Tactics and Procedures

DragonflyScreen Capture, PowerShell, Remote File Copy, File Deletion, Create Account, Disabling Security Tools, External Remote Services, Brute Force, Credential Dumping, Scripting, Masquerading, Indicator Removal on Host, Web Shell, Commonly Used Port, Email Collection, Remote Desktop Protocol, Network Share Discovery, Scheduled Task, Forced Authentication

Page 6: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Graph of Techniques used by /bear/APTs

Dragonfly

APT29

APT28

https://mitre-attack.github.io/caret/#/

Page 7: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Graph of Techniques used by /bear/ APTs with prevalence of each technique

Prioritize

DeprioritizeDragonfly

APT29

APT28

https://mitre-attack.github.io/caret/#/

Page 8: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

pag

Can this be done by machine learning?

Input {black box} Output

• Written material• Blogs• Whitepapers• Incident Response

reports

• Extract actor names• Extract tools names• Extract techniques• Extract relationships

• Attacker graphs• Timelines

Page 9: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

• Introduce the idea of Named Entity Extraction

• Build a machine learning/deep learning based Cyber Entity Extractor

• Training Data

• Feature Extraction

• Architecture and Models

• Evaluation

• Demo

• Driving Impact

pag

Agenda

Page 10: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

What is Named Entity Extraction?

Page 11: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Israeli : GEOPOLITICAL ENTITY, GPE

Static: PERSON

Ben El Tavori: PERSON

American: GEOPOLITICAL ENTITY, GPE

Capitol Records: ORGANIZATION

Dropping Elephant: BADACTOR

Chinastrats: BADACTOR

Patchwork: BADACTOR

spear-phishing: TECHNIQUE

watering hole: TECHNIQUE

https://www.timesofisrael.com/pop-duo-static-ben-el-sign-with-major-us-record-label/ https://securelist.com/the-dropping-elephant-actor/75328/

Page 12: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

pag

Training Data Feature Extraction

Architecture Assessments

Add a footer

Training our own Cyber Entity Extractor

Page 13: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Training Data

• APT Notes

• Public threat intelligence blogs collected since June 2018

• 2704 Documents

• On average, ~1% of the tokens are “interesting”

Page 14: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Training Data

• Labels: Caret Dataset (MITRE)

• Automatically annotated using longest extent pattern matching

• Kinda noisy, but best we can do short of manual annotation

Page 15: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Numbered Panda (also known as IXESHE, DynCalc, DNSCALC, and APT12) is a cyber espionage group believed to be linked with the Chinese military.

('Numbered', 'B-BADACTOR’), ('Panda', 'I-BADACTOR’), ('(', 'O’), ('also', 'O’), ('known', 'O’), ('as', 'O’), ('IXESHE', 'B-BADACTOR’),(',', 'O’), ('DynCalc', 'B-BADACTOR’), (',', 'O’),('DNSCALC', 'B-BADACTOR’), (',', 'O’),('and', 'O’),('APT12', 'B-BADACTOR’),(')', 'O’), ('is', 'O’), ('a', 'O’), ('cyber', 'O’),

Training Data

('espionage', 'O’), ('group', 'O’), ('believed', 'O’), ('to', 'O’), ('be', 'O’), ('linked', 'O’),('with', 'O’),

('the', 'O’), ('Chinese', 'O’), ('military', 'O’),('.', 'O')

IOB Style

Page 16: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

pag

Add a footer

Training our own Cyber Entity Extractor

Training Data Feature Extraction

Architecture Assessments

Page 17: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

• Traditional

• Word itself

• Part of speech

• Lemma

• Word type (alphanumeric, digits, punctuation)

• Orthographic features (lowercase, ALLCAPS, Upper initial, MiXedCaPs etc.)

• Unsupervised

• Word Embeddings

Word Embeddings

pag

Feature Extraction

Page 18: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Generic Word Embeddings

Page 19: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

pag

apt29

sofacy

dymalloy

apt28

tg-4127

Of the 5 vectors closest to “apt28”, 2 are aliases (sofacy and tg-4127) and 2 are related by attribution

Page 20: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Dogcall, ruhappy, pooraim and shutterspeed are all malware used by APT37

ruhappydogcall

shutterspeed

pooraim

slowdrift

Page 21: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Tokens occurring in techniques

Tokens occurring in Actor Names

Tokens occurring in Malware Names

Page 22: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

pag

Add a footer

Training our own Cyber Entity Extractor

Training Data Feature Extraction

Architecture Assessments

Page 23: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

page 23

Architecture

HTML and PDF Parsers

Text Pre-processing

Word embeddings(word2vec)

Blogs,White Papers,TI Marketing

Labeled trainingdata

Labeled testdata

Custom Feature Extraction

Word Embedding Features

Model (CRF or LSTM)

Custom Feature Extraction

Word Embedding Features

FITTEST

Page 24: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

• Statistical modeling method

• Not Deep Learning

• Used for sequence labeling tasks.

• Has short term memory

• Commonly used in Natural Language processing, biological sequences and computer vision

• 2 Experiments with CRF (one with and one without the word embeddings)

Conditional Random Fields (CRF)

Page 25: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Long Short-term Memory (LSTM)

• Special type of RNN

• 2 Stacked Bidirectional LSTM Layers

• With Dropout

• Categorical Cross Entropy Loss Function

• Softmax activation for the final layer

• Keras + tensorflow

Embedding Layer Output Shape: None, 75, 100

Bidirectional (LSTM) Output Shape: None, 75, 300

Dropout Output Shape: None, 75, 100

Bidirectional (LSTM) Output Shape: None, 75, 300

Dropout Output Shape: None, 75, 100

Timedistributed (dense) Output Shape: None, 75, 300

Page 26: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

pag

Add a footer

Training our own Cyber Entity Extractor

Training Data Feature Extraction

Architecture Assessments

Page 27: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

pag

Assessment

3500

2417

966

2396

2087

840

2294

2071

876

3009

1885

942

0 500 1000 1500 2000 2500 3000 3500 4000

Actor

Malware

Technique

Recall CountsLSTM

CRF Without Embeddings

CRF With Embeddings

Actual

Page 28: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Assessment

0.99

0.76

0.98

0.74

0.93

0.93

POSI T I VE PRE C I S I ON POSI T I VE RE C ALL

OVERALL PRECISION AND RECALL

CRF With Embeddings CRF Without Embeddings LSTM

1

0.22

0.47

0.03

0.84

0.07

POSI T I VE PRE C I S I ON POSI T I VE RE C ALL

PRECISION AND RECALL FOR UNSEEN TOKENS

CRF With Embeddings CRF Without Embeddings LSTM

Precision = TP/(FP+TP)Recall = TP/(TP+FN)

Page 29: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Demo

udi Steinwell via PikiWiki - Israel free image collection project. CC Attribution 2.5 Generic

Page 30: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used
Page 31: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Still to do…

• Attention networks

• Data Augmentation

• Sophisticated Relationship Extraction

• Temporal relationships

CCO 1.0 https://picryl.com/media/fish-kids-clip-art-people-b9882d

Page 32: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Driving Impact

Page 33: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Graph of Techniques used by /bear/ APTs with prevalence of each technique

Prioritize

DeprioritizeDragonfly

APT29

APT28

https://mitre-attack.github.io/caret/#/

Page 34: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

Overlap in techniques used by commodity malware vs APTs

Emissary Panda

Emotet

Saffron Rose

Muddywater

Stone Panda

Snake

Hangover

Page 35: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

• Move beyond IOC feeds

• Rich unstructured data can be extracted with Machine Learning

• Graphs

• Timelines

• We can use this to make better decisions to improve security of our orgs

In conclusion

Page 36: TRANSMOGRIFYING OTHER PEOPLES’ MARKETING INTO … · aliases (sofacy and tg-4127) and 2 are related by attribution Dogcall, ruhappy, pooraim and shutterspeed are all malware used

• Contributors:

• Arun Gururajan, Daewoo Chong and Jugal Parikh for Data Science Expertise

• Peter Cap and Jessica Payne for Threat Intelligence Expertise

• Chris Ackerman for the demo website

• Karen Lavi for encouragement, better presentation

• @bsoman3

[email protected]

Acks/Q&A/Thanks

CC BY 2.0 https://www.flickr.com/photos/signote/6102820227