malicious client detection using machine learning

Malicious Client Detection using machine learning SATYAM SAXENA

Threats•There are many types of malware for all types of devices and operating systems

•Most if not all malware relies on a support system – command and control infrastructure

•Bad guys use DNS to scale and hide their C&C infrastructure

•Bad guys use DNS for C&C to bypass corporate security (tunneling)

•Bad guys use cloud providers to roll out, scale, manage and quickly move their C&C Infrastructure

Without reliance on any particular end point operation system or configuration, we can use big data analytics on network data to detect malware.

Malware use of DNS

rndruppbakyokv[.]com

1.2.3.4

rndruppbakyokv[.]com

1.2.3.4

Command andControl

Infrastructure

CommunicationChanel with C&C is established. Compromised device receives updates, instructions, targets.

DNS Server

DNS Server

End point device

RawpDNS

Domain Nameclassifier

DNS Resolverclassifier

Device Behavior classifier

Compromised Device(Security Event)

classifier

MaliciousDomains

MaliciousResolvers

Behavior Anomalies

Machine Learning Pipeline

DGA Network Time

Tunnel

Network Time

Network Time

Architecture

DGA Model• Detect Randomly generated domains in the pDNS data.

• Model is trained on 6 categories of malware families like zeus, tinba, pushdo, etc.

• 29 features extracted from the domain.

• 29 features dimensionally reduced to 16 features using PCA.

• Those reduced features set is then used to train a GBM classifier.

Domain FeaturesCommon Letter Score Entropy

Domain Features(2)Length of largest meaningful string Mean length of dictionary words

DGA Features

DGA Classification PerformanceOverall model performance

(Random Forrest)

Metric Performance Accuracy 98.738% Precision 99.288% Recall 98.181% AUC 99.801%

Performance per malware family

Malware Family % Detection

Conflicker 86.309%

Cryptolocker 98.348%

Pushdo 95.515%

Ramdo 99.823%

Tinba 96.715%

Zeus 100.0%

Network Model• Using WHOIS record to find if a domain is malicious or benign.

• WHOIS record contains very rich information about a domain.

• Age based features.

• Registration Features.

Network Features – Whois Server

Malicious Domains Benign Domains

Network Features – creation Date

Network Model Performance • Final Set of features :- creation Date, update Date, expiration Date,admin country, registrant country, tech country, status, whois server

Metric Performance Error 0.00450864127

Area Under Curve 0.96615884041

Compromised Client Detection

Hadoop HDFS

Spark Compute

IP DGA WHOIS NX SERVERip1 #10 #3 #4 #5

Ip2 #8 #1 #2 #3

ip3 #5 #2 #0 #0

ip4 #3 #3 #0 #0

pDNS Data

Group By

Thank You

malicious client detection using machine learning

Technology