machine learning-based malicious adversaries detection in an enterprise environment by using open...
DESCRIPTION
Machine Learning-based Malicious AdversariesDetection in an Enterprise Environment by Using OpenSource Tools-talk for Malaysian Open Source Conference 2012, 9th July 2012, Berjaya Times Square, Kuala Lumpur, MalaysiaTRANSCRIPT
![Page 1: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/1.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Machine Learning-based Malicious AdversariesDetection in an Enterprise Environment by Using Open
Source Tools
Muhammad Najmi Ahmad ZabidiInternational Islamic University Malaysia
MOSC 2012Berjaya Times Square, Kuala Lumpur
9th July 2012
Muhammad Najmi Ahmad Zabidi MOSC 2012 1/34
![Page 2: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/2.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
About
• I am a research grad student at Universiti TeknologiMalaysia, Skudai, Johor Bahru, Malaysia
• My current employer is International Islamic UniversityMalaysia, Kuala Lumpur
• Research area - malware detection, narrowing onWindows executables
• For past few years (since 2003), I am a Subversion(SVN)committer for KDE localization project to Malay language(but now rarely commit.. need a new intern to replace :) )
Muhammad Najmi Ahmad Zabidi MOSC 2012 2/34
![Page 3: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/3.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Computing world as we knew it
• Interconnected machine
• Previously less connected, now ‘‘socialized’’ machines
• Brought real problems to the cyberworld
Muhammad Najmi Ahmad Zabidi MOSC 2012 3/34
![Page 4: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/4.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Risks
• Financial lost
• Company/government level espionage
• Privacy breach
Muhammad Najmi Ahmad Zabidi MOSC 2012 4/34
![Page 5: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/5.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Types of adversaries
• Spam
• Scam
• Phishing
• Malware, botnet, rookit etc
• Anything else?
Muhammad Najmi Ahmad Zabidi MOSC 2012 5/34
![Page 6: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/6.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Spam
• Annoying
• Productivity wasted in unneccesary file deletion
• Difficult to find important email - extreme case
Muhammad Najmi Ahmad Zabidi MOSC 2012 6/34
![Page 7: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/7.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Spam
• Annoying
• Productivity wasted in unneccesary file deletion
• Difficult to find important email - extreme case
Muhammad Najmi Ahmad Zabidi MOSC 2012 6/34
![Page 8: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/8.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Spam
• Annoying
• Productivity wasted in unneccesary file deletion
• Difficult to find important email - extreme case
Muhammad Najmi Ahmad Zabidi MOSC 2012 6/34
![Page 9: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/9.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Spam
• Annoying
• Productivity wasted in unneccesary file deletion
• Difficult to find important email - extreme case
Muhammad Najmi Ahmad Zabidi MOSC 2012 6/34
![Page 10: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/10.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Scam
• Preying on naive victims
• Sounds to good to be true, but still some people believed
• Organized crime/syndicate... with mules cooperating
Muhammad Najmi Ahmad Zabidi MOSC 2012 7/34
![Page 11: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/11.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Scam
• Preying on naive victims
• Sounds to good to be true, but still some people believed
• Organized crime/syndicate... with mules cooperating
Muhammad Najmi Ahmad Zabidi MOSC 2012 7/34
![Page 12: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/12.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Scam
• Preying on naive victims
• Sounds to good to be true, but still some people believed
• Organized crime/syndicate... with mules cooperating
Muhammad Najmi Ahmad Zabidi MOSC 2012 7/34
![Page 13: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/13.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Scam
• Preying on naive victims
• Sounds to good to be true, but still some people believed
• Organized crime/syndicate... with mules cooperating
Muhammad Najmi Ahmad Zabidi MOSC 2012 7/34
![Page 14: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/14.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Phishing
• Almost similar with scam, but different tactic
• More sophisticated, but does not need mule/physicalmeetup
• Main purpose to gain important details - online bankinglogin name, password hence access to the victim’saccount
• More secure to the criminal
Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
![Page 15: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/15.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Phishing
• Almost similar with scam, but different tactic
• More sophisticated, but does not need mule/physicalmeetup
• Main purpose to gain important details - online bankinglogin name, password hence access to the victim’saccount
• More secure to the criminal
Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
![Page 16: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/16.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Phishing
• Almost similar with scam, but different tactic
• More sophisticated, but does not need mule/physicalmeetup
• Main purpose to gain important details - online bankinglogin name, password hence access to the victim’saccount
• More secure to the criminal
Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
![Page 17: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/17.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Phishing
• Almost similar with scam, but different tactic
• More sophisticated, but does not need mule/physicalmeetup
• Main purpose to gain important details - online bankinglogin name, password hence access to the victim’saccount
• More secure to the criminal
Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
![Page 18: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/18.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Phishing
• Almost similar with scam, but different tactic
• More sophisticated, but does not need mule/physicalmeetup
• Main purpose to gain important details - online bankinglogin name, password hence access to the victim’saccount
• More secure to the criminal
Muhammad Najmi Ahmad Zabidi MOSC 2012 8/34
![Page 19: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/19.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Malware
• Safely to say,coverstrojan,virus,dialers,rabbits,worms,rootkit(bundlednowadays)
• Already infecting computers since 1980s, threat is moreobvious when the Internet is coming in
• Attacking any operating system, Linux, Windows, Mac...even Android phones
Muhammad Najmi Ahmad Zabidi MOSC 2012 9/34
![Page 20: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/20.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Malware
• Safely to say,coverstrojan,virus,dialers,rabbits,worms,rootkit(bundlednowadays)
• Already infecting computers since 1980s, threat is moreobvious when the Internet is coming in
• Attacking any operating system, Linux, Windows, Mac...even Android phones
Muhammad Najmi Ahmad Zabidi MOSC 2012 9/34
![Page 21: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/21.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Malware
• Safely to say,coverstrojan,virus,dialers,rabbits,worms,rootkit(bundlednowadays)
• Already infecting computers since 1980s, threat is moreobvious when the Internet is coming in
• Attacking any operating system, Linux, Windows, Mac...even Android phones
Muhammad Najmi Ahmad Zabidi MOSC 2012 9/34
![Page 22: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/22.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Malware
• Safely to say,coverstrojan,virus,dialers,rabbits,worms,rootkit(bundlednowadays)
• Already infecting computers since 1980s, threat is moreobvious when the Internet is coming in
• Attacking any operating system, Linux, Windows, Mac...even Android phones
Muhammad Najmi Ahmad Zabidi MOSC 2012 9/34
![Page 23: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/23.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Problems with adversaries detection
• Some manually crafted, some automated
• React relatively fast, difficult to trace
• Too many (for example, spam) hence too time consumingfor manual work
Muhammad Najmi Ahmad Zabidi MOSC 2012 10/34
![Page 24: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/24.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
In house analysis
• Given enough expertise, in house analysis could be useful
• Maintaining reputation, having own group of analysts tohandle incidents
• Try minimize costs, use open source tools wheneverpossible
Muhammad Najmi Ahmad Zabidi MOSC 2012 11/34
![Page 25: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/25.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Machine Learning
• Associated with the Artificial Intelligence
• Mimicking human (brain) learning
• Learns through experience
• Deals with known and unknown patterns
• Overlapping (or somehow originated) with Data Mining,Pattern Recognition
Muhammad Najmi Ahmad Zabidi MOSC 2012 12/34
![Page 26: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/26.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005] Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
![Page 27: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/27.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification
Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005] Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
![Page 28: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/28.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification
Clustering
Deals with known data
Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005] Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
![Page 29: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/29.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification
Clustering
Deals with known data
Deals with unknown data
Supervised learning
Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005] Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
![Page 30: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/30.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification
Clustering
Deals with known data
Deals with unknown data
Supervised learning
Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005] Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
![Page 31: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/31.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification
Clustering
Deals with known data
Deals with unknown data
Supervised learning
Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005]
Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
![Page 32: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/32.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification Clustering
Deals with known data
Deals with unknown data
Supervised learning
Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005]
Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
![Page 33: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/33.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning
Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005]
Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
![Page 34: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/34.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005]
Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
![Page 35: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/35.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005]
Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
![Page 36: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/36.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Table 1: Differences between clustering and classification
Classification Clustering
Deals with known data Deals with unknown data
Supervised learning Unsupervised learning
Popular algorithms includes:
• Random Forest
• Neural Networks
• k-Nearest Neighbor
• Decision Trees
Popular algorithms includes:
• K-means
• Fuzzy C
• Gaussian
Predictive [Tan et al., 2005] Descriptive [Tan et al., 2005]
Muhammad Najmi Ahmad Zabidi MOSC 2012 13/34
![Page 37: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/37.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
What to look?
• We look for patterns
• In some case, have the spam,phishing mails corpus ready
• We call these patterns as ‘‘features’’
Muhammad Najmi Ahmad Zabidi MOSC 2012 14/34
![Page 38: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/38.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Spam/scam
• The language that being used
• Perhaps words like ‘‘You have won GBP100,000,000’’notification through emails
• Spam bombarded emails, some might be true businesses,but irresistable to handle.
• Scam, asking people to bank in money for untruthfulreasons
Muhammad Najmi Ahmad Zabidi MOSC 2012 15/34
![Page 39: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/39.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Phishing mails
• Look for URL
• Current effort for example by PhishTank is done by usingpublic submission and (I believe) manual verification
Muhammad Najmi Ahmad Zabidi MOSC 2012 16/34
![Page 40: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/40.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Malware
• Researchers tend to look on the ApplicationProgramming Interface (API) calls, some on the opcodes
• Analysis done either by using static or dynamic analysis
Muhammad Najmi Ahmad Zabidi MOSC 2012 17/34
![Page 41: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/41.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Categories
Some example
Figure 1: Automated classification proposed by [Rieck et al., 2009]
Muhammad Najmi Ahmad Zabidi MOSC 2012 18/34
![Page 42: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/42.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
The datasets
• Spam email research is already quite sometimescompared to the other (phishing)
• Sample dataset:• http://csmining.org/index.php/spam-email-datasets-.html• http://archive.ics.uci.edu/ml/datasets/Spambase
• Scam email somehow very much associated with spam,since it is unwanted email. Might as well beingcategorized as ‘‘sub-spam’’
• Phishing emails samples:• Sample dataset:
• http://phishtank.com
Muhammad Najmi Ahmad Zabidi MOSC 2012 19/34
![Page 43: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/43.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
Feature Selection/Extraction
• When analyzing, we’re interested with features• What kind of feature?
• Important keywords, strong features• Non important features will be phased out.. unneccesary• Some features might be redundant
Muhammad Najmi Ahmad Zabidi MOSC 2012 20/34
![Page 44: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/44.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
• There are algorithms which meant for this:• Information Gain• Support Vector Machine (SVM)• other... some maybe hybrid algoritms(combining several
algorithms altogether) - also known as ensemble
Muhammad Najmi Ahmad Zabidi MOSC 2012 21/34
![Page 45: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/45.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
WekaR languageOctavePython Scipy
List of tools
• Weka
• R language
• Octave (as replacement for Matlab)
• Python Sci-py with Matplotlib
Muhammad Najmi Ahmad Zabidi MOSC 2012 22/34
![Page 46: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/46.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
WekaR languageOctavePython Scipy
List of tools
• Weka
• R language
• Octave (as replacement for Matlab)
• Python Sci-py with Matplotlib
Muhammad Najmi Ahmad Zabidi MOSC 2012 22/34
![Page 47: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/47.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
WekaR languageOctavePython Scipy
List of tools
• Weka
• R language
• Octave (as replacement for Matlab)
• Python Sci-py with Matplotlib
Muhammad Najmi Ahmad Zabidi MOSC 2012 22/34
![Page 48: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/48.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
WekaR languageOctavePython Scipy
List of tools
• Weka
• R language
• Octave (as replacement for Matlab)
• Python Sci-py with Matplotlib
Muhammad Najmi Ahmad Zabidi MOSC 2012 22/34
![Page 49: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/49.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
WekaR languageOctavePython Scipy
List of tools
• Weka
• R language
• Octave (as replacement for Matlab)
• Python Sci-py with Matplotlib
Muhammad Najmi Ahmad Zabidi MOSC 2012 22/34
![Page 50: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/50.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
WekaR languageOctavePython Scipy
Figure 2: Weka
Muhammad Najmi Ahmad Zabidi MOSC 2012 23/34
![Page 51: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/51.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
WekaR languageOctavePython Scipy
Weka
• Obtained data are in numbers and visualizations
• Need to do some reading on how to interpret them
• Test with different algorithms to get the best results
Muhammad Najmi Ahmad Zabidi MOSC 2012 24/34
![Page 52: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/52.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
WekaR languageOctavePython Scipy
R language
• No merely a tool, but a language by itself
• Usually being used by data analysts
Muhammad Najmi Ahmad Zabidi MOSC 2012 25/34
![Page 53: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/53.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
WekaR languageOctavePython Scipy
Figure 3: These books use R language for their analysis purposes
Muhammad Najmi Ahmad Zabidi MOSC 2012 26/34
![Page 54: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/54.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
WekaR languageOctavePython Scipy
Octave
• Octave is an open source alternative for Matlab (MATrixLABoratory)
• Works almost similar like Matlab does
Muhammad Najmi Ahmad Zabidi MOSC 2012 27/34
![Page 55: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/55.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
WekaR languageOctavePython Scipy
Figure 4: Octave also has GUI, QtOctave - discontinued
Muhammad Najmi Ahmad Zabidi MOSC 2012 28/34
![Page 56: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/56.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
WekaR languageOctavePython Scipy
Python Scipy
#!/usr/bin/env python"""Example: simple line plot.Show how to make and save a simple lineplot with labels, title and grid"""import numpyimport pylab
t = numpy.arange(0.0, 1.0+0.01, 0.01)s = numpy.cos(2*2*numpy.pi*t)pylab.plot(t, s)
pylab.xlabel(’time (s)’)pylab.ylabel(’voltage (mV)’)pylab.title(’About as simple as it gets,folks’)pylab.grid(True)pylab.savefig(’simple_plot’)
pylab.show()
Muhammad Najmi Ahmad Zabidi MOSC 2012 29/34
![Page 57: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/57.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
WekaR languageOctavePython Scipy
Muhammad Najmi Ahmad Zabidi MOSC 2012 30/34
![Page 58: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/58.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
FlowchartConclusion
The flow
Feature Selection Feature Categorization
Clustering Classification
Visualization
Weka,Octave,R
scipy, octave,R
Weka,Octave,R
scipy, octave,R
Muhammad Najmi Ahmad Zabidi MOSC 2012 31/34
![Page 59: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/59.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
FlowchartConclusion
Conclusion
• Malicious/unwanted threats from spam, scam, phishingand malware is not easy
• Perhaps one sample could be done by hands, but havingthousands per day is tedious
• Machine learning assist in automation
• Open source provides alternative (free as in minimal cost)for the analysis
• In house analysis provides security in anorganization/enterprise reputation
Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34
![Page 60: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/60.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
FlowchartConclusion
Conclusion
• Malicious/unwanted threats from spam, scam, phishingand malware is not easy
• Perhaps one sample could be done by hands, but havingthousands per day is tedious
• Machine learning assist in automation
• Open source provides alternative (free as in minimal cost)for the analysis
• In house analysis provides security in anorganization/enterprise reputation
Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34
![Page 61: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/61.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
FlowchartConclusion
Conclusion
• Malicious/unwanted threats from spam, scam, phishingand malware is not easy
• Perhaps one sample could be done by hands, but havingthousands per day is tedious
• Machine learning assist in automation
• Open source provides alternative (free as in minimal cost)for the analysis
• In house analysis provides security in anorganization/enterprise reputation
Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34
![Page 62: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/62.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
FlowchartConclusion
Conclusion
• Malicious/unwanted threats from spam, scam, phishingand malware is not easy
• Perhaps one sample could be done by hands, but havingthousands per day is tedious
• Machine learning assist in automation
• Open source provides alternative (free as in minimal cost)for the analysis
• In house analysis provides security in anorganization/enterprise reputation
Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34
![Page 63: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/63.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
FlowchartConclusion
Conclusion
• Malicious/unwanted threats from spam, scam, phishingand malware is not easy
• Perhaps one sample could be done by hands, but havingthousands per day is tedious
• Machine learning assist in automation
• Open source provides alternative (free as in minimal cost)for the analysis
• In house analysis provides security in anorganization/enterprise reputation
Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34
![Page 64: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/64.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
FlowchartConclusion
Conclusion
• Malicious/unwanted threats from spam, scam, phishingand malware is not easy
• Perhaps one sample could be done by hands, but havingthousands per day is tedious
• Machine learning assist in automation
• Open source provides alternative (free as in minimal cost)for the analysis
• In house analysis provides security in anorganization/enterprise reputation
Muhammad Najmi Ahmad Zabidi MOSC 2012 32/34
![Page 65: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/65.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
FlowchartConclusion
Get in touch!
najmi.zabidi @ gmail.comhttp://mypacketstream.blogspot.com
This slides was created with LATEX Beamer
Muhammad Najmi Ahmad Zabidi MOSC 2012 33/34
![Page 66: Machine Learning-based Malicious Adversaries Detection in an Enterprise Environment by Using Open Source Tools](https://reader033.vdocuments.mx/reader033/viewer/2022052620/55720fa3497959fc0b8c9836/html5/thumbnails/66.jpg)
IntroThe issues in general
MotivationSolution
ExperimentsToolseof()
FlowchartConclusion
Bibliography
Rieck, K., Trinius, P., Willems, C., and Holz, T. (2009).
Automatic analysis of malware behavior using machine learning.TU, Professoren der Fak. IV.
Tan, P.-N., Steinbach, M., and Kumar, V. (2005).
Introduction to Data Mining, (First Edition).Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
Muhammad Najmi Ahmad Zabidi MOSC 2012 34/34