malware detection using machine learning techniques

Malware detection using Machine learning: A review

Submitted To: Maam Tahira MehboobPresented By:

Anum NisaSumaiya Arshad

MAY 18, 2016 | Machine Learning

ABOUT MALWARE & ITS DETECTION TECHNIQUES:

INTODUCTION:


ABOUT MALWARE & ITS DETECTION TECHNIQUES:

Malware is … Malicious software Virus, Spam, …

Increasing threats *Continuous and increased attacks on infra- structure *Threats to business, national security & personal security of PCs

Attacks are becoming more advanced and sophisticated!


MALWARE Executables

Host vs Network based approachesLimitation of existing techniques -Signature-based approach * Fails to detect zero-day attacks. * Fails to detect threats with evolving capabilities such as metamorphic and polymorphic malware. -Anomaly-based approach *Producing high false positive rate. -Supervised Learning based approach *Poor performance on new and evolving malware *Building classifier model is challenging due to diversity of malware classes, imbalanced distribution, data imperfection issues, etc.


Red Hocks (Viruses)


Our Goal

Machine Learning based approach-Two level: *Supervised learning approach to detect malicious flows and further identify specific type *Combine unsupervised learning with supervised learning to address new class discovery problem


Two level malware detection framework:

Macro-level classifier

Used to isolate malicious flows from the non-malicious ones.

Micro-level classifier

Further categorize the malicious flows into one of the preexisting malware or new malware

Proposed Framework


Proposed Framework Block diagram


Classification ProcessMachine learning, data mining, and text classification & detection methods to detect Malicious Executable includes:

Classifies Unknown or Malicious using ML alogorithmsRandom Forest ClassifierBoosted J 48 decision treeKNN, naïvebayes, SVM, Multilayer Perceptron MLPMal-ID Basic Detection AlgorithmBoth the Bayes network and random fore

st classifiers produced more accurate readi

ngs.But boosted Decision Tree (J48) is best cl

assifierMAY 18, 2016 | Machine Learning

Experimental EvaluationOur Analysis Shows that among three major foms of viruses such as computer viruses, Internet worms and Trojan horses the most dangerous is trojans


ANALYSIS


ANALYSISThis section will introduce analysis techniques for mobile and PCs malware. It will transfer well known techniques from the common computer world to the platforms of mobile devices.The main idea of dynamic analysis is executing a

given sample in a controlled environment, monitoring its behavior, and obtaining information about its nature and purpose.

This is especially important in the field of malware research because a malware analyst must be able to assess a program’s threat and create proper counter-measures.

While static analysis might provide more precise results, the sheer mass of newly emerging malware each day makes it impossible to conduct a static analysis for even a small portion of today’s malware. MAY 18, 2016 | Machine Learning

ANALYSIS Of PARAMETERS:

To analyze malware detection techniques some evaluation parameters are used to detect quality factors (NonFunctional Requirements) :Category/Type of VirusDetection TechniquesAlgorithm/ Technology/ MechanismBest Classification methodologyEvaluation criterionImplementation Tools


J48 is an extension of ID3.

The additional features of J48 are:

accounting for missing values, decision trees pruning,

continuous attribute value ranges, derivation of rules, etc. In the WEKA data mining tool, J48 is an open source Java implementation of the C4.5 algorithm.

Boosted J 48 Decision Tree


Boosted J 48 Decision Tree


Conclusion:

We proposed an effective malware detection framework based on data mining & machine learning techniques:

Two level ML based classifier

New class detection

Encrypted data

A tree based kernel for SVM was proposed to handle the data imperfection issue in network flow data

And Boosted J 48 decision tree classifier is analysized as best classifier among no of different classifiers


Conclusion Contd:

However this paper shows the comparison of efficiency rate of different malware detection techniques including KNN, Naives Bayes, J 48 boosted, SVM(Support Vector Machine).We explain the feasibility of some detection methods and highlight the major causes of increasing no of malware files, but more research is necessary.


Future Works

Develop a hierarchical multi-class learning method to enhance the testing efficiency when the number of malware classes becomes extremely large.

Detection (of malware) accuracy can be improved, through further research into classification algorithms and ways to mark malware data more accurately.And most of the classifiers used are not optimized for hardware operations or applications. Additionally hardware algorithm design can increase precision or accuracy and efficiency.


ExtraMetamorphic malware is rewritten with each iteration so

that each succeeding version of thecode is different from the preceding one. The code changes makes it difficult for signature-based antivirus software programs to recognize that different iterations are the same malicious program.

Polymorphic malware also makes changes to code to avoid detection. It has two parts, but one part remains the same with each iteration, which makes the malware a little easier to identify.

an you imagine that a piece of malware code can change its shape and signature each time it appears, to make it extremely hard for signature based antivirus to detect them ?! This is called Polymorphic or Metamorphic malware.

http://searchsoftwarequality.techtarget.com/definition/iteration

http://whatis.techtarget.com/definition/code

http://searchsecurity.techtarget.com/definition/antivirus-software

http://searchsecurity.techtarget.com/definition/polymorphic-malware

software. Trojans can be employed by cyber-thieves and hackers trying to gain access to users' systems. Users are typically tricked by some form of social engineering into loading and executing Trojans on their systems. Once activated, Trojans can enable cyber-criminals to spy on you, steal your sensitive data, and gain backdoor access to your system. These actions can include:

Deleting dataBlocking dataModifying dataCopying dataDisrupting the performance of computers or computer

networks

malware detection using machine learning techniques

Engineering