khaled n. khasawneh*, meltem ozsoy***, caleb donovick**, nael abu-ghazaleh*, and dmitry ponomarev**...
TRANSCRIPT
Khaled N. Khasawneh*, Meltem Ozsoy***, Caleb Donovick**,Nael Abu-Ghazaleh*, and Dmitry Ponomarev**
Ensemble Learning for Low-levelHardware-supported Malware
Detection
* University of California, Riverside, ** Binghamton University, *** Intel Corp.
RAID 2015 – Kyoto, Japan, November 2015
Malware GrowthMcAfee Lab
Over 350M malware programs in their malware zoo387 new threat every minute
RAID 2015 – Kyoto, Japan, November 2015
Malware Detection Analysis
Static analysisSearch for signatures in the executableCan detect all known malware programs with no false alarmsCan't detect metamorphic malware, polymorphic malware, or targeted attacks
RAID 2015 – Kyoto, Japan, November 2015
Malware Detection Analysis
Static analysisSearch for signatures in the executableCan detect all known malware programs with no false alarmsCan't detect metamorphic malware, polymorphic malware, or targeted attacks
Dynamic analysisMonitors the behavior of the programCan detect metamorphic malware, polymorphic malware, and targeted attacksAdds substantial overhead to the system and have false positives
RAID 2015 – Kyoto, Japan, November 2015
TWO-LEVEL MALWARE DETECTION FRAMEWORK
RAID 2015 – Kyoto, Japan, November 2015
Two-Level Malware Detection
MAP was introduced by Ozsoy el al. (HPCA 2015)Explored a number of sub-semantic features vectors Single hardware supported detectorDetect malware online (In real time)Two stage detection
RAID 2015 – Kyoto, Japan, November 2015
RAID 2015 – Kyoto, Japan, November 2015
Contributions of this work
Better hardware malware detection using ensemble of detectors specialized for each type of malware
Metrics to measure resulting advantages of using two-level malware detection framework
EVALUATION METHODOLOGY:
WORKLOADS, FEATURES, PERFORMANCE MEASURES
RAID 2015 – Kyoto, Japan, November 2015
Data Set & Data Collection
Total Training Testing Cross-Validati
on
Backdoor
815 489 163 163
Rogue 685 411 137 137
PWS 558 335 111 111
Trojan 1123 673 225 225
Worm 473 283 95 95
Regular 554 332 111 111
• Source of programs– Malware
• MalwareDB• 2011-2014• 3,690 total malware programs
– Regular• Windows system binaries• Other applications like Winrar,
Notepad++, Acrobat Reader
• Dynamic trace– Windows 7 virtual machine – Firewall and security services
were all disabled– Pin tool was used to collect
the features during execution
RAID 2015 – Kyoto, Japan, November 2015
Feature SpaceInstruction mix
INS1: frequency of instruction categories INS2: frequency of most variant opcodesINS3: presence of instruction categories INS4: presence of most variant opcodes
Memory reference patterns MEM1: histogram (count) of memory address distancesMEM2: binary (presence) of memory address distances
Architectural eventsARCH: Total number of memory reads, memory writes, unaligned memory access, immediate branches and taken branches
RAID 2015 – Kyoto, Japan, November 2015
RAID 2015 – Kyoto, Japan, November 2015
Detection Performance Measures
Sensitivity:Percent of malware that was detected (True positive rate)
Specificity:Percent of correctly classified regular programs (True negative rate)
Receiver Operating Characteristic (ROC) CurveSummaries the prediction performance for range of detection thresholds
Area Under the Curve (AUC)Traditional performance metric for ROC curve
SPECIALIZING THE DETECTORS FOR DIFFERENT MALWARE
TYPES
RAID 2015 – Kyoto, Japan, November 2015
Constructing Specialized Detectors
Specialized detectors for each malware type were trained only with the data of that typeSupervised learning with logistic regression was used
MEM1 Detectors
RAID 2015 – Kyoto, Japan, November 2015
General vs. Specialized Detectors
Backdoor
PWS Rogue Trojan Worm
INS1 General 0.713 0.909 0.949 0.715 0.705
Specialized 0.715 0.892 0.962 0.727 0.819
INS2 General 0.905 0.946 0.993 0.768 0.810
Specialized 0.895 0.954 0.976 0.782 0.984
INS3 General 0.837 0.909 0.924 0.527 0.761
Specialized 0.840 0.888 0.991 0.808 0.852
INS4 General 0.866 0.868 0.914 0.788 0.830
Specialized 0.891 0.941 0.993 0.798 0.869
MEM1 General 0.729 0.893 0.424 0.650 0.868
Specialized 0.868 0.961 0.921 0.867 0.871
MEM2 General 0.833 0.947 0.761 0.866 0.903
Specialized 0.843 0.979 0.931 0.868 0.871
ARCH General 0.702 0.919 0.965 0.763 0.602
Specialized 0.686 0.942 0.970 0.795 0.560
RAID 2015 – Kyoto, Japan, November 2015
General vs. Specialized Detectors
Backdoor
PWS Rogue Trojan Worm
INS1 General 0.713 0.909 0.949 0.715 0.705
Specialized 0.715 0.892 0.962 0.727 0.819
INS2 General 0.905 0.946 0.993 0.768 0.810
Specialized 0.895 0.954 0.976 0.782 0.984
INS3 General 0.837 0.909 0.924 0.527 0.761
Specialized 0.840 0.888 0.991 0.808 0.852
INS4 General 0.866 0.868 0.914 0.788 0.830
Specialized 0.891 0.941 0.993 0.798 0.869
MEM1 General 0.729 0.893 0.424 0.650 0.868
Specialized 0.868 0.961 0.921 0.867 0.871
MEM2 General 0.833 0.947 0.761 0.866 0.903
Specialized 0.843 0.979 0.931 0.868 0.871
ARCH General 0.702 0.919 0.965 0.763 0.602
Specialized 0.686 0.942 0.970 0.795 0.560
RAID 2015 – Kyoto, Japan, November 2015
General vs. Specialized Detectors
Backdoor
PWS Rogue Trojan Worm
INS1 General 0.713 0.909 0.949 0.715 0.705
Specialized 0.715 0.892 0.962 0.727 0.819
INS2 General 0.905 0.946 0.993 0.768 0.810
Specialized 0.895 0.954 0.976 0.782 0.984
INS3 General 0.837 0.909 0.924 0.527 0.761
Specialized 0.840 0.888 0.991 0.808 0.852
INS4 General 0.866 0.868 0.914 0.788 0.830
Specialized 0.891 0.941 0.993 0.798 0.869
MEM1 General 0.729 0.893 0.424 0.650 0.868
Specialized 0.868 0.961 0.921 0.867 0.871
MEM2 General 0.833 0.947 0.761 0.866 0.903
Specialized 0.843 0.979 0.931 0.868 0.871
ARCH General 0.702 0.919 0.965 0.763 0.602
Specialized 0.686 0.942 0.970 0.795 0.560
RAID 2015 – Kyoto, Japan, November 2015
General vs. Specialized Detectors
Backdoor
PWS Rogue Trojan Worm
INS1 General 0.713 0.909 0.949 0.715 0.705
Specialized 0.715 0.892 0.962 0.727 0.819
INS2 General 0.905 0.946 0.993 0.768 0.810
Specialized 0.895 0.954 0.976 0.782 0.984
INS3 General 0.837 0.909 0.924 0.527 0.761
Specialized 0.840 0.888 0.991 0.808 0.852
INS4 General 0.866 0.868 0.914 0.788 0.830
Specialized 0.891 0.941 0.993 0.798 0.869
MEM1 General 0.729 0.893 0.424 0.650 0.868
Specialized 0.868 0.961 0.921 0.867 0.871
MEM2 General 0.833 0.947 0.761 0.866 0.903
Specialized 0.843 0.979 0.931 0.868 0.871
ARCH General 0.702 0.919 0.965 0.763 0.602
Specialized 0.686 0.942 0.970 0.795 0.560
RAID 2015 – Kyoto, Japan, November 2015
Is There an Opportunity?General Specialized Difference
Backdoor 0.8662 0.8956 0.0294
PWS 0.8684 0.9795 0.1111
Rogue 0.9149 0.9937 0.0788
Trojan 0.7887 0.8676 0.0789
Worm 0.8305 0.9842 0.1537
Average 0.8537 0.9441 0.0904
Best General (INS4) Best Specialized per Type
RAID 2015 – Kyoto, Japan, November 2015
ENSEMBLE DETECTORS
RAID 2015 – Kyoto, Japan, November 2015
Ensemble LearningMultiple diverse base detectors
Different learning algorithm Different data set
Combined to solve a problem
RAID 2015 – Kyoto, Japan, November 2015
Decision FunctionsOr’ing
High Confidence Or’ing
RAID 2015 – Kyoto, Japan, November 2015
Decision FunctionsMajority voting
Stacking
RAID 2015 – Kyoto, Japan, November 2015
Ensemble DetectorsGeneral Ensemble
Combines multiple general detectors Best of INS, MEM, ARCH
Specialized Ensemble Combines the best specialized detector for each malware type
Mixed EnsembleCombines the best general detector with the best specialized detectors from the same features vector
RAID 2015 – Kyoto, Japan, November 2015
Offline Detection Effectiveness
Decision Function Sensitivity
Specificity
Accuracy
Best General - 82.4% 89.3% 85.1%
General Ensemble
Or’ing 99.1% 13.3% 65.0%
High Confidence 80.7% 92.0% 85.1%
Majority Voting 83.3% 92.1% 86.7%
Stacking 80.7% 96.0% 86.8%
Specialized Ensemble
Or’ing 100% 5% 51.3%
High Confidence 94.4% 94.7% 94.5%
Stacking 95.8% 96.0% 95.9%
Mixed EnsembleOr’ing 84.2% 70.6% 78.8%
High Confidence 83.3% 81.3% 82.5%
Stacking 80.7% 96.0% 86.7%
RAID 2015 – Kyoto, Japan, November 2015
Offline Detection Effectiveness
Decision Function Sensitivity
Specificity
Accuracy
Best General - 82.4% 89.3% 85.1%
General Ensemble
Or’ing 99.1% 13.3% 65.0%
High Confidence 80.7% 92.0% 85.1%
Majority Voting 83.3% 92.1% 86.7%
Stacking 80.7% 96.0% 86.8%
Specialized Ensemble
Or’ing 100% 5% 51.3%
High Confidence 94.4% 94.7% 94.5%
Stacking 95.8% 96.0% 95.9%
Mixed EnsembleOr’ing 84.2% 70.6% 78.8%
High Confidence 83.3% 81.3% 82.5%
Stacking 80.7% 96.0% 86.7%
RAID 2015 – Kyoto, Japan, November 2015
Offline Detection Effectiveness
Decision Function Sensitivity
Specificity
Accuracy
Best General - 82.4% 89.3% 85.1%
General Ensemble
Or’ing 99.1% 13.3% 65.0%
High Confidence 80.7% 92.0% 85.1%
Majority Voting 83.3% 92.1% 86.7%
Stacking 80.7% 96.0% 86.8%
Specialized Ensemble
Or’ing 100% 5% 51.3%
High Confidence 94.4% 94.7% 94.5%
Stacking 95.8% 96.0% 95.9%
Mixed EnsembleOr’ing 84.2% 70.6% 78.8%
High Confidence 83.3% 81.3% 82.5%
Stacking 80.7% 96.0% 86.7%
RAID 2015 – Kyoto, Japan, November 2015
Online Detection Effectiveness
A decision is made after each 10,000 committed instructions Exponentially Weighted Moving Average (EWMA) to filter false alarms
Sensitivity
Specificity Accuracy
Best General 84.2% 86.6% 85.1%
General Ensemble (Stacking)
77.1% 94.6% 84.1%
Specialized Ensemble (Stacking)
92.9% 92.0% 92.3%
Mixed Ensemble (Stacking)
85.5% 90.1% 87.4%
RAID 2015 – Kyoto, Japan, November 2015
METRICS TO ASSESS RELATIVE PERFORMANCE OF TWO-LEVEL DETECTION
FRAMEWORK
RAID 2015 – Kyoto, Japan, November 2015
Metrics
Work Advantage
Time Advantage
Detection Performance
RAID 2015 – Kyoto, Japan, November 2015
Online Detection Effectiveness
A decision is made after each 10,000 committed instructions Exponentially Weighted Moving Average (EWMA) to filter false alarms
Sensitivity
Specificity Accuracy
Best General 84.2% 86.6% 85.1%
General Ensemble (Stacking)
77.1% 94.6% 84.1%
Specialized Ensemble (Stacking)
92.9% 92.0% 92.3%
Mixed Ensemble (Stacking)
85.5% 90.1% 87.4%
RAID 2015 – Kyoto, Japan, November 2015
Time & Work Advantage Results
Time Advantage Work Advantage
RAID 2015 – Kyoto, Japan, November 2015
Hardware Implementation
Physical design overheadArea 2.8% (Ensemble), 0.3% (General)Power 1.5% (Ensemble), 0.1% (General)Cycle time 9.8% (Ensemble), 1.9% (General)
RAID 2015 – Kyoto, Japan, November 2015
RAID 2015 – Kyoto, Japan, November 2015
Conclusions & Future WorkEnsemble learning with specialized detectors can significantly improve detection performance
Hardware complexity increases, but several optimizations still possible
Some features are complex to collect; simpler features may carry same information
Future work:Demonstrate a fully functional systemStudy how attackers could evolve and adversarial machine learning
Thank you!
RAID 2015 – Kyoto, Japan, November 2015
Questions?