issn: 1992-8645 feature selection in intrusion detection ... · pdf fileinput features give...

Journal of Theoretical and Applied Information Technology15th December 2016. Vol.94. No.1

© 2005 - 2016 JATIT & LLS. All rights reserved.

ISSN: 1992-8645 www.jatit.org E-ISSN: 1817-3195

30

FEATURE SELECTION IN INTRUSION DETECTION, STATEOF THE ART: A REVIEW

1HELMI B MD RAIS, 2TAHIR MEHMOOD1,2Department of Computer and Information Sciences

Universiti Teknologi PETRONAS, Perak, MalaysiaE-mail: [email protected], [email protected]

ABSTRACT

With the increase of internet usage the need of security for organizations network also increased.Network anomaly intrusion detection systems are designed to monitor abnormal activity in the network.These systems find the behavior that is deviated from the normal behavior. Network anomaly detectionmethods are implemented using different approaches including machine learning, data mining, andmany more. However, intrusion detection systems highly depend on the features of the input data. Theseinput features give information to the learning algorithms which used in intrusion detection system inthe form of the detection method. With irrelevant and redundant features learning algorithm buildsdetection model with less accuracy rate. Also, ambiguous features increase the time complexityand consume other computational resources as well. By removing these irrelevant and redundantfeatures accuracy of the learning algorithms can be increased. In this paper implementation of differentfeature selection techniques have been reviewed. Novel feature selection techniques have beendeveloped due to its importance in network intrusion domain. We have discussed some of it in atechnical aspect. These techniques are being discussed in detail. Moreover, features from these methodsare also given and their results are being. We categorized these techniques according to theirimplementation. Different comparison of these techniques have been given and been discussed.Moreover, the benchmark dataset that is KDD99 widely used for anomaly detection is also discussed inthis paper.

Keywords: Intrusion Detection System, Intrusion Detection Methods, Feature Selection, KDD99

1. INTRODUCTION

Today advancement in the internet bringsadvantages to the organization but at the sametime it benevolent the hackers to undermine thesecurity of the internet. There are usually twodefense walls on the network; the first one isthe firewall that scans and stops the traffic at theport level by using some rules for ports andsource, destination IP addresses. The second wallis the network intrusion detection system (NIDS),which analyzes the whole packet including apacket header and payload for any suspiciousactivity. NIDS is the second line of defense,which let the network administrator to know thatsome attacks or malicious activities have bypassedthe firewall. There are many vendors that provideNIDS products but not a single product can claim100% protection against all attacks.Intrusion Detection System (IDS) is designedto detect intrusions in the data. It monitors the datafor any suspicious activity. Network anomalybased intrusion detection systems are used toobserve the activities that deviated from normalbehavior of the network. Network anomaly

detection method makes a baseline for thenormal activity, any activity that deviatesfrom that baseline is considered as a possibleintrusion.

Many algorithms have been applied to develop theanomaly detection model for intrusion detectionsystem. These algorithms, however, highly dependon input features. High dimensional datacauses learning algorithm to generate detectionmodel with high time complexity and lowaccuracy rate. Poor features selection can affectthe accuracy of these algorithms badly whichleads to errors in the form of false negativesand false positives, which are needed to beminimized. This is the challenging task, and itis the main reason to prevent deployment ofintrusion detection systems. The purpose offeature selection is to decrease the computationaltime for the learning algorithms and enhance theperformance of these algorithms throughremoving redundant and irrelevant features [1].Good features can lead to higher accuracy of thedetection. Features selection improves the learningprocess, good generalization model, and decreasescomputational complexity [2].




31

In this paper different feature selection techniquesfrom previous work have been discussed fromnetwork anomaly detection methods perspective.

Unlike we mainly focused the featureselection

method. The feature selection methods arediscussed in detail. Also the selected features fromthese methods are also given and their results arebeing discussed in detail. Section 1 gives a briefintroduction to types of intrusion detection system,the methods used in the intrusion detectionsystem, and why feature selection is necessary forintrusion detection method. Section 2 givesprevious work related to feature selection method,followed by dataset and evaluation in section 3.The dataset discussed in section 3 is KDD99dataset which is a benchmark dataset for networkanomaly detection. Also feature selection methodsthat are being discussed in section 2 all usedKDD99 dataset. In section 4 Discussion section,discusses the results given by the papers given insection 2. Some issues and challenges are given insection 5. In section 6 paper is concluded in theconclusion section.

1.1. Types of Intrusion Detection SystemIntrusion detection systems are software

applications that analyze the unauthorizedactivities in network or system data and generatealerts about intrusions [3]. There are two types ofintrusion detection system (IDS) named, Network-based IDS and Host-based IDS [4]. These systemsare mainly based on data source [5]. Types ofintrusion detection systems are based on where theintrusion detection system is implemented i.e. onnetwork or host.

1.1.1. Host-based SystemsHost-based intrusion detection system (HIDS)

deals with the information of a certain singlehost or computer. The HIDS collects data aboutthe ongoing events in the monitoring system.Host- based intrusion detection system acquiresdata from system logs or audit trails and otherlogging information that are generated by theoperating system. These log files consist of asequence of system calls and give usefulinformation about host system [6].Host-based intrusion detection system should havethe capability to collect information in sufficientdetail to identify abnormalities in the host. Butas the HIDS collects a finer level of detail, itrequires significant storage. It also results inincreased load on a host system. Thus, Host-basedIDS has high cost and results in a tradeoffbetween time and storage complexity to itsreliable detection rate. If the setup of the

organization is large, then host- based approachcould be economically infeasible. Its cost cannotbe reduced as HIDS are not portable because itdepends on the configuration of the specifichost system as well. Furthermore, to access someinformation from the operating system needs userprivileges to access the information and it is theone of the limitation of such intrusion detectionsystems.

1.1.2. Network-based SystemsA Network-based intrusion detection system

(NIDS) deals with detecting intrusions in networkdata [7]. These systems gather information fromthe network itself, instead of each separate host.Information is gathered from the network traffic,as data travels on the network segments.Network- based intrusion detection systemdetects an attack by examining the content andheader information of all packets that are movingacross the network. Network intrusion detectionsystem has been considered to be one of themost prominent methods for detection ofcomplex and dynamic intrusion behaviors [8].As NIDS is implemented on the network, it istransparent to the user of the system, and this isalso important for intrusion detection systemitself. Transparent monitoring reduces thepossibility that the intruder will be able todetect and cancel security defense capabilities ofthe monitoring system without significant effort.Moreover, NIDS monitors traffic over specificnetwork segments and is not dependent on theoperating system this results in enhancedportability. Deployed network- based intrusiondetection system detects all attacks, regardless ofthe implemented operating system. Such system isuseful in situations where the network topologychanges. However, NIDS does not work well onhigh-speed networks. It starts dropping packets onhigh-speed networks. In such condition, NIDS hasless enough time to monitor network trafficefficiently and detect the intrusions.

1.2. Techniques Used in IntrusionDetection System

There are two main intrusion detectionmethods, signature based detection method andanomaly based detection method [9]. These arethe techniques that are used by the intrusiondetection system to detect intrusions.




32

1.2.1. Signature Based Detection

Signature-based intrusion detection schemesseek defined patterns, or signatures, within theanalyzing data [10]. The main advantage of thistechnique is that the signatures are very easy todevelop if the behavior of the network is known.

This approach has been proved to be veryeffective at detecting known threats but ineffectiveto detect unknown threats [11]. Since signaturebased detection method can detect attacks whosesignatures are previously stored in thedatabase. Therefore, the signature must be createdfor every attack, due to this reason signature baseddetection method cannot detect novel attacks.Fixed behavior pattern as the bases of signaturebased technique fails to detect those worms,having self-modifying behavior like encryptingthemselves. Furthermore, as signature-baseddetection method has to create new signatures forevery variation, the performance of the systemdegrades significantly. By adding new signaturewill increase the amount of pattern to be detectedand with the passage of time it will result inperformance degradation of the detection system.

1.2.2. Anomaly Based DetectionThe anomaly based detection method is based

on defining the network behavior. If the behaviorof the network is according to pre-definedbehavior, then it is accepted. Otherwise, it triggersthe event for the detection of the abnormalbehavior [4]. Anomaly based technique requiresdetail knowledge of the network or host tobuild a baseline for normal behavior [12]. Thebaseline is considered as a threshold for thenormal behavior. To detect anomalies correctly,detailed knowledge of accepted network or systembehavior is required to be developed carefully.Conversely, the malicious behavior goesunnoticed if the behavior of malicious user fallsbelow the baseline. The main advantage ofanomaly based detection method over signaturebased detection method is the detection of novelattacks for which no signatures exist [13]. Thiscan be seen when the system automatically detectsnew worms.

1.3. Feature SelectionFeatures are the trait of the system or

object that predict the behavior or state of thesystem [14]. In other words, features are thedistinctive characteristics of the system thatpredict the behavior, nature or state of thesystem. Abstractly all data points are considered

as features, and can be utilized to predict the stateor behavior of the system. Researchers generallyextract the dominant features from all the datapoints that are invariant in nature to achieverobustness of the system. Researchers are aimingto select the features that can enhance theaccuracy of the learning algorithm, this processis known as feature selection [15].

Feature selection must have the capability ofdetection and removal of noisy and misleadingfeatures among features.Feature selection process selects a subsetof features that represents the whole feature set[16]. Which features should be included orexcluded is being decided in this process.Feature selectiontechniques hinges on two pillars, relevancyandredundancy of the features [17]. Relevantfeatures are those that predict the desired systemresponse, on the other hand, redundant featureshave a high degree of correlation amongthemselves. Thus, removal of the redundantfeatures is desired. The predictive accuracy ofthe machine learning algorithms can beincreased by the feature selection. Robustfeature set also reduces the training time ofthe classifier as robust features are invariant innature and reduces the dimension in high-dimensional data [18]. Reduced dataset alsodecreases dataset which acquire less storage space.In network intrusion detection, featuresare extracted from protocols header at differentlayers of network architecture and contentsof data packets. Due to this reason noise inchannels propagates to extracted features, thisleads to the false intrusion alarm. There are twotypes of feature selection methods: Filter andwrapper. Filter method selects the subset offeatures without involving learning algorithmin evaluation phase and is mainly based onranking of features, which represents therelevancy of the features [19]. In contrast,wrapper method evaluates a subset of featuresusing learning algorithm [20]. Thisevaluating algorithm is called iteratively unlessa robust subset of the feature is selected. Filterbased approach is computationally fastcompared to wrapper based as it doesn’tinvolve any learning algorithm during rankingof features. However, wrapper based featuresubset accomplishes good accuracy rate as itinvolves learning algorithm in the subsetevaluation phase [21].




33

2. FEATURE SELECTIONTECHNIQUE AND INTRUSIONDETECTION

Many works have been done for featureselection in intrusion detection domain. This isdue to the high dependency of intrusion detectionmethod on input features in the learning phase.Inadequate and noisy features affects the learningalgorithm model. The detection model came fromthe learning of the ambiguous features cannotefficiently detect the attack, if the learningalgorithm doesn’t get adequate and conciseinformation from the features. In this paper wediscussed feature selection technique fornetwork

anomaly detection using machine learning based,rough set theory, and evolution computing. Thedetection techniques and feature selectiontechniques are the aspect for the discussion in thefollowing sections.

2.1. Machine learning basedMukkamala et al. [22] Used SVM for

intrusion detection on the kdd99 dataset. Forfeature selection, SVM was used to identify themost significant features for detecting attacks.Feature reduction was done by deleting onefeature at a time, and train SVM with thatreduced data set. Those features, whose deletionresulted in more accurate performance (ascompared to the original SVM trained with 41features) was deemed insignificant. It does featureselection by deleting single feature at a time andcomparing the result with the whole features. Itincreases the time for feature selection as it hasto test the reduced data for each feature and thencomparing the result with the whole featuredatasets classification result. Thus, the methodconsumed more computing memory and requiredmore processing cycle.Lin et al. [23] Used Simulated Annealing (SA)with Support Vector Machine (SVM) andDecision Tree (DT) together for detection ofanomaly and feature selection. SimulatedAnnealing found the best features whereasDecision Tree was used to get rules from thedataset. The data was then classified usingSupport Vector Machine. Simulated Annealingwas also used to adjust automatically the parametersetting for the DT and SVM. Intrusion detectionmethod was trained and tested on the kdd99dataset. The method had a very good result forNormal and DOS classes but for the other threeattack classes Probe, Remote-to-Local (R2L), and

User-to-Remote (U2R), this technique had notgood results.George [24] used Principle ComponentAnalysis (PCA) along with SVM for intrusiondetection. PCA was used for dimension reduction(feature selection) while SVM used forclassification. Evaluation, based on the SVM withPCA approach, gave less misclassificationcompared to SVM method. The result was good ascompared to using SVM alone, but the resultstill needs to improve also false negative rate(FNR) was so high.Authors in [25] used two-phase technique for theclassification of KDD99 dataset into normal andanomalous class. The approach used three featureranking techniques, gain ratio, informationgain, and global method for data handling(GMDH) for feature selection. The proposedmethod used GMDH for the classification as wellin the second phase. The abductive network ofGMDH was used for the validation for featuresubset. The abductive network includedmonolithic abductive model and ensembleabductive model. Proposed method converted thenominal features to binary. Thus, the total numberof features after transformation were

123. After transformation above mentioned threefeatures selection techniques were carried outon the set to get a subset of features. The topcommon features, among above mentioned featureranking approaches, were selected for theconstruction of the GMDH classifier model. Thepaper claimed to have less training time forfeature subset. Less training time was due tothe removal of the instances of the not selectedfeatures.In [26] new method called, cluster center andnearest neighbor (CANN) was used totransform the data dimension into a singledimension, which was based on the Euclideandistance. CANNcalculated two different distances for the datapoint.One was the distance between data point tothe centroid of each cluster. Second distance wascalculated by the sum of the distances betweenthat data point to its neighbor data points.Both distances were summed to single newdimension for the data point. The data points withnew dimension were validated using K-NNclassifier. Before using K-NN, two subset offeatures were used. One subset consisted of 6features while the second one consisted of 19features. These features were taken directly fromthe work of [27] [28]. The author claimed to havea good detection rate of K- NN with new data




34

formation compared to the K- NN result with theoriginal set, but work didn’t put the same k valuefor both cases of data formation. Also, the resultwas not validated for the whole feature set thatmakes it difficult to decide whether the reducedfeature set as well as new data formationresulted better than whole feature set and originaldata formation.

2.2. Rough Set TheoryThe rough set theory deals with the analysis

of uncertain, imprecise and incompleteinformation [29]. It gives rational about theobject represented by features. The rough settheory assumes that every object has someinformation associated with it, which helps to findthe indiscernible objects according to the availableinformation [30]. Many authors use this principlein intrusion detection feature selection and foundit facilitating. Few of them are given below,Ghali [31] used the rough set theory along withthe artificial neural network. The aim of the authorwas to reduce the dataset for intrusion detectionwhich resulted in less consumption of computerresources.

RSNNA (Rough Set Neural Network Algorithm)was used for feature reduction, which founddependencies among the features while feedforward neural network was used for theclassification of the data. The result was good ascompared to NN result for the full feature set andone hidden layer with four neurons. Moreover,from the results it can depicts that the aim of themethodology was just the reduction of data as theauthor didn’t consider the detection rate forintrusion detection method.Rassam and Maarof [32] used unsupervisedlearning method, artificial immune network(aiNet).The artificial immune network was used forthe clustering of the data into two clusters, normalclassand attack classes. Before the clustering features, aselection was done using rough set theory. Themotivation of using clustering approach was thatasit is difficult to get the labelled data. Also,theaccurate labelling of the data is a hard task and isa time consuming job. This research used twophases. In phase one, the rough set theory wasused, with the help of ROSSETA tool, to findfeature subset. This input feature subset wasthen used in the second phase for aiNet to clusterthe data into its respective classes. Feature subsetwas selected by rough set theory. With the reduced

feature set aiNet was used for clustering. Butbefore using aiNet the normalization process wasdone to convert nominal attributes into discretevalues. The result was compared with K-meansclustering and was outperformed. The authoralso used a rough set classifier to see theperformance of the classifier with the wholefeature set and with the reduced feature set. Theclassifier trained with the reduced feature set had abetter result compared to the result of the classifiertrained with the whole feature set.In [33] enhanced SVM has been proposed. Theidea behind is that SVM kernel function treats allfeatures equally, which means that redundant andirrelevant features are treated the same way asother features. This work introduced to giveweights to the features according to theirimportance. The rough set theory was used for thispurpose. Important features were given highweights while least features got very low weightand thus deleted. Before applying rough settheory, redundant records were removed andnominal features were converted into the numericusing feature-value format. After getting thefeature subset, SVM was trained with the reducedfeature set to get detection model. In theexperiment along with KDD99 dataset anotherdataset named, University of New Mexico(UNM), was used. This dataset consists of systemcalls. Result for the enhanced SVM wascompared with the SVM trained with thewhole feature set. For KDD99 dataset, theresult was comparable but not significant. Forthe UNM dataset result was substantially goodbut the same result hold for the SVM trained andtested with the whole feature set.

Shirivastava and Jain [34] used research rough settheory to find the feature subset. The dataset wasnormalized first into numeric values. ThenROSETTA, rough set theory tool, was used tofind feature reduct. Authors used Johnson’salgorithm and genetic algorithm two differentapproaches in ROSETTA to find feature reduct.From Johnson’s one feature reduct was obtainedwhile genetic gave39 feature reduct set from which 4 wereselected. All five reduct sets, Johnsons’ andgenetic, contains six features. This featuresubset was evaluated using SVM accuracy rateas well as false positive rate were measured forthe original feature set and reduced feature set.The result was good for reduct feature set ascompared to the original feature set.




35

2.3. Evolution ComputationEvolutionary computation is based on

biological inspiration and involves survival of thefittest. Evolution computation involves manytechniques like genetic algorithm (GA), antcolony optimization, particle swarm optimization,genetic programming etc. These algorithms arepopulation based and are optimization algorithm,therefore it has been widely adopted for featureselection problem which is an NP-problem. Thefollowing literature gives the application ofevolutionary computation for feature selection inintrusion detection method.In [35] Tsang et al. proposed, multi-objectivegenetic fuzzy intrusion detection system(MOGFIDS) that was applied to improve theaccuracy of classification algorithm by makingeffective interpretability of fuzzy knowledge baserules for classification algorithm. The proposedMOGFIDS applied an agent-based evolutionarycomputation framework to develop anomaly rule-based intrusion detection. This research addressedthe problem that rule-based intrusion detectionshould have interpretable knowledge that canassist security experts for intrusion analysis. Themulti- agent learning system, MOGFIDS,comprised of arbitrator agent (AA) and fuzzy setagent (FSA). Each FSA was an autonomous andintra-behaviors of FSA was done byinterpretability-based regulation strategy. Thisincluded merging of similar fuzzy sets andremoval of fuzzy sets identical to singleton set.The fuzzy sets distribution is comprised of stepsincluded initialization of rule based population,Crossover

and mutation, evaluation criteria, andselection mechanism of fuzzy rule set candidate.Rufai et al. [36] combined membrane computing(MC) and bee algorithm (BA) for their work.Motivated by membrane structure and operationsof living cells MC, gives the solution for BA tofind the best feature subset. Thus, it improved BAfor feature selection. BA was run on differentmembranes in the main membrane to get theinitial solution. The best solution from theindividual membrane was collected, mixed andpassed to output membrane. At the outputmembrane, BA was again run for somespecific number of iterations. After which thebest solution was collected which served as a finalsolution. Feature subsets were selected for adifferent run with different fitness accuracy. Thefeature subset that gave best fitness accuracy wasfinally selected from a different run. The selectedfeature subset was validated using SVMclassification. The attack detection rate was not

outperformed as compared to other researchmethods, but false alarm rate was remarkably lowcompared to other feature selection methods.Zainal et al. [37] used a 2-tier approach,which included rough set and particle swarmoptimization (PSO), Rough-PSO. SVMwas used for classification while fitnessfunction was used to find out the fitness of theproposed feature subset. The 2-tier structurephase used rough set in coarse phase and particleswarm optimization in granular phase. Coarsephase removed the redundant and irrelevantfeature, which was followed by PSO to refineand select most prominent features out of it. Thepurpose of using rough set before PSO was toreduce the complexity in PSO. The methodsresult was compared with other researchwork and proposed method has outperformedamong them. Hasani et al. [38] used twoevolutionary algorithms, bees algorithm (BA) andlinear genetic algorithm (LGP), to find featuresubset in their research. The proposed modelcalled, LGP_BA, used in a wrapperapproach. First candidate chromosomegenerated by LGP was used the input to BA,which performed modification using neighborhoodsearch. New generations were performed byLGP that randomly selected features from theKDD99 dataset. Crossover and mutation wereapplied on each generation. It was performed tocategories it with highest fitness values. Afterwhich fitness function was applied, andchromosomes that fulfill the condition by thefitness function were passed to BA. BA modifiedthe solution from LGP. It evaluated the fitnesssolution using queen-bee evaluation. It alsohelped LGP to converge slowly. Afterevaluating BA applied crossover and mutationuntil it found proper fitness value. The bestcandidate solution was then evaluated by SVM.

Hua et al. [39] used foraging behavior inspiredalgorithm ant system [40], for feature selection ofnormal and intrusive data in the kdd99 dataset.Detection was done by classification in whichSVM was used to classify the data into theirrespective classes. Features were represented bygraphs and ants select the next node usingprobability function, which calculated pheromonevalue of the edge and heuristic information. TheFisher discrimination rate was adopted as theheuristic information in the probability function.A number of ants used was equal to the totalnumber of features i.e. 41 and was positioned onthe nodes randomly. Stopping criteria of the antsystem was selected using the maximumpredefined generations or the global optimal




36

solution. Global pheromone update was used toupdate the pheromone level on those edges whichbelonged to the optimal solution i.e. having theless squared error. Features subset was validatedusing SVM.In [41], the authors proposed a novel featureselection algorithm based on cuttlefish and iscalled, cuttlefish algorithm (CFA). CFA used tosearch prominent features among KDD99features. Thisalgorithm is based on reflection and visibilitysameas cuttlefish do when light strikes it. CFAalgorithm was modified to select features. Sixcases were introduced in the feature selectionprocess using CFA. These cases were carriedout in steps. It followed the initializationprocess in which population for some initialrandomly generated solutions were generated.Each population was associated with twosubsets named, selected features andunselected features. After which case 1 and case 2were carried out in parallel to selected bestfeature subset from each population. In case 3and case 4 single feature extraction was usedto produce new feature subset from its abovecases. In case 5 new subset was generatedand case 6 remaining solutions for populationswere carried out. Each subset was validatedusing decision tree in terms of the fitnessfunction. The paper claimed not to considerfalse positive rate as KDD99 test datasetcontains attacks, which are not included in the

training set. But the aim of anomaly detection Thetask to detect the unseen attacks with less FPR. Sowe cannot ignore FPR in anomaly detectioncase.

2.4. Dataset And Evaluation Analysis2.4.1. KDD CUP 99 Data Set

KDD99 dataset is the network intrusiondataset that is utmost extensively used for theevaluation of anomaly detection methods innetwork intrusions [42]. The dataset came fromDAPRA’98 IDS evaluation program [43, 44].Training data was collected from seven weeksof data in which few week’s data are attackfree while other weeks of data consist ofattacks.Testing data consists of two weeks of datawhich consists of normal and attacks data.Kdd99 has huge records and that’s why itssubset is widely used and is calledkddcup.data_10_percent (kdd99_10%). Thereare 22 attacks are in the training set and 16additional attacks are in the testing set. Thetraining set contains 492020 instances while311029 instances are for testing dataset [45].

KDD99 contains four attack classes, Denial ofService (DoS), Probe, Root-to-Local (R2L), User-to-Root (U2R), and one legitimate data classcalled, Normal [46]. Sub attacks of these attackclasses aregiven in Table 1.

Table 1: Attack Classes and sub attacks

Attack Class Attack name Additional attack in test dataset

DoS smurf, neptune, back, teardrop, pod, land. Processtable, apache2, mailbomb.

PROBE satan, ipsweep, portsweep, nmap. saint, mscan.

U2R buffer_overflow, rootkit, load_module, perl. sqlattack, xterm, httptunnel, ps.

R2Lwarezclient, guess_password, warezmaster,imap, ftp_write,multihop, phf, spy.

sendmail, xsnoop, named, snmpguess, xlock,snmpgetattack,worm.




37

Table 2: KDD99 features with labels

S.No Label Feature S.No Label Feature S.No Label Feature

1 A duration 15 O su_attempted 29 AC same_srv_rate2 B protocol_type 16 P num_root 30 AD diff_srv_rate3 C service 17 Q num_file_creations 31 AE srv_diff_host_rate4 D flag 18 R num_shells 32 AF dst_host_count5 E src_byte 19 S num_access_files 33 AG dst_host_srv_count6 F dst_byte 20 T num_outbound_cm

ds34 AH dst_host_same_srv_ rate

7 G land 21 U is_host_login 35 AI dst_host_diff_srv_rate8 H wrong_

fragment22 V is_guest_login 36 AJ dst_host_same_src_port_rate

9 I urgent 23 W count 37 AK dst_host_srv_diff_host_rate10 J hot 24 X srv_count 38 AL dst_host_serror_rate11 K num_failed_

login25 Y serror_rate 39 AM dst_host_srv_serror_rate

12 L logged_in 26 Z srv_serror_rate 40 AN dst_host_rerror_rate13 M um_promised 27 AA rerror_rate 41 AO dst_host_srv_rerror_rate14 N root_shelL 28 AB srv_rerror_rate

Table 3: KDD99_10% data distribution

Category Instances Distinctive Instances

NORMAL 97277 87831

DoS 391458 54572

PROBE 4107 2130

R2L 1126 999

U2R 52 52

TOTAL 494020 145584

Figure 1: % Distribution Among Original KDD99_10% And Reduced Dataset




38

The instances of each class are described by 41features as shown in Table 2.KDD99 features can be divided into three maincategories [47] as follows,1) Basic features: These features areextracted from single TCP/IP connections.2) Traffic features: Also called time-based features and is extracted from theconnections window interval.3) Content feature: These features deal withthe data portion of the TCP/IP packet.

KDD99 has many redundant data both in trainingand testing dataset. The distribution of differentclasses in KDD99 has been shown in Table 3.It can be seen that in original dataset DoSand PROBE attack classes have a huge number ofinstances compared to U2R and R2L, whichhave very few instances. One reason for that isDoS and PROBE attacks occur more frequently ina short period. Many researches, therefore, claimhaving high detection rate for DoS and PROBEclasses and low detection rate for R2L and U2Rclasses. Furthermore, as this dataset is thesubset of DARPA 1998 which has many flaws,detailed discussion can be found in [48].The percentage distribution in KDD99 (10%)of different classes has been shown in Figure 1. Italso shows the distribution in KDD99 (10%) ofdifferent classes when the redundant data fromKDD99 are removed.

2.4.2. EvaluationThe performance of the classifier is analyzedby measuring the error rate of the detectionsystem. To estimate the error rates andaccuracy of thesystem, the confusion matrix is used, givenin Figure 2.

NORMAL, DoS, and PROBE class but forU2R and R2L it has a very poor result.While [32] has high TPR for the R2L classin which the proposed feature selectionmethod

Selected eight features. In [39] detection methodhas high TPR, 99.4% among listed detectionmethods for probe class. This work used SVMclassification for their detection model and SVMwas trained with 32 features. It also combinedtheinstances of R2L and U2R classes and treatedthem as a single class. The TPR is high for thatcombined class, i.e. 98.7%. Table 4 also depictsthe type of feature selection method that has beenadopted in different methodology. Forunderstanding features

-of KDD99 has been labeled whichis shown in Table 2. From the table,it can be seen that bothFigure 2:Confusion Matrix

All other performance parameters are derivedfrom TP, FN, FP, and TN. True Positive (TP): when the data of a

positive class is correctly classified in itspositive class.

False Positive (FP): when the data ofa negative class is incorrectly classified intopositive class.

False Negative (FN): when the data ofa positive class is incorrectly classifiedin negative class.

True Negative (TN): when the data ofa negative class is correctly classified in itsnegative class.

The aim of the detection algorithm is to reducethe FPs and FNs, as these represents error and aremisclassified data, meanwhile increase TPs andTNs. For some attacks, detection algorithm worksperfectly but for others it is unable to detect. Todetect that types of attack, the algorithm needs tobe modified accordingly so for that particular typeof attacks its performance gets improve.

3. DISCUSSION

In this section results of different listedmethods in literature is discussed. Summary ofdifferent literature has been given in Table 4. Italso includes a number of features selected,using feature selection method, and learningmethods, used for the detection of an intrusion.Not a single detection method has a good resultfor all the classes of KDD99.




39

In [35] genetic and fuzzy method selected 25features and detection method, fuzzy knowledgebase rule, was used to develop a detection methodusing these 25 features. It has good TPRfor types, wrapper based and filter based, offeature selection has been adopted.

Moreover, some author used nominal to binaryconversion in data preprocessing process ofKDD99. Some of them treated all differentnominal values with different numeric valueswhile someused same numeric value for differentnominalvalues. The overall accuracy rate for detectionmethods used in the above literature is given inFigure 3. This includes the total numberof correctly detected instances into its respectiveclasses. It can be seen that the rough set classifier,which was trained by eight features has very lowaccuracy rate. Fuzzy Association Rule [35] whichwas trained by 25 features has better resultcompared to [32]. There are four detectionmethods that used SVM classifier. Proposedmethod in [39] is slightly better than [37] SVMdetection model but [35] SVM used 25 featurewhile [37] SVM used just 6 features. Similarly,although SVM [36] has less accuracy ratecompared to SVM [39] having99.4% accuracy rate but [36] SVM has used tenfeatures while SVM [36] collectively used 32features. In [36] author found separate features foreach class and also combined R2L and U2Rclasses into one class. Here it is worthnoticing that KDD99 consists of a normal classand four attack classes, some methodologies justused two classes for their work, by combining allfour attack classes into, one attack or abnormalclass and the second one is a normal class.Whereas some of them treated all the fiveclasses separately. If all the attack classes arecombined it will give good TPR as DoS class hasa huge number of examples in KDD99 dataset.Enormous examples of DoS dominant otherattack classes. As most of other attack classeshave less TPR due to an insufficient amount ofexamples. So by combining DoS class with otherattack classes, commutatively high TPR ignoringthe FPR for other attack classes.




40

Feature SelectionMethod

No. of Feature Feature Selection Type Selected Features Detection Algorithm Authors

SVM 13 Wrapper based A, B, C, E, F, I, W, X,AC, AF, AG, AH, AJ SVM Mukkammal et al. [22]

SA and DT 23 Wrapper based ------------------------- SVM Lin et al. [23]

SVM 28 Filter based ------------------------- SVM George [24]Information Gain,

Gainratio, GMDH

20 Filter based ------------------------ GMDH Baig et al. [25]

CANN 19 Filter based

B, D, F, L, W, X, Y,Z, AC, AD, AE, AF,

AG, AH, AI, AJ,AK,

AL, AM

K-NN Lin et al. [26]

Genetic Fuzzy 25 Filter based C, E, F, W, Y, AF, AG,AI

Fuzzy KnowledgeBase Rules Tsang et al. [35]

Rough Set 7 Filter based --------------------------- NN Ghali [31]

Rough Set Theory 8 Filter based --------------------------- Rough Set Classifier Rassam and Maroof[32]

Rough SET 16 Filter based

A, C, D, E, F, J, K,M,

N, P, Q, R, S, V, W,X, Y, Z, AA, AB,

AC, AD, AE, AF, AI,AJ,

AK, AL, AM, AN,AO

SVM Yao et al. [33]

Rough set 6 Wrapper based ------------------------- SVM Shrivastava and Jain[34]

MC and BA 10 Filter based B, C, H, M, T, X,AF,

AK, AM, AN

SVM Rufai et al. [36]

Rough Set And PSO 6 Wrapper based B, D, X, AA, AH, AI SVM Zainal et al. [37]

LGP and BA 6 Wrapper based C, L, W, X, AA, AI SVM Hasani et al. [38]

ACO 32 Wrapper based --------------------- SVM Hua et al. [39]

Cuttlefish Algorithm 10 Wrapper based -------------------- DT Eesa et al. [41]

Table 4: Results Of Different Detection Methods For Feature Selection Methods

Figure 3: Accuracy Rate For Intrusion Detection




41

4. OPEN ISSUES AND CHALLENGES

Network speed is moving at high speed likeGbps for which real time anomaly detectionmethods are demanding. Therefore, only therelevant and correlated features can decrease timecomplexity for classification algorithm to developefficient detection model. Mostly detection modelsare being evaluated using KDD99, which is a veryold dataset. Although, many analyst believe thatattacks in nowadays are the variants of the attacksfrom KDD99. If a detection model cannot achieve100% TPR and 0% of FPR for all classes ofKDD99, so how it will work for other latest attackclasses. The data distribution in KDD99 is notuniform that has an impact on the detection model.The attacks in KDD99 are generated by thesynthetic way, so the flaws in those systems arealso included in KDD99. Many work has done byconsidering the data reduction to achieve lesstraining and detection time for IDS while ignoringdetection rate.KDD99 test dataset contains some novel attacks,but many authors ignored to find out whether theirmodel was able to detect those novel attacks or not.They only give the cumulative result for all attackclasses. Many IDS used hybrid detection approach,by combining signature based and anomaly based.This can help to reduce FPR but it can increasetime complexity for IDS operation.

5. CONCLUSION

Intrusion detection plays a vital role in thesecurity of communication. IDS must beintelligent enough to have high detection ratewith the less false positive rate. The high amountof data and irrelevant, redundant features make itdifficult to build the prediction model for anomalydetection method. Furthermore, fast and efficientdetection method can be achieved using robustfeatures. This leads to real time detection ofthe intrusions in networks. This paper reviewsfeature selection techniques in the field of networkanomaly detection. Feature selection techniqueshave been discussed in machine learning, roughset theory, and evolution computing. Comparativeresults for these techniques have been given anddiscussed critically. Open issues and challengesare being discussed. It can be concluded thatfeature selection for intrusion detection method isvery critical because the detection method highlydepends on the features of the input data.

REFERENCES

[1] Shah, B. And B.H. Trivedi. “ReducingFeatures Of KDD CUP 1999 Dataset ForAnomaly Detection Using Back PropagationNeural Network”. In Advanced Computing &Communication Technologies (ACCT), 2015Fifth International Conference On. IEEE, 2015.

[2] Zheng, D. And C. Zhang, “Selecting FeatureSubset For Large-Scale Data Via FuzzyRough Approach”. Journal Of ConvergenceInformation Technology, 8(9): P. 109, 2013.

[3] Kenkre, P.S., A. Pai, And L. Colaco. “RealTime Intrusion Detection And PreventionSystem”. In Proceedings Of The 3rdInternational Conference On Frontiers OfIntelligent Computing: Theory AndApplications (FICTA) 2014, SpringerInternational Publishing, 2015.

[4] Chandola, V., A. Banerjee, And V.Kumar, “Anomaly Detection: A Survey”. ACMComputing Surveys (CSUR), 41(3): P. 15,2009.

[5] Xiaohui, Z. “Research And Design OfIntrusion Detection System In ComputerNetwork”. In 2015 International ConferenceOn Social Science And TechnologyEducation, Atlantis Press, 2015.

[6] Caselli, M., E. Zambon, And F. Kargl.“Sequence- Aware Intrusion Detection InIndustrial Control Systems”. In ProceedingsOf The 1st ACM Workshop On Cyber-PhysicalSystem Security, ACM, 2015.

[7] Bhuyan, M.H., D. Bhattacharyya, And J.K.Kalita, “Network Anomaly Detection:Methods, Systems And Tools”.Communications Surveys & Tutorials, IEEE,

16(1): P. 303-336, 2014.[8] Nguyen, H.A. And D. Choi, “Application Of

Data Mining To Network Intrusion Detection:Classifier Selection Model”, In Challenges ForNext Generation Network Operations AndService Management, Springer. P. 399-408,2008.

[9] Othman, Z.A., Et Al., “Record To RecordFeature Selection Algorithm For NetworkIntrusion Detection”. International Journal OfAdvancements In Computing Technology,6(2): P. 163, 2014.

[10] Garcia-Teodoro, P., Et Al., “Anomaly-BasedNetwork Intrusion Detection: Techniques,Systems And Challenges”. Computers &Security, 28(1): P. 18-28, 2009.

[11] Farhoud Hosseinpour, P.V.A., FahimehFarahnakian, Juha Plosila, Timo Hämäläinen,“Artificial Immune System Based Intrusion




42

Detection: Innate Immunity Using AnUnsupervised Learning Approach”. JDCTA(International Journal Of Digital ContentTechnology And Its Applications), Volume8(5): P. 01-12, 2014.

[12] Friedberg, I., F. Skopik, And R. Fiedler,“Cyber Situational Awareness ThroughNetwork Anomaly Detection: State Of TheArt And New Approaches”. E & IElektrotechnik Und Informationstechnik132(2): P. 101-105, 2015.

[13] Chen, L.-S. And J.-S. Syu. “Feature ExtractionBased Approaches For Improving ThePerformance Of Intrusion Detection Systems”.In Proceedings Of The InternationalMulticonference Of Engineers And ComputerScientists. 2015.

[14] Stańczyk, U. And L.C. Jain, “FeatureSelection For Data And Pattern Recognition:An Introduction, In Feature Selection ForData And Pattern Recognition”, Springer. P.1-7, 2015.

[15] Pramokchon, P. And P. Piamsa-Nga, “Content-Adaptive Feature Selection For ClassifyingClass- Imbalanced Data”. InternationalJournal Of Advancements In ComputingTechnology, 6(5): P. 66, 2014.

[16] García, S., J. Luengo, And F. Herrera,“Feature Selection, In Data Preprocessing” InData Mining, Springer. P. 163-193, 2015.

[17] Gediga, G. And I. Düntsch, “RoughSet Data Analysis—A Road To Non-Invasive Knowledge Discovery”, MethodosPublishers, UK, 2000.

[18] Wang, S., Et Al., “Subspace Learning ForUnsupervised Feature Selection Via MatrixFactorization”. Pattern Recognition, 48(1): P.10-19, 2015

[19] Zhang, F., Et Al., “Adversarial FeatureSelection Against Evasion Attacks”, 2015.

[20] Pitt, E. And R. Nayak. “The Use OfVarious Data Mining And Feature SelectionMethods In The Analysis Of A PopulationSurvey Dataset”. In Proceedings Of The 2ndInternational Workshop On IntegratingArtificial Intelligence And Data Mining-Volume 84, Australian Computer Society, Inc,2007.

[21] Wang, A., Et Al., “AcceleratingWrapper-Based Feature Selection With K-Nearest-Neighbor”. Knowledge-BasedSystems, 83: P. 81-91, 2015.

[22] Mukkamala, S., G. Janoski, And A. Sung.“Intrusion Detection: Support VectorMachines And Neural Networks”. In

Proceedings Of The IEEE International JointConference On Neural Networks (ANNIE).2002.

[23] Lin, S.-W., Et Al., “An Intelligent AlgorithmWith Feature Selection And DecisionRules Applied To Anomaly IntrusionDetection”. Applied Soft Computing,12(10): P. 3285-3290, 2012.

[24] George, A., “Anomaly Detection Based OnMachine Learning: Dimensionality ReductionUsing PCA And Classification Using SVM”‘. International Journal Of ComputerApplications (0975–8887) Volume, 2012.

[25] Baig, Z.A., S.M. Sait, And A. Shaheen,“GMDH- Based Networks For IntelligentIntrusion Detection. Engineering ApplicationsOf Artificial Intelligence”, 26(7): P. 1731-1740, 2013.

[26] Lin, W.-C., S.-W. Ke, And C.-F. Tsai, CANN:“An Intrusion Detection System Based OnCombining Cluster Centers And NearestNeighbors”. Knowledge-Based Systems, 2015.

[27] Tsai, C.-F. And C.-Y. Lin, “A Triangle AreaBased Nearest Neighbors Approach ToIntrusion Detection”. Pattern Recognition,43(1): P. 222-229, 2010.

[28] Xue-Qin, Z., G. Chun-Hua, And L. Jia-Jin.“Intrusion Detection System Based On FeatureSelection And Support Vector Machine”. InCommunications And Networking In China,2006. Chinacom'06. First InternationalConference On, IEEE, 2006.

[29] Rissino, S. And G. Lambert-Torres, “Rough SetTheory–Fundamental Concepts, Principals,Data Extraction, And Applications”. DataMining And Knowledge Discovery In RealLife Applications, P. 438, 2009.

[30] Pawlak, Z., “Rough Sets And IntelligentData Analysis”. Information Sciences, 147(1–4): P. 1-12, 2002.

[31] Ghali, N.I., “Feature Selection For EffectiveAnomaly- Based Intrusion Detection”.International Journal Of Computer ScienceAnd Network Security, 9(3): P. 285-289,2009.

[32] Rassam, M.A. And M.A. Maarof, “ArtificialImmune Network Clustering Approach ForAnomaly Intrusion Detection”. Journal OfAdvances In Information Technology, 3(3):P. 147-154, 2012.

[33] Yao, J., S. Zhao, And L. Fan, “An EnhancedSupport Vector Machine Model ForIntrusion Detection, In Rough Sets AndKnowledge Technology”, Springer. P. 538-543, 2006.




43

[34] Shrivastava, S.K. And P. Jain, “EffectiveAnomaly Based Intrusion Detection UsingRough Set Theory And Support VectorMachine”. International Journal OfComputer Applications, 18(3): P. 35-41, 2011.

[35] Tsang, C.-H., S. Kwong, And H. Wang,“Genetic- Fuzzy Rule Mining ApproachAnd Evaluation Of Feature SelectionTechniques For Anomaly IntrusionDetection”. Pattern Recognition, 40(9): P.2373- 2391, 2007.

[36] Rufai, K.I., R.C. Muniyandi, And Z.A.Othman, “Improving Bee Algorithm BasedFeature Selection In Intrusion DetectionSystem Using Membrane Computing”.Journal Of Networks, 9(3): P. 523-529, 2014.

[37] Zainal, A., M.A. Maarof, And S.M.Shamsuddin, “Feature Selection UsingRough-DPSO In Anomaly IntrusionDetection”, In Computational Science AndIts Applications–ICCSA 2007, Springer. P.512-524, 2007.

[38] Hasani, S.R., Z.A. Othman, And S.M.M.Kahaki, “Hybrid Feature Selection AlgorithmFor Intrusion Detection System”. Journal OfComputer Science, 10(6): P. 1015-1025,2014.

[39] Gao, H.-H., H.-H. Yang, And X.-Y. Wang.“Ant Colony Optimization Based NetworkIntrusion Feature Selection And Detection”.In Machine Learning And Cybernetics,2005. Proceedings Of 2005International Conference On, IEEE, 2005.

[40] Rais, H.M., Z.A. Othman, And A.R. Hamdan.“Improved Dynamic Ant Colony System(DACS) On Symmetric Traveling SalesmanProblem (TSP)”. In Intelligent AndAdvanced Systems, 2007. ICIAS 2007.International Conference On, IEEE, 2007.

[41] Eesa, A.S., Z. Orman, And A.M.A.Brifcani, “A Novel Feature-SelectionApproach Based On The CuttlefishOptimization Algorithm For IntrusionDetection Systems”. Expert SystemsWith Applications, 42(5): P. 2670-2679, 2015.

[42] Tavallaee, Mahbod, Et Al. "A DetailedAnalysis Of The KDD CUP 99 Data Set."Proceedings Of The Second IEEE SymposiumOn Computational Intelligence For SecurityAnd Defence Applications 2009, 2009.

[43] Kayacik, H. Günes, A. Nur Zincir-Heywood, And Malcolm I. Heywood."Selecting Features For Intrusion Detection: AFeature Relevance Analysis On KDD 99Intrusion Detection Datasets." Proceedings

Of The Third Annual Conference On Privacy,Security And Trust, 2005.

[44] Lincoln Laboratory, MassachusettsInstitute Of Technology. [Cited 2015January 15, 2015]; Available From:Http://Www.Ll.Mit.Edu/Mission/Communications/Cybe R/Cstcorpora/Ideval/Data/.

[45] Faisal, M.A., Et Al., “Data-Stream-BasedIntrusion Detection System For AdvancedMetering Infrastructure In Smart Grid: AFeasibility Study”. Systems Journal, IEEE 9(1):P. 31-44, 2015.

[46] KDD Cup 1999 Data. AvailableFrom:Http://Kdd.Ics.Uci.Edu/Databases/Kddcup99/Kddcup99.Html.

[47] Lei, L., T”He Network Abnormal IntrusionDetection Based On The Improved ArtificialFish Swarm Algorithm Features Selection”.Journal Of Convergence InformationTechnology, 8(6), 2013.

[48] Mchugh, J., “Testing Intrusion DetectionSystems: A Critique Of The 1998 And 1999DARPA Intrusion Detection SystemEvaluations As Performed By LincolnLaboratory”. ACM Transactions OnInformation And System Security, 3(4): P.262-294, 2000.

issn: 1992-8645 feature selection in intrusion detection ... · pdf fileinput features give...

Documents