Practical real-time intrusion detection using machine learning approaches

Download Practical real-time intrusion detection using machine learning approaches

Post on 04-Sep-2016




1 download

Embed Size (px)


<ul><li><p>in</p><p>orty ofLuan</p><p>Denial of ServiceProbe</p><p>netty oionciehinhowlope</p><p>feature selection criterions. Our RT-IDS can distinguish normal network activities from main attack types</p><p>sentialcreasiality, agly cofortify</p><p>the rewall. An IDS collects information from a network or com-puter system, and analyzes the information for symptoms of sys-tem breaches. As shown schematically in Fig. 1, a network IDSmonitors network data and gives an alarm signal to the computeruser or network administrator when it detects antagonistic activityon an open port. This signal allows the recipient to inspect the sys-tem for more symptoms of unauthorized network activities.</p><p>fraction of a minute, so that the network administrator is notiedand can stop the on-going attack. Our approach also could be ap-plied as host-based detection. We designed our intrusion detectionsystem (IDS) using a misuse detection technique. The misusedetection approach can classify attacks into categories. In contrast,the anomaly detection approach can only differentiate betweennormal activity and abnormal/attack activity. Although, there aremany possible features of network data that could serve as inputto an IDS, we propose to consider only 12 features of network traf-c data extracted from the headers of data packets. We show thatthese 12 features are effective in identifying normal network</p><p> Corresponding author. Tel.: +66 2 470 9089; fax: +66 2 872 5050.</p><p>Computer Communications 34 (2011) 22272235</p><p>Contents lists availab</p><p>m</p><p>elsE-mail address: (N. Wattanapongsakorn).malicious activities and network threats. Therefore, a network sys-tem must use one or more security tools such as a rewall, anti-virus software or an intrusion detection system to protect impor-tant data/services from hackers or intruders.</p><p>Relying on a rewall system alone is not sufcient to prevent acorporate network from all types of network attacks. This is be-cause a rewall cannot defend the network against intrusion at-tempts on open ports required for network services. Hence, anintrusion detection system (IDS) is usually installed to complement</p><p>are off-line detection and on-line detection. An off-line networkintrusion detection system periodically analyzes or audits networkinformation or log data to identify suspected activities or intru-sions. In an on-line network intrusion detection system, the net-work trafc data has to be inspected as it arrives for detectingnetwork attacks or malicious activities.</p><p>In this paper, we focus on real-time network-based intrusiondetection where the incoming network data is captured on-lineand the detection result is reported instantaneously or within a1. Introduction</p><p>Internet services have become esas well as to individuals. With the inservices, the availability, condentiinformation have become increasinintrusions. Enterprises are forced to0140-3664/$ - see front matter 2011 Elsevier B.V. Adoi:10.1016/j.comcom.2011.07.001(Probe and Denial of Service (DoS)) with a detection rate higher than 98% within 2 s. We also developed anew post-processing procedure to reduce the false-alarm rate as well as increase the reliability anddetection accuracy of the intrusion detection system.</p><p> 2011 Elsevier B.V. All rights reserved.</p><p>to business commerceng reliance on networknd integrity of criticalmpromised by remotetheir networks against</p><p>Network intrusion detection systems can be classied into twotypes which are host-based and network-based intrusion detec-tion. Host-based detection captures and analyzes network data atthe attacked system itself while the network-based detection cap-tures and inspects online network data at the network gateway orserver, before the attack reaches the end users. In addition, net-work intrusion detection systems can operate in two modes whichNetwork intrusion detectionMachine learning technique to classify on-line network data as normal or attack data. We also identied 12 essential fea-</p><p>tures of network data which are relevant to detecting network attacks using the information gain as ourPractical real-time intrusion detection us</p><p>Phurivit Sangkatsanee a, Naruemon WattanapongsakaDepartment of Computer Engineering, Faculty of Engineering, King Mongkuts UniversibNational Electronics and Computer Technology Center, 112 Phahonyothin Road, Klong</p><p>a r t i c l e i n f o</p><p>Article history:Received 6 October 2010Received in revised form 11 March 2011Accepted 2 July 2011Available online 13 July 2011</p><p>Keywords:</p><p>a b s t r a c t</p><p>The growing prevalence ofcondentiality, and integripropose a real-time intrusapproach is simple and efdifferent well-known macOur experimental results sTherefore, we further deve</p><p>Computer Co</p><p>journal homepage: www.ll rights reserved.g machine learning approaches</p><p>n a,, Chalermpol Charnsripinyo bTechnology Thonburi, 126 Pracha-utid Road, Toongkru, Bangkok 10140, Thailandg, Pathumthani 12120, Thailand</p><p>work attacks is a well-known problem which can impact the availability,f critical information for both individuals and enterprises. In this paper, wedetection approach using a supervised machine learning technique. Ournt, and can be used with many machine learning techniques. We appliede learning techniques to evaluate the performance of our IDS approach.that the Decision Tree technique can outperform the other techniques.</p><p>d a real-time intrusion detection system (RT-IDS) using the Decision Tree</p><p>le at ScienceDirect</p><p>munications</p><p>evier .com/locate /comcom</p></li><li><p>ommactivity and classifying main attack activities into two attack typesnamely Port Scanning (PS or probing) and Denial of Service (DoS).Using a small number of features reduces the complexity of dataanalysis and thus can increase the detection speed and reducecomputer resource (CPU and memory) consumption.</p><p>We are also interested in designing an IDS by applying well-known machine learning algorithms. This will give us a simplebut efcient intrusion detection approach, so that anyone can eas-ily adopt it. To do that, we consider and compare several existingmachine learning algorithms which are Decision Tree, Ripper Rule,Back-Propagation Neural Network, Radial Basis Function NeuralNetwork, Bayesian Network, and Nave Bayesian for classifyingincoming network data. Our experimental results showed thatthe Decision Tree was superior to the other approaches in termsof detection accuracy. Therefore, we further developed the Deci-sion Tree approach for our Real-Time Intrusion Detection System(RT-IDS), where the input network data is captured on-line in a realenvironment. In addition, we designed an optional post-processingprocedure for our real-time intrusion detection system to reducethe false alarm rate of the intrusion detection and thus increaseits reliability. We evaluated our IDS in terms of the detectionspeed, CPU usage, memory consumption, and detection accuracy.</p><p>The remainder of this paper is organized as follows. In the nextsection, we describe and review related work in network intrusiondetection systems. In Section 3, we present our research ap-proaches, using information gain to select input features. We alsoprovide a brief overview of the machine learning techniques thatwe evaluated. In Section 4, we present our real-time IDS (RT-IDS)process and algorithm. Each phase of the RT-IDS process is de-scribed in detail. We present the 12 essential features of networkdata relevant to classifying network attacks. A post-processingmethod for reducing false alarm rate of intrusion detection is alsopresented. In Section 5, we present the details of our experimentsand performance evaluation. The experimental design and experi-</p><p>Fig. 1. Network intrusion detection system environment.</p><p>2228 P. Sangkatsanee et al. / Computer Cmental data are described. The experimental results for differentscenarios are presented. Lastly, in Section 6, we offer conclusionsfrom our research work and summarize its contribution.</p><p>2. Literature survey</p><p>In previous research, most researchers have concentrated onoff-line intrusion detection using a well-known KDD99 benchmarkdataset to verify their IDS development. The KDD99 dataset is astatistically preprocessed dataset which has been available fromDARPA since 1999 [13]. There also exist a few on-line intrusiondetection approaches, which are discussed in Section 2.2.</p><p>2.1. Off-line network intrusion detection</p><p>The articial neural network approach is one of the most popu-lar techniques for the design of IDS. Jirapummin et al. [1] proposeda hybrid neural network using a combination of Self-OrganizingMap (SOM) and Resilient Back-Propagation Neural Network(BPNN). To evaluate their approach, they used an available well-known preprocessed dataset which is KDD99 [2]. The KDD99 data-set is a network packet dataset consisting of normal network activ-ity as well as many network attack types. The dataset is based onthe DARPA98 dataset from MIT Lincoln laboratory, which providesanswer class (labeled data) for evaluation of intrusion detection.</p><p>Pan et al. [3] designed a hybrid system by using a BPNN and aC4.5 Decision Tree considering the KDD99 dataset. The resultsshowed that using only a BPNN without C4.5 Decision Tree, theirsystem could not detect the network attack types such as User toRoot (U2R) and Root to Local (R2L) at all. Moradi and Zulkernine[4] used a Multi- Layer Perceptron (MLP) articial neural networkin off-line mode to classify normal network activity, Satan (Probe)attacks and Neptune attacks using the KDD99 dataset. Ngamwitt-hayanon et al. [5] designed a multi-state IDS system to classify nor-mal data and each attack type using the KDD99 dataset. Theirresults showed a higher detection rate in each classication cate-gory than when only a single state was used to classify allcategories.</p><p>Fuzzy Sets is another techniqueoftenused for IDS. This techniquegenerally falls into two categories, fuzzy misuse detection [6,7] andfuzzy anomaly detection [810]. Abraham and Jain [6] used threetypes of fuzzy rules to compare with linear generic programming(LPG), Decision Tree, and Support Vector Machines (SVM). Their re-sult showed that one of their fuzzy rules gave the best detection rateusing the 41 features of the DARPA 1998 dataset. Liao et al. [7] usedfuzzy logic and an expert system with the DARPA 2000 dataset andachievedmore than 91.5% detection rate over all attack types, whilereducing complexity of traditional techniques for ranking fuzzynumbers. Ngamwitthayanon et al. [8] used Fuzzy Adaptive Reso-nance Theory (F-ART) to implement network anomaly intrusiondetection with the KDD99 dataset. Similarly, Toosi and Kahani [9]andTsang et al. [10] also used fuzzy rules as their intrusiondetectionapproaches and evaluated with the KDD99 dataset.</p><p>Pukkawanna et al. [11] proposed BLINd classication or BLINCmodel which is a graphlet model of network packet patterns. BLINCcould detect normal network activity and four network attacktypes using a signature-based detection approach.</p><p>Lee et al. [27] proposed a Decision Tree approach using ID3learning algorithm for anomaly detection (unknown or new at-tacks). They considered four types of attacks which are DoS, Probe,U2R and R2L using the DARPA dataset that KDD99 dataset is basedon. Most of their anomaly detection rates were lower than thedetection rates with known/trained data.</p><p>Other off-line intrusion detection approaches have also beenproposed using the KDD99 as input dataset such as Katos et al.[12] where analysis of data and clustering evaluation were investi-gated, and Chen et al. [13] where anomaly score of a packet wascomputed based on the deviation from the normal behavior.</p><p>2.2. On-line (real-time) network intrusion detection</p><p>A system that can detect network intrusion while an attack isoccurring is called a real-timedetection system. A real-time IDS cap-tures the present network trafc data which is on-line data. In thispaper, the terms on-line detection and real-time detection areused interchangeably. In our literature survey, we found very fewon-line (real-time) network IDS approaches that have been pro-posed previously. We discuss these few studies in this section.</p><p>Labib and Vemuri [14] developed a real-time IDS using Self-Organizing Maps (SOM) to detect normal network activity and</p><p>unications 34 (2011) 22272235DoS attack. They preprocessed their dataset to have 10 featuresfor each data record. Each record contained information of 50 pack-ets. Their IDS was evaluated by human visualization for different</p></li><li><p>was reported.Puttini et al. [15] used a Bayesian classication model for anom-</p><p>ommaly detection to classify normal network activity and attack using a3-month training dataset and a 1-month test dataset. They evalu-ated their approach by adjusting a penalty value to see how it af-fected the classication results. They also needed human expertto visualize the normal and abnormal network behaviors. Nodetection rate was reported.</p><p>Amini et al. [16] designed a real-time IDS using two unsuper-vised neural network algorithms which are Adaptive ResonanceTheory (ART) and Self-Organizing Map (SOM). They classied twoattack types plus normal data during a 4-day experiment with a27-feature dataset, where each feature captures number of occur-rences of an event in each time interval. The detection resultsshowed that the ART-2 gave higher detection speed and detectionrate than the SOM. The detection rate was reported to be over 97%,separating normal trafc data from network attacks. However, theattacks were not classied into types or categories.</p><p>Su et al. [17] created a real-time network IDS using fuzzy asso-ciation rules and conducted their experiments by using four com-puters with 30 DoS attack types in WIN32. They could separate thenormal network activity from network attacks but they did notidentify the attack type. They preprocessed the network data tohave 16 features. After testing, the results showed that the 30DoS attack types have a similarity ratio of less than 0.4 while nor-mal network activity gave a similarity ratio more than 0.75. Thesimilarity ratio represents how close or similar the data is to nor-mal data, i.e. 1.0 means that they are perfectly matched.</p><p>Li et al. [18] developed a high-speed intrusion detection modelusing TCP/IP header information. However, their approach is lim-ited to only one type of attack which is DoS.</p><p>A well-known intrusion detection tool called SNORT was stud-ied in [19]. SNORT has become a commercial tool. Its attack signa-ture rules are available only to their registered customers. Thesignature rules or patches have to be frequently updated and in-stalled in order to detect current attack types.</p><p>In summary, most researchers proposed IDS classication algo-rithms based on machine learning techniques and used KDD 99dataset to evaluate their IDS approaches in an off-line environmentwithout considering real-time processing/detection. The KDD99dataset, which was created about 10 years ago, is complex andlacks of many current attack types. While this approach is of theo-retical interest, it provides only post hoc assistance to n...</p></li></ul>


View more >