base 1

17
66 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 1, JANUARY 2014 Online Adaboost-Based Parameterized Methods for Dynamic Distributed Network Intrusion Detection Weiming Hu, Jun Gao, Yanguo Wang, Ou Wu, and Stephen Maybank Abstract —Current network intrusion detection systems lack adaptability to the frequently changing network environments. Furthermore, intrusion detection in the new distributed archi- tectures is now a major requirement. In this paper, we propose two online Adaboost-based intrusion detection algorithms. In the first algorithm, a traditional online Adaboost process is used where decision stumps are used as weak classifiers. In the second algorithm, an improved online Adaboost process is proposed, and online Gaussian mixture models (GMMs) are used as weak classifiers. We further propose a distributed intrusion detection framework, in which a local parameterized detection model is constructed in each node using the online Adaboost algorithm. A global detection model is constructed in each node by combining the local parametric models using a small number of samples in the node. This combination is achieved using an algorithm based on particle swarm optimization (PSO) and support vector ma- chines. The global model in each node is used to detect intrusions. Experimental results show that the improved online Adaboost process with GMMs obtains a higher detection rate and a lower false alarm rate than the traditional online Adaboost process that uses decision stumps. Both the algorithms outperform existing intrusion detection algorithms. It is also shown that our PSO, and SVM-based algorithm effectively combines the local detection models into the global model in each node; the global model in a node can handle the intrusion types that are found in other nodes, without sharing the samples of these intrusion types. Index Terms—Dynamic distributed detection, network intru- sions, online Adaboost learning, parameterized model. I. Introduction N ETWORK ATTACK detection is one of the most impor- tant problems in network information security. Currently there are mainly firewall, network-based intrusion detection and prevention systems (NIDS/NIPS) and unified threat man- agement (UTM) like devices to detect attacks in network in- Manuscript received December 16, 2011; revised August 13, 2012; accepted February 7, 2013. Date of publication March 27, 2013; date of current version December 12, 2013. This work is partly supported by NSFC, under Grant 60935002, the National 863 High-Tech Research and Development Program of China, under Grant 2012AA012504, the Natural Science Foundation of Beijing, under Grant 4121003, and The Project Supported by Guangdong Natural Science Foundation, under Grant S2012020011081. This paper was recommended by Associate Editor P. Bhattacharya. W. Hu, J. Gao, and O. Wu are with National Laboratory of Pattern Recogni- tion, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China (e-mail: [email protected]; [email protected]; [email protected]). Y. Wang is with Institute of Infrastructure Inspection, China Academy of Railway Sciences, Beijing 100190, China (e-mail: [email protected]). S. Maybank is with School of Computer Science and Informa- tion Systems, Birkbeck College, London WC1E 7HX, U.K. (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCYB.2013.2247592 frastructure. NIDS/NIPS detect and prevent network behaviors that violate or endanger network security. Basically, firewalls are used to block certain types of traffic to improve the security. NIDS and firewalls can be linked to block network attacks. UTM devices combine firewall, NIDS/NIPS, and other capabilities onto a single device to detect similar events as standalone firewalls and NIDS/NIPS devices. Deep packet inspection (DPI) [52] adds analysis on the application layer, and then recognizes various applications and their contents. DPI can incorporate NIDS into firewalls. It can increase the accuracy of intrusion detection, but it is more time-consuming, in contrast to traditional package header analysis. This paper focuses on investigation of NIDS. The current practical solutions for NIDS used in industry are misuse-based methods that utilize signatures of attacks to detect intrusions by modeling each type of attack. As typical misuse detection methods, pattern matching methods search packages for the attack features by utilizing protocol rules and string matching. Pattern matching methods can effectively detect the well-known intrusions. But they rely on the timely generation of attack signatures, and fail to detect novel and unknown attacks. In the case of rapid proliferation of novel and unknown attacks, any defense based on signatures of known attacks becomes impossible. Moreover, the increasing diversity of attacks obstructs modeling signatures. Machine learning deals with automatically inferring and generalizing dependencies from data to allow extrapolation of dependencies to unseen data. Machine learning methods for intrusion detection model both attack data and normal network data, and allow for detection of unknown attacks using the network features [60]. This paper focuses on machine learning-based NIDS. The machine learning-based intrusion detection methods can be classified as statistics based, data mining based, and classification based. All the three classes of methods first extract low-level features and then learn rules or models that are used to detect intrusions. A brief review of each class of methods is given below. 1) Statistics-based methods construct statistical models of network connections to determine whether a new con- nection is an attack. For instance, Denning [1] construct statistical profiles for normal behaviors. The profiles are used to detect anomalous behaviors that are treated as attacks. Caberera et al. [2] adopt the Kolmogorov- Smirnov test to compare observation network signals with normal behavior signals, assuming that the num- ber of observed events in a time segment obeys the 2168-2267 c 2013 IEEE

Upload: vinoth1128

Post on 22-Dec-2015

212 views

Category:

Documents


0 download

DESCRIPTION

hi

TRANSCRIPT

66 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 1, JANUARY 2014

Online Adaboost-Based Parameterized Methods forDynamic Distributed Network Intrusion Detection

Weiming Hu, Jun Gao, Yanguo Wang, Ou Wu, and Stephen Maybank

Abstract—Current network intrusion detection systems lackadaptability to the frequently changing network environments.Furthermore, intrusion detection in the new distributed archi-tectures is now a major requirement. In this paper, we proposetwo online Adaboost-based intrusion detection algorithms. In thefirst algorithm, a traditional online Adaboost process is usedwhere decision stumps are used as weak classifiers. In the secondalgorithm, an improved online Adaboost process is proposed,and online Gaussian mixture models (GMMs) are used as weakclassifiers. We further propose a distributed intrusion detectionframework, in which a local parameterized detection model isconstructed in each node using the online Adaboost algorithm. Aglobal detection model is constructed in each node by combiningthe local parametric models using a small number of samples inthe node. This combination is achieved using an algorithm basedon particle swarm optimization (PSO) and support vector ma-chines. The global model in each node is used to detect intrusions.Experimental results show that the improved online Adaboostprocess with GMMs obtains a higher detection rate and a lowerfalse alarm rate than the traditional online Adaboost process thatuses decision stumps. Both the algorithms outperform existingintrusion detection algorithms. It is also shown that our PSO,and SVM-based algorithm effectively combines the local detectionmodels into the global model in each node; the global model ina node can handle the intrusion types that are found in othernodes, without sharing the samples of these intrusion types.

Index Terms—Dynamic distributed detection, network intru-sions, online Adaboost learning, parameterized model.

I. Introduction

NETWORK ATTACK detection is one of the most impor-tant problems in network information security. Currently

there are mainly firewall, network-based intrusion detectionand prevention systems (NIDS/NIPS) and unified threat man-agement (UTM) like devices to detect attacks in network in-

Manuscript received December 16, 2011; revised August 13, 2012; acceptedFebruary 7, 2013. Date of publication March 27, 2013; date of current versionDecember 12, 2013. This work is partly supported by NSFC, under Grant60935002, the National 863 High-Tech Research and Development Programof China, under Grant 2012AA012504, the Natural Science Foundation ofBeijing, under Grant 4121003, and The Project Supported by GuangdongNatural Science Foundation, under Grant S2012020011081. This paper wasrecommended by Associate Editor P. Bhattacharya.

W. Hu, J. Gao, and O. Wu are with National Laboratory of Pattern Recogni-tion, Institute of Automation, Chinese Academy of Sciences, Beijing 100190,China (e-mail: [email protected]; [email protected]; [email protected]).

Y. Wang is with Institute of Infrastructure Inspection, China Academy ofRailway Sciences, Beijing 100190, China (e-mail: [email protected]).

S. Maybank is with School of Computer Science and Informa-tion Systems, Birkbeck College, London WC1E 7HX, U.K. (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCYB.2013.2247592

frastructure. NIDS/NIPS detect and prevent network behaviorsthat violate or endanger network security. Basically, firewallsare used to block certain types of traffic to improve thesecurity. NIDS and firewalls can be linked to block networkattacks. UTM devices combine firewall, NIDS/NIPS, and othercapabilities onto a single device to detect similar events asstandalone firewalls and NIDS/NIPS devices. Deep packetinspection (DPI) [52] adds analysis on the application layer,and then recognizes various applications and their contents.DPI can incorporate NIDS into firewalls. It can increase theaccuracy of intrusion detection, but it is more time-consuming,in contrast to traditional package header analysis. This paperfocuses on investigation of NIDS.

The current practical solutions for NIDS used in industryare misuse-based methods that utilize signatures of attacks todetect intrusions by modeling each type of attack. As typicalmisuse detection methods, pattern matching methods searchpackages for the attack features by utilizing protocol rulesand string matching. Pattern matching methods can effectivelydetect the well-known intrusions. But they rely on the timelygeneration of attack signatures, and fail to detect novel andunknown attacks. In the case of rapid proliferation of novel andunknown attacks, any defense based on signatures of knownattacks becomes impossible. Moreover, the increasing diversityof attacks obstructs modeling signatures.

Machine learning deals with automatically inferring andgeneralizing dependencies from data to allow extrapolationof dependencies to unseen data. Machine learning methodsfor intrusion detection model both attack data and normalnetwork data, and allow for detection of unknown attacksusing the network features [60]. This paper focuses on machinelearning-based NIDS. The machine learning-based intrusiondetection methods can be classified as statistics based, datamining based, and classification based. All the three classesof methods first extract low-level features and then learn rulesor models that are used to detect intrusions. A brief review ofeach class of methods is given below.

1) Statistics-based methods construct statistical models ofnetwork connections to determine whether a new con-nection is an attack. For instance, Denning [1] constructstatistical profiles for normal behaviors. The profilesare used to detect anomalous behaviors that are treatedas attacks. Caberera et al. [2] adopt the Kolmogorov-Smirnov test to compare observation network signalswith normal behavior signals, assuming that the num-ber of observed events in a time segment obeys the

2168-2267 c© 2013 IEEE

HU et al.: ONLINE ADABOOST-BASED PARAMETERIZED METHODS FOR DYNAMIC DISTRIBUTED NETWORK INTRUSION DETECTION 67

Poisson distribution. Li and Manikopoulos [22] extractseveral representative parameters of network flows, andmodel these parameters using a hyperbolic distribution.Peng et al. [23] use a nonparametric cumulative sumalgorithm to analyze the statistics of network data, andfurther detect anomalies on the network.

2) Data mining-based methods mine rules that are used todetermine whether a new connection is an attack. For in-stance, Lee et al. [3] characterize normal network behav-iors using association rules and frequent episode rules[24]. Deviations from these rules indicate intrusions onthe network. Zhang et al. [40] use the random forestalgorithm to automatically build patterns of attacks.Otey et al. [4] propose an algorithm for mining frequentitemsets (groups of attribute value pairs) to combine cat-egorical and continuous attributes of data. The algorithmis extended to handle dynamic and streaming datasets.Zanero and Savaresi [25] first use unsupervised cluster-ing to reduce the network packet payload to a tractablesize, and then a traditional anomaly detection algorithmis applied to intrusion detection. Mabu et al. [49] detectintrusions by mining fuzzy class association rules usinggenetic network programming. Panigrahi and Sural [51]detect intrusions using fuzzy logic, which combinesevidence from a user’s current and past behaviors.

3) Classification-based methods construct a classifier thatis used to classify new connections as either attacks ornormal connections. For instance, Mukkamala et al. [30]use the support vector machine (SVM) to distinguishbetween normal network behaviors and attacks,and further identify important features for intrusiondetection. Mill and Inoue [31] propose the TreeSVM andArraySVM algorithms for reducing the inefficienciesthat arise when a sequential minimal optimizationalgorithm for intrusion detection is learnt from a largeset of training data. Zhang and Shen [7] use SVMs toimplement online intrusion detection. Kayacik et al.[5] propose an algorithm for intrusion detection basedon the Kohonen self-organizing feature map (SOM).Specific attention is given to a direct labeling of SOMnodes with the connection type. Bivens et al. [26]propose an intrusion detection method, in which SOMsare used for data clustering and multilayer perceptron(MLP) neural networks are used for detection.Hierarchical neural networks [28], evolutionary neuralnetworks [29], and MLP neural networks [27] havebeen applied to distinguish between attacks andnormal network behaviors. Hu and Heywood [6]combine SVM with SOM to detect network intrusions.Khor et al. [55] propose a dichotomization algorithm,in which rare categories are separated from the trainingdata and cascaded classifiers are trained to handle boththe rare categories and other categories. Xian et al. [32]combine the fuzzy K-means algorithm with the clonalselection algorithm to detect intrusions. Jiang et al. [33]use an incremental version of the K-means algorithmto detect intrusions. Hoglund et al. [34] extract featuresdescribing network behaviors from audit data, and use

the SOM to detect intrusions. Sarasamma et al. [18]propose a hierarchical SOM that selects different featuresubset combinations that are used in different layers ofthe SOM to detect intrusions. Song et al. [35] proposea hierarchical random subset selection-dynamic subsetselection (RSS-DSS) algorithm to detect intrusions.Lee et al. [8] propose an adaptive intrusion detectionalgorithm that combines the adaptive resonancetheory with the concept vector and the Mecer-Kernel.Jirapummin et al. [21] employ a hybrid neural networkmodel to detect TCP SYN flooding and port scanattacks. Eskin et al. [20] use a k-nearest neighbor(k-NN)-based algorithm and a SVM-based algorithm todetect anomalies.

Although there is much work on intrusion detection, severalissues are still open and require further research, especially inthe following areas:

1) Network environments and the intrusion training datachange rapidly over time, as new types of attack emerge.In addition, the size of the training data increasesover time and can become very large. Most existingalgorithms for training intrusion detectors are offline.The intrusion detector must be retrained periodically inbatch mode in order to keep up with the changes inthe network. This retraining is time consuming. Onlinetraining is more suitable for dynamic intrusion detectors.New data are used to update the detector and are thendiscarded. The key issue in online training is to maintainthe accuracy of the intrusion detector.

2) There are various types of attributes for network con-nection data, including both categorical and continuousones, and the value ranges for different attributes differgreatly—from {0, 1} to describe the normal or errorstatus of a connection, to [0, 107] to specify the numberof data bytes sent from source to destination. Thecombination of data with different attributes without lossof information is crucial to maintain the accuracy ofintrusion detectors.

3) In traditional centralized intrusion detection, in which allthe network data are sent to a central site for process-ing, the raw data communications occupy considerablenetwork bandwidth. There is a computational burden inthe central site and the privacy of the data obtained fromthe local nodes cannot be protected. Distributed detec-tion [36], which shares local intrusion detection modelslearned in local nodes, can reduce data communications,distribute the computational burden, and protect privacy.Otey et al. [4] construct a novel distributed algorithmfor detecting outliers (including network intrusions). Itslimitation is that many raw network data still need to beshared among distributed nodes. There is a requirementfor a distributed intrusion detection algorithm to makeonly a small number of communications between localnodes.

In this paper, we address the above challenges and pro-pose a classification-based framework for the dynamic dis-tributed network intrusion detection using the online Adaboost

68 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 1, JANUARY 2014

algorithm [58]. The Adaboost algorithm is one of the mostpopular machine learning algorithms. Its theoretical basisis sound, and its implementation is simple. Moreover, theAdaBoost algorithm corrects the misclassifications made byweak classifiers and it is less susceptible to over-fitting thanmost learning algorithms. Recognition performances of theAdaboost-based classifiers are generally encouraging. In ourframework, a hybrid of online weak classifiers and an onlineAdaboost process results in a parameterized local model ateach node for intrusion detection. The parametric models forall the nodes are combined into a global intrusion detectorin each node using a small number of samples, and thecombination is achieved using an algorithm based on particleswarm optimization (PSO) and SVMs. The global model ina node can handle the attack types that are found in othernodes, without sharing the samples of these attack types. Ourframework is original in the following ways.

1) In the Adaboost classifier, the weak classifiers are con-structed for each individual feature component, for bothcontinuous and categorical ones, in such a way thatthe relations between these features can be naturallyhandled, without any forced conversions between con-tinuous features and categorical features.

2) New algorithms are designed for local intrusion de-tection. The traditional online Adaboost process and anewly proposed online Adaboost process are applied toconstruct local intrusion detectors. The weak classifiersused by the traditional Adaboost process are decisionstumps. The new Adaboost process uses online Gaussianmixture models (GMM) as weak classifiers. In bothcases the local intrusion detectors can be updated online.The parameters in the weak classifiers and the strongclassifier construct a parametric local model.

3) The local parametric models for intrusion detection areshared between the nodes of the network. The volume ofcommunications is very small and it is not necessary toshare the private raw data from which the local modelsare learnt.

4) We propose a PSO and SVM-based algorithm for com-bining the local models into a global detector in eachnode. The global detector that obtains information fromother nodes obtains more accurate detection results thanthe local detector.

The remainder of this paper is organized as follows. Sec-tion II introduces the distributed intrusion detection frame-work. Section III describes the online Adaboost-based localintrusion detection models. Section IV presents the methodfor constructing the global detection models. Section V showsthe experimental results. Section VI summarizes the paper.

II. Overview of Our Framework

In the distributed intrusion detection framework, each nodeindependently constructs its own local intrusion detectionmodel according to its own data. By combining all the localmodels, at each node, a global model is trained using a smallnumber of the samples in the node, without sharing any of theoriginal training data between nodes. The global model is used

to detect intrusions at the node. Fig. 1 gives an overview of ourframework that consists of the modules of data preprocessing,local models, and global models.

1) Data Preprocessing: For each network connection,three groups of features [9] that are commonly used forintrusion detection are extracted: basic features of individualtransmission control protocol (TCP) connections, content fea-tures within a connection suggested by domain knowledge,and traffic features computed using a two-second time window[14]. The extracted feature values from a network connectionform a vector x = (x1, x2, . . . , xD), where D is the numberof feature components. There are continuous and categoricalfeatures, and the value ranges of the features may differ greatlyfrom each other. The framework for constructing these featurescan be found in [9]. A set of data is labeled for trainingpurposes. There are many types of attacks on the Internet.The attack samples are labeled as −1, −2, ... dependingon the attack type, and the normal samples are all labeledas +1.

2) Local Models: The construction of a local detectionmodel at each node includes the design of weak classifiers andAdaboost-based training. Each individual feature componentcorresponds to a weak classifier. In this way, the mixedattribute data for the network connections can be handlednaturally, and full use can be made of the information in eachfeature. The Adaboost training is implemented using only thelocal training samples at each node. After training, each nodecontains a parametric model that consists of the parameters ofthe weak classifiers and the ensemble weights.

3) Global Models: By sharing all the local parametricmodels, a global model is constructed using the PSO andSVM-based algorithm in each node. The global model in eachnode fuses the information learned from all the local nodesusing a small number of training samples in the node. Featurevectors of new network connections to the node are input tothe global classifier, and classified as either normal or attacks.The results of the global model in the node are used to updatethe local model in the node and the updated model is thenshared by other nodes.

III. Local Detection Models

The classical Adaboost algorithm [37] carries out the train-ing task in batch mode. A number of weak classifiers areconstructed using a training set. Weights, which indicate theimportance of the training samples, are derived from theclassification errors of the weak classifiers. The final strongclassifier is an ensemble of weak classifiers. The classificationerror of the final strong classifier converges to 0. However, theAdaboost algorithm based on offline learning is not suitable fornetworks. We apply online versions of Adaboost to constructthe local intrusion detection models. It is proved in [38] thatthe strong classifier obtained by the online Adaboost convergesto the strong classifier obtained by the offline Adaboost as thenumber of training samples increases.

In the following, we first introduce the weak classifiers forintrusion detection, and then describe the online Adaboost-based intrusion detection algorithms.

HU et al.: ONLINE ADABOOST-BASED PARAMETERIZED METHODS FOR DYNAMIC DISTRIBUTED NETWORK INTRUSION DETECTION 69

Fig. 1. Overview of the intrusion detection framework.

A. Weak Classifiers

Weak classifiers that can be updated online match therequirement of dynamic intrusion detection. We consider twotypes of weak classifier. The first type consists of decisionstumps for classifying attacks and normal behaviors. Thesecond type consists of online GMMs that model a distributionof values of each feature component for each attack type.

1) Decision Stumps: A decision stump is a decision treewith a root node and two leaf nodes. A decision stumpis constructed for each feature component of the networkconnection data.

For a categorical feature f, the set of attribute values Cf isdivided into two subsets C

fi and Cf

n with no intersection, andthe decision stump takes the form

hf (x) =

{1 xf ∈ Cf

n

−1 xf ∈ Cfi

(1)

where xf is the attribute value of x on the feature f. The subsetsC

fi and Cf

n are determined using the training samples: for anattribute value z on a feature f, all the training samples whoseattribute values on f are equal to z are found; if the numberof attack samples in these samples is more than the numberof normal samples, then z is assigned to C

fi , otherwise, z is

assigned to Cfn . In this way, the false alarm rate for the training

samples is minimized.For a continuous feature f, the range of attribute values is

split by a threshold v, and the decision stump takes the form

hf (x) =

{1 xf > v

−1 xf ≤ v. (2)

The threshold v is determined by minimizing the false alarmrate for the training samples.

The above design of weak classifiers for intrusion detectionhas the following advantages.

1) The decision stumps operate on individual feature com-ponents. This first deals naturally with combinationof information from categorical attributes and fromcontinuous attributes, and second deals with the largevariations in value ranges for the different feature com-ponents.

2) There is only one comparison operation in a decisionstump. The computation complexity for constructing thedecision stumps is very low, and online updating ofdecision stumps can be easily implemented when newtraining samples are obtained.

The limitation of the above design of weak classifiers isthat the decision stumps do not take into account the differenttypes of attacks. This may influence the performance of theAdaboost method.

2) Online GMM: For the samples of each attack type orthe normal samples, we use a GMM to model the data oneach feature component. Let c ∈ {+1, −1, −2, . . . , −M} bea sample label, where +1 represents the normal samples and“−1, −2,. . . −M” represents different types of attacks whereM is the number of attack types. The GMM model θc

j onthe jth feature component for the samples with label c isrepresented as

θcj =

{ωc

j(i), μcj(i), σc

j (i)}K

i=1(3)

where K is the number of GMM components indexed by i,and ω, μ, and σ represent the weight, mean, and standarddeviation for the corresponding component. Then, the weakclassifier on the jth feature is constructed as

hj(x) =

⎧⎨⎩ 1

if p(x|θ+1j ) > 1

Mp(x|θc

j )for c = −1, −2, ..., −M

−1 otherwise(4)

where “1/M” is used to weight the probabilities for attacktypes in order to balance the importance of the attack samples

70 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 1, JANUARY 2014

and the normal samples. In this way, the sum of the weightsfor all the types of attack samples is equal to the weight ofnormal samples, and then the false alarm rate is reduced forthe final ensemble classifier.

The traditional way for estimating the parameter vec-tors θc

j , using the K-means clustering and the expectation-maximization (EM) algorithm [41], from the training samples,is time-consuming and is not suitable for online learning. Inthis paper, we use the online EM-based algorithm [44] toestimate these parameters. Given a new sample (x, y) where xis the feature vector of the sample and y ∈ {1, −1, −2 . . .}, theparameter vectors θ

yj are updated using the following steps:

Step 1: Update the number Nyj (i) of the training samples

belonging to the ith component of the GMM asfollows:

�i(x) =

{1 if

∣∣∣ x−μy

j(i)

σy

j(i)

∣∣∣ ≤ T

0 else(5)

Nyj (i) ← N

yj (i) + �i(x) (6)

where the binary function �i(x) determines whether(x, y) belongs to the ith component, and the thresholdT depends on confidence limits required.Update ω

yj (i) by

ωyj (i) =

Nyj (i)

K∑k=1

Nyj (k)

. (7)

Step 2: Calculate the following equation:

Z =K∑i=1

ωyj (i)p(x|μy

j (i), σyj (i))�i(x) (8)

where Z describes the relation between (x, y) and theGMM of θ

yj .

If Z > 0, then go to Step 3; else (Z = 0 meansthat there is no relation between (x, y) and the GMMof θ

yj ), go to Step 5 for setting a new Gaussian

component for (x, y).Step 3: Update the sum A

yj (i) of the weighted probabilities

for all the input training samples belonging to the ithcomponent by

δ(i) = ωyj (i)p(x|μy

j (i), σyj (i))�i(x)/Z (9)

Ayj (i) ← A

yj (i) + δ(i) (10)

where δ(i) is the probability that (x, y) belongs to theith component of the GMM.

Step 4: Update μyj (i) and σ

yj (i) by

μyj (i) ← A

yj (i) − δ(i)

Ayj (i)

μyj (i) +

δ(i)

Ayj (i)

x (11)

σ2y

j (i) ← Ayj (i) − δ(i)

Ayj (i)

σ2y

j (i) +(Ay

j (i) − δ(i))δ(i)

Ayj (i)

(x − μyj (i))2.

(12)Exit.

Step 5: Reset a Gaussian component t in θyj by

t = arg mini

(ωyj (i)) (13){

Nyj (t) ← N

yj (t) + 1

Ayj (t) ← A

yj (t) + 1

(14){μ

yj (t) = x

σyj (t) = σ

(15)

where σ is used to initialize the standard deviation ofthe component t.

The terms δ(i)/Ayj (i) in (11) and (Ay

j (i) − δ(i))δ(i)/Ayj (i) in

(12) are the learning rates of the algorithm. They are changeddynamically: getting smaller with the growth of the trainingsamples. This behavior of the learning rates ensures that theonline EM algorithm achieves the balance between the stabilityneeded to keep the characteristics of the model learned fromthe previous samples and necessity of adapting the model asnew samples are received.

The computational complexity of the online GMM for onesample is O(K). While the online GMM, which can be usedto model distributions of each type of attack, has a highercomplexity than decision stumps, the complexity for the onlineGMM is still very low as K is a small integer. So, the onlineGMM is suitable in online environments.

B. Online Ensemble Learning

To adapt to online training, in which each training sampleis used only once for learning the strong classifier, onlineAdboost makes the following changes to the offline Adaboost.

1) All the weak classifiers are initialized.2) Once a sample is input, a number of weak classifiers

are updated using this sample, and the weight of thissample is changed according to certain criteria suchas the weighted false classification rate of the weakclassifiers.

In the following, we first apply the traditional online Ad-aboost algorithm to intrusion detection, and then we proposea new and more effective online Adaboost algorithm.

1) Traditional Online Adaboost: Oza [38] proposes anonline version of Adaboost, and gives a convergence proof.This online Adaboost algorithm has been applied to thecomputer vision field [42]. When a new sample is input to theonline Adaboost algorithm, all the weak classifiers (ensemblemembers) are updated in the same order [10]. Let {ht}t=1,...,D

be the weak classifiers. This online Adaboost algorithm isoutlined as follows.Step 1: Initialize the weighted counters λsc

t and λswt of “his-

torical” correct and wrong classification results forthe tth weak classifier ht as: λsc

t = 0, λswt = 0 (t=1, 2,

. . . , D).Step 2: For each new sample (x, y).

Initialize the weight λ of the current sample as: λ = 1.For t = 1, 2, . . . , D.

(a) Randomly sample an integer k from the Poissondistribution: Poisson(λ).

(b) Update the weak classifier ht using the sample (x,y) k times.

HU et al.: ONLINE ADABOOST-BASED PARAMETERIZED METHODS FOR DYNAMIC DISTRIBUTED NETWORK INTRUSION DETECTION 71

(c) If the weak classifier ht correctly classifies thesample (x, y), i.e., ht(x) = sign(y), thenthe weighted correct classification counter λsc

t isupdated by λsc

t ← λsct + λ;

the approximate weighted false classification rateεt is updated by

εt =λsw

t

λsct + λsw

t

(16)

the weight of the sample (x, y) is updated by

λ ← λ

(1

2(1 − εt)

)(17)

elseλsw

t is updated by λswt ← λsw

t + λ;εt is updated using 16;the weight of the sample (x, y) is updated by

λ ← λ

(1

2εt

). (18)

(d) The weight αt for the weak classifier ht is

αt = log

(1 − εt

εt

). (19)

The weight αt reflects the importance of feature tfor detecting intrusions.

Step 3: The final strong classifier is defined by

H(x)=sign

(D∑t=1

αtht(x)

)=sign

(D∑t=1

log

(1 − εt

εt

)ht(x)

).

(20)

The differences between the offline Adaboost algorithm andthe online Adaboost algorithm are as follows.

1) In the offline Adaboost algorithm, the weak classifiersare constructed in one step using all the training samples.In the online Adaboost algorithm, the weak classifiersare updated gradually as the training samples are inputone by one.

2) In the offline Adaboost algorithm, all the sample weightsare updated simultaneously according to the classifi-cation results of the optimal weak classifier for thesamples. In the above online Adaboost algorithm, theweight of the current sample changes as the weakclassifiers are updated one by one using the currentsample.

3) In the offline Adaboost algorithm, the weighted clas-sification error for each weak classifier is calculatedaccording to its classification results for the whole set ofthe training samples. In online Adaboost, the weightedclassification error for each weak classifier is estimatedusing the weak classifier’s classification results only forthe samples that have already been input.

4) The number of weak classifiers is not fixed in the offlineAdaboost algorithm. In the online Adaboost, the numberof weak classifiers is fixed, and equal to the dimensionof the feature vectors.

5) When only a few training samples have been input, theonline ensemble classifier is less accurate than the offline

ensemble classifier. As the number of training samplesincreases, the accuracy of the online ensemble classifiergradually increases until it approximates to the accuracyof the offline ensemble classifier [43].

The limitation of the traditional online Adaboost algorithmis that for each new input sample, all the weak classifiers areupdated in a predefined order. The accuracy of the ensembleclassifier depends on this order. There is a tendency to over-fitto the weak classifiers that are updated first.

2) A New Online Adaboost Algorithm: To deal with thelimitation of the traditional online Adaboost, our new onlineAdaboost algorithm selects a number of weak classifiers ac-cording to a certain rule and updates them simultaneously foreach input training sample. Let S+ and S− be, respectively, thenumbers of the normal and attack samples that have alreadybeen input. Let be the number of the samples, each of whichis correctly classified by the previous strong classifier that hasbeen trained before the sample is input. Let λsc

t be the sum ofthe weights of the input samples that are correctly classified bythe weak classifier ht . Let λsw

t be the sum of the weights of theinput samples that are mistakenly classified by ht . Let Ct bethe number of the input samples that are correctly classifiedby ht . All the parameters S+, S−, , λsc

t , λswt , and Ct are

initialized to 0. For a new sample (x, y), y ∈ {1, −1, −2, . . .},the strong classifier is updated online by the following steps.

Step 1: Update the parameter S+ or S− by{S+ ← S+ + 1 if y = 1S− ← S− + 1 else

. (21)

Initialize the weight λ of the new sample by{λ = (S+ + S−)/S+

λ = (S+ + S−)/S−if

elsey = 1

(22)

where λ follows the change of S+ and S−, in favorof balancing the proportion of the normal samplesand the attack samples to ensure that the sum of theweights of the attack samples equals to the sum ofthe weights of the normal samples.

Step 2: Calculate the combined classification rate υt for eachweak classifier ht by

υt = (1 − ρ)εt − ρsign(y)ht(x)

= (1 − ρ)λsw

t

λsct + λsw

t

− ρsign(y)ht(x) (23)

where ρ is a weight ranging in (0, 0.5]. The rate υt

combines the historical false classification rate εt ofht for the samples input previously and the result ofht for the current sample (x, y). The rate υt is moreeffective than εt , as it gives ht whose εt is high morechance to be updated and then increases the detectionrate of ht .Define min υ by

min υ = mint∈{1,2,...D}

υt (24)

The weak classifiers {ht}t=1,...,D are ranked in theascending order of υt and then the weak classifiers arerepresented by {hr1 , hr2 , . . . , hrD

}, ri ∈ {1, 2, . . . , D}.

72 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 1, JANUARY 2014

Step 3: The weak classifies whose combined classificationrates υri are not larger than 0.5 are selected from{hr1 , hr2 , . . . , hrD

}. Each of these selected weak clas-sifiers hri is updated using (x, y) in the followingway.Compute the number Pi of iterations for hri by

Pi = Integer(P exp(−γ(υi − min υ))) (25)

where γ is an attenuation coefficient and P is apredefined maximum number of iterations.Repeat the following steps (a) and (b) Pi times inorder to update hri:

(a) Online update hri using (x, y).(b) Update the sample weight λ, λsc

t , and λswt :

If sign(y) = hri(x), then⎧⎪⎪⎪⎨⎪⎪⎪⎩

Ct ← Ct + 1

λscri ← λsc

ri + λ

λ ← λ(

1−2ρ

2(1−υri)

) . (26)

(λ is decreased )If sign(y) �= hri, then{

λswri ← λsw

ri + λ

λ ← λ(

1+2ρ

2υri

). (27)

(λ is increased )Step 4: Update Ct , λsc

t and λswt for each ht of the weak

classifiers whose combined classification rates arelarger than 0.5 (υt > 0.5):if sign(y) = ht(x), then{

Ct ← Ct + 1λsc

t ← λsct + λ

(28)

else λswt ← λsw

t + λ.Step 5: Update the parameter : If the current sample is

correctly classified by the previous strong classifierthat has been trained before the current sample (x, y)is input, then ← + 1.

Step 6: Construct the strong ensemble classifier.Calculate the ensemble weight α∗

t of ht by

α∗t = β log

(1 − εt

εt

)+ (1 − β) log

(Ct

). (29)

The α∗t is normalized to αt by

αt =α∗

t

D∑i=1

α∗i

. (30)

The strong classifier H(x) is defined by

H(x) = sign(D∑t=1

αtht(x)). (31)

We explain two points for the above online Adaboostalgorithm.

1) The attenuation coefficient γ controls the number ofweak classifiers that are chosen for further updating.

When γ is very small, Pi is large and all the chosen weakclassifiers are updated using the new sample. When γ

is very large, Pi equals 0 if υt is large. As a result, forvery large γ , only the weak classifiers with the small υt

are updated.2) The ensemble weight αt , which is calculated from εt and

log(Ct/), is different from the one in the traditionalAdaboost algorithm. The term log(Ct/), which iscalled a contributory factor, represents the contributionrate of ht to the strong classifier. It can be used totune the ensemble weights to attain better detectionperformance.

C. Adaptable Initial Sample Weights

We use the detection rate and the false alarm rate to evaluatethe performance of the algorithm for detecting network intru-sions. It is necessary to pay more attention to the false alarmrate because, in real applications, most network behaviors arenormal. A high false alarm rate wastes resources, as each alarmhas to be checked. For Adaboost-based learning algorithms,the detection rate and the false alarm rate depend on the initialweights of the training samples. So we propose to adjust theinitial sample weights in order to balance the detection rateand the false alarm rate. We introduce a parameter r ∈ (0, 1)for setting the initial weight λ of each training sample

λ =

⎧⎪⎪⎨⎪⎪⎩

Nnormal + Nintrusion

Nnormalr for normal connections

Nnormal + Nintrusion

Nintrusion(1 − r) for network intrusions

(32)where Nnormal and Nintrusion are approximated using the num-bers of normal samples and attack samples that have beeninput online to train the classifier. The sums of the weightsfor the normal samples and the attack samples are (Nnormal +Nintrusion) · r and (Nnormal + Nintrusion) · (1 − r), respectively.Through adjusting the value of the parameter r, we change theimportance of normal samples or attack samples in the trainingprocess, and then make a tradeoff between the detection rateand the false alarm rate of the final detector. The selectionof r depends on the proportion of the normal samples in thetraining data, and the requirements for the detection rate andthe false alarm rate in specific applications.

D. Local Parameterized Models

Subsequent to the construction of the weak classifiers andthe online Adaboost learning, a local parameterized detectionmodel ϕ is formed in each node. The local model consists ofthe parameters ϕw of the weak classifiers and the parameters ϕd

for constructing the Adaboost strong classifier: ϕ = {ϕw, ϕd}.The parameters for each decision stump-based weak classifierinclude the subsets C

fi and Cf

n for each categorical feature andthe thresholds v for each continuous feature. The parametersfor each GMM-based weak classifier include a set of GMMparameters φw = {θc

j |j = 1, 2, ..., D; c = 1, −1, −2, ...}. Theparameters of the strong classifier for the online Adaboostalgorithm include a set of ensemble weights φd = {αt|t =1, 2, ..., D} for the weak classifiers.

HU et al.: ONLINE ADABOOST-BASED PARAMETERIZED METHODS FOR DYNAMIC DISTRIBUTED NETWORK INTRUSION DETECTION 73

The parameters in the decision stump-based weak classifiersdepend on the differences between normal behaviors and at-tacks in each component of the feature vectors. The parametersin the GMM-based weak classifiers depend on the distributionsof the different types of attacks and normal behaviors in eachcomponent of the feature vectors. The parameters in the strongclassifier depend on the significances of individual featurecomponents for intrusion detection. The local detection modelscapture the distribution characteristics of observed mixed-attribute data in each node. They can be exchanged betweenthe different nodes.

Compared with the nonparametric distributed outlier detec-tion algorithm proposed by Otey et al. [4], where a largeamount of statistics about frequent itemsets need to be sharedamong the nodes, the parametric models are not only conciseto be suitable for information sharing, but also very useful togenerate global intrusion detection models.

IV. Global Detection Models

The local parametric detection models are shared amongall the nodes and combined in each node to produce a globalintrusion detector using a small number of samples left in thenode (most samples in the node are removed, in order to adaptto changing network environments). This global intrusiondetector is more accurate than the local detectors that maybe only adequate for specific attack types, due to the limitedtraining data available at each node.

Kittler et al. [11] develop a common theoretical frameworkfor combining classifiers using the product rule, the sum rule,the max rule, the min rule, the median rule, and the majorityvote rule. It is shown that the sum rule has a better perfor-mance than others. Some researchers fuse multiclassifiers bycombining the output results of all the classifiers into a vector,and then using a classifier, such as SVM or ANN, to classifythe vectors.

The combination of the local intrusion detection modelshas two problems. First, there may be large performancegaps between the local detection models for different types ofattacks, especially for new attack types that have not appearedpreviously. So, the sum rule may not be the best choicefor combining the local detectors. Second, some of the localmodels may be similar for a test sample. If the results of thelocal models for the test sample are combined into a vector,the dimension of the vector has to be reduced to choose thebest combination of the results from local models.

To solve the above two problems, we combine the PSOand SVM algorithms, in each node, to construct the globaldetection model. The PSO [12], [13] is a population searchalgorithm that simulates the social behavior of birds flocking.The SVM is a learning algorithm based on the structural riskminimization principle from statistical learning theory. It hasgood performance even if the set of the training samples issmall. We use the PSO to search for the optimal local modelsand the SVM is trained using the samples left in a node. Then,the trained SVM is used as the global model in the node. Bycombining the searching ability of the PSO and the learningability of the SVM for small sample sets, a global detectionmodel can be constructed effectively in each node.

The state of a particle i used in the PSO for one of theA nodes is defined as Xi = (x1, x2, . . . , xA), xj ∈ {0, 1},where “xj = 1” means that the jth local model is chosen, and“xj = 0” means the jth local model is not chosen. For eachparticle, a SVM classifier is constructed. Let L be the numberof local detection nodes chosen by the particle state Xi. Foreach network connection sample left in the node, a vector(r1, r2, . . . , rL) is constructed, where rj is the result of the jthchosen local detection model for the sample [correspondingto (20) and (31)]

rj =D∑t=1

ρtht(x) (33)

where D is the dimension of network connection featurevectors. These results are in the range [−1, 1]. All the vectorscorresponding to the small number of samples left in the node,together with the attributions (normal connections or attacks)of the samples, are used to train the SVM classifier.

Each particle i has a corresponding fitness value that isevaluated by a fitness model f (Xi). Let γ(Xi) be the detectionrate of the trained SVM classifier for particle i, where thedetection rate is estimated using another set of samples in thenode. The fitness value is evaluated as

f (Xi) = τ ∗ γ(Xi) + (1 − τ) logA − L

A(34)

where τ is a weight ranging between 0 and 1.For each particle i, its individual best state Sl

i , which hasthe maximum fitness value, is updated in the nth PSO iterationusing the following equation:

Sli =

{Xi,n if f (Xi,n) > f (Sl

i)Sl

i else(35)

The global best state Sg in all the particles is updated by

Sg = arg maxSl

i

f (Sli) (36)

Each particle in the PSO is associated with a velocity that ismodified according to its current best state and the current beststate among all the particles

Vi,n+1 = F(wVi,n + c1μ1(Sl

i − Xi,n) + c2μ2(Sg − Xi,n))

(37)

where w is the inertia weight that is negatively correlated withthe number of iterations; c1 and c2 are acceleration constantscalled the learning rates; μ1 and μ2 are relatively independentrandom values in the range [0, 1]; and F () is a function thatconfines the velocity within a reasonable range

F (V ) =

{V if ‖V‖ ≤ Vmax

Vmax else. (38)

The states of the particles are evolved using the updatedvelocity

Xi,n+1 = Xi,n + Vi,n+1. (39)

In theory, PSO can find the global optimum after sufficientlymany iterations. In practice, a local optimum may be obtained,due to an insufficient number of iterations. PSO has a higher

74 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 1, JANUARY 2014

probability of reaching the global optimum than gradientdescent approaches or particle filtering approaches.

The SVM- and PSO-based fusion algorithm is outlined asfollows.

Step 1: Initialization: The particles{Xi,0

}Q

i=1 are randomlychosen in the particle space, where Q is the number ofparticles. Sl

i = Xi,0. A SVM classifier is constructedfor each particle, and the detection rate γ(Xi,0) of theSVM classifier is calculated. The fitness value f (Xi,0)is calculated using (34). The global best state Sg isestimated using (36). n = 0.

Step 2: The velocities{Vi,n+1

}Q

i=1 are updated using (37).

Step 3: The particles’ states{Xi,n+1

}Q

i=1 are evolved using(39).

Step 4: The SVM classifier is constructed for each particle,and the detection rate γ(Xi,n+1) is calculated. Thefitness values f (Xi,n+1) are calculated using (34). Theindividual best states

{Sl

i

}Q

i=1 are updated using (35).The global best sate Sg is updated using (36).

Step 5: n ← n + 1. If f (Sg) > max fintness or thepredefined number of iterations is achieved, then theparticle evolution process terminates and the SVMclassifier corresponding to Sg is chosen as the finalclassifier—the global model in the node; otherwisego to Step 2 for another loop of evolution.

When certain conditions are met, nodes may transmit theirlocal models to each other. Then, each node can construct acustomized global model using a small set of training samplesrandomly selected from the historical training samples in thenode according to the proportion of various kinds of thenetwork behaviors. Once local nodes gain their own globalmodels, the global models are used to detect intrusions; fora new network connection, the vector of the results from thelocal models chosen by the global best particle is used as theinput to the global model whose result determines whether thecurrent network connection is an attack.

We explain two points as follows.

1) A uniform global model can be constructed for allthe local nodes. But, besides the parameters of localmodels, small sets of samples should be shared throughcommunications between the nodes.

2) The computational complexity of the PSO for search-ing for the optimal local models determines the scaleof network for our global model. The computationalcomplexity of the PSO is O(QIA2�2) where I is thenumber of iterations, and � is the number of the trainingsamples.

V. Experiments

We use the knowledge discovery and data mining (KDD)CUP 1999 dataset [14]–[16], [45] to test our algorithms.This dataset is still the most trustful and credible publicbenchmark dataset [53]–[59] for evaluating network intrusiondetection algorithms. In the dataset, 41 features including ninecategorical features and 32 continuous features are extracted

TABLE I

The KDD CUP 1999 Dataset

Categories Training data Test dataNormal 97 278 60 593DOS 391 458 223 298R2L 1126 5993U2R 52 39Probing 4107 2377Others 0 18 729Total 494 021 311 029

for each network connection. Attacks in the dataset fall intothe following four main categories.

1) DOS: denial-of-service.2) R2L: unauthorized access from a remote machine, e.g.,

guessing password.3) U2R: unauthorized access to local super-user (root)

privileges.4) Probe: surveillance and other probing, e.g., port scan-

ning.

Each of the four categories contains some low-grade attacktypes. The test dataset includes some attack types that donot exist in the training dataset. The numbers of normalconnections and each category of attacks in the training andtest datasets are listed in Table I.

In the following, we first introduce the performances ofour online learning-based intrusion detection algorithms: onewith decision stumps and the traditional online Adaboostprocess, and the other with online GMMs and our proposedonline Adaboost process. Then, the performance of our PSOand SVM-based distributed intrusion detection algorithm isevaluated.

A. Local Models

For the online local models, the ability to handle mixedfeatures, the effects of parameters on detection performance,and the ability to learn new types of attacks are estimated insuccession. Finally, the performances of our algorithms arecompared with the published performances of the existingalgorithms.

1) Handling Mixed Features: Table II shows, respectively,the results of the classifier with decision stumps and the tra-ditional online Adaboost and the classifier with online GMMsand our online Adaboost, when only continuous features orboth continuous features and categorical features are used,respectively, tested on the KDD training and test datasets,respectively. It is seen that the results obtained by using bothcontinuous and categorical features are much more accuratethan the results obtained by only using continuous features.This shows the ability of our algorithms to handle mixedfeatures in network connection data.

2) Effects of Parameters on Detection Performance: In thefollowing, we illustrate the effects of important parameters,i.e., the adaptable initial weight parameter r in (32), the datadistribution weight coefficient “1/M” in (4), the parameter ρ

in (23), the parameter β in (29), and the attenuation coefficientparameter γ in (25).

HU et al.: ONLINE ADABOOST-BASED PARAMETERIZED METHODS FOR DYNAMIC DISTRIBUTED NETWORK INTRUSION DETECTION 75

TABLE II

Results Obtained by Using Only Continuous Features or Both Continuous and Categorical Features

Algorithms FeaturesTraining data Test data

Detection rate (%) False alarm rate (%) Detection rate (%) False alarm rate (%)Decision stump + Only Continuous 98.68 8.35 90.05 13.76

Traditional online Adaboost Continuous + Categorical 98.93 2.37 91.27 8.38Online GMM + Our online Only Continuous 98.79 7.83 91.33 11.34

Adaboost Continuous + Categorical 99.02 2.22 92.66 2.67

TABLE III

Results of the Algorithm With Decision Stumps and the Traditional Online Adaboost When Adaptable Initial Weights Are Used

rTraining data Test data

Detection rate (%) False alarm rate (%) Detection rate (%) False alarm rate (%)0.25 98.71 2.38 90.83 8.420.3 98.69 0.78 90.69 2.290.35 98.63 0.77 89.80 1.920.4 98.53 0.73 89.66 1.880.5 98.16 0.16 88.97 0.37

Table III shows the results of the algorithm with decisionstumps and the traditional online Adaboost when the adapt-able initial weights formulated by (32) are used. When r isincreased the detection rate and the false alarm rate are de-creased. An appropriate setting of r is 0.3, as with this value ofr, the false alarm rate of 2.28% is much smaller than the falsealarm rate of 8.42% obtained by choosing r as 0.25, while anacceptable detection rate of 90.69% is still obtained. As shownin Table II, when an identical initial weight is used for eachtraining sample in the traditional online Adaboost algorithmusing decision stumps as weak classifiers, a high detectionrate of 91.27% is obtained on the test data, but the falsealarm rate is 8.38%, which is too high for intrusion detection.After adaptable initial weights are introduced, although thedetection rate is slightly decreased, the false alarm rate ismuch decreased, producing more suitable results for intrusiondetection. This illustrates that adjustable initial weights canbe used effectively to balance the detection rate and the falsealarm rate.

Table IV shows the results of the algorithm with onlineGMMs and our online Adaboost when the data distributionweight “1/M” in (4) and adaptable initial weights are used ornot used. It is seen that when the parameter M is not used,the false alarm rate is 54.15%, which is too high for intrusiondetection. When the parameter M is used, the false alarm ratefalls to 4.82%. When adaptable initial weights are used (r isset to 0.35) the false alarm rate further falls to 1.02%, whilean acceptable detection rate is maintained.

Parameters ρ in (23) and β in (29) influence the detectionresults. Table V shows the detection rates and false-alarm ratesof the algorithm with online GMMs and our online Adaboostwhen ρ = 0 or 0.1 and β = 1 or 0.8. From this table, thefollowing two points are deduced.

1) When ρ is set to 0, the false alarm rate is very high. Thisis because the weak classifiers are chosen only accordingto the false classification rate εt , without taking intoaccount the current sample. Adopting the measure υt

of the combined classification rate balances the effects

Fig. 2. Online learning with decision stumps and the traditional onlineAdaboost for “guess-passwd” attacks.

of historical samples and the current sample, and thenresults in more accurate results.

2) When β is set to 1, the detection rate is too low.This is because under this case only εt is used toweight the weak classifiers, i.e., every weak classifier isweighted in the same way as in the offline Adaboost andthe traditional online Adaboost. In the online trainingprocess, weighting every weak classifier only using εt

causes over-fitting to some weak classifiers. Namely,the ensemble weights of weak classifiers based onlyon εt are not accurate. Weighting weak classifiers bycombining εt and contributory factors yields the betterperformance.

The attenuation coefficient γ in (25) is important for theperformance of our online Adaboost algorithm. Table VIshows the detection rates and the false alarm rates of ouronline Adaboost algorithm based on GMMs, when γ rangesfrom 10 to 50. When γ ∈ {20, 25, 30}, the updating times Pt

is better chosen for the weak classifiers and thus good resultsare obtained. In fact, when γ is set to 25, the detection ratereaches 91.15% with a false-alarm rate of 1.69%. This is apreferable result compared with those obtained when γ is set

76 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 1, JANUARY 2014

TABLE IV

Results of Our Online Adaboost Algorithm Based on Online GMMs with or Without Adaptable Initial Weights and the Data

Distribution Weight ‘‘1/M ’’

ParametersTraining data Test data

Detection rate (%) False alarm rate (%) Detection rate (%) False alarm rate (%)Without the data distribution 99.86 48.85 97.49 54.15weight “1/M”With “1/M” and without 99.22 8.10 91.74 4.82adaptable initial weightsWith “1/M” and adaptable 98.91 2.65 90.65 1.02initial weights

TABLE V

Results of the Algorithm With Online GMMs and Our Online

Adaboost When ρ = 0 or 0.1 and β = 0.8 or 1

Detection rate (%) False alarm rate (%)ρ = 0 β = 0.8 96.22 77.66ρ = 0.1, β = 1 85.36 0.60ρ = 0.1, β = 0.8 91.15 1.69

to 10 or 50.3) Learning New Attack Types: The ability to learn effec-

tively from new types of attacks and then correctly detect thesetypes of attacks is an attractive and important characteristic ofour online Adaboost-based intrusion detection algorithms.

Figs. 2 and 3 show the online learning process of theensemble classifier with decision stumps and the traditionalonline Adaboost with respect to two attack types, “guess-passwd” and “httptunnel,” which have not appeared beforein the training data. In the figures, the horizontal coordinateindicates the number of samples of “guess-passwd” or “http-tunnel,” which have appeared in the learning process; andthe vertical coordinate indicates the class label y predictedby the detector for a sample before it takes part in thelearning process, where “y = 1” means that the sample ismistakenly classified as a normal connection, and “y = −1”means that it is correctly classified as a network attack. Itis seen that at the beginning of learning for new types ofattacks, the classifier frequently classifies mistakenly samplesof these new attack types as normal network connections.After some samples of these types of attacks take part in theonline learning of the classifier, these new attack types aremuch more accurately detected. These two examples show theeffectiveness of online learning of the algorithm with decisionstumps and the traditional online Adaboost for new attacktypes.

Figs. 4 and 5 show the online learning process of the ensem-ble classifier with online GMMs and our online Adaboost forthe two attack types, “guess-passwd” and “httptunnel,” whichhave not appeared before in the training data. It is seen thatafter some samples of “guess-passwd” and “httptunnel” occurin the online learning, these two types of samples are thendetected much more accurately by the ensemble classifier.Compared with the classifier with decision stumps and thetraditional online Adaboost, the classifier with online GMMsand our online Adaboost learns the new types of attacks more

TABLE VI

Detection Results of Our Online Adaboost Algorithm Based

on Online GMMs with Varying γ

γ Detection rate (%) False alarm rate (%)10 92.50 12.8720 90.61 1.1725 91.15 1.6930 90.55 1.2640 88.28 0.3750 24.33 0.34

Fig. 3. Online learning with decision stumps and the traditional onlineAdaboost for “httptunnel” attacks.

quickly and accurately. This is because the classifier withonline GMMs and our online Adaboost has more powerfullearning ability based on modeling the distributions of eachtype of attacks.

4) Comparisons: We first compare the detection resultsobtained from part of the labeled KDD CUP 1999 dataset,then the results obtained from the whole labeled KDD CUP1999 dataset.

Some intrusion detection algorithms are tested on a smallerdataset that contains only three common attack types, namelyneptune, satan, and portsweep, in order to examine the algo-rithm’s ability to detect specific types of attack. The samplesin the dataset are randomly selected from the KDD CUP1999 dataset. Table VII lists the numbers of the trainingand test samples of all the types in this smaller dataset.Table VIII shows the detection results of the algorithm basedon hybrid neural networks [21], the algorithm based on thehierarchical SOM [18], the algorithm with decision stumps and

HU et al.: ONLINE ADABOOST-BASED PARAMETERIZED METHODS FOR DYNAMIC DISTRIBUTED NETWORK INTRUSION DETECTION 77

Fig. 4. Online learning with online GMMs and our online Adaboost for“guess-passwd” attacks.

Fig. 5. Online learning with online GMMs and our online Adaboost for“httptunnel” attacks.

TABLE VII

Smaller Dataset

Categories Training data Test dataNormal 20 676 60593Neptune 22 783 58 001Satan 316 1633Portsweep 225 354Total 44 000 120 581

the traditional online Adaboost, and the algorithm with onlineGMMs and our online Adaboost, all tested on the smallerdataset. It is seen that the detection rate of our algorithmwith decision stumps and the traditional online Adaboost forthe specific types of attacks is 99.55%, which is close to thehighest detection rate for the two algorithms in [18] and [21].The false alarm rate of this algorithm is only 0.17%, whichis much less than the means in the false alarm rate rangesfor the algorithms in [18] and [21]. Our algorithm with onlineGMMs and our online Adaboost even obtains more accurateresults. It is shown that our online Adaboost-based algorithmsare effective for these three types of attacks.

We also compare our online Adaboost-based algorithmswith other recently published algorithms for intrusion detec-tion. The algorithms are tested on the whole labeled KDDCUP 1999 dataset. Table IX lists the detection results of thefollowing algorithms: the clustering-based algorithm in [3],the k-NN-based algorithm in [20], the SVM-based approach

in [20], the algorithm based on SOMs in [5], the geneticclustering-based algorithm in [17], the hierarchical SOM-based algorithm in [18], the bagged C5 algorithm in [19], theoffline Adaboost-based algorithm in [39], the Mercer kernelART algorithm in [8], our algorithm using decision stumpsand the traditional online Adaboost, and our algorithm usingonline GMMs and our online Adaboost. The Mercer kernelART algorithm in [8] and our algorithms are online learningalgorithms and the others are offline learning algorithms. Fromthe table, the following useful points are deduced.

1) Overall, the detection accuracies of the online Adaboost-based algorithms are comparable with those of the otheralgorithms.

2) Compared with the offline algorithms, our algorithmsnot only gain satisfactory detection rates while keepinglow false alarm rates, but also adaptively modify thelocal models in online mode. This adaptability is veryimportant for intrusion detection applications.

3) The detection accuracies of the online Adaboost-basedalgorithms are comparable with the detection accuraciesof the Adaboost-based algorithm in batch mode.

4) Our online Adaboost-based algorithms outperform theonline algorithms in [8]. In particular, lower false alarmrates are obtained.

5) The detection accuracy of the algorithm using decisionstumps and the traditional online Adaboost is lower thanthe algorithm using online GMMs and our proposedonline Adaboost. The reason for this is that the algorithmbased on GMMs models online the distribution of eachtype of attacks and also updates the weak classifiersin a more effective way, which obtains more accurateweights for the weak classifiers.

Moreover, Brugger and Chow [50] have assessed the per-formance of the Snort algorithm, a typical pattern matching-NIDS, on the DARPA 1998 dataset that is the dataset of theraw data of the KDD CUP 1999 dataset. It is shown that Snorthas a very high alarm rate. The results of the Snort algorithmare less accurate than those of the machine learning algorithmslisted in Table IX.

Our online Adaboost-based algorithms are implemented ona Pentium IV computer with 2.6 GHZ CPU and 256M RAM,using MATLAB 7. The mean training time for our algorithmwith decision stumps and the traditional online Adaboost isonly 46 s and that for our algorithm with online GMMs and ouronline Adaboost is 65 min, using all 494 021 training samples.This is an empirical demonstration that the computationalcomplexity of our algorithm with decision stumps and thetraditional online Adaboost is relatively low, due to verysimple operations for the decision stumps. The computationalcomplexity of our algorithm with online GMMs and our onlineAdaboost is moderate. In [46], the least training times requiredfor the SOM and improved competitive learning neural net-work are, respectively, 1057 s and 454 s, only using 101 000samples for training. The bagged C5 algorithm [19], [47] tooka bit more than a day on a machine with a two-processor ultra-sparc2 (2–300 MHz) and 512M main memories. The RSS-DSSalgorithm [35] requires 15 min to finish the learning process on

78 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 1, JANUARY 2014

TABLE VIII

Detection Results for Some Specific Attacks

Algorithms Detection rate (%) False alarm rate (%)Hybrid neural networks [21] 90–99.72 0.06–4.5Hierarchical SOM [18] 99.17–99.63 0.34–1.41Our algorithm (decision stumps + traditional online Adaboost) 99.55 0.17Our algorithm (online GMM + our online Adaboost) 99.99 0.31

TABLE IX

Comparison Between Algorithms Tested on the KDD CUP 1999 Dataset

Algorithms Detection rate (%) False alarm rate (%)Clustering [3] 93 10K-NN [20] 91 8SVM [20] 91–98 6–10

OfflineSOM [5] 89–90.6 4.6–7.6Genetic clustering[17] 79.00 0.30Hierarchical SOM [18] 90.94–93.46 2.19–3.99Bagged C5 [19] 91.81 0.55Offline Adaboost [39] 90.04–90.88 0.31–1.79Mercer kernel ART [8] 90.00-94.00 2.9–3.4

Online Our algorithm (decision stumps+traditional Adaboost) 90.13 2.23Our algorithm (Online GMMs+our Adaboost) 90.61-91.15 1.17–1.69

TABLE X

Training Sets in the Six Nodes

NodesNode 1 Node 2 Node 3 Node 4 Node 5 Node 6Attack types

Neptune 5000 1000 1000 3500 2500 1500smurf 5000 1000 1000 3500 2500 1500portsweep 0 8000 0 1500 2500 3500satan 0 0 8000 1500 2500 3500Nomal 10000 10 000 10 000 10 000 10 000 10 000

a 1 GHz Pentium III laptop with 256M RAM. In [4], 8 min aretaken for processing the training data in an eight-node cluster,where each node has dual 1 GHz Pentium III processors and1 GB memory, running Red Hat Linux7.2, while 212 min aretaken for the algorithm in [48]. The mean training time forthe offline Adaboost is 73 s [39].

B. PSO and SVM-Based Global Models

The proposed distributed intrusion detection algorithm istested with six nodes. The KDD CUP 1999 training dataset issplit into six parts and each part of data is used to constructa local detection model in a detection node. In this way, thesize of training data in each node is small, with the result thataccurate local intrusion detectors cannot be found in somenodes. In each node, a global model is constructed using thesix local models.

To simulate a distributed intrusion detection environment,the four kinds of attacks: neptune, smurf, portsweep, andsatan in the KDD CUP 1999 training dataset are used forconstructing local detection models, as samples of these fourkinds take up 98.46% of all the samples in the KDD trainingdataset. Table X shows the training sets used for constructingthe global models in the six nodes. It is seen that the sizes ofthe training sets are comparatively small. Node 1 has not theportsweep and satan attack samples; Node 2 has not the satan

TABLE XI

Test Set in Each Node

neptune smurf portsweep satan Normal58001 164091 10413 15892 60593

samples; and Node 3 has not the portsweep attack samples.Table XI shows the test set used in all the nodes.

In the experiments, the inertia weight w in (37) varies from1.4 to 0.4, according to the following equation:

w = (w − 0.4)Titer − Iter

Titer+ 0.4 (40)

where Titer is the predefined maximum number of iterationsand Iter is the current iteration number.

Tables XII and XIII show, using our algorithm with decisionstumps and the traditional online Adaboost and our algorithmwith online GMMs and our online Adaboost, respectively, thedetection rates and false alarm rates of the local models andthose of the PSO and SVM-based global models, as well asthose obtained by combining the local models using the SVMalgorithm, the sum rule, and the majority vote rule. From thetables, the following three points are deduced.

1) It is seen that, overall, our PSO and SVM-based combi-nation algorithm greatly increases the detection rate and

HU et al.: ONLINE ADABOOST-BASED PARAMETERIZED METHODS FOR DYNAMIC DISTRIBUTED NETWORK INTRUSION DETECTION 79

TABLE XII

Results for the Six Local Detection Nodes Using Our Algorithm with Decision Stumps and the Traditional Online Adaboost

Methods Performance Node 1 (%) Node 2 (%) Node 3 (%) Node 4 (%) Node 5 (%) Node 6 (%) Average (%)

Local modelDetection rate 99.48 99.96 99.98 97.01 97.13 99.96 98.92False alarm rate 24.07 9.64 18.45 0.42 0.49 8.28 10.23

PSO-SVMDetection rate 99.59 99.70 99.67 99.78 99.65 99.74 99.69False alarm rate 0.38 0.44 0.43 0.44 0.41 0.48 0.43

SVMDetection rate 99.76 99.78 97.13 99.78 99.78 97.13 98.89False alarm rate 0.44 0.44 0.42 0.44 0.45 0.42 0.44

Sum ruleDetection rate 99.96 99.96 99.96 99.96 99.96 99.96 99.96False alarm rate 1.98 1.98 1.98 1.98 1.98 1.98 1.98

Majority voteDetection rate 99.96 99.96 99.96 99.96 99.96 99.96 99.96

False alarm rate 7.95 7.95 7.95 7.95 7.95 7.95 7.95

TABLE XIII

Results for Six Local Detection Nodes Using Our Algorithm with Online GMMs and Our Online Adaboost

Methods Performance Node 1 (%) Node 2 (%) Node 3 (%) Node 4 (%) Node 5 (%) Node 6 (%) Average (%)

Local modelDetection rate 96.19 33.66 33.9 99.72 99.82 33.77 66.18False alarm rate 0.35 0.39 0.56 0.34 0.37 0.40 0.40

PSO-SVMDetection rate 99.61 99.64 99.89 99.84 99.84 99.83 99.78False alarm rate 0.3 0.31 0.39 0.35 0.34 0.33 0.34

SVMDetection rate 97.46 99.76 99.85 99.83 99.84 99.85 99.43False alarm rate 0.31 0.35 0.36 0.33 0.34 0.34 0.34

SumDetection rate 99.76 99.76 99.76 99.76 99.76 99.76 99.76False alarm rate 0.37 0.37 0.37 0.37 0.37 0.37 0.37

Majority voteDetection rate 33.79 33.79 33.79 33.79 33.79 33.79 33.79

False alarm rate 0.34 0.34 0.34 0.34 0.34 0.34 0.34

decreases the false alarm rate for each node after thelocal models with only a small number of parametersare shared.

2) In node 2, the detection rate of the PSO-SVM method isslightly less than the detection rate of the local model,but the false alarm rate of the PSO-SVM method is muchless than the false alarm rate of the local model. In node4, the alarm rate of the PSO-SVM method is slightlymore than the alarm rate of the local model, but thedetection rate of the PSO-SVM method is higher thanthe detection rate of the local model. Thus, the PSO-SVM method is more accurate than the local models innodes 2 and 4.

3) Our PSO and SVM-based algorithm is superior to theSVM algorithm, the sum rule and the majority vote rulefor combining local detection models. The performancedisparities between different local models ensure thatthe sum rule and the majority vote rule are not suitablefor distributed intrusion detection. The SVM may notfind the optimal local model combination to improve theperformance. Through dynamically combining a smallportion of the local models to obtain the global model,our PSO and SVM-based algorithm effectively choosesthe optimal local model combination, reduces the timeconsumption for detecting intrusions, and then achievesa better performance.

As stated in the last paragraph of Section IV, a uniformglobal model can be constructed for all the local nodes, ifsmall sets of samples are shared among the nodes, sacrificingthe privacy of raw data. We also test the uniform global modelusing the dataset of the four kinds of attacks: neptune, smurf,

TABLE XIV

Attack Types Used to Train the Local Models in Each Node

Local models Types of attackNode 1 neptuneNode 2 smurfNode 3 portsweepNode 4 satanNode 5 neptune, smurfNode 6 portsweep, satan

portsweep, and satan in the KDD CUP 1999 set. The trainingset used for constructing the global model only contains 4000randomly chosen samples, and the test set for the global modelcontains 284672 samples of the four types of attacks and allthe normal samples. Table XIV shows the attack types used totrain local models in each node. It is seen that each node hasonly one or two types of attack. Tables XV and XVI show,using our algorithm with decision stumps and the traditionalonline Adaboost and our algorithm with online GMMs andour online Adaboost, respectively, the detection rates and falsealarm rates of the local models in each node and those of thePSO and SVM-based global model, as well as those obtainedby combining local models using the sum rule and the SVMalgorithm. It is seen that our PSO and SVM-based algorithmgreatly increases the detection accuracy in each node and ourPSO and SVM-based algorithm obtains more accurate resultsthan the sum rule and the SVM algorithm.

The distributed outlier detection algorithm in [4] has adetection rate of 95% and a false alarm rate of 0.35% forthe KDD CUP 1999 training dataset, however, the algorithmin [4] requires a data preprocessing procedure, in which the

80 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 1, JANUARY 2014

TABLE XV

Results of the Uniform Global Model Based on Our Algorithm

With Decision Stumps and the Traditional Online Adaboost

When Some Samples Are Shared Among Nodes

Detection models Detection rate (%) False alarm rate (%)Node 1 26.47 0.12Node 2 73.20 0.22

Local modelsNode 3 2.19 0.1Node 4 8.77 0.01Node 5 99.62 0.32Node 6 5.32 0.11

Global model (PSO-SVM) 99.99 1.35Sum rule 9.10 0.01

SVM 98.98 1.41

TABLE XVI

Results of the Uniform Global Model Based on Our Algorithm

with Online GMMs and Our Online Adaboost When Some

Samples Are Shared Among Nodes

Detection models Detection rate (%) False alarm rate (%)Node 1 26.48 0.08Node 2 70.16 0.01

Local modelsNode 3 7.92 0.17Node 4 0.81 0.01Node 5 99.62 0.21Node 6 26.77 1.80

Global model (PSO-SVM) 99.99 0.37Sum rule 26.37 0.01

SVM 99.99 0.39

attack data are packed into bursts in the dataset. An intrusionis marked as detected if at least one instance in a burst isflagged as an outlier. Furthermore, the high communicationcost and the need to share a large number of the originalnetwork data confine the application of the algorithm in [4]to network intrusion detection. In our proposed algorithm, noassumptions are made about bursts of attacks in the trainingdata and no private data are shared among the nodes, and theonly communication is the concise local model parameters.

VI. Conclusion

In this paper, we proposed online Adaboost-based intrusiondetection algorithms, in which decision stumps and onlineGMMs were used as weak classifiers for the traditional onlineAdaboost and our proposed online Adaboost, respectively. Theresults of the algorithm using decision stumps and the tradi-tional online Adaboost were compared with the results of thealgorithm using online GMMs and our online Adaboost. Wefurther proposed a distributed intrusion detection framework,in which the parameters in the online Adaboost algorithmformed the local detection model for each node, and localmodels were combined into a global detection model in eachnode using a PSO and SVM-based algorithm. The advantagesof our work are as follows: 1) our online Adaboost-basedalgorithms successfully overcame the difficulties in handlingthe mixed-attributes of network connection data; 2) the on-line mode in our algorithms ensured the adaptability of our

algorithms to the changing environments; the information innew samples was incorporated online into the classifier, whilemaintaining high detection accuracy; 3) our local parameter-ized detection models were suitable for information sharing:only a very small number of data were shared among nodes;4) no original network data were shared in the framework sothat the data privacy was protected; and 5) each global detec-tion model improved considerably on the intrusion detectionaccuracy for each node.

References

[1] D. Denning, “An intrusion detection model,” IEEE Trans. Softw. Eng.,vol. SE-13, no. 2, pp. 222–232, Feb. 1987.

[2] J. B. D. Caberera, B. Ravichandran, and R. K. Mehra, “Statistical trafficmodeling for network intrusion detection,” in Proc. Modeling, Anal.Simul. Comput. Telecommun. Syst., 2000, pp. 466–473.

[3] W. Lee, S. J. Stolfo, and K. Mork, “A data mining framework forbuilding intrusion detection models,” in Proc. IEEE Symp. SecurityPrivacy, May 1999, pp. 120–132.

[4] M. E. Otey, A. Ghoting, and S. Parthasarathy, “Fast distributed outlierdetection in mixed-attribute data sets,” Data Ming Knowl. Discovery,vol. 12, no. 2–3, pp. 203–228, May 2006.

[5] H. G. Kayacik, A. N. Zincir-heywood, and M. T. Heywood, “On thecapability of an SOM based intrusion detection system,” in Proc. Int.Joint Conf. Neural Netwo., vol. 3. Jul. 2003, pp. 1808–1813.

[6] P. Z. Hu and M. I. Heywood, “Predicting intrusions with local linearmodel,” in Proc. Int. Joint Conf. Neural Netw., vol. 3, pp. 1780–1785,Jul. 2003.

[7] Z. Zhang and H. Shen, “Online training of SVMs for real-time intrusiondetection,” in Proc. Adv. Inform. Netw. Appl., vol. 2, 2004, pp. 568–573.

[8] H. Lee, Y. Chung, and D. Park, “An adaptive intrusion detectionalgorithm based on clustering and kernel-method,” in Proc. Int. Conf.Adv. Inform. Networking Appl., 2004, pp. 603–610.

[9] W. Lee and S. J. Stolfo, “A framework for constructing features andmodels for intrusion detection systems,” ACM Trans. Inform. Syst.Security, vol. 3, no. 4, pp. 227–261, Nov. 2000.

[10] A. Fern and R. Givan, “Online ensemble learning: An empirical study,”in Proc. Int. Conf. Mach. Learning, 2000, pp. 279–286.

[11] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On combiningclassifiers,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 3, pp.226–238, Mar. 1998.

[12] J. Kennedy, “Particle swarm optimization,” in Proc. IEEE Int. Conf.Neural Netw., 1995, pp. 1942–1948.

[13] Y. Shi and R. C. Eberhart, “A modified particle swarm optimizer,” inProc. IEEE Int. Conf. Evolut. Comput., 1998, pp. 69–73.

[14] S. Stofo et al. The Third International Knowledge Discovery and DataMining Tools Competition, The University of California, 2002 [Online].Available: http://kdd.ics.uci.edu/databases/kddCup99/kddCup99.h-tml.

[15] S. Mukkamala, A. H. Sung, and A. Abraham, “Intrusion detection usingan ensemble of intelligent paradigms,” Netw. Comput. Appl., vol. 28, no.2, pp. 167–182, Apr. 2005.

[16] R. Lippmann, J. W. Haines, D. J. Fried, J. Korba, and K. Das, “The 1999DARPA off-line intrusion detection evaluation,” ACM Trans. Inform.Syst. Security, vol. 34, no. 4, pp. 579–595, Oct. 2000.

[17] Y. G. Liu, K. F. Chen, X. F. Liao, and W. Zhang, “A genetic clusteringmethod for intrusion detection,” Pattern Recogn., vol. 37, no. 5, pp.927–942, May 2004.

[18] S. T. Sarasamma, Q. A. Zhu, and J. Huff, “Hierarchical Kohonenennet for anomaly detection in network security,” IEEE Trans. Syst., ManCybern., Part B: Cybern., vol. 35, no. 2, pp. 302–312, Apr. 2005.

[19] B. Pfahringer, “Winning the KDD99 classification cup: Bagged boost-ing,” SIGKDD Explorations, vol. 1, no. 2, pp. 65–66, 2000.

[20] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo, “A geometricframework for unsupervised anomaly detection: Detecting intrusions inunlabeled data,” in Applications of Data Mining in Computer Security,D. Barbara and S. Jajodia (Eds.), Norwell, MA, USA: Kluwer, 2002,ch. 4.

[21] C. Jirapummin, N. Wattanapongsakorn, and P. Kanthamanon, “Hybridneural networks for intrusion detection systems,” in Proc. Int. Tech. Conf.Circuits/Syst., Comput. Commun., Jul. 2002, pp. 928–931.

[22] J. Li and C. Manikopoulos, “Novel statistical network model: the hyper-bolic distribution,” IEE Proc. Commun., vol. 151, no. 6, pp. 539–548,Dec. 2004.

HU et al.: ONLINE ADABOOST-BASED PARAMETERIZED METHODS FOR DYNAMIC DISTRIBUTED NETWORK INTRUSION DETECTION 81

[23] T. Peng, C. Leckie, and K. Ramamohanarao, “Information sharing fordistributed intrusion detection systems,” J. Netw. Comput. Appl., vol. 30,no. 3, pp. 877–899, Aug. 2007.

[24] M. Qin and K. Hwang, “Frequent episode rules for internet anomalydetection,” in Proc. IEEE Int. Symp. Netw. Computing Appl., 2004, pp.161–168.

[25] S. Zanero and S. M. Savaresi, “Unsupervised learning techniques foran intrusion detection system,” in Proc. ACM Symp. Appl. Computing,2004, pp. 412–419.

[26] A. Bivens, C. Palagiri, R. Smith, B. Szymanski, and M. Embrechts,“Network-based intrusion detection using neural networks,” in Proc.Artif. Neural Netw. Eng., vol. 12. Nov. 2002, pp. 579–584.

[27] J. J. M. Bonifacio, A. M. Cansian, A. C. P. L. F. De Carvalho, and E.S. Moreira, “Neural networks applied in intrusion detection systems,”in Proc. IEEE Int. Joint Conf. Neural Netw., vol. 1. 1998, pp. 205–210.

[28] C. Zhang, J. Jiang, and M. Kamel, “Intrusion detection using hierarchicalneural networks,” Pattern Recogn. Lett., vol. 26, no. 6, pp. 779–791,2005.

[29] S. J. Han and S. B. Cho, “Evolutionary neural networks for anomalydetection based on the behavior of a program,” IEEE Trans. Syst., Man,Cybern.—Part B, vol. 36, no. 3, pp. 559–570, Jun. 2006.

[30] S. Mukkamala, G. Janoski, and A. Sung, “Intrusion detection usingneural networks and support vector machines,” in Proc. Int. Joint Conf.Neural Netw., vol. 2. 2002, pp. 1702–1707.

[31] J. Mill and A. Inoue, “Support vector classifiers and network intrusiondetection,” in Proc. Int. Conf. Fuzzy Syst., vol. 1. 2004, pp. 407–410.

[32] J. Xian, F. Lang, and X. Tang, “A novel intrusion detection methodbased on clonal selection clustering algorithm,” in Proc. Int. Conf. Mach.Learning Cybern., vol. 6. 2005, pp. 3905–3910.

[33] S. Jiang, X. Song, H. Wang, J. Han, and Q. Li, “A clustering-basedmethod for unsupervised intrusion detections,” Pattern Recogn. Lett.,vol. 27, no. 7, pp. 802–810, May 2006.

[34] A. J. Hoglund, K. Hatonen, and A. S. Sorvari, “A computer host-baseduser anomaly detection system using the self-organizing map,” in Proc.Int. Joint Conf. Neural Netw., vol. 5. 2000, pp. 411–416.

[35] D. Song, M. I. Heywood, and A. N. Zincir-Heywood, “Training geneticprogramming on half a million patterns: An example from anomalydetection,” IEEE Trans. Evolut. Comput., vol. 9, no. 3, pp. 225–239,Jun. 2005.

[36] S. Parthasarathy, A. Ghoting, and M. E. Otey, “A survey of distributedmining of data streams,” in Data Streams: Models and Algorithms. C.C. Aggarwal (Ed.) New York: Springer, Nov. 2006.

[37] Y. Freund and R. E. Schapire, “A decision-theoretic generalization ofon-line learning and an application to boosting,” J. Comput. Syst. Sci.,vol. 55, no. 1, pp. 119–139, Aug. 1997.

[38] N. Oza, “Online ensemble learning,” Ph.D. thesis, Univ. California,Berkeley, 2001.

[39] W. M. Hu, W. Hu, and S. Maybank, “Adaboost-based algorithm fornetwork intrusion detection,” IEEE Trans. Syst., Man, Cybern., Part B:Cybern., vol. 38, no. 2, pp. 577–583, Apr. 2008.

[40] J. Zhang, M. Zulkernine, and A. Haque, “Random-forests-based networkintrusion detection systems,” IEEE Trans. Syst., Man, Cybern., Part C:Appl. Rev., vol. 38, no. 5, pp. 649–659, Sep. 2008

[41] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihoodfrom incomplete data via the EM algorithm,” J. Royal Stat. Soc., SeriesB (Methodological), vol. 39, no. 1, pp. 1–38, 1977.

[42] H. Grabner and H. Bischof, “On-line boosting and vision,” in Proc.IEEE Conf. Comput. Vision Pattern Recogn., 2006, pp. 260–267.

[43] N. Oza and S. Russell, “Experimental comparisons of online and batchversions of bagging and boosting,” in Proc. ACM SIGKDD Int. Conf.Knowledge Discovery Data Mining, 2001, pp. 359–364.

[44] Y. Lei, X. Q. Ding, and S. J. Wang, “Visual tracker using sequentialBayesian learning: Discriminative, generative and hybrid,” IEEE Trans.Syst., Man Cybern., Part B: Cybern., vol. 38, no. 6, pp. 1578–1591,Dec. 2008.

[45] J. McHugh, “Testing intrusion detection systems: A critique of the 1998and 1999 DARPA intrusion detection system evaluations as performedby Lincoln laboratory,” ACM Trans. Inform. Syst. Security, vol. 3, no.4, pp. 262–294, Nov. 2000.

[46] J. Z. Lei and A. Chorbani, “Network intrusion detection using animproved competitive learning neural network,” in Proc. Second Annu.Conf. Commun. Netw. Services Res., vol. 4. May 2004, pp. 190–197.

[47] C. Elkan, “Results of the KDD99 classifier learning contest,” SIGKDDExplorations, vol. 1, no. 2, pp. 63–64, 2000.

[48] S. Bay and M. Schwabacher, “Mining distance-based outliers in nearlinear time with randomization and a simple pruning rule,” in Proc.ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, 2003, pp.29–38.

[49] S. Mabu, C. Chen, N. Lu, K. Shimada, and K. Hirasawa, “An intrusion-detection model based on fuzzy class-association-rule mining usinggenetic network programming,” IEEE Trans. Syst., Man, Cybern., PartC: Appl. Rev., vol. 41, no. 1, pp. 130–139, Jan. 2011.

[50] S. T. Brugger and J. Chow, “An assessment of the DARPA IDSevaluation dataset using Snort,” Univ. California, Tech. Rep. CSE-2007-1, Jan. 2007.

[51] S. Panigrahi and S. Sural, “Detection of database intrusion using a two-stage fuzzy system,” Inform. Security, LNCS 5735. pp. 107–120, 2009.

[52] D. Smallwood and A. Vance, “Intrusion analysis with deep packetinspection: Increasing efficiency of packet based investigations,” in Proc.Int. Conf. Cloud Service Computing, Dec. 2011, pp. 342–347.

[53] S. Mabu, C. Chen, N. Lu, K. Shimada, and K. Hirasawa, “An intrusion-detection model based on fuzzy class-association-rule mining usinggenetic network programming,” IEEE Trans. Syst., Man, Cybern. PartC: Appl. Rev., vol. 41, no. 1, pp. 130–139, Jan. 2011.

[54] C. Otte and C. Stormann, “Improving the accuracy of network intru-sion detectors by input-dependent stacking,” Integrated Computer-AidedEng., vol. 18, no. 3, pp. 291–297, 2011.

[55] K.-C. Khor, C.-Y. Ting, S. Phon-Amnuaisuk, “A cascaded classifierapproach for improving detection rates on rare attack categories innetwork intrusion detection,” Appl. Intell., vol. 36, no. 2, pp. 320–329,2012.

[56] B. Zhang, “A heuristic genetic neural network for intrusion detection,”in Proc. Int. Conf. Internet Computing Inform. Services, Sept. 2011, pp.510–513.

[57] C.-F. Tsai, J.-H. Tsai, and J.-S. Chou, “Centroid-based nearest neighborfeature representation for e-government intrusion detection,” in Proc.World Telecommun. Congr., March 2012, pp. 1–6.

[58] J. Gao, W. Hu, X. Zhang, and X. Li, “Adaptive distributed intru-sion detection using parametric model,” in Proc. IEEE/WIC/ACM Int.Joint Conf. Web Intell. Intelligent Agent Technol., vol. 1. Sep. 2009,pp. 675–678.

[59] P. Prasenna, A. V. T. RaghavRamana, R. KrishnaKumar, and A.Devanbu, “Network programming and mining classifier for intrusiondetection using probability classification,” in Proc. Int. Conf. PatternRecogn., Informatics Med. Eng., Mar. 2012, pp. 204–209.

[60] K. Rieck, “Machine learning for application-layer intrusion detection,”Dissertation, Fraunhofer Inst. FIRST & Berlin Inst. Technol., Berlin,Germany, 2009.

Weiming Hu received the Ph.D. degree from theDepartment of Computer Science and Engineering,Zhejiang University, Hangzhou, China, in 1998.

From April 1998 to March 2000, he was a Post-Doctoral Research Fellow with the Institute of Com-puter Science and Technology, Peking University,Beijing, China. He is currently a Professor withthe Institute of Automation, Chinese Academy ofSciences, Beijing. His current research interests in-clude visual surveillance and filtering of Internetobjectionable information.

Jun Gao received the B.S. degree in automation en-gineering from BeiHang University, Beijing, China,in 2007, and the Ph.D. degree in computer sciencefrom the Institute of Automation, Chinese Academyof Sciences, Beijing, in 2012.

His current research interests include machinelearning, data mining, and network information se-curity.

82 IEEE TRANSACTIONS ON CYBERNETICS, VOL. 44, NO. 1, JANUARY 2014

Yanguo Wang received the B.Sc. degree in in-formation and computing science from Sun Yat-sen University, Guangzhou, China, in 2005, and theM.Sc. degree from the National Laboratory of Pat-tern Recognition, Institute of Automation, ChineseAcademy of Sciences, Beijing, China, in 2008.

He is currently a Project Manager with the Instituteof Infrastructure Inspection, China Academy of Rail-way Sciences, Beijing. His current research interestsinclude pattern recognition, computer vision, anddata mining.

Ou Wu received the B.S. degree in electrical en-gineering from Xi’an Jiaotong University, Shaanxi,China, in 2003, and the M.S. and Ph.D. degreesboth in pattern recognition and machine intelligencefrom the National Laboratory of Pattern Recognition(NLPR), Institute of Automation, Chinese Academyof Sciences Beijing, China, in 2006 and 2011, re-spectively.

He is currently an Assistant Professor with NLPR.His current research interests include informationfiltering, data mining, and web vision computing.

Stephen Maybank received the B.A. degree inmathematics from King’s College, Cambridge, MA,USA, in 1976, and the Ph.D. degree in computer sci-ence from Birkbeck College, University of London,London, U.K., in 1988.

He is currently a Professor with the School ofComputer Science and Information Systems, Birk-beck College. His current research interests includethe geometry of multiple images, camera calibration,and visual surveillance.