efficient probabilistic classification methods for nids

6
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010 Efficient Probabilistic Classification Methods for NIDS S.M.Aqil Burney M.Sadiq Ali Khan Mr.Jawed Naseem Department of Computer Science Department of Computer Science Principal Scientific Officer-PARC University of Karachi, Karachi-Pakist an University of Karachi, Karchi-Pak istan [email protected]   Abstract: As technology improve, attackers are trying to get access of the network system resources by so many means, open loop holes in the network allow them to penetrate in the network more easily. Various approaches are tried for classification of attacks. In this paper we have compared two methods Naïve Bayes and Junction Tree Algorithm on reduced set of features by improving the performance as compared to full data set. For feature reduction PCA is used that helped in proposing a new method for efficient classification. We proposed a Bayesian network-based model with reduced set of features for Intrusion Detection. Our proposed method generates a less false positive rate that increase the detection efficiency by reducing the workload and that increase the overall performance of an IDS. We also investigated that whether conditional independence really effect on the attacks/ threats detection.  Keywords-Network Intrusion Detection System(NIDS);  Bayesain Networks; Junction Tree Algorithm I. INTRODUCTION Network Security whether in a commercial organization or in a critically important research network, is a major issue of concern with the increasing use of web even the personal information in under threat. Efficient network intrusion detection system is only solution to such threats [4]. IDS is a monitoring system of networks to control / avoid / secure the networks from cyber terrorist or it is the process of examing the events occurring in a network or computer system and detecting the signs of incidents which are the threats of computer security policies. Network system monitored by the IDS for detection of any rules violation. Having such violation in the system, efficient IDS generates notification by means of an alarm generation that alert the administrator to put some steps/major according to such vulnerabilities. Common intrusion attacks are classified based on various features/ parameter. KDD-99 data set usually used for investigating the nature of attack. The data set has 41 features listed. Information value of these features and interdependence among them is an interest of investigation. How much reduction in features can be made without reducing the efficiency of classification algorithm and whether interdependency really contributes to detection efficiency? We are tried to find the answers of such kind of questions in this paper. PCA is an effective data dimension reduction technique. Similarly Naïve Bayes’ classifier and Bayesian Network both use probabilistic approach for determination of attack probability. Naïve Bayes’ classifiers assume conditional independence while Bayesian network consider assumes conditional dependence. Two methods can be used to compare whether conditional independency or interdependency really contribute to probability of attack. In the next section we discussed some related works which are already proposed, in section 3 we discussed the two methods of classification, in section 4 the methodology is mentioned and finally in section 5 results and discussions are presented. II. BACKGROUND For intrusion most network based systems become the target to the hacker, so building efficient IDS is the main task now a day [4]. Intrusion based systems needs a component that generates an alerts on the basis of rule set, to detect the malicious activity correctly it is necessary to manage the alerts correctly [1]. Data Mining approaches are being applied by researchers for the attacks detection in their Intrusion Detection Systems[2]..Probabilistic approaches for reducing the false alarm rate are proposed for example, see [3]. The enormous amount of network data traffic is accumulated each day. Numbers of data mining approaches are used for collecting knowledge domain for intrusion detection which includes clustering, association rules and classification [12]. Data analysis supports by data mining techniques and now it becomes one of the important features/component in intrusion based system. The main concern of using data mining techniques in attacks detection system to differentiate between normal packet vs abnormal. For applying data mining in intrusion detection we need a data set and a classification model. That classification model may be Ba  yesian Network, neural network, rule based decision tree based and other s oft computing techniques as Support Vector Machines(SVM) [10,11] . Intrusion Detection System is now becomes the necessicity for an organizational security system with its credibility that may depend upon the data mining techniques. 2.1 Clustering The process of labeling data and arranging it in groups is called clustering. By grouping we basically improve the performance of different classifiers used. The genuine cluster contains data corresponding to single category [5]. The data set belongs to the cluster is modeled with respect to them exciting 168 http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Upload: ijcsis

Post on 09-Apr-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient Probabilistic Classification Methods for NIDS

8/8/2019 Efficient Probabilistic Classification Methods for NIDS

http://slidepdf.com/reader/full/efficient-probabilistic-classification-methods-for-nids 1/5

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 8, No. 8, November 2010

Efficient Probabilistic Classification Methods for

NIDS

S.M.Aqil Burney M.Sadiq Ali Khan Mr.Jawed Naseem

Department of Computer Science Department of Computer Science Principal Scientific Officer-PARCUniversity of Karachi, Karachi-Pakistan University of Karachi, Karchi-Pakistan

[email protected]  

 Abstract: As technology improve, attackers are trying to get

access of the network system resources by so many means, openloop holes in the network allow them to penetrate in the networkmore easily. Various approaches are tried for classification of attacks. In this paper we have compared two methods NaïveBayes and Junction Tree Algorithm on reduced set of features byimproving the performance as compared to full data set. Forfeature reduction PCA is used that helped in proposing a new

method for efficient classification. We proposed a Bayesiannetwork-based model with reduced set of features for IntrusionDetection. Our proposed method generates a less false positiverate that increase the detection efficiency by reducing theworkload and that increase the overall performance of an IDS.We also investigated that whether conditional independencereally effect on the attacks/ threats detection.

  Keywords-Network Intrusion Detection System(NIDS); Bayesain Networks; Junction Tree Algorithm

I.  INTRODUCTION

Network Security whether in a commercial organization orin a critically important research network, is a major issue of 

concern with the increasing use of web even the personalinformation in under threat. Efficient network intrusiondetection system is only solution to such threats [4].

IDS is a monitoring system of networks to control / avoid / secure the networks from cyber terrorist or it is the process of examing the events occurring in a network or computer systemand detecting the signs of incidents which are the threats of computer security policies. Network system monitored by theIDS for detection of any rules violation. Having such violationin the system, efficient IDS generates notification by means of an alarm generation that alert the administrator to put somesteps/major according to such vulnerabilities. Commonintrusion attacks are classified based on various features/ 

parameter. KDD-99 data set usually used for investigating thenature of attack. The data set has 41 features listed. Informationvalue of these features and interdependence among them is aninterest of investigation. How much reduction in features canbe made without reducing the efficiency of classificationalgorithm and whether interdependency really contributes todetection efficiency? We are tried to find the answers of suchkind of questions in this paper. PCA is an effective datadimension reduction technique. Similarly Naïve Bayes’classifier and Bayesian Network both use probabilistic

approach for determination of attack probability. Naïve Bayes’classifiers assume conditional independence while Bayesiannetwork consider assumes conditional dependence. Twomethods can be used to compare whether conditionalindependency or interdependency really contribute toprobability of attack. In the next section we discussed somerelated works which are already proposed, in section 3 wediscussed the two methods of classification, in section 4 the

methodology is mentioned and finally in section 5 results anddiscussions are presented.

II.  BACKGROUND 

For intrusion most network based systems become thetarget to the hacker, so building efficient IDS is the main task now a day [4]. Intrusion based systems needs a component thatgenerates an alerts on the basis of rule set, to detect themalicious activity correctly it is necessary to manage the alertscorrectly [1]. Data Mining approaches are being applied byresearchers for the attacks detection in their Intrusion DetectionSystems[2]..Probabilistic approaches for reducing the falsealarm rate are proposed for example, see [3]. The enormous

amount of network data traffic is accumulated each day.Numbers of data mining approaches are used for collectingknowledge domain for intrusion detection which includesclustering, association rules and classification [12]. Dataanalysis supports by data mining techniques and now itbecomes one of the important features/component in intrusionbased system. The main concern of using data miningtechniques in attacks detection system to differentiate betweennormal packet vs abnormal. For applying data mining inintrusion detection we need a data set and a classificationmodel. That classification model may be Ba  yesian Network,neural network, rule based decision tree based and other soft computing techniques as Support Vector Machines(SVM)[10,11]. Intrusion Detection System is now becomes the

necessicity for an organizational security system with itscredibility that may depend upon the data mining techniques.

2.1  Clustering

The process of labeling data and arranging it in groups iscalled clustering. By grouping we basically improve theperformance of different classifiers used. The genuine clustercontains data corresponding to single category [5]. The data setbelongs to the cluster is modeled with respect to them exciting

168 http://sites.google.com/site/ijcsis/ISSN 1947-5500

Page 2: Efficient Probabilistic Classification Methods for NIDS

8/8/2019 Efficient Probabilistic Classification Methods for NIDS

http://slidepdf.com/reader/full/efficient-probabilistic-classification-methods-for-nids 2/5

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 8, No. 8, November 2010

features. You may define the term clustering in such a way that

it refers as unsupervised machine learning mechanism forpatterns matching in unlabeled data with numerousaspects.

2.2  Classification

In classification we break the data sets into different classesand it is much less exploratory than clustering. By means of classification we need to classify data into set of classes normal

  /not normal and to sub classify into different types. NaïveBayes’ used as a classification algorithm in this research bywhich data classification for intrusion detection be achieved.Due to the collection of huge amount of data traffic neededclassification is less famous [6].

III.  CLASSIFICATION METHODS 

3.1 Naïve Bayes Classifier 

Naïve Bayes classifier is an effective technique forclassification of data. The technique is particularly useful forlarge data dimension. The Naïve Bayes is a special case of Bayes theoram which presuppose independence in dataattributes [7]. Even though Naïve Bayes assumes dataindependence, its performance is efficient and at par with othertechniques assuming data conditionality. Naïve Bayes classifiercan manage continuous or categorical data. Let for a set of given variable X={x1,x2,.....xn } with possible outcomesO={o1,o2,…..on}. The posterior probability of the dependentvariable is obtained by Bayes rule.

P(O j | x1,x2,.....xn) * P(x1,x2,.....xn)O j P(O j)

We can obtain a new case with X with a class label O j havehighest posterior probability as

d

The efficiency of Naive Bayes classifier lies in the fact thatit converts multi dimensionality of data to one dimensionaldensity estimation. The occupations of evidence do not affectthe posterior probability so generally classification task isefficient. The same is proved in this study also when Naive

Bayes classifier is compared with Junction Tree algorithm. Formodeling Naive Bayes classifier several distribution includingnormal gamma or Poisson density function can be employed.

 3.2  Junction Tree Algorithm 

Its a graphical method of belief updation or probabilisticreasoning. For Probabilistic reasoning, we are using BayesianNetworks and Decision Graphs (BNDG) for which details canbe found in [9]. The basic concept in junction tree is clusteringof predicted attributes [8]. In belief updation instead of approximating joint probability distribution of all targetedvariable (cliques) cluster attributes are formed and potential of clusters are used to approximate probability. So basically

 junction tree is the graphical representation of potential clusternodes or cliques and a suitable algorithm to update thispotential. Junction tree algorithm involve several steps as

moralizing the graph, triangulation junction tree formulation,assigning probabilities to cliques, message passing and readingcliques marginal potentials from junction tree.

Using Junction tree algorithm requires that directed graph

is changed to undirected graph to ensure uniform applicationprocess is called moralization which involve adding edges

between parents and dropping the direction let = (

be a directed graph to be changed into undirected graph G(NG,EG) so infect two new sets along with EG required to beadded i.e.

and

The set can be defined as

In moralization is obtained and newundirected moralized graph is given as

Junction tree is formed after moralization which is basicallyhyper graphs of cliques if cliques of undirected graph G isgiven by C(G) than junction tree with a unique property thatintersections of any two nodes is contained in every node in theunique path joining the nodes.

Let consider a cluster representation having to neighborcluster U and V sharing a variable S in common

169 http://sites.google.com/site/ijcsis/

ISSN 1947-5500

Page 3: Efficient Probabilistic Classification Methods for NIDS

8/8/2019 Efficient Probabilistic Classification Methods for NIDS

http://slidepdf.com/reader/full/efficient-probabilistic-classification-methods-for-nids 3/5

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 8, No. 8, November 2010

The aim of JTA is to modify potential in such a way that

the distribution of P (V) is obtained by modified potentialΨ(V). In such case probability of S can be given as 

P(S)= ∑ Ψ(V) 

Similarly

P(S) = ∑ Ψ(U) 

Let Ψ(S) represent modified potential so Ψ(S) = P(S), sonow if potential of let say Ψ(V) is delayed as result of newevidence f  the potential of both Ψ(S) & Ψ(U) can be updatedrealizing the equivalence

Ψ(U) = P(S) = Ψ(V) 

Belief updation in junction tree is carried out throughmessage passing let U and V are two adjacent node withseparator S. so the task is to absorb V and W through S.

 potential Ψ(W) and Ψ(S) with condition 

∑ Ψ*(W) = Ψ*(S) = ∑ Ψ*(V) 

In absorption Ψ*(S) and Ψ*(W) are replaced as under Ψ*(S) = ∑ Ψ(V) 

Ψ(S) Ψ*(W) = Ψ (W)

Ψ(S) 

In this way belief of the whole network is updated throughmessage passing.

IV. METHODOLOGY 

KDD’99 data set of intrusion detection was used. PCAtechnique was used and 14 features were selected on the basisof analysis. Selection of data set for training and testing plays avital role in accuracy of prediction. In intrusion detectionfrequency of some attacks are very large as compare to others.To ensure inclusion of all attacks type in learning stratifiedrandom sample were drawn relative to proportion of eachattack type. This produces better result as compare to simple

random sampling. For Naive Bayes classification two data sets(stratified sample of equal size of 10000) were used forlearning and testing using software BN classifier . In junctiontree algorithm structure learning is carried out by drawing arandom sample of 5000 from KDD data sets using netica. Then

five data sets each of size 1000 are selected through simplerandom sample, data set is used for learning and drawing

 junction tree. Data set 2 to 5 were used for testing belief updatelearned by junction tree.

V. RESULTS & DISCUSSION 

The 41 features of KDD’99 data set were reduced to 14features. The PCA identified 12 major components havingEigen values greater than and around more than 80%variability of data explained by these features while 98%variability can be explained 24 components.

The difference of variability between 24 and 14 featuresselection is only 18% but computational cost highly increasedif 24 parameters are selected, so optimize the processing speed14 has been selected. It is evident from the graph mentionedabove that first 24 components represent 98.866% data and 14components explained 80% variability which is quite sufficient,and work was carried out on these components only, neglectingthe other components which seem less worthy. Besides this,structure learning also support selection of 14 features. TheBayesian network model shown in Figure 2 representsinterdependence among various attributes. It is evident thatmainly two factors as count & src_byte are effected byvarious features and in turn these two ultimately affect theattack types. The KDD’99 data set classification list 18 attack 

types however normal & neptune are more frequent.

Figure 1: Scree Plot of attributes.

Ψ(U)  Ψ(V) Ψ(S) 

US

V

170 http://sites.google.com/site/ijcsis/

ISSN 1947-5500

Page 4: Efficient Probabilistic Classification Methods for NIDS

8/8/2019 Efficient Probabilistic Classification Methods for NIDS

http://slidepdf.com/reader/full/efficient-probabilistic-classification-methods-for-nids 4/5

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 8, No. 8, November 2010

Figure:2 Bayesian Network Model Intrusion Detection System

BN classification also supports the importance of these twotype normal (0.527) and neptune (0.399) in Table 1. Theprobability of features buffer overflow, imap and multihop areless than 0.001% and that of  ftp_write, guess_password  andload_module are close to 0. It suggests that this classificationcan be merged.

Prediction Accuracy

4271

3729

4287

3713

3400

3600

3800

4000

4200

4400

normal Attack

     N    u    m     b    e    r    s

Actual Predicted

 

Figure3: Prediction accuracy using BN Classifier

Figure 4 shows majors attacks category predictions. DoSattacks are 99.86% detected while probe attacks about 75%detected.

Prediction Accuracy of Major Attack Category

0

500

1000

1500

2000

2500

3000

3500

Attack Category

     N    u    m     b    e    r    s

Actual Predicted

Actual 2872 726 7 3

Predicted 2868 538 2 1

DoS Probe R2l U2R

 

Figure 4: Prediction Accuracy of Major Attacks

BN classifier learned more effectively the attack which is morefrequent. In case of identify normal attacks it showed error rateof 0.8% only and identification of most frequent attack neptune 

is 6.8% refers in table 1.

TABLE 1 ACCURACY OF CLASSIFICATION(BAYESIAN CLASSIFIER)

Class Actual Predicted Diff Error %

back 62 62 0 0

buffer_overflow 2 0 2 100

guess_passwd 3 0 3 100

imap 2 0 2 100

ipsweep 225 284 -59 -26.2

multihop 1 0 1 100

neptune 2630 2587 43 1.6

nmap 96 35 61 63.5

normal 4271 4287 -16 -0.37

phf 1 0 1 100

pod 12 0 12 100portsweep 186 219 -33 -17.7

rootkit 1 0 1 100

satan 219 273 -54 -24.6

smurf 168 180 -12 -7.1

teardrop 60 39 21 35

warezclient 57 34 23 40.35

warezmaster 4 0 4 100

Total 8000 8000

TABLE 2. PROBABILITY OF ATTACK(AVERAGE)

Class Junction

Tree

Naïve Bayes

Classifier

Diff 

back 0.0102 0.0086 0.0016

buffer_overflow 0.0008 0.001 -0.0002imap 0.0006 0.0005 0.0001

ipsweep 0.0368 0.0368 0

multihop 0.0002 0 0.0002

neptune 0.3992 0.3936 0.0056

nmap 0.0176 0.0147 0.0029

normal 0.527 0.5432 -0.0162

Total 1 1

171 http://sites.google.com/site/ijcsis/

ISSN 1947-5500

Page 5: Efficient Probabilistic Classification Methods for NIDS

8/8/2019 Efficient Probabilistic Classification Methods for NIDS

http://slidepdf.com/reader/full/efficient-probabilistic-classification-methods-for-nids 5/5

(IJCSIS) International Journal of Computer Science and Information Security,

Vol. 8, No. 8, November 2010

Using junction tree algorithm accuracy of identification isutmost 98%. Junction tree also identified neptune as mostfrequent attack. Probability identified of various attacks isdepicted in table 2. It is evident that estimation of probabilityalmost equal. This has been statistically compared that there isno significance difference between two methods. Frequenciesof remaining attacks are very small and their probability almost

near to zero.

Probability of Attack

0

0.1

0.2

0.3

0.4

0.5

0.6

   B  A  C   K

   B   U   F   F   E   R_ 

  O   V   E   R

   F   L  O   W

 

   F   T   P_    W   R

   I   T   E

  G   U   E  S  S_    P  A

  S  S   W   D   I   M

  A   P

   I   P  S   W   E

   E   P   L  A   N   D

   L  O  A   D   M  O

   D   U   L   E

   M   U   L   T   I   H  O   P

   N   E   P   T   U   N

   E   N   M

  A   P

   N  O   R   M

  A   L

Attack Type

     P    r    o     b    a     b     i     l     i     t    y

Avg JT Naïve Bayes 

VI.  CONCLUSION & FUTURE RECOMMENDATIONS  

Despite the fact that Naïve Bayes classifiers assumeconditional independence and junction tree algorithmparameter interdependence, even though Naïve Bayes and

  junction tree classifiers are almost equally effective. It isrecommended that only those attacks should be considered

which are more frequents in order to achieve betterperformance. It is also found that in selection of learning andtesting data set appropriate sampling techniques are utilized forbetter result prediction.

REFERENCES 

[1] Moon Sun Shin, Eun Hee Kim, and Keun Ho Ryu, “ False Alarmclassification model for network-based IDS”; Springer-verlag berlin

Heidelberg, LNCS 3177, pp. 259–265, 2004.

[2] M.J.Lee,M.S.Shin,H.S.Moon,” Design and implementation of alertanalyzer with data mining engine. Proc. IDEAL ’03, Hongkong, 2003.

[3] A.Valdes and K. Skinner, “Probabilistic alert correlation”; 4thinternational symposium on Recent Advances in ID, RAID, 54-68, 2003.

[4] S.M.Aqil Burney and M.Sadiq Ali Khan , “Network UsageSecurity Policies for Academic Institutions”, International Journal of Computer Applications, October Issue, Published By Foundation of Computer

Science,2010.

[5] Anoop Singhal and Sushil Jajodia, “Data warehousing and datamining techniques for intrusion detection systems”, Distributed and Parallel

Databases Volume 20, Number 2, 149-166, DOI: 10.1007/s10619-006-9496-5,2006.

[6] Tasleem Mustafa, Ahmed Mateen, Ahsan Raza Sattar, Nauman ul

Haq and M. Yahya Saeed,“Forensic Data Security for Intrusions”, EuropeanJournal of Scientific Research ISSN 1450-216X Vol.39 No.2 (2010), pp.296-

308,2010.

[7] Karl Friston, Carlton Chu, Jnaina Mourao,Oliver Hulme, GeriantRees, Will Penny and John Ashburner, “Bayesian decoding of brain images”,

Elsevier NeuroImage Volume 39, Issue 1, 1, Pages 181-205, January 2008.

[8] Jaydip Sen, “An agent-based intrusion detection system for localarea networks”,IJCNIS, Vol. 2, No. 2, August 2010.

[9] F.V.Jensen and T.S.nielsen, “ Bayesian Networks and DecisionGraphs” Springer.Berlin Heidelberg, New York,2007.

[10] C.Cortes and V. Vapnik,“ Support Vector Networks”. Machine

Learning, 20, 1995, pp. 273-297,1995.

[11] Jungtaek Seo,“ An Attack Classification Mechanism Based onMultiple Support Vector Machines”, LNCS 4706, Part II, pp. 94–103,

Springer-Verlag Berlin Heidelberg, ICCSA 2007.

[12] Hebah H. O. Nasereddin, “Stream Data Mining”, InternationalJournal of Web Applications, Volume 1 Number 4 December 2009.

AUTHORS PROFILE

Dr.S.M.Aqil Burney is the Meritorious Professor and apSupervisor in Computer Science and Statistics by the

Education Commission, Govt of Pakistan. He is also the D& Chairman of Computer Science Department, Univer

Karachi. Additionally he is also a Director of Communication Network University of Karachi. He i

member of various higher academic boards of diuniversities of Pakistan. His research interest includes A

Computing, Neural Network, Fuzzy Logic, Data MStatistics, Simulation and Stochastic Modeling of M

Communication system and Networks, Network SecuritMIS in health services. Dr.Burney is also referee of v

  journals and conferences proceedings, nationinternationally. He is member of IEEE(USA), ACM(USA

M.Sadiq Ali Khan received his BS & MS Degree in CompEngineering from SSUET in 1998 and 2003 respectiv

Since 2003 he is serving Computer Science DepartmUniversity of Karachi as an Assistant Professor. He has ab

12 years of teaching experience and his research aincludes Data Communication & Networks, Network Secu

Cryptography issues and Security in Wireless Networks. H

member of CSI, PEC and NSP.

Jawed Naseem is Principal Scientific Officer in Paki

Agricultural Research Council. He has M.Sc(Statistics) MCS from University of Karachi, currently doing

(Computer Science) from University of Karachi. His resea

interest are data modeling, Information ManagementSecurity and Decision Support System particularlyagricultural research. He has been a team member

development of several regional(SAARC) level agricultdatabases.

172 http://sites.google.com/site/ijcsis/

ISSN 1947-5500