a comparison of feature-selection methods for intrusion ......a comparison of feature-selection...

33
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/220697162 A Comparison of Feature-Selection Methods for Intrusion Detection Conference Paper · September 2010 DOI: 10.1007/978-3-642-14706-7_19 · Source: DBLP CITATIONS 31 READS 648 3 authors, including: Some of the authors of this publication are also working on these related projects: Writer identification View project SBP08 Workshop on Social Computing, Behavior Modeling and Prediction View project Hai Thanh Nguyen National Institute for Research in Computer Science and Control 35 PUBLICATIONS 408 CITATIONS SEE PROFILE Slobodan Petrovic Norwegian University of Science and Technology 58 PUBLICATIONS 564 CITATIONS SEE PROFILE All content following this page was uploaded by Katrin Franke on 20 May 2014. The user has requested enhancement of the downloaded file.

Upload: others

Post on 31-Jan-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

  • See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/220697162

    A Comparison of Feature-Selection Methods for Intrusion Detection

    Conference Paper · September 2010

    DOI: 10.1007/978-3-642-14706-7_19 · Source: DBLP

    CITATIONS

    31READS

    648

    3 authors, including:

    Some of the authors of this publication are also working on these related projects:

    Writer identification View project

    SBP08 Workshop on Social Computing, Behavior Modeling and Prediction View project

    Hai Thanh Nguyen

    National Institute for Research in Computer Science and Control

    35 PUBLICATIONS   408 CITATIONS   

    SEE PROFILE

    Slobodan Petrovic

    Norwegian University of Science and Technology

    58 PUBLICATIONS   564 CITATIONS   

    SEE PROFILE

    All content following this page was uploaded by Katrin Franke on 20 May 2014.

    The user has requested enhancement of the downloaded file.

    https://www.researchgate.net/publication/220697162_A_Comparison_of_Feature-Selection_Methods_for_Intrusion_Detection?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_2&_esc=publicationCoverPdfhttps://www.researchgate.net/publication/220697162_A_Comparison_of_Feature-Selection_Methods_for_Intrusion_Detection?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_3&_esc=publicationCoverPdfhttps://www.researchgate.net/project/Writer-identification?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_9&_esc=publicationCoverPdfhttps://www.researchgate.net/project/SBP08-Workshop-on-Social-Computing-Behavior-Modeling-and-Prediction?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_9&_esc=publicationCoverPdfhttps://www.researchgate.net/?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_1&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Hai-Nguyen-122?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_4&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Hai-Nguyen-122?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_5&_esc=publicationCoverPdfhttps://www.researchgate.net/institution/National_Institute_for_Research_in_Computer_Science_and_Control?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_6&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Hai-Nguyen-122?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_7&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Slobodan-Petrovic-2?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_4&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Slobodan-Petrovic-2?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_5&_esc=publicationCoverPdfhttps://www.researchgate.net/institution/Norwegian-University-of-Science-and-Technology2?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_6&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Slobodan-Petrovic-2?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_7&_esc=publicationCoverPdfhttps://www.researchgate.net/profile/Katrin-Franke-2?enrichId=rgreq-093e758929536d774cc37a0c399e9541-XXX&enrichSource=Y292ZXJQYWdlOzIyMDY5NzE2MjtBUzo5ODgyOTE1MjAzMDczM0AxNDAwNTc0MTE0NjEx&el=1_x_10&_esc=publicationCoverPdf

  • A Comparison of Feature-Selection Methods for Intrusion

    Detection

    Hai Thanh Nguyen, Slobodan Petrovid and Katrin Franke

    Gjøvik University College, Norway

  • Introduction

    • The problem of intrusion detection

    – Analyzed as a pattern recognition problem

    • Has to tell normal from abnormal behavior of network traffic and/or command sequences on a host

    • Classifies further abnormal behavior to undertake adequate counter-measures

    MMM-ACNS-2010 2

  • Introduction

    • Models of IDS usually include

    – A representation algorithm

    • Represents incoming data in the space of selected features

    – A classification algorithm

    • Maps the feature vector representation of the incoming data to elements of a certain set of values (e.g. normal, abnormal, etc.)

    MMM-ACNS-2010 3

  • Introduction

    • Some IDS also include a feature selection algorithm

    – Determines the features to be used by the representation algorithm

    • If a feature selection algorithm is not included in the IDS model, it is assumed that a feature selection algorithm is run before the intrusion detection process

    MMM-ACNS-2010 4

  • Introduction

    • The feature selection algorithm

    – Determines the most relevant features of the incoming traffic

    • Monitoring of those features ensures reliable detection of abnormal behavior

    • The number of selected features heavily influences the effectiveness of the classification algorithm

    MMM-ACNS-2010 5

  • Introduction

    • The task of the feature selection algorithm

    – Minimize the cardinality of selected features without dropping potential indicators of abnormal behavior

    • Feature selection for intrusion detection

    – Manual (mostly) – based on expert knowledge

    – Automatic

    MMM-ACNS-2010 6

  • Introduction

    • Automatic feature selection

    – The filter model

    • Considers statistical characteristics of a data set directly

    • No learning algorithm involved

    – The wrapper model

    • Assesses the selected features by evaluating the performance of the classification algorithm

    MMM-ACNS-2010 7

  • Introduction

    • Individual feature evaluation is based on

    – Their relevance to intrusion detection

    – Relationships with other features

    • Such relationships can make certain features redundant

    • Relevance and relationship are characterized in terms of

    – Correlation

    – Mutual information

    MMM-ACNS-2010 8

  • Introduction

    • We focus on 2 feature selection measures for the IDS task

    – Correlation feature selection (CFS)

    – Minimal-redundancy-maximal-relevance (mRMR)

    • Both feature selection measures contain an objective function, which is maximized over all the possible subsets of features

    MMM-ACNS-2010 9

  • Introduction

    • Hai et. al. proposed a solution to the problem of maximization of the objective functions in the CFS and mRMR measures

    – Based on polynomial mixed 0-1 fractional programming (PM01FP)

    MMM-ACNS-2010 10

  • Introduction

    • Here we compare CFS and mRMR solved by means of PM01FP with some feature selection measures previously used in intrusion detection

    – SVM wrapper

    – Markov blanket

    – CART (Classification and Regression Trees)

    MMM-ACNS-2010 11

  • Introduction

    • The comparison is practical, on a particular data set (KDD CUP ’99)

    – SVM, Markov blanket and CART were originally evaluated on that data set

    • To avoid known problems with KDD CUP ’99

    – It was split into 4 parts: DoS, Probe, U2R and R2L

    – Only DoS and Probe attacks were considered, since they significantly outnumber the other 2 categories

    MMM-ACNS-2010 12

  • Introduction

    • Comparison by

    – The number of selected features

    – Classification accuracy of the machine learning algorithms chosen as classifiers

    MMM-ACNS-2010 13

  • Feature selection methods

    • Existing approaches – SVM wrapper (1)

    – A feature ranking method – one input feature is deleted from the input data set at a time

    – The resulting data set is then used for training and testing of the SVM (Support Vector Machine) classifier

    – The SVM’s performance is then compared to that of the original SVM (based on all the features)

    MMM-ACNS-2010 14

  • Feature selection methods

    • Existing approaches – SVM wrapper (2)

    – Criteria for SVM comparison

    • Overall classification accuracy

    • Training time

    • Testing time

    – Feature ranking

    • Important

    • Secondary

    • Insignificant

    MMM-ACNS-2010 15

  • Feature selection methods

    • Existing approaches – Markov blanket (1)

    – Markov blanket MB(T) of an output variable T

    • A set of input variables such that all other variables are probabilistically independent of T

    • Knowledge of MB(T) is sufficient for perfect estimation of the distribution of T and consequently for the classification of T

    MMM-ACNS-2010 16

  • Feature selection methods

    • Existing approaches – Markov blanket (2)

    – In IDS feature selection (1)

    • A Bayesian network B=(N,A,Q) from the original data set is constructed

    – N is the set of vertices – each node is a data set attribute

    – A is the set of arcs – each arc aA represents probabilistic dependency between the attributes (variables)

    – That probabilistic dependency is quantified using a conditional probability distribution qQ for each node nN

    MMM-ACNS-2010 17

  • Feature selection methods

    • Existing approaches – Markov blanket (3)

    – In IDS feature selection (2)

    • A Bayesian network can be used to compute the conditional probability of one node, given the values assigned to the other nodes

    • From the constructed Bayesian network the Markov blanket of the feature T is obtained

    MMM-ACNS-2010 18

  • Feature selection methods

    • Existing approaches – CART (1)

    – Classification and Regression Trees (CART)

    • Based on binary recursive partitioning– Binary – parent nodes are always split into exactly 2 child

    nodes

    – Recursive – In the next splitting, each child node is treated as a parent

    • Key elements of CART methodology– A set of splitting rules

    – Decision when the tree is complete

    – Assigning a class to each terminal node

    MMM-ACNS-2010 19

  • Feature selection methods

    • Existing approaches – CART (2)

    – In IDS feature selection

    • Contribution of the input variables to the construction of the decision tree is determined

    – By determining the role of each input variable

    » As the main splitter

    » As a surrogate

    • Feature importance – The sum across all nodes of the improvement scores

    MMM-ACNS-2010 20

  • Feature selection methods

    • The new approach (1)

    – A generic feature selection measure for the filter model

    – Binary variable xi indicates presence/absence of the feature fi

    – Ai and Bi are linear functions of xi

    MMM-ACNS-2010 21

    nnn

    i ii

    n

    i ii ,x,,xx,xxBb

    xxAaxGeFS 101

    10

    10

  • Feature selection methods

    • The new approach (2)

    – The feature selection problem: find x{0,1}n that maximizes the function GeFS(x), i.e.

    – Examples of instances of the GeFS measure

    • Correlation-feature selection (CFS)

    • Minimal-redundancy-maximal-relevance (mRMR)

    MMM-ACNS-2010 22

    xGeFS

    n,x 10

    max

  • Feature selection methods

    • The new approach (3)

    – Correlation-feature selection (CFS)

    • Based on the average value of all feature-classification correlations and the average value of all feature-feature correlations

    • Can be expressed as an optimization problem

    MMM-ACNS-2010 23

    n

    i ji jiiji

    n

    i i

    ,x xxbx

    xa

    n

    1

    2

    1 i

    10 2max

  • Feature selection methods

    • The new approach (4)

    – Minimal-redundancy-maximal relevance (mRMR)

    • Relevance and redundancy of features are considered simultaneously, in terms of mutual information

    • Can be expressed as an optimization problem

    MMM-ACNS-2010 24

    2

    1

    1

    1

    1

    10

    maxn

    i i

    n

    j,i jiij

    n

    i i

    n

    i ii

    ,xx

    xxa

    x

    xc

    n

  • The solution

    • Solving the feature selection problem (1)

    – Represent it as a polynomial mixed 0-1 fractional programming (PM01FP) task

    under the constraints

    MMM-ACNS-2010 25

    m

    in

    j Jk kiji

    n

    j Jk kiji

    xbb

    xaa

    11

    1min

    n

    j Jk kpjp

    n

    j Jk kiji

    m,,p,xcc

    m,,i,xbb

    1

    1

    10

    10

  • The solution

    • Solving the feature selection problem (2)

    – Linearize the PM01FP program to get a Mixed 0-1 Linear Programming (M01LP) problem

    – The M01LP problem can be solved e.g. by means of the branch and bound method

    – In our solution, the number of variables and constraints in the M01LP problem is linear in the number n of full-set features

    MMM-ACNS-2010 26

  • Experimental results

    • GeFSCFS and GeFSmRMR were implemented

    • The goal

    – Find optimal feature subsets by means of those measures

    – Compare the obtained feature subsets with those obtained with the previously analyzed methods

    • By the cardinalities of the selected subsets

    • By accuracy of the classification

    MMM-ACNS-2010 27

  • Experimental results

    • The classification algorithm used in the experiments was the decision tree algorithm C4.5

    • 10% of the KDDCUP’99 data set was used

    • Only DoS and probe attacks were analyzed, for the same reason

    MMM-ACNS-2010 28

  • Experimental results

    • Thus, 2 data sets were generated

    – Normal traffic + DoS attacks

    – Normal traffic + probes

    • Classification into 2 classes

    • GeFSCFS and GeFSmRMR were run first on both data sets, to select features

    • Then the classification algorithm C4.5 was run on the full-sets and the selected feature sets

    MMM-ACNS-2010 29

  • Experimental results

    • The numbers of selected features (on average)

    MMM-ACNS-2010 30

  • Experimental results

    • Classification accuracy (on average)

    MMM-ACNS-2010 31

  • Conclusions

    • The GeFS measure instances (CFS and mRMR) performed better than the other measures involved in the comparison

    – Better (CFS) in removing redundant features

    – Classification accuracy sometimes even better and in general not worse than with the other methods

    MMM-ACNS-2010 32

    View publication statsView publication stats

    https://www.researchgate.net/publication/220697162