improving network intrusion detection with growing...

Improving Network Intrusion Detection with Growing Hierarchical Self-Organizing Maps

Andres Ortiz1

, Julio Ortega2 , Alberto Prieto

2

, Antonio F. Díaz2

1 Communications Engineering Department. University of Malaga.

29004 Malaga, Spain

2 Department of Computer Architecture and Technology. University of Granada.

18060 Granada, Spain

Abstract - Nowadays, the growth of the computer networks

and the expansion of the Internet have made the security to be

a critical issue. In fact, many proposals for Intrusion

Detection/Prevention Systems (IDS/IPS) have been proposed.

These proposals try to avoid that corrupt or anomalous traffic

reaches the user application or the operating system.

Nevertheless, most of the IDS/IPS proposals only distinguish

between normal traffic and anomalous traffic that can be

suspected to be a potential attack. In this paper, we present a

IDS/IPS approach based on Growing Hierarchical Self-

Organizing Maps (GHSOM) which can not only differentiate

between normal and anomalous traffic but also identify

different known attacks. The proposed system has been trained

and tested using the well-known DARPA/NSL-KDD datasets

and the results obtained are promising since we can detect

over 99,4% of the normal traffic and over 99,2 % of attacker

traffic. Moreover, the system can be trained on-line by using

the probability labeling method presented on this paper.

Keywords: IDS, IPS, Attack Classification,

Self-Organizing Maps, Growing Self-

Organizing Maps, SOM relabeling, clustering.

1 Introduction

The interest in computer network security has increased

in recent years, as the trend to on-line services available

through the Internet have exposed a lot of sensitive

information to intruders and attackers [1]. Although there are

several methods to protect the information, there is not any

infallible encryption method and the encryption/decryption

process can impose a high overhead in high speed networks

that use the TCP/IP protocol stack. On the other hand, the

complexity of the newer attacks make necessary the use of

elaborate techniques such as pattern classification or artificial

intelligence techniques for successfully detecting an attack or

just to differentiate among normal and abnormal traffic. IDS

and IPS systems are active systems which implement a

protection by continuously monitoring the network. They

calculate some traffic features to be able to classify it, to

detect abnormal behaviors and to react according to some

predefined rules. There are two design approaches to IDS/IPS

systems [2]. The first one consists on looking for patterns

corresponding with known signatures of intrusions. The

second one searches for abnormal patterns by more complex

features, to discover not only intrusions but also potential

intrusions.

Several neural and machine learning techniques have

been used for implementing IDS/IPS systems [2-6]. In [2],

perceptron-like neural networks are used for traffic

classification as well as fuzzy classifiers to improve the

decision step. In [3, 4], Self-Organizing Maps (SOMs) [5] are

applied to implement unsupervised clustering of the data

instances in order to classify the traffic anomalies according

to several known attack. In order to improve the classification

task either by distinguishing the normal traffic from

anomalies or by classifying the different attacks with high

accuracy, hierarchical SOM have been used in works.

Although the proposal on [3] tries to overcome some of the

difficulties on the static structure of classic SOMs splitting

the SOM into three smaller SOMs, the size of these maps is

still static.

An alternative to avoid the limitations of the classical

SOM is the Growing Hierarchical SOM (GHSOM) [7]. It

presents a dynamic hierarchical structure composed of several

layers, with several SOMs in each layer. The number of

SOMs in each layer and their respective sizes are determined

in the GHSOM training process. In this paper, GHSOM is

used for both, anomaly detection and attack classification.

Moreover, a probability-based mechanism applied to the

previously trained structure is used to label the units when the

GHSOM is applied to new data instances.

After this introduction, the remainder of this paper is

organized as follows. Section 2 shows the features of the

NSL-KDD dataset that, as in most of the IDS/IPS works, has

been used to evaluate our system. Section 3 describes our

proposals for data preprocessing and for applying GHSOM to

attack classification. In Section 4, the results obtained are

shown, and finally, Section 5 provides the conclusions of the

paper and our future work

2 Attack Classification with GHSOM

The first step in IDS and IPS implementation is the

extraction of some significant features from the captured

traffic. The features on the NSL-KDD dataset [8] (the

benchmark set we have used in this work) are split into three

classes: basic features, content-based features, and traffic

features. The main reason for having these three groups of

features is that detecting and identifying some attacks

requires the use of more than just one feature class. For

instance, time-based features are necessary to detect some

attacks, as some statistics should be calculated over a certain

time period. Thus, the first step we perform consists of

parsing the dataset in order to extract the features from the

text files and to build the vectors which comprise the feature

space. This feature space is composed of vectors belonging to

R41

which contains the connection traffic features. Examples

of these features are the duration of the connection, the

protocol type, or the number of user-to-root attempts.

Nevertheless, the feature selection for network-based IDS is

not straightforward. This way, there are works [10, 11, 12]

which use multivariate techniques such as PCA [10] or LDA

[11]. The use of multivariate techniques try to obtain the

components with the highest variability, supposing the rest

are unimportant or just noise. Although good results are given

in [10, 11, 12] other works have shown that the full set of

features outperforms feature selection with PCA/LDA [13].

Moreover, feature reduction has to be applied to each input

vector since the most discriminate component depends on the

specific attack. Thus, in [18] a SOM classifier for attack

detection is presented without reducing the feature space.

However, the authors only use a selection of 28 features from

the 41 available in the NSL-KDD dataset, and the influence

of each feature is figured out through the U-Matrix of each

component belonging to the input (feature) space. Instead, in

this paper we propose the use of a dynamic structure such as

GHSOM and the full set of features.

2.1 SOMs and GHSOMs

SOM is a very useful tool for discovering structures and

similarities in high dimensional data, organizing the

information and visualizing it in a 2D or 3D way. Detailed

descriptions of SOM can be found elsewhere [5].

Nevertheless, SOM is not able to figure out the inherent

hierarchical structure of data [7], and, at the same time, the

performance of SOM depends on the size of the map which

has to be set in advance. Thus, GHSOM [7] is a hierarchical

and non-fixed structure developed to overcome the main

limitations of classical SOM. The structure of GHSOM

consists of multiple layers composed by several independent

SOMs. Hence, during the learning process, the number of the

SOMs on each layer and the size of each SOM are determined

by minimizing the quantization error. Thus an adaptive

growing process is accomplished by using two parameters τ1

and τ2 that respectively control the breadth of each map

(horizontal direction) and the growing of the hierarchy

(vertical direction). Therefore, these two parameters are the

only parameters that have to be set in advance.

In order to determine how far the GHSOM grows [7],

the quantization error of each unit is calculated according to

the Equation 1, where Ci is the set of input vectors mapped to

the i-th unit, xj is the j-th input vector belonging to Ci, and i

is the weight associated to the i-th unit.

(1)

Initially, all the input vectors belong to C0 . This means

all the inputs are used to compute the initial quantization

error, qe0. Then, the quantization errors qei for each neuron

are calculated. Thus, whenever qei< qe0, the i-th neuron is

expanded in a new map on the next level of the hierarchy.

Each new map is trained as an independent SOM, and the

calculation of its BMU (Best Matching Unit) is done by using

the Euclidean distance. Once the new map is trained, the

quantization error of each neuron on this map is computed.

Then, the mean quantization error MQEm of the new map can

be determined. Whenever MQEm<·qeu (qeu is the

quantization error of the unit u on the upper layer), the map

stops growing. The process has been schematized in Figure 1

and it is explained with detail in [9].

Figure 1. BMU calculation on the GHSOM hierarchy

To apply GHSOM to IDS/IPS, it is necessary to encode

some qualitative features included in the NSL-KDD dataset

due to the numeric nature of the GHSOM input vectors. For

example, this is the case of protocol, service and flag features.

The encoding of these features is performed by assigning

numeric values in order t keep a high enough distance among

them for the effectiveness of the SOM classifier. Specifically,

the number 1 and other prime numbers with distances higher

than 6 among them have been chosen. This coding has been

chosen in order to increase the distance among different

features. For example, in the component protocol in the input

vectors, TCP is encoded as 1, UDP as 7, and ICMP as 17. In

order to avoid that some features have more influence than

others, the input vectors have been normalized by subtracting

1 3

2 4

1 3

2 4

1 3

2 4

1 3

2 4

5

6

1 3

2 4

5

6

1 3

2 4

1 3

2 4

Map1

Map2 Map4 Map3

Layer 1

Layer 0

Layer 2

Map7 Map 6 Map 5

7

8

9

10

BMU=Map5, Unit4

the mean and dividing by the standard deviation (zero mean

and unity variance). Thus, each dimension takes a value

between 0 and 1, and the feature space belongs to R41

.

2.2 BMU calculation for GHSOM

In order to calculate the BMU in the GHSOM, we have

to go through all the hierarchy to determine the winning unit

and the map to which it belongs. Thus, an iterative algorithm

has been developed as shown in Figure 1, where an example

of BMU calculation on a three-level GHSOM hierarchy is

considered. After computing the distances between an input

pattern and the weight vectors of the map in layer 0, the

minimum of these distances is determined. Once, the winning

neuron on map 1 is found, since other map could be grown

from this winning neuron, we have to check whether the

wining neuron is a parent unit. This can be accomplished with

the parent vectors resulting from the GHSOM training

process. If a new map arose from the wining neuron, the

BMU on this new map is calculated. This process is repeated

until a BMU with no growing map is found. Thus, the BMU

in the GHSOM is identified inside a map in a layer of the

hierarchy (for example, map 5 and unit 4).

2.3 GHSOM training and relabeling

The GHSOM structure has been trained by using 10% of

the training samples provided by the NSL-KDD dataset.

Then, the system has been tested by using the rest of the

training patterns. As it is commented in Section 2.2, we have

used the full set of features. After several tests using different

values for 1 (to control the breadth of the map) and (to

control the depth), we have selected 1=0.6 and2=10-5

. This

could make the GHSOM to grow more than necessary,

leaving some of the units unlabeled. Thus, the number of

neurons on the output map surrounding the winning neuron

for training data is increased. When an input pattern similar to

one of the training patterns is presented to the GHSOM, the

wining neuron can be labeled or unlabeled. If the wining

neuron is unlabeled, a probability-based scheme is used in

order to determine the label of that neuron. More specifically,

the wining neuron is labeled (or relabeled) with the more

repeated label in its neighborhood by using a probability

calculated according to (2).

( , ) ( )u

u

M uP

n

(2)

In this expression, Pu is the probability for the winning

unit u to be successfully relabeled, where Mσ(u,ε)(u) is the label

that appears more frequently in the Gaussian neighborhood

u,, of the winning neuron, u , and n is the number of

neurons belonging to the neighborhood of u u, . In

this equation, parameter noted the width of the

neighborhood of the winning neuron.

Figure 2. Example of the relabeling process (white units are

unlabeled).

In Figure 2, an example of the relabeling process is

shown. In this Figure, the BMU that has not been initially

labeled, is labeled by using =1 to establish its neighborhood.

In this neighborhood, we found four units labeled as L1, one

unit labeled as L2 and one unit labeled as L3. Then, Pu, the

probability for successful relabeling for this BMU is

4/6=0.66 (66%) (Mσ(u,ε)(u)) =4, n=6). this BMU is 4/6=0.66

(66%) (Mσ(u,ε)(u)) =4, n=6).

3 Experimental results

In this section, we present the experimental results

obtained with the NSL-KDD dataset [9] and the GHSOM

classifier described in Section 2. In Figure 3, the detection

success for each type of attack included in the NSL-KDD

dataset is shown. As this figure shows, the probability-based

labeling process performed with new data (black bar in Figure

2) increases the detection success for most attacks. Moreover

some attacks such as multihop, are not detected before the

unit relabeling process.

In order to show the effectiveness of the relabeling

process due to the associated probability, the ROC (receiver

operating characteristic) curves are shown in Figure 4. They

constitute an effective alternative to evaluate the performance

of a classifier.

Figure 3. Detection success with (black bar) and without (white bar) unit relabeling

Figure 4.a shows the ROC curve (false positive rate) and

Figure 5 the mirrored ROC curve (true negative rate).

Regarding a measure of performance derived from these

curves, we computed the Area Under ROC Curve (AUC).

The use of AUC makes the interpretation of the results from

the ROC curve easier. Thus, a perfect classifier will provide

an AUC=1.0 whereas in a random classifier AUC=0.5. In the

graphs of Figure 4, the cut point determines the best

performance the classifier can provide. The AUC computed

from our ROC curves is 0.71. Hence, the AUC is statistically

grater than 0.5 which denotes a fair behavior of the relabeling

process.

(a) (b)

Figure 4. ROC curves for the relabeling process.

In Figure 5, we present the detection success rate per

attack type. As can be seen, in all cases, the relabeling

method increases the classification performance as well as the

detection success rate. Regarding to User to Root (U2R)

attacks, the performance is worse than that obtained for other

attacks. Nevertheless, the number of U2R training patterns on

the NSL-KDD is significantly less than for other attacks [14].

Figure 5. Detection Success rate for different types of attac.

Moreover, Figure 6 summarizes the performance of our

proposal when detecting normal/abnormal traffic. As shown

in this figure, 99.6% of normal traffic patterns and 99.2% of

the attack patterns have been correctly classified.

In Table 1, testing results for previous proposed IDSs

based on SOM and GHSOM are extracted from [14]. As it is

shown in this table, our GHSOM with relabeling probabilies

(RL-GHSOM in Table 1) reaches a high rate of detected

attacks and clearly outperforms the false positive rate

provided by other similar proposals.

Table 1. Basic features of individual TCP connections

IDS

implementation

Detected attacks

(%)

False Positive

(%)

RL-GHSOM 99.68 0.02

GHSOM 99.99 3.72

K-Map 99.63 0.34

SOM 97.31 0.04

Figure 6. Detection Success rate for normal/attack traffic.

Detection success with unit relabeling (black bar) and without

unit relabeling (white bar).

4 Conclusions and future directions

In this paper we present a network intrusion prevention

approach that takes advantage of the discriminating properties

of the GHSOM. Moreover, instead of applying any feature

selection technique over the dataset, the full set of data

features has been used. This circumstance has required to let

the GHSOM grow more than it should be necessary and to

devise a labeling (or relabeling) process for the BMUs that

uses a probability for relabeling success. Acceptable results

have been obtained from an analysis of the effectiveness of

the relabeling process which has been done by using the ROC

curves. The results obtained for the proposed IPS are

promising, since it can detect 99.6% of the normal traffic

patterns and 99.2% of the abnormal ones.

As future work, we will consider a real-time

implementation of the IPS. This could be feasible as we avoid

the need for principal component analysis. With such kind of

implementation, as normal and abnormal behaviors could be

accurately detected on line, it would be possible to perform

some complementary activities, such as IP blocking, in real

time in order to improve the quality of the network intrusion

prevention.

Acknowledgments. This work was supported by

project SAF2010-20558 (Ministerio de Educación, Spain).

References

[1] Ghosh, J., Wanken, J., Charron, F.: Detecting anomalous

and unknown intrusions against programs. Proceedings of the

Annual Computer Security Applications Conference, 1998.

[2] Hoffman, A., Schimitz, C., Sick, B.: Intrussion Detection

in Computer networks with Neural and Fuzzy classifiers.

International Conference on Artificial Neural Networks,

ICANN 2003.

[3] Lichodzijewski, P., Zincir-Heywood, N., Heywood, M.:

Host Based Intrusion Detection Using Self-Organizing Maps.

Proceedings of the IEEE International Joint Conference on

Neural Networks. 2002.

[4] Zhang, C., Jiang, J., Kamel, M.: Intrusion Detection

using hierarchical neural networks. Pattern Recognition

Letters, issue 26 (2005), pp. 779-791.

[5] Kohonen T. Self-Organizing Maps. Springer, third

edition, 2001.

[6] Fisch, D., Hofmann, A., Sick, B.: On the versatility of

radial basis function neural networks: A case study in the field

of intrusion detection. Inf. Sci. 180(12): 2421-2439 (2010),

[7] Rauber, A., Merkl, D., Dittenbach, M.: The Growing

Hierarchical Self-Organizing Map: Explorarory Analysis of

High-Dimensional Data. IEEE Transactions on Neural

Network, Vol. 13, nº6. 2002.

[8] Oh, H., Doh, I., Chae, K.: Attack Classification based on

data mining technique and its application for reliable medical

sensor communication. International Journal Of Science and

Applications. Vol. 6, nº3, pp. 20-32. 2009.

[9] The NSL-KDD dataset. http://iscx.ca/NSL-KDD/

[10] Lakhina, S., Joseph, S., Verma, B.: Feature Reduction

using Principal Component Analysis for Effective Anomaly-

Based Intrusion Detection on NSL-KDD. International

Journal on Engineering Science and Technology. Vol. 2(6),

2010, pp. 1790-1799.

[10] Datti, R., Verma, B.: Feature Reduction for Intrusion

Detection Using Linear Discriminant Analysis. International

Journal on Engineering Science and Technology. Vol. 2(4),

2010, pp. 1072-1078.

[12] Zargar, G.R., Kabiri, P.: Selection of Effective Network

Parameters in Attacks for Intrussion Detection. IEEE

International Conference on Data Mining. 2010.

[13] Mukkamala, S., Sung, A.H.: Feature Ranking and

Selection for Intrusion Detection Systems Using Support

Vector Machines. Proceedings of the Second Digital Forensic

Research Workshop. 2002.

[14] Palomo, E.J., Domínguez, E., Luque, R.M., Muñoz, J.:

Network Security Using Growing Hierarchical Self-

Organizing Maps. International Conference on Adaptive and

Natural Computing Algorithms, ICANNGA 2009.

http://iscx.ca/NSL-KDD/

improving network intrusion detection with growing...

Documents