improving network intrusion detection with growing...
TRANSCRIPT
Improving Network Intrusion Detection with Growing Hierarchical Self-Organizing Maps
Andres Ortiz1
, Julio Ortega2 , Alberto Prieto
2
, Antonio F. Díaz2
1 Communications Engineering Department. University of Malaga.
29004 Malaga, Spain
2 Department of Computer Architecture and Technology. University of Granada.
18060 Granada, Spain
Abstract - Nowadays, the growth of the computer networks
and the expansion of the Internet have made the security to be
a critical issue. In fact, many proposals for Intrusion
Detection/Prevention Systems (IDS/IPS) have been proposed.
These proposals try to avoid that corrupt or anomalous traffic
reaches the user application or the operating system.
Nevertheless, most of the IDS/IPS proposals only distinguish
between normal traffic and anomalous traffic that can be
suspected to be a potential attack. In this paper, we present a
IDS/IPS approach based on Growing Hierarchical Self-
Organizing Maps (GHSOM) which can not only differentiate
between normal and anomalous traffic but also identify
different known attacks. The proposed system has been trained
and tested using the well-known DARPA/NSL-KDD datasets
and the results obtained are promising since we can detect
over 99,4% of the normal traffic and over 99,2 % of attacker
traffic. Moreover, the system can be trained on-line by using
the probability labeling method presented on this paper.
Keywords: IDS, IPS, Attack Classification,
Self-Organizing Maps, Growing Self-
Organizing Maps, SOM relabeling, clustering.
1 Introduction
The interest in computer network security has increased
in recent years, as the trend to on-line services available
through the Internet have exposed a lot of sensitive
information to intruders and attackers [1]. Although there are
several methods to protect the information, there is not any
infallible encryption method and the encryption/decryption
process can impose a high overhead in high speed networks
that use the TCP/IP protocol stack. On the other hand, the
complexity of the newer attacks make necessary the use of
elaborate techniques such as pattern classification or artificial
intelligence techniques for successfully detecting an attack or
just to differentiate among normal and abnormal traffic. IDS
and IPS systems are active systems which implement a
protection by continuously monitoring the network. They
calculate some traffic features to be able to classify it, to
detect abnormal behaviors and to react according to some
predefined rules. There are two design approaches to IDS/IPS
systems [2]. The first one consists on looking for patterns
corresponding with known signatures of intrusions. The
second one searches for abnormal patterns by more complex
features, to discover not only intrusions but also potential
intrusions.
Several neural and machine learning techniques have
been used for implementing IDS/IPS systems [2-6]. In [2],
perceptron-like neural networks are used for traffic
classification as well as fuzzy classifiers to improve the
decision step. In [3, 4], Self-Organizing Maps (SOMs) [5] are
applied to implement unsupervised clustering of the data
instances in order to classify the traffic anomalies according
to several known attack. In order to improve the classification
task either by distinguishing the normal traffic from
anomalies or by classifying the different attacks with high
accuracy, hierarchical SOM have been used in works.
Although the proposal on [3] tries to overcome some of the
difficulties on the static structure of classic SOMs splitting
the SOM into three smaller SOMs, the size of these maps is
still static.
An alternative to avoid the limitations of the classical
SOM is the Growing Hierarchical SOM (GHSOM) [7]. It
presents a dynamic hierarchical structure composed of several
layers, with several SOMs in each layer. The number of
SOMs in each layer and their respective sizes are determined
in the GHSOM training process. In this paper, GHSOM is
used for both, anomaly detection and attack classification.
Moreover, a probability-based mechanism applied to the
previously trained structure is used to label the units when the
GHSOM is applied to new data instances.
After this introduction, the remainder of this paper is
organized as follows. Section 2 shows the features of the
NSL-KDD dataset that, as in most of the IDS/IPS works, has
been used to evaluate our system. Section 3 describes our
proposals for data preprocessing and for applying GHSOM to
attack classification. In Section 4, the results obtained are
shown, and finally, Section 5 provides the conclusions of the
paper and our future work
2 Attack Classification with GHSOM
The first step in IDS and IPS implementation is the
extraction of some significant features from the captured
traffic. The features on the NSL-KDD dataset [8] (the
benchmark set we have used in this work) are split into three
classes: basic features, content-based features, and traffic
features. The main reason for having these three groups of
features is that detecting and identifying some attacks
requires the use of more than just one feature class. For
instance, time-based features are necessary to detect some
attacks, as some statistics should be calculated over a certain
time period. Thus, the first step we perform consists of
parsing the dataset in order to extract the features from the
text files and to build the vectors which comprise the feature
space. This feature space is composed of vectors belonging to
R41
which contains the connection traffic features. Examples
of these features are the duration of the connection, the
protocol type, or the number of user-to-root attempts.
Nevertheless, the feature selection for network-based IDS is
not straightforward. This way, there are works [10, 11, 12]
which use multivariate techniques such as PCA [10] or LDA
[11]. The use of multivariate techniques try to obtain the
components with the highest variability, supposing the rest
are unimportant or just noise. Although good results are given
in [10, 11, 12] other works have shown that the full set of
features outperforms feature selection with PCA/LDA [13].
Moreover, feature reduction has to be applied to each input
vector since the most discriminate component depends on the
specific attack. Thus, in [18] a SOM classifier for attack
detection is presented without reducing the feature space.
However, the authors only use a selection of 28 features from
the 41 available in the NSL-KDD dataset, and the influence
of each feature is figured out through the U-Matrix of each
component belonging to the input (feature) space. Instead, in
this paper we propose the use of a dynamic structure such as
GHSOM and the full set of features.
2.1 SOMs and GHSOMs
SOM is a very useful tool for discovering structures and
similarities in high dimensional data, organizing the
information and visualizing it in a 2D or 3D way. Detailed
descriptions of SOM can be found elsewhere [5].
Nevertheless, SOM is not able to figure out the inherent
hierarchical structure of data [7], and, at the same time, the
performance of SOM depends on the size of the map which
has to be set in advance. Thus, GHSOM [7] is a hierarchical
and non-fixed structure developed to overcome the main
limitations of classical SOM. The structure of GHSOM
consists of multiple layers composed by several independent
SOMs. Hence, during the learning process, the number of the
SOMs on each layer and the size of each SOM are determined
by minimizing the quantization error. Thus an adaptive
growing process is accomplished by using two parameters τ1
and τ2 that respectively control the breadth of each map
(horizontal direction) and the growing of the hierarchy
(vertical direction). Therefore, these two parameters are the
only parameters that have to be set in advance.
In order to determine how far the GHSOM grows [7],
the quantization error of each unit is calculated according to
the Equation 1, where Ci is the set of input vectors mapped to
the i-th unit, xj is the j-th input vector belonging to Ci, and i
is the weight associated to the i-th unit.
(1)
Initially, all the input vectors belong to C0 . This means
all the inputs are used to compute the initial quantization
error, qe0. Then, the quantization errors qei for each neuron
are calculated. Thus, whenever qei< qe0, the i-th neuron is
expanded in a new map on the next level of the hierarchy.
Each new map is trained as an independent SOM, and the
calculation of its BMU (Best Matching Unit) is done by using
the Euclidean distance. Once the new map is trained, the
quantization error of each neuron on this map is computed.
Then, the mean quantization error MQEm of the new map can
be determined. Whenever MQEm<·qeu (qeu is the
quantization error of the unit u on the upper layer), the map
stops growing. The process has been schematized in Figure 1
and it is explained with detail in [9].
Figure 1. BMU calculation on the GHSOM hierarchy
To apply GHSOM to IDS/IPS, it is necessary to encode
some qualitative features included in the NSL-KDD dataset
due to the numeric nature of the GHSOM input vectors. For
example, this is the case of protocol, service and flag features.
The encoding of these features is performed by assigning
numeric values in order t keep a high enough distance among
them for the effectiveness of the SOM classifier. Specifically,
the number 1 and other prime numbers with distances higher
than 6 among them have been chosen. This coding has been
chosen in order to increase the distance among different
features. For example, in the component protocol in the input
vectors, TCP is encoded as 1, UDP as 7, and ICMP as 17. In
order to avoid that some features have more influence than
others, the input vectors have been normalized by subtracting
1 3
2 4
1 3
2 4
1 3
2 4
1 3
2 4
5
6
1 3
2 4
5
6
1 3
2 4
1 3
2 4
Map1
Map2 Map4 Map3
Layer 1
Layer 0
Layer 2
Map7 Map 6 Map 5
7
8
9
10
BMU=Map5, Unit4
the mean and dividing by the standard deviation (zero mean
and unity variance). Thus, each dimension takes a value
between 0 and 1, and the feature space belongs to R41
.
2.2 BMU calculation for GHSOM
In order to calculate the BMU in the GHSOM, we have
to go through all the hierarchy to determine the winning unit
and the map to which it belongs. Thus, an iterative algorithm
has been developed as shown in Figure 1, where an example
of BMU calculation on a three-level GHSOM hierarchy is
considered. After computing the distances between an input
pattern and the weight vectors of the map in layer 0, the
minimum of these distances is determined. Once, the winning
neuron on map 1 is found, since other map could be grown
from this winning neuron, we have to check whether the
wining neuron is a parent unit. This can be accomplished with
the parent vectors resulting from the GHSOM training
process. If a new map arose from the wining neuron, the
BMU on this new map is calculated. This process is repeated
until a BMU with no growing map is found. Thus, the BMU
in the GHSOM is identified inside a map in a layer of the
hierarchy (for example, map 5 and unit 4).
2.3 GHSOM training and relabeling
The GHSOM structure has been trained by using 10% of
the training samples provided by the NSL-KDD dataset.
Then, the system has been tested by using the rest of the
training patterns. As it is commented in Section 2.2, we have
used the full set of features. After several tests using different
values for 1 (to control the breadth of the map) and (to
control the depth), we have selected 1=0.6 and2=10-5
. This
could make the GHSOM to grow more than necessary,
leaving some of the units unlabeled. Thus, the number of
neurons on the output map surrounding the winning neuron
for training data is increased. When an input pattern similar to
one of the training patterns is presented to the GHSOM, the
wining neuron can be labeled or unlabeled. If the wining
neuron is unlabeled, a probability-based scheme is used in
order to determine the label of that neuron. More specifically,
the wining neuron is labeled (or relabeled) with the more
repeated label in its neighborhood by using a probability
calculated according to (2).
( , ) ( )u
u
M uP
n
(2)
In this expression, Pu is the probability for the winning
unit u to be successfully relabeled, where Mσ(u,ε)(u) is the label
that appears more frequently in the Gaussian neighborhood
u,, of the winning neuron, u , and n is the number of
neurons belonging to the neighborhood of u u, . In
this equation, parameter noted the width of the
neighborhood of the winning neuron.
Figure 2. Example of the relabeling process (white units are
unlabeled).
In Figure 2, an example of the relabeling process is
shown. In this Figure, the BMU that has not been initially
labeled, is labeled by using =1 to establish its neighborhood.
In this neighborhood, we found four units labeled as L1, one
unit labeled as L2 and one unit labeled as L3. Then, Pu, the
probability for successful relabeling for this BMU is
4/6=0.66 (66%) (Mσ(u,ε)(u)) =4, n=6). this BMU is 4/6=0.66
(66%) (Mσ(u,ε)(u)) =4, n=6).
3 Experimental results
In this section, we present the experimental results
obtained with the NSL-KDD dataset [9] and the GHSOM
classifier described in Section 2. In Figure 3, the detection
success for each type of attack included in the NSL-KDD
dataset is shown. As this figure shows, the probability-based
labeling process performed with new data (black bar in Figure
2) increases the detection success for most attacks. Moreover
some attacks such as multihop, are not detected before the
unit relabeling process.
In order to show the effectiveness of the relabeling
process due to the associated probability, the ROC (receiver
operating characteristic) curves are shown in Figure 4. They
constitute an effective alternative to evaluate the performance
of a classifier.
Figure 3. Detection success with (black bar) and without (white bar) unit relabeling
Figure 4.a shows the ROC curve (false positive rate) and
Figure 5 the mirrored ROC curve (true negative rate).
Regarding a measure of performance derived from these
curves, we computed the Area Under ROC Curve (AUC).
The use of AUC makes the interpretation of the results from
the ROC curve easier. Thus, a perfect classifier will provide
an AUC=1.0 whereas in a random classifier AUC=0.5. In the
graphs of Figure 4, the cut point determines the best
performance the classifier can provide. The AUC computed
from our ROC curves is 0.71. Hence, the AUC is statistically
grater than 0.5 which denotes a fair behavior of the relabeling
process.
(a) (b)
Figure 4. ROC curves for the relabeling process.
In Figure 5, we present the detection success rate per
attack type. As can be seen, in all cases, the relabeling
method increases the classification performance as well as the
detection success rate. Regarding to User to Root (U2R)
attacks, the performance is worse than that obtained for other
attacks. Nevertheless, the number of U2R training patterns on
the NSL-KDD is significantly less than for other attacks [14].
Figure 5. Detection Success rate for different types of attac.
Moreover, Figure 6 summarizes the performance of our
proposal when detecting normal/abnormal traffic. As shown
in this figure, 99.6% of normal traffic patterns and 99.2% of
the attack patterns have been correctly classified.
In Table 1, testing results for previous proposed IDSs
based on SOM and GHSOM are extracted from [14]. As it is
shown in this table, our GHSOM with relabeling probabilies
(RL-GHSOM in Table 1) reaches a high rate of detected
attacks and clearly outperforms the false positive rate
provided by other similar proposals.
Table 1. Basic features of individual TCP connections
IDS
implementation
Detected attacks
(%)
False Positive
(%)
RL-GHSOM 99.68 0.02
GHSOM 99.99 3.72
K-Map 99.63 0.34
SOM 97.31 0.04
Figure 6. Detection Success rate for normal/attack traffic.
Detection success with unit relabeling (black bar) and without
unit relabeling (white bar).
4 Conclusions and future directions
In this paper we present a network intrusion prevention
approach that takes advantage of the discriminating properties
of the GHSOM. Moreover, instead of applying any feature
selection technique over the dataset, the full set of data
features has been used. This circumstance has required to let
the GHSOM grow more than it should be necessary and to
devise a labeling (or relabeling) process for the BMUs that
uses a probability for relabeling success. Acceptable results
have been obtained from an analysis of the effectiveness of
the relabeling process which has been done by using the ROC
curves. The results obtained for the proposed IPS are
promising, since it can detect 99.6% of the normal traffic
patterns and 99.2% of the abnormal ones.
As future work, we will consider a real-time
implementation of the IPS. This could be feasible as we avoid
the need for principal component analysis. With such kind of
implementation, as normal and abnormal behaviors could be
accurately detected on line, it would be possible to perform
some complementary activities, such as IP blocking, in real
time in order to improve the quality of the network intrusion
prevention.
Acknowledgments. This work was supported by
project SAF2010-20558 (Ministerio de Educación, Spain).
References
[1] Ghosh, J., Wanken, J., Charron, F.: Detecting anomalous
and unknown intrusions against programs. Proceedings of the
Annual Computer Security Applications Conference, 1998.
[2] Hoffman, A., Schimitz, C., Sick, B.: Intrussion Detection
in Computer networks with Neural and Fuzzy classifiers.
International Conference on Artificial Neural Networks,
ICANN 2003.
[3] Lichodzijewski, P., Zincir-Heywood, N., Heywood, M.:
Host Based Intrusion Detection Using Self-Organizing Maps.
Proceedings of the IEEE International Joint Conference on
Neural Networks. 2002.
[4] Zhang, C., Jiang, J., Kamel, M.: Intrusion Detection
using hierarchical neural networks. Pattern Recognition
Letters, issue 26 (2005), pp. 779-791.
[5] Kohonen T. Self-Organizing Maps. Springer, third
edition, 2001.
[6] Fisch, D., Hofmann, A., Sick, B.: On the versatility of
radial basis function neural networks: A case study in the field
of intrusion detection. Inf. Sci. 180(12): 2421-2439 (2010),
[7] Rauber, A., Merkl, D., Dittenbach, M.: The Growing
Hierarchical Self-Organizing Map: Explorarory Analysis of
High-Dimensional Data. IEEE Transactions on Neural
Network, Vol. 13, nº6. 2002.
[8] Oh, H., Doh, I., Chae, K.: Attack Classification based on
data mining technique and its application for reliable medical
sensor communication. International Journal Of Science and
Applications. Vol. 6, nº3, pp. 20-32. 2009.
[9] The NSL-KDD dataset. http://iscx.ca/NSL-KDD/
[10] Lakhina, S., Joseph, S., Verma, B.: Feature Reduction
using Principal Component Analysis for Effective Anomaly-
Based Intrusion Detection on NSL-KDD. International
Journal on Engineering Science and Technology. Vol. 2(6),
2010, pp. 1790-1799.
[10] Datti, R., Verma, B.: Feature Reduction for Intrusion
Detection Using Linear Discriminant Analysis. International
Journal on Engineering Science and Technology. Vol. 2(4),
2010, pp. 1072-1078.
[12] Zargar, G.R., Kabiri, P.: Selection of Effective Network
Parameters in Attacks for Intrussion Detection. IEEE
International Conference on Data Mining. 2010.
[13] Mukkamala, S., Sung, A.H.: Feature Ranking and
Selection for Intrusion Detection Systems Using Support
Vector Machines. Proceedings of the Second Digital Forensic
Research Workshop. 2002.
[14] Palomo, E.J., Domínguez, E., Luque, R.M., Muñoz, J.:
Network Security Using Growing Hierarchical Self-
Organizing Maps. International Conference on Adaptive and
Natural Computing Algorithms, ICANNGA 2009.