improving some artificial immune algorithms for...
TRANSCRIPT
-
MINISTRY OF EDUCATION VIETNAMESE ACADEMY
AND TRAINING OF SCIENCE AND TECHNOLOGY
GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY————————————
NGUYEN VAN TRUONG
IMPROVING SOME ARTIFICIAL IMMUNE
ALGORITHMS FOR NETWORK INTRUSION
DETECTION
THE THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
IN MATHEMATICS
Hanoi - 2019
-
MINISTRY OF EDUCATION VIETNAMESE ACADEMY
AND TRAINING OF SCIENCE AND TECHNOLOGY
GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY————————————
NGUYEN VAN TRUONG
IMPROVING SOME ARTIFICIAL IMMUNE
ALGORITHMS FOR NETWORK INTRUSION
DETECTION
THE THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHYIN MATHEMATICS
Major: Mathematical foundations for InformaticsCode: 62 46 01 10
Scientific supervisor:
1. Assoc. Prof., Dr. Nguyen Xuan Hoai
2. Assoc. Prof., Dr. Luong Chi Mai
Hanoi - 2019
-
Acknowledgments
First of all I would like to thank is my principal supervisor, Assoc. Prof.,
Dr. Nguyen Xuan Hoai for introducing me to the field of Artificial Immune System.
He guides me step by step through research activities such as seminar presentations,
paper writing, etc. His genius has been a constant source of help. I am intrigued
by his constructive criticism throughout my PhD. journey. I wish also to thank my
co-supervisor, Assoc. Prof., Dr. Luong Chi Mai. She is always very enthusiastic in
our discussion promising research questions. It is a pleasure and luxury for me to work
with her. This thesis could not have been possible without my supervisors’ support.
I gratefully acknowledge the support from Institute of Information Technology,
Vietnamese Academy of Science and Technology, and from Thai Nguyen University
of Education. I thank the financial support from the National Foundation for Science
and Technology Development (NAFOSTED), ASEAN-European Academic University
Network (ASEA-UNINET).
I thank M.Sc. Vu Duc Quang, M.Sc. Trinh Van Ha and M.Sc. Pham Dinh
Lam, my co-authors of published papers. I thank Assoc. Prof., Dr. Tran Quang
Anh and Dr. Nguyen Quang Uy for many helpful insights for my research. I thank
colleagues, especially my cool labmate Mr. Nguyen Tran Dinh Long, in IT Research
& Development Center, HaNoi University.
Finally, I thank my family for their endless love and steady support.
-
Certificate of Originality
I hereby declare that this submission is my own work under my scientific super-
visors, Assoc. Prof., Dr. Nguyen Xuan Hoai, and Assoc. Prof., Dr. Luong Chi Mai. I
declare that, it contains no material previously published or written by another person,
except where due reference is made in the text of the thesis. In addition, I certify that
all my co-authors allow me to present our work in this thesis.
Hanoi, 2019
PhD. student
Nguyen Van Truong
-
i
Contents
List of Figures v
List of Tables vii
Notation and Abbreviation viii
INTRODUCTION 1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Problem statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Outline of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1 BACKGROUND 5
1.1 Detection of Network Anomalies . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Host-Based IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.2 Network-Based IDS . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.4 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 A brief overview of human immune system . . . . . . . . . . . . . . . . 8
1.3 AIS for IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 AIS model for IDS . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 AIS features for IDS . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Selection algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.1 Negative Selection Algorithms . . . . . . . . . . . . . . . . . . . 12
-
ii
1.4.2 Positive Selection Algorithms . . . . . . . . . . . . . . . . . . . 15
1.5 Basic terms and definitions . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.1 Strings, substrings and languages . . . . . . . . . . . . . . . . . 16
1.5.2 Prefix trees, prefix DAGs and automata . . . . . . . . . . . . . 17
1.5.3 Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.4 Detection in r-chunk detector-based positive selection . . . . . . 20
1.5.5 Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.6 Performance metrics . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.7 Ring representation of data . . . . . . . . . . . . . . . . . . . . 23
1.5.8 Frequency trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.6 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.6.1 The DARPA-Lincoln datasets . . . . . . . . . . . . . . . . . . . 27
1.6.2 UT dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.6.3 Netflow dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2 COMBINATION OF NEGATIVE SELECTION AND POSITIVE SE-
LECTION 30
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 New Positive-Negative Selection Algorithm . . . . . . . . . . . . . . . . 31
2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3 GENERATION OF COMPACT DETECTOR SET 43
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 New negative selection algorithm . . . . . . . . . . . . . . . . . . . . . 45
-
iii
3.3.1 Detectors set generation under rcbvl matching rule . . . . . . . 45
3.3.2 Detection under rcbvl matching rule . . . . . . . . . . . . . . . 48
3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4 FAST SELECTION ALGORITHMS 51
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 A fast negative selection algorithm based on r-chunk detector . . . . . . 52
4.4 A fast negative selection algorithm based on r-contiguous detector . . . 57
4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5 APPLYING HYBRID ARTIFICIAL IMMUNE SYSTEM FOR NET-
WORK SECURITY 66
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Hybrid positive selection algorithm with chunk detectors . . . . . . . . 69
5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4.2 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4.3 Performance metrics and parameters . . . . . . . . . . . . . . . 72
5.4.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
CONCLUSIONS 78
Contributions of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Published works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
-
iv
BIBLIOGRAPHY 81
-
v
List of Figures
1.1 Classification of anomaly-based intrusion detection methods . . . . . . 7
1.2 Multi-layered protection and elimination architecture . . . . . . . . . . 9
1.3 Multi-layer AIS model for IDS . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Outline of a typical negative selection algorithm. . . . . . . . . . . . . . 13
1.5 Outline of a typical positive selection algorithm. . . . . . . . . . . . . . 15
1.6 Example of a prefix tree and a prefix DAG. . . . . . . . . . . . . . . . . 18
1.7 Existence of holes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8 Negative selections with 3-chunk and 3-contiguous detectors. . . . . . . 23
1.9 A simple ring-based representation (b) of a string (a). . . . . . . . . . . 25
1.10 Frequency trees for all 3-chunk detectors. . . . . . . . . . . . . . . . . . 26
2.1 Binary tree representation of the detectors set generated from S. . . . . 33
2.2 Conversion of a positive tree to a negative one. . . . . . . . . . . . . . . 33
2.3 Diagram of the Detector Generation Algorithm. . . . . . . . . . . . . . 35
2.4 Diagram of the Positive-Negative Selection Algorithm. . . . . . . . . . 37
2.5 One node is reduced in a tree: a compact positive tree has 4 nodes (a)
and its conversion (a negative tree) has 3 node (b). . . . . . . . . . . . 38
2.6 Detection time of NSA and PNSA. . . . . . . . . . . . . . . . . . . . . 40
2.7 Nodes reduction on trees created by PNSA on Netflow dataset. . . . . . 41
2.8 Comparison of nodes reduction on Spambase dataset. . . . . . . . . . . 41
3.1 Diagram of a algorithm to generate perfect rcbvl detectors set. . . . . . 47
4.1 Diagram of the algorithm to generate positive r-chunk detectors set. . . 55
-
vi
4.2 A prefix DAG G and an automaton M . . . . . . . . . . . . . . . . . . 57
4.3 Diagram of the algorithm to generate negative r-contiguous detectors set. 61
4.4 An automaton represents 3-contiguous detectors set. . . . . . . . . . . . 62
4.5 Comparison of ratios of runtime of r-chunk detector-based NSA to run-
time of Chunk-NSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.6 Comparison of ratios of runtime of r-contiguous detector-based NSA to
runtime of Cont-NSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
-
vii
List of Tables
1.1 Performance comparison of NSAs on linear strings and ring strings. . . 24
2.1 Comparison of memory and detection time reductions. . . . . . . . . . 39
2.2 Comparison of nodes generation on Netflow dataset. . . . . . . . . . . . 40
3.1 Data and parameters distribution for experiments and results comparison. 49
4.1 Comparison of our results with the runtimes of previously published
algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Comparison of Chunk-NSA with r-chunk detector-based NSA. . . . . . 63
4.3 Comparison of proposed Cont-NSA with r-contiguous detector-based NSA. 64
5.1 Features for NIDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Distribution of flows and parameters for experiments. . . . . . . . . . . 73
5.3 Comparison between PSA2 and other algorithms. . . . . . . . . . . . . 74
5.4 Comparison between ring string-based PSA2 and linear string-based PSA2. 76
-
viii
Notation and Abbreviation
Notation
` Length of data samples
Sr Set of ring presentations of all strings in S
|X| Cardinality of set XΣ An alphabet, a nonempty and finite set of symbols
Σk Set of all strings of length k on alphabet Σ, where k is a
positive integer.
Σ∗ Set of all strings on alphabet Σ, including an empty string.
r Matching threshold
Dpi Set of all positive r-chunk detectors at position i.
Dni Set of all negative r-chunk detectors at position i.
CHUNKp(S, r) Set of all positive r-chunk detectors.
CHUNK(S, r) Set of all negative r-chunk detectors.
CONT(S, r) Set of all r-contiguous detectors.
L(X) Set of all nonself strings detected by X.
rcbvl r-contiguous bit with variable length.
Abbreviation
AIS Artificial Immune System
ACC Accuracy Rate
ACO Ant Colony Optimization
ANIDS Anomaly Network Intrusion Detection System
BBNN Block-Based Neural Network
Chunk-NSA Chunk Detector-Based Negative Selection Algorithm
Cont-NSA Contiguous Detector-Based Negative Selection Algorithm
DR Detection Rate
DAG Directed Acyclic Graph
FAR False Alarm Rate
GA Genetic Algorithm
HIS Human Immune System
HIDS Host Intrusion Detection System
IDS Intrusion Detection System
-
ix
ML Machine Learning
MLP Multilayer Perceptron
NIDS Network Intrusion Detection System
NS Negative Selection
NSA Negative Selection Algorithm
NSM Negative Selection Mutation
PNSA Positive-Negative Selection Algorithm
PSA Positive Selection Algorithm
PSA2 Two-class Positive Selection Algorithm
PSO Particle Swarm Optimization
PSOGSA Particle Swarm Optimization-Gravitational Search Algorithm
RNSA Real-valued NSA
SVM Support Vector Machines
TCP Transmission Control Protocol
VNSA Variable length detector-based NSA
-
1
INTRODUCTION
Motivation
Internet users and computer networks are suffering from rapidly increasing num-
ber of attacks. In order to keep them safe, there is a need for effective security monitor-
ing systems, such as Intrusion Detection Systems (IDS). However, intrusion detection
has to face a number of different problems such as large network traffic volumes, im-
balanced data distribution, difficulties to realize decision boundaries between normal
and abnormal actions, and a requirement for continuous adaptation to a constantly
changing environment. As a result, many researchers have attempted to use different
types of approaches to build reliable intrusion detection system.
Computational intelligence techniques, known for their ability to adapt and to
exhibit fault tolerance, high computational speed and resilience against noisy informa-
tion, are hopefully alternative methods to the problem.
One of the promising computational intelligence methods for intrusion detection
that have emerged recently are artificial immune systems (AIS) inspired by the biolog-
ical immune system. Negative selection algorithm (NSA), a dominating model of AIS,
is widely used for intrusion detection systems (IDS) [55, 52]. Despite its successful
application, NSA has some weaknesses: 1-High false positive rate (false alarm rate)
and false negative rate, 2-High training and testing time, 3-Exponential relationship
between the size of the training data and the number of detectors possibly generated for
testing, 4-Changeable definitions of ”normal data” and ”abnormal data” in dynamic
network environment [55, 79, 92]. To overcome these limitations, trends of recent works
are to concentrate on complex structures of immune detectors, matching methods and
hybrid NSAs [11, 94, 52].
Following trends mentioned above, in this thesis we investigate the ability of
NSA to combine with other classification methods and propose more effective data
-
2
representations to improve some NSA’s weaknesses.
Scientific meaning of the thesis: to provide further background to improve per-
formance of AIS-based computer security field in particular and IDS in general.
Reality meaning of the thesis: to assist computer security practicers or experts
implement their IDS with new features from AIS origin.
The major contributions of this research are: Propose a new representation of
data for better performance of IDS; Propose a combination of existing algorithms as
well as some statistical approaches in an uniform framework; Propose a complete and
non-redundant detector representation to archive optimal time and memory complex-
ities.
Objectives
Since data representation is one of the factors that affect the training and testing
time, a compact and complete detector generation algorithm is investigated.
The thesis investigates optimal algorithms to generate detector set in AIS. They
help to reduce both training time and detecting time of AIS-based IDSs.
Also, it is regarded to propose and investigate an AIS-based IDS that can
promptly detect attacks, either if they are known or never seen before. The proposed
system makes use of AIS with statistics as analysis methods and flow-based network
traffic as experimental data.
Problem statements
Since the NSA has some limitations as listed in the first section, this thesis
concentrates on three problems:
1. The first problem is to find compact representations of data. Objectives of this
problem’s solution is not only to minimize memory storage but also to reduce
testing time.
2. The second problem is to propose algorithms that can reduce training time
and testing time in compared with all existing related algorithms.
-
3
3. The third problem is to improve detection performance with respect to reduc-
ing false alarm rates while keeping detection rate and accuracy rate as high as
possible.
Solutions of these problems can partly improve first three weaknesses as listed in the
first section. Regarding to the last NSAs’ weakness about changeable definitions of
”normal data” and ”abnormal data” in dynamic network environment, we consider it
as a risk in our proposed algorithm and left it for future work.
Logically, it is impossible to find an optimal algorithm that can both reduce time
and memory complexities and obtain best detection performance. These aspects are
always in conflict with each other. Thus, in each chapter, we will propose algorithms
to solve each problem quite independently.
The intrusion detection problem mentioned in this thesis can be informally
stated as:
Given a finite set S of network flows which labeled with self (normal) or nonself
(abnormal). The objective is to build classifying models on S that can label exactly an
unlabeled network flow s.
Outline of thesis
The first chapter introduces the background knowledge necessary to discuss the
algorithms proposed in following chapters. First, detection of network anomalies is
briefly introduced. Following that, the human immune system, artificial immune sys-
tem, machine learning and their relevance are reviewed and discussed. Then, popular
datasets used for experiments in the thesis are examined. related works.
In Chapter 2, a combination method of selection algorithms is presented. The
proposed technique helps to reduce detectors storage generated in training phase. Test-
ing time, an important measurement in IDS, will also be reduced as a direct consequence
of a smaller memory complexity. Tree structure is used in this chapter (and in Chapter
5) to improve time and memory complexities.
A complete and nonredundant detector set, also called perfect detectors set,
-
4
is necessary to archive acceptable self and nonself coverage of classifiers. A selection
algorithm to generate a perfect detectors set is investigated in Chapter 3. Each detector
in the set is a string concatenated from overlapping classical ones. Different from
approaches in the other chapters, discrete structure of string-based detectors in this
chapter are suitable for detection in distributed environment.
Chapter 4 includes two selection algorithms for fast training phase. The optimal
algorithms can generate a detectors set in linear time with respect to size of training
data. The experiment results and theoretical proof show that proposed algorithms
outperform all existing ones in term of training time. In term of detection time, the
first algorithm and the second one is linear and polynomial, respectively.
Chapter 5 mainly introduces a hybrid approach of positive selection algorithm
with statistics for more effective NIDS. Frequencies of self and nonself data (strings) are
contained in leaves of trees representing detectors. This information plays an important
role in improving performance of the proposed algorithms. The hybrid approach came
as a new positive selection algorithm for two-class classification that can be trained
with samples from both self and nonself data types.
-
5
Chapter 1
BACKGROUND
The human immune system (HIS) has successfully protected our bodies against
attacks from various harmful pathogens, such as bacteria, viruses, and parasites. It
distinguishes pathogens from self-tissue, and further eliminates these pathogens. This
provides a rich source of inspiration for computer security systems, especially intrusion
detection systems [92]. Hence, applying theoretical immunology and observed immune
functions, its principles, and its models to IDS has gradually developed into a new
research field, called artificial immune system (AIS).
How to apply remarkable features of HIS to archive scalable and robust IDS
is considered a researching gap in the field of computer security. In this chapter, we
introduce the background knowledge necessary to discuss the algorithms proposed in
following chapters that can partly fulfill the gap.
Firstly, a brief introduction to network anomaly detection is presented. We
then overview HIS. Next, immune selection algorithms, detectors, performance metrics
and their relevance are reviewed and discussed. Finally, some popular datasets are
examined.
1.1 Detection of Network Anomalies
The idea of intrusion detection is predicated on the belief that an intruder’s
behavior is noticeably different from that of a legitimate user and that many unautho-
rized actions are detectable [65]. Intrusion detection systems (IDSs) are deployed as a
second line of defense along with other preventive security mechanisms, such as user
-
6
authentication and access control. Based on its deployment, an IDS can act either as
a host-based or as a network-based IDS.
1.1.1 Host-Based IDS
A Host-Based IDS (HIDS) monitors and analyzes the internals of a computing
system. A HIDS may detect internal activity such as which program accesses what
resources and attempts illegitimate access, for example, an activity that modifies the
system password database. Similarly, a HIDS may look at the state of a system and
its stored information whether it is in RAM or in the file system or in log files or
elsewhere. Thus, one can think of a HIDS as an agent that monitors whether anything
or anyone internal or external has circumvented the security policy that the operating
system tries to enforce [12].
1.1.2 Network-Based IDS
A Network-Based IDS (NIDS) detects intrusions in network data. Intrusions
typically occur as anomalous patterns. Most techniques model the data in a sequential
fashion and detect anomalous subsequences. The primary reason for these anomalies
is the attacks launched by outside attackers who want to gain unauthorized access to
the network to steal information or to disrupt the network. In a typical setting, a
network is connected to the rest of the world through the Internet. The NIDS reads
all incoming packets or flows, trying to find suspicious patterns. For example, if a
large number of TCP connection requests to a very large number of different ports are
observed within a short time, one could assume that there is someone committing a
port scan at some of the computers in the network. Port scans mostly try to detect
incoming shell codes in the same manner that an ordinary intrusion detection system
does. In addition to inspecting the incoming traffic, a NIDS also provides valuable
information about intrusion from outgoing or local traffic. Some attacks might even be
staged from the inside of a monitored network or network segment; and therefore, not
regarded as incoming traffic at all. The data available for intrusion detection systems
can be at different levels of granularity, like packet level traces or Cisco netflow data.
-
7
The data is high dimensional, typically, with a mix of categorical as well as continuous
numeric attributes. Misuse-based NIDSs attempt to search for known intrusive patterns
while an anomaly-based intrusion detector searches for unusual patterns. Today, the
intrusion detection research is mostly concentrated on anomaly-based network intrusion
detection because it can detect both known and unknown attacks [12].
1.1.3 Methods
On the basis of the availability of prior knowledge, the detection mechanism
used, the mode of performance and the ability to detect attacks, existing anomaly
detection methods are categorized into six broad categories [41] as shown in Fig. 1.1.
This figure is adapted from [12].
AnomalyDetection
SupervisedLearning
Learning
Learning
Unsupervised
Probabilistic
SoftComputing
Knowledgebased
CombinationLearners
Parametric
Non-ParametricClustering
AssociationMining
Outlier mining
ANN based
Rough Set based
Fuzzy Logic
GA based & Ant Colony
Artificial Immune System
Ensemble based
Fusion based
Hybrid
Rule based & ExpertSystem based
Ontology & Logic based
Figure 1.1: Classification of anomaly-based intrusion detection methods
AIS is a fairly new research subfield of Computational intelligence. It was
considered as a system that acts intelligently: What it does is appropriate for its
circumstances and its goal; it is flexible to changing environments and changing goals;
it learns from experience; also it makes appropriate choices given perceptual limitations
and finite computation [68].
-
8
1.1.4 Tools
IDS tools are used for purposes such as information gathering, victim identi-
fication, packet capture, network traffic analysis and visualization of traffic behavior.
These tools for both commercial and free purposes can be examplified, such as Snort,
Suricata, Bro, OSSEC, Samhain, Cisco Secure IDS, CyberCop, and RealSecure. Some
immune-related IDS tools including LISYS [10], which is based on TCP packages, and
MILA [26], a multilevel immune learning algorithm proposed for novel pattern recog-
nition.
However, despite their initially promising and influential properties, immune-
based IDSs never made it beyond the prototype stage [83]. Two main issues that
impeded the progress of immune algorithms were identified: large computational cost
to achieve acceptable coverage of the potentially anomalous region [54], and the failure
of these algorithms to generalize properly beyond the training set [79].
1.2 A brief overview of human immune system
Mainly being inspired by the human immune system, researchers have devel-
oped AISs intellectually and innovatively. Physical barriers, physiological barriers, an
innate immune system, and an adaptive immune system are main factors of a multi-
layered protection architecture included in our human immune system; among which,
the adaptive immune system being capable of adaptively recognizing specific types of
pathogens, and memorizing them for accelerated future responses is a complex of a
variety of molecules, cells, and organs spread all over the body [46]. Pathogens are for-
eign substances like viruses, parasites and bacteria which attack the body. Figure 1.2,
adapted from [77], presents a multi-layered protection and elimination architecture.
T cells and B cells cooperate to distinguish self from nonself. On the one hand,
T cells recognize antigens with the help of major histocompatibility complex (MHC)
molecules. Antigen presenting cells ingest and fragment antigens to peptides. MHC
molecules transport these peptides to the surface of antigen presenting cells. T cells,
whose receptors bind with these peptide-MHC combinations, are said to recognize
-
9
Figure 1.2: Multi-layered protection and elimination architecture
antigens. On the other hand, B cells recognize antigens by binding their receptors
directly to antigens. The bindings actually are chemical bonds between receptors and
epitopes. The more complementary the structure and the charge between receptors and
epitopes are, the more likely binding will occur. The strength of the bond is termed
affinity. To avoid autoimmunity, T cells and B cells must pass a negative selection
stage, where lymphocytes matching self cells are killed.
Prior to negative selection, T cells undergo positive selection. This is because in
order to bind to the peptide-MHC combinations, they must recognize self MHC first.
Thus, the positive selection will eliminate T cells with weak bonds to self MHC. T cells
and B cells, which survive the negative selection, become mature, and enter the blood
stream to perform the detection task. Since these mature lymphocytes have never
encountered antigens, they are naive. Naive T cells and B cells can possibly auto-react
with self cells, because some peripheral self proteins are never presented during the
negative selection stage. To prevent self-attack, naive cells need two signals in order
to be activated: one occurs when they bind to antigens, and the other is from other
sources as a confirmation. Naive T helper cells receive the second signal from innate
system cells. In the event that they are activated, T cells begin to clone. Some of
the clones will send out signals to stimulate macrophages or cytotoxic T cells to kill
antigens, or send out signals to activate B cells. Others will form memory T cells. The
activated B cells migrate to a lymph node. In the lymph node, a B cell will clone itself.
-
10
Meanwhile, somatic hyper mutation is triggered, whose rate is 10 times higher than
that of the germ line mutation, and is inversely proportional to the affinity. Mutation
changes the receptor structures of offspring; hence offspring have to bind to pathogenic
epitopes captured within the lymph nodes. If they do not bind, they will simply die
after a short time. Whereas, in case they succeed in binding, they will leave the lymph
node and differentiate into plasma or memory B cells.
In summary, the HIS is a distributed, self-organizing and lightweight defense
system for the body. These remarkable features fulfill and benefit the design goals of
an intrusion detection system, thus resulting in a scalable and robust system [53].
1.3 AIS for IDS
1.3.1 AIS model for IDS
Figure 1.3 illustrates the steps necessary to obtain an AIS solution for a secu-
rity problem, as firstly envisioned by de Castro and Timmis [27] and latter adopted
by Fernandes et al. [35]. Firstly, the security domain of the system to model needs
to be identified. Secondly,the immune entities that best fit the needs of the system
should be picked from the immunological theories. That should ease pointing out the
representation of the entities. In the step of the affinity measures one should take into
account a matching rule that outputs if two elements should bind.
Figure 1.3: Multi-layer AIS model for IDS
-
11
1.3.2 AIS features for IDS
According to Kim et al. [55], AIS features can be illustrated and summarized
as follows.
Firstly, a distributed IDS supports robustness, configurability, extendibility and
scalability. It is robust since the failure of one local intrusion detection process does
not cripple the overall IDS. It is also easy to configure a system since each intrusion
detection process can be simply tailored for the local requirements of a specific host.
The addition of new intrusion detection processes running on different operating sys-
tems does not require modification of existing processes and hence it is extensible. It
can also scale better, since the high volume of audit data is distributed amongst many
local hosts and is analyzed by those hosts.
Secondly, a self-organizing IDS provides adaptability and global analysis. With-
out external management or maintenance, a self organizing IDS automatically detects
intrusion signatures which are previously unknown and/or distributed, and eliminates
and/or repairs compromised components. Such a system is highly adaptive because
there is no need for manual updates of its intrusion signatures as network environments
change. Global analysis emerges from the interactions among a large number of varied
intrusion detection processes.
Next, a lightweight IDS supports efficiency and dynamic features. A lightweight
IDS does not impose a large overhead on a system or place a heavy burden on CPU
and I/O. It places minimal work on each component of the IDS. The primary functions
of hosts and networks are not adversely affected by the monitoring. It also dynami-
cally covers intrusion and non-intrusion pattern spaces at any given time rather than
maintaining entire intrusion 8 and non-intrusion patterns.
One more important feature is a multi-layered IDS which increases robustness.
The failure of one-layer defense does not necessarily allow an entire system to be
compromised. While a distributed IDS allocates intrusion detection processes across
several hosts, a multi-layered IDS places different levels of sensors at one monitoring
place.
Additionally, a diverse IDS provides robustness. A variety of different intrusion
-
12
detection processes spread across hosts will slow an attack that has successfully com-
promised one or more hosts. This is because an understanding of the intrusion process
at one site provides limited or no information on intrusion processes at other sites.
Finally, it is a disposable IDS that increases robustness, extendibility and config-
urability. A disposable IDS does not depend on any single component. Any component
can be easily and automatically replaced with other components. These properties are
important in an effective IDS, as well as being established properties of the HIS.
1.4 Selection algorithms
The main developments within AIS have focussed on three immunological the-
ories: clonal selections, immune networks and negative selections. Negative selection
approaches are based on self-nonself discrimination in biology system. This property
makes it attractive for computer and network security researchers. A survey by G. C.
Silva and D. Dasgupta in [71] showed that in five-year period 2008-2013, NSA predom-
inated all the other models of AIS in term of published papers relating to both network
security and anomaly detection. This trend triggers for much of the research work in
this thesis.
A model of AIS, positive selection algorithm (PSA), is also investigated. Under
some conditions, we will prove in a follow section that PSA is adequate to NSA in term
of anomaly detection performance.
1.4.1 Negative Selection Algorithms
Negative selection is a mechanism employed to protect the body against self-
reactive lymphocytes. Such lymphocytes can occur because the building blocks of
antibodies are different gene segments that are randomly composed and undergo a fur-
ther somatic hypermutation process. Therefore, this process can produce lymphocytes
which are able to recognise self-antigens [85].
NSAs are among the most popular and extensively studied techniques in ar-
tificial immune systems that simulate the negative selection process of the biological
immune system. Stephanie Forrest et al. [38] proposed an algorithmic model of this
-
13
process, which can be considered as a classifier that learns from only self samples
(negative examples)1.
A typical NSA comprises of two phases: detector generation and detection [7,
50]. In the detector generation phase (Fig. 1.4.a), the detector candidates are generated
by some random processes and censored by matching them against given self samples
taken from a set S (representing the system components). The candidates that match
any element of S are eliminated and the rest are kept and stored in the detector
set D. In the detection phase (Fig. 1.4.b), the collection of detectors are used to
distinguish self (system components) from nonself (outlier, anomaly, etc.). If incoming
data instance matches any detector, it is claimed as nonself or anomaly. Figure 1.4 is
adapted from [38].
End
Input new samples
Match any detector?
Self
Nonself
Yes
No
Begin
End
Generate Random Candidates
Match self samples?
Accept as new detector
Yes
NoEnough detectors?
Yes
Begin
No
(a) Generation of detector set (b) Detection of new instances
Figure 1.4: Outline of a typical negative selection algorithm.
Concept matching or recognition, are used both in the detector generation phase
and in the anomaly detection phase. Regardless of representation, a matching rule on
a detector d and a data sample s can be informally defined as a distance measure
between d and s within a threshold. Matching threshold exposes the concept of partial
matching: two points do not have to be exactly the same to be considered matching.
1At the writing time of this thesis, the paper has been cited more than 2300 times
-
14
A partial matching rule can support an approximation or a generalization in the algo-
rithms. The choice of the matching rule or the threshold in a matching rule must be
application specific and representation dependent [51]. For real-valued representation,
some popular rules are Euclidean distance and Manhattan distance. In string represen-
tation, rcb(r-contiguous bits) matching rule and r-chunk matching rule are the most
famous ones and they are formally presented in following section.
Since its introduction, NSA has had many applications such as in computer virus
detection [37, 5], monitoring UNIX processes [36], anomaly detection [22, 26], intrusion
detection [19, 54, 46, 59, 18, 93], scheduling [64], fault detection and diagnosis [45, 72],
negative database [33, 98], negative authentication [25, 20]. Moreover, NSA has been
quite successfully applied in immunology where they are used as models to provide
insight into fundamental principles of immunity and infection [15], and to illustrate
the immunological processes such as HIV infection [56, 57].
The most significant characteristics of a NSA making its uniqueness and strength
are:
• No prior knowledge of nonself is required [29].
• It is inherently distributable; no communication between detectors is needed [30].
• It can hide the self concept [33].
• Compared with other change detection methods, NSAs do not depend on the
knowledge of defined normal. Consequently, checking activity of each site can
be based on a unique signature of each while the same algorithm is used over
multiple sites.
• The quality of the check can be traded off against the cost of performing a check
[38].
• Symmetric protection is provided so the malicious manipulation on detector set
can be detected by normal behavior of the system [38].
• If the process of generating detectors is costly, it can be distributed to multiple
sites because of its inherent parallel characteristics.
-
15
• Detection is tunable to balance between coverage (matching probability) and the
number of detectors [29].
1.4.2 Positive Selection Algorithms
Contrary to NSAs, PSAs have been less studied in the literature. PSAs are
mainly developed and applied in intrusion detection [23, 73, 44, 66], malware detec-
tion [39], spam detection [81], and classification [40, 67]. Stibor et al. [80] argues that
positive selection might have better detection performance than negative one. How-
ever, for problems and applications that the number of detectors generated by NSAs is
much less than that of self samples, negative selection is obviously a better choice [51].
Similar to NSA, a PSA contains two phases: detector generation and detection.
In the detector generation phase (Fig. 1.5.a), the detector candidates are generated by
some random processes and matched against the given self sample set S. The candi-
dates that do not match any element in S are eliminated and the rest are kept and
stored in the detector set D. In the detection phase (Fig. 1.5.b), the collection of de-
tectors are used to distinguish self from nonself. If incoming data instance matches any
detector, it is claimed as self. In other words, detectors modeling involves generating a
End
Input new samples
Match any detector?
Self
Nonself
Yes
No
Begin
End
Generate Random Candidates
Match self samples?
Accept as new detector
Yes
NoEnough detectors?
Yes
Begin
No
(a) Generation of detector set (b) Detection of new instances
Figure 1.5: Outline of a typical positive selection algorithm.
-
16
set of strings (patterns) that do not match any string in a training dataset too strongly
(negative selection) or weakly match at least one string from the same dataset (positive
selection). Having obtained the detectors, one usually examines a set of testing dataset
(i.e., ”antigens”), for which we search one or all matching detectors for classification.
1.5 Basic terms and definitions
In selection algorithms, an essential component is the matching rule which de-
termines the similarity between detectors and self samples (in the detector generation
phase) and coming data instances (in the detection phase). Obviously, the matching
rule is dependent on detector representation. In this thesis, both self and nonself cells
are represented as strings of fixed length. This representation is a simple and popular
representation for detectors and data in AIS, and other representations (such as real
valued) could be reduced to binary, a special case of string [42, 51].
1.5.1 Strings, substrings and languages
An alphabet Σ is nonempty and finite set of symbols. A string s ∈ Σ∗ is a
sequence of symbols from Σ, and its length is denoted by |s|. A string is called empty
string if its length equals 0. Given an index i ∈ {1, . . . , |s|}, then s[i] is the symbol
at position i in s. Given two indices i and j, whenever j ≥ i, then s[i . . . j] is the
substring of s with length j − i + 1 that starts at position i and whenever j < i, then
s[i . . . j] is the empty string. If i = 1, then s[i . . . j] is a prefix of s and, if j = |s|,
then s[i . . . j] is a suffix of s. For a proper prefix or suffix s′ of s, we have in addition
|s′| < |s|. Given a string s ∈ Σ`, another string d ∈ Σr with 1 ≤ r ≤ `, and an index
i ∈ {1, . . . , ` − r + 1}, we say that d occurs in s at position i if s[i . . . i + r − 1] = d.
Moreover, concatenation of two strings s and s′ is s + s′.
A set of strings S ⊆ Σ∗ is called a language. For two indices i and j, we define
S[i . . . j] = {s[i . . . j]|s ∈ S}.
-
17
1.5.2 Prefix trees, prefix DAGs and automata
A prefix tree T is a rooted directed tree with edge labels from Σ where for all
c ∈ Σ, every node has at most one outgoing edge labeled with c. For a string s, we write
s ∈ T if there is a path from the root of T to a leaf such that s is the concatenation
of the labels on this path. The language L(T ) described by T is defined as the set of
all strings that have a nonempty prefix s ∈ T . For example, for T as in Fig. 1.6.a we
have 0 ∈ T and 10 ∈ T , but 1 6∈ T . Furthermore, 0 ∈ L(T ), 01 ∈ L(T ) since 0 ∈ T and
11 6∈ L(T ) since no prefix of 11 lies in T . Trees for self dataset and nonself dataset are
called positive trees and negative trees, respectively.
A prefix DAG D is a directed acyclic graph with edge labels from Σ, where
again for all c ∈ Σ, every node has at most one outgoing edge labeled with c. Similar
to prefix trees, the terms root and leaf used to refer to a node without incoming and
outgoing edges, respectively. We write s ∈ D if there is a path from a root node to
a leaf node in D that is labeled by s. Given n ∈ D, the language L(D,n) contains
all strings that have a nonempty prefix that labels a path from n to some leaf. For
instance, if D is the DAG in Fig. 1.6.b and n is its lower left node, then L(D,n) consists
of all strings starting with 11. Moreover, we define L(D) = ∪n is a root of DL(D,n).
A finite automaton is a tuple M = (Q, q0, Qa,Σ,∆), where Q is a set of states
with a distinguished initial state q0 ∈ Q,Qa ⊆ Q the set of accepting states, Σ the
alphabet of M , and ∆ ⊆ Q× Σ×Q the transition relation. Furthermore, we assume
that the transition relation is unambiguous: for every q ∈ Q and every c ∈ Σ there is
at most one q′ ∈ Q with (q, c, q′) ∈ ∆. It is common to represent the transition relation
as a graph with node set Q (with the initial state and the accepting states highlighted
properly) and labeled edges (a c-labeled edge from q to q′ if q′ ∈ Q with (q, c, q′) ∈ ∆).
An automaton M is said to accept a string s if its graph contains a path from q0 to
some q ∈ Qa whose concatenated edge labels equal s (note that this path may contain
loops). The language L(M) contains all strings accepted by M .
A prefix DAG can be turned into a finite automaton to decide the membership
of strings in languages. The details steps of this process is presented in Chapter 4.
-
18
01
0
1 0
1
0
1a. b.
Figure 1.6: Example of a prefix tree and a prefix DAG.
1.5.3 Detectors
In PSAs and NSAs, an essential component is the matching rule which deter-
mines the similarity between detectors and self samples (in the detector generation
phase) and coming data instances (in the detection phase). Obviously, the matching
rule is dependent on detector representation. For string based AIS, the r-chunk and
r-contiguous detectors are among the most common matching rules. A r-chunk match-
ing rule can be seen as a generalisation of the r-contiguous matching rule, which helps
AIS to achieve better results on data where adjacent regions of the input data sequence
are not necessarily semantically correlated, such as in network data packets [9].
An important difference between rcb and r-chunk matching rules is holes, or the
undetectable strings, that they may induce. This concept is presented in Section 1.5.5.
Given a nonempty alphabet Σ and finite set of symbols, positive and negative
r-chunk detectors, r-contiguous detectors, rcbvl-detectors could be defined as follows:
Definition 1.1 (Positive r-chunk detectors). Given a self set S ⊆ Σ`, a tuple (d, i) of
a string d ∈ Σr, where r ≤ `, and an index i ∈ {1, ..., ` − r + 1} is a positive r-chunk
detector if there exists a s ∈ S such that d occurs in s.
Definition 1.2 (Negative r-chunk detectors). Given a self set S ⊆ Σ`, a tuple (d, i) of
a string d ∈ Σr, r ≤ `, and an index i ∈ {1, ..., `− r+ 1} is a negative r-chunk detector
if d does not occurs any s ∈ S.
Although some proposed approaches in following chapters can be implemented
on any finite alphabet, binary strings used in all examples are binary, Σ = {0, 1}, just
for easy understanding.
Example 1.1. Let ` = 6, r = 3. Given a set S of five self strings: s1 = 010101, s2
= 111010, s3 = 101101, s4 = 100011, s5 = 010111. The set of some positive r-chunk
-
19
detectors is {(010,1), (111,1), (101,2), (110,2), (010,3), (101,3), (101,4), (010,4),
(111,4))}. The set of some negative r-chunk detectors is {(000,1), (001,1), (011,1),
(001,2), (010,2), (100,2), (000,3), (100,3), (000,4), (001,4), (100,4)}.
Definition 1.3. Given a self set S ⊆ Σ`, a string d ∈ Σ` is a r-contiguous detector if
d[i, . . . , i + r − 1] does not occurs any s ∈ S for all i ∈ {1, ..., `− r + 1}.
Example 1.2. Let ` = 5, r = 3. Given a set of 7 self strings S = {01111, 00111,
10000, 10001, 10010, 10110, 11111}. The set of all 3-contiguous detectors is {01011,
11011}. This example is adapted from [32].
We also use the following notations:
• Dpi = {(d, i)|(d, i) is a positive r-chunk detector} is set of all positive r-chunk
detectors at position i, i = 1, . . . , `− r + 1.
• Dni = {(d, i)|(d, i) is a negative r-chunk detector} is set of all negative r-chunk
detectors at position i, i = 1, . . . , `− r + 1.
• CHUNKp(S, r) = ∪`−r+1i=1 Dpi is set of all positive r-chunk detectors.
• CHUNK(S, r) = ∪`−r+1i=1 Dni is set of all negative r-chunk detectors.
• CONT(S, r) is the set of all r-contiguous detectors that do not match any string
in S.
• For a given detectors set X, L(X) is the set of all nonself strings detected by X.
We also say that Σ` \ L(X) is the set of all self strings detected by X.
Example 1.3. Let ` = 5, matching threshold r = 3. Suppose that we have the set S
of six self strings s1 = 00000, s2 = 00010, s3 = 10110, s4 = 10111, s5 = 11000, s6 =
11010. Dp1 = {(000,1), (101,1), (110,1)} (Dp1 is set of all left most substrings of
length r of all s ∈ S), Dn1 = {(001,1), (010,1), (011,1), (100,1), (111,1)}, Dp2 =
{(000,2), (001,2), (011,2), (100,2), (101,2)}, Dn2 = {(010,2), (110,2), (111,2)}, Dp3
= {(000,3), (010,3), (110,3), (111,3)}, Dn3 = {(001,3), (011,3), (100,3), (101,3)}
(note that Dpi ∪Dni = Σ3, i = 1, 2, 3).
-
20
The self space covered by the set of CHUNKp(S, 3) is {0, 1}5\ L(CHUNKp(S, 3))
= {00000, 00001, 00010, 00011, 00110, 00111, 01000, 01001, 01010, 01011, 01110,
01111, 10000, 10001, 10010, 10011, 10100, 10101, 10110, 10111, 11000, 11001, 11010,
11011, 11110, 11111}. Set of all strings detected by CHUNK(S,3) is L(CHUNK(S,3))
= {00001, 00011, 00100, 00101, 00110, 00111, 01000, 01001, 01010, 01011, 01100,
01101, 01110, 01111, 10000, 10001, 10010, 10011, 10100, 10101, 11001, 11011, 11100,
11101, 11110, 11111}.
Definition 1.4. Given a self set S ⊆ Σ`. A triple (d, i, j) of a string d ∈ Σk, 1 ≤ k ≤ `,
an index i ∈ {1, ..., ` − r + 1} and an index j ∈ {i, ..., ` − r + 1} is called a negative
detector under rcbvl matching rule if d does not occur in any s, s ∈ S.
In other words, a triple (d, i, j) is a rcbvl detector if there exist (j − i + 1) r-chunk
detectors (d1, i),..., (dj−i+1, j) that dk, dk+1 are two (r−1)-symbol overlapping strings,
k = 1, ..., j − i.
Example 1.4. Given `, r and the set S of self strings as in Example 1.1, S = {010101,
111010, 101101, 100011, 010111}. Triple (0001,1,2) is a rcbvl detector because there
exist two 3-chunk detectors (000,1), (001,2) that 000 and 001 are two 2-bit overlapping
strings. A detectors set under rcbvl matching rule contains 5 variable length detec-
tors {(0001,1,2), (00100,1,4), (100,4,4), (011110,1,4), (11000,1,3)}. It is a minimum
detectors set (23 bits) that covers all detector space of r-chunk detectors set in Exam-
ple 1.1 (45 bits).
Matching threshold r plays an important role in selection algorithms. The
value of r can be used to balance between underfitting and overfitting. Our proposed
methods in Chapter 5 investigate this value in combination with simple statistics for
better detection performance.
1.5.4 Detection in r-chunk detector-based positive selection
It could be seen from Example 1.3 that L(CHUNKp(S, r)) = {00100, 00101,
01100, 01101, 11100, 11101} 6= L(CHUNK(S, r)), so the detection coverage of Dn
is not the same as that of Dp. This is undesirable for the combination of PSA and
-
21
NSA. Hence, to combine PSA and NSA in a unified framework, we have to change the
original semantic of positive selection in the detection phase as follows.
Definition 1.5 (Detection in positive selection). If new instance matches ` − r + 1
positive r-chunk detectors (dij , i), i = 1, . . . , `− r + 1, it is claimed as self, otherwise it
is claimed as nonself.
With this new detection semantic, the following theorem on the equivalence of
detection coverage of r-chunk type PSA and NSA could be stated.
Theorem 1.1 (Detection Coverage). The detection coverage of positive and negative
selection algorithms coincide.
L(CHUNKp(S, r)) = L(CHUNK(S, r)) (1.1)
Proof. From the description of NSAs (see Fig. 1.4), if a new data instance matches
a negative r-chunk detector, then it is claimed as nonself, otherwise it is claimed as
self. Obviously, it is dual to the detection of new data instances in positive selection
as given in Definition 1.5.
This theorem lays the foundation for our novel Positive-Negative Selection Al-
gorithm (PNSA) proposed in next chapter.
1.5.5 Holes
The generalization performed by selection algorithms corresponds to the strings
that are labeled self even though they do not occur in the self sample S. These strings
are called holes. There are two types of holes: crossover holes and length-limit holes in
rcb matching rules literature. A crossover hole is a crossover of certain self strings, in
which all substrings of length r occur in self strings. A length-limit hole is one that has
at least one substring of length r that exists in CHUNK(S, r). The r-chunk matching
rule eliminates the problem of length-limit holes.
Fig. 1.7 illustrates the existence of holes in a self and a nonself space comprised
by self and detector strings. The string universe Σ` is a squared region. Each dark
circle represents a detector and a grid shape in the middle is self. The universe is
-
22
Figure 1.7: Existence of holes.
classified by the detector set as self (grid region and holes - white region) and nonself
(dark region covered by all circles).
In fact, holes are not a ”problem”, but as pointed out by Stibor et al. [78], they
are a necessary property of the selection algorithms. Without holes, the algorithms
would do nothing but memorize the training data for classification naively.
Fig. 1.8 shows a set of seven self strings as in Example 1.2, S ⊂ {0, 1}5 (left)
along with CHUNK(S, 3) (middle) and CONT(S, 3) (right). For both detector types,
the induced bipartitionings of the shape space {0, 1}5 are illustrated with strings that
are classified as nonself having a gray background and strings that are classified as self
having a white background. Bold strings are members of the self-set. Holes are the
strings that are classified as self but do not occur in the self-set S (non-bold, non-shaded
strings). This figure is adapted from [32].
1.5.6 Performance metrics
We used three metrics to evaluate each machine learning technique. Detection
rate (DR), Accuracy rate (ACC) and False alarm rate (FAR) are defined as:
DR =TP
TP + FN(1.2)
ACC =TP + TN
TP + FN + FP + FN(1.3)
FAR =FP
FP + TN(1.4)
-
23
Figure 1.8: Negative selections with 3-chunk and 3-contiguous detectors.
Where TP (True Positive) is the number of true positives (correctly classified
as nonself), TN (True Negative) is the number of true negatives (correctly classified
as self), TP (False Positive) is the number of false positives (classified as nonself,
actually self), and FN (False Negative) is the number of false negatives (classified as
self, actually nonself).
We used 10-fold cross-validation technique and holdout one to evaluate our ap-
proaches in experiments. Regarding the former, the dataset was randomly partitioned
into 10 subsets. Of the 10 subsets, a single subset was retained for testing, and the
others were used as training data. The process was then repeated 10 times, with each
of the 10 subset used exactly once as the testing data. The 10 results from the folds
were then averaged to produce a single performance. Regarding the later, dataset is
split into two groups: training set used to train the classifier and test set (or hold out
set) used to estimate the performance of classifier.
1.5.7 Ring representation of data
As we known, most of AIS-based applications use two types of data representa-
tion: string and real-valued vector. For both popular types, representations are linear
structure of symbols or numbers. They may omit information at the edges (the begin
-
24
and the end) of these structures. For example, to detect a new instance s as self or
nonself in detection phases of PSA and NSA, it takes `− r + 1 steps in the worst case
to match s again each detector at positions 1,...,`− r+ 1. Therefore, for given detector
d, the symbols d[1] and s[1] are used in only one match, the symbols d[2] and s[2] are
used in two matches, etc. In the other words, all positions in linear structures are not
equal in term of matching times.
Our earlier experimental implementation of NSA on binary ring-based strings
produces a trigger to solve this problem. A set of 50,000 random self strings and a set of
10,000 random nonself strings with the same length of 50 were used in the experiment.
Table 1.1 shows the experimental results on values of r ranging from 10 to 16 under
10 cross-validation technique. The results show that both detection rate and accuracy
rate of ring-based NSA are higher than that of the linear-based one, while false alarm
rates are relatively similar. Accordingly, we could use ring structures instead of linear
ones for more exact classification.
Table 1.1: Performance comparison of NSAs on linear strings and ring strings.
rNAS on linear strings NSA on ring stringsACC DR FAR ACC DR FAR
10 0.8343 0.0102 0.0008 0.8345 0.0123 0.001011 0.8380 0.0535 0.0051 0.8390 0.0655 0.006312 0.8488 0.1817 0.0177 0.8522 0.2193 0.021213 0.8677 0.4054 0.0399 0.8723 0.4704 0.046514 0.8875 0.6456 0.0640 0.8932 0.7184 0.071915 0.9023 0.8293 0.0829 0.9075 0.8888 0.088816 0.9112 0.9340 0.0934 0.9139 0.9662 0.0964
With reference to string-based detectors set, a simple technique for this approach
is to concatenate each string representing a detector with its first k symbols. Each new
linear string is a ring representation of its original binary one. Fig. 1.9 shows a ring
representation (b) of its original string (a) with k = 3.
Given a set of strings S ⊂ Σ`, a set Sr ⊂ Σ`+r−1 includes ring representations
of all strings in S by concatenating each string s ∈ S with its first r − 1 symbols.
Note that we can easily apply the idea of ring strings for other data representa-
-
25
00101001110010010100111
(a) (b)
Figure 1.9: A simple ring-based representation (b) of a string (a).
tions in AIS. One way to do this, for instance, is to create ring representations of other
structures such as trees, and automata, from set Sr instead of S as usual.
1.5.8 Frequency trees
Given a set D of length-equaled strings, a tree T on D, noted TD, is a rooted
directed tree with edge labels from Σ where for all c ∈ Σ, every node has at most one
outgoing edge labeled with c. For a string s, we write s ∈ T if there is a path from the
root of T to a leaf such that s is the concatenation of the labels on this path. Each
leaf is associated with an integer number, that is frequency of string s ∈ D and s is
the concatenation of the labels on the path ending by this leaf. This tree structure is
a compact representation of r-chunk detectors in our algorithm in Chapter 5.
Example 1.5. Let ` = 5 matching threshold r = 3. Suppose that we have the
set S of four strings: s1 = 00000, s2 = 10110, s3 = 10111, s4 = 11111. Sr =
{0000000, 1011010, 1011110, 1111111}. S1 = {(000,1), (101,1), (111,1)}, S2 =
{(000,2), (011,2), (111,2)}, S3 = {(000,3), (110,3), (111,3)}, S4= {(000,4), (101,4),
(111,4)}, S5 = {(000,5), (010,5), (110,5), (111,5)}.
Assuming that S = {N,A}, where set of normal data N = {s1, s2} and set of
abnormal data A = {s3, s4}.
Nr = {0000000, 1011010} (ring representations of all strings in N). N1 =
{(000,1), (101,1)}, N2 = {(000,2), (011,2)}, N3 = {(000,3), (110,3)}, N4= {(000,4),
(101,4)}, N5 = {(000,5), (010,5)}.
Ar = {1011110, 1111111}. A1 = {(101,1), (111,1)}, A2 = {(011,2), (111,2)},
A3 = {(111,3)}, A4= {(111,4)}, A5 = {(110,5), (111,5)}.
Ten trees presenting all 3-chunk detectors are in Fig. 1.10. Five trees TNi (TAi),
i = 1, . . . , 5, are in the first (second) row, from left to right, respectively.
Call to mind that there are some strings that belong to both positive trees and
-
26
1
0
0
0
0
0
0
01
1
0
0
01
0
0
0
1
0
0
0
11 1
1
1
1
1
1
0
1
1
0
1
1
1
01
1
1
1
2
1
1
1
2
1
1
1
1
1
1
1
1
1
1
1
1
0
11
1
1
1
0
0
1
Figure 1.10: Frequency trees for all 3-chunk detectors.
negative trees. A substring of s2, s2[1 . . . 3] = 101, for example, s2[1 . . . 3] ∈ TN1 and
s2[1 . . . 3] ∈ TA1 . This situation could lead to error rates in detection phase. Therefore,
frequencies of matches will be used to improve detection performance of algorithms.
Detailed technique of using the frequencies is presented in Chapter 5.
1.6 Datasets
There are two basis NIDSs which depend on the source of data to be analyzed:
packet-based NIDSs and flow-based ones. M. H. Bhuyan et al. reviewed in [13] that
most of NIDSs are based on packet-based data. However, we only concentrate on
flow-based NIDSs because of three reasons: 1). It can detect some special attacks,
like DDoS or RDP Brute Force, more efficient and faster than payload based one,
since lesser information is needed to be analyzed [87, 76]; 2). Flow-based anomaly
detection methods process only packet headers and, therefore, reduce data amount and
processing time for high-speed detection on large networks. It can solve the scalability
problem in condition of increasing network usage and load [76]. 3). Flow-based NIDSs
decrease privacy issues in comparison with packet-based ones because of the absence
of payload [76].
-
27
1.6.1 The DARPA-Lincoln datasets
So as to evaluate the performance of different intrusion detection methodologies,
MITs Lincoln laboratory with the sponsor from the DARPA ITO and Air Force Re-
search laboratory, gathered the DARPA Lincoln datasets which consist of nine weeks
in 1998: seven weeks for training and two remaining weeks for test data. The datasets
are collected and stored in the form of Tcpdump, they are data source for extracting
to some datasets such as KDD99 [3], and NetFlow [86].
There were more than 300 instances of 38 different attacks launched against
victim UNIX hosts in the attack date which can fall into one of the four categories:
Denial of Service (DoS), Probe, Users to Root (U2R), and Remote to Local (R2L).
For each week, inside and outside network traffic data, audit data recorded by the
Basic Security Module on Solaris hosts, and file system dumped from UNIX hosts were
collected. In 1999, another series of datasets which contained three weeks of training
and two weeks of test date was collected. More than 200 instances of 58 attack types
were launched against victim UNIX and WindowsNT hosts and a Cisco router. In 2000,
there were three additional scenario-specific datasets generated to address distributed
DoS and Windows NT attacks. Detailed descriptions of these datasets can be found
at [1].
1.6.2 UT dataset
A public labeled flow-based dataset is provided in [75]. This dataset was cap-
tured by monitoring a honeypot hosted in the University of Twente network, so we
call it TU dataset. This dataset has three categories: malicious traffic, unknown traffic
and side-effect traffic. It has 14,170,132 flows which are mostly of malicious nature.
Only a small number of flows, 5,968 flows or 0.042%, are not labeled (unknown) and
considered as normal data in our experiments in following chapters. Each flow in the
dataset has 13 fields: id: the ID of the flow, src ip: anonymized source IP address
(encoded as 32-bit number), dst ip: anonymized destination IP address (encoded as
32-bit number), packets: number of packets in flow, octets: number of bytes in flow,
start time: UNIX start time (number of seconds), start msec: start time (milliseconds
-
28
part), end time: UNIX end time (number of seconds), end msec: end time (millisec-
onds part), src port: source port number, dst port: destination port number, tcp flags:
TCP flags of the flow, and prot: IP protocol number.
Examples of a normal flow and an attack flow are (393, 3145344965, 2463760020,
3, 168, 1222173606, 974, 1222173610, 239, 0, 769, 0, 1) and (1, 2463760020, 3752951033,
1, 60, 1222173605, 985, 1222173605, 985, 4534, 22, 2, 6), respectively. The ID of a flow
is used to distinguish attack flows from the others.
1.6.3 Netflow dataset
Packet-based DARPA dataset [1] is used to generate flow-based DARPA dataset,
called NetFlow, in [86]. This dataset focuses only on flows to a specific port and an IP
address which receives the highest number of attacks. It includes all 129,571 traffics
(including attacks) to and from victims. Each flow in the dataset has 10 fields: Source
IP, Destination IP, Source Port, Destination Port, Packets, Octets, Start Time, End
Time, Flags, and Proto. All 24,538 attacked flows are labeled with text labels, such as
neptune, portsweep, ftpwrite, etc.
Examples of a normal flow and an attack flow are (172.16.112.20, 172.16.112.50,
53, 32961, 1, 161, 1999-03-05T08:17:10, 1999-03-05T08 :17:10, 17, 00) and (209.167.99.71,
172.16.112.50, 10353, 4288, 1, 46, 1999-03-12T17:23:19, 1999-03-12T17:23:19, 6, 02:::port-
sweep), respectively.
1.6.4 Discussions
This thesis does not mention any techniques or algorithms for NIDS on a variety
of datasets. Although there exist many other well-known datasets for IDS, such as
KDD99 datasets [3], NSL-KDD dataset [4], and FDFA datasets [2], they are all out of
scope of this netflow-related thesis.
There have been a number of different viewpoints and researches which argue
the DARPA datasets. McHugh [62] proposed one of the most important assessment
of DARPA datasets which was deeply critical. In the other words, this assessment
gives some examples such as normal and attack data have unrealistic data rates, the
-
29
lack of training datasets for anomaly detection for its intended purpose or no efforts
for validation; to show that some evaluation methodologies are questionable and may
have biased the results. Furthermore, the ideas from Malhony and Chan [60] which
can be seen as a confirmation of findings owned by McHughs’s experiments, revealed
that many attributes had small and fixed ranges in simulation, but large and growing
ranges in real traffic.
Although there exist some above argumentative theories, DARPA-Lincoln dataset
plays a vital role in publicity and is the most sophisticated benchmark for researchers
in the process of the evaluation on intrusion detection algorithms or machine learning
algorithms [92]. One of the important factor of this data is to be a proxy for developing,
testing and evaluating detecting algorithms, rather than to be a solid dataset for a real
time system. If a detection algorithm has a high performance based on the DARPA
data, it is more likely to have a good performance in real network environment [88].
1.7 Summary
This chapter presents background topics used in this thesis including HIS, IDS,
and AIS. Some terms and definitions that will be used throughout the thesis are also
stated clearly. Two important data structure, ring-based and frequency-based, can be
used for improving classification rate of selection algorithms. Some popular perfor-
mance metrics and three well-known datasets for NIDS are presented and discussed.
These datasets will be used for experiments in the other chapters. Besides, under new
semantic of detection in r-chunk based PSA (Section 1.5.4), we proved an important
theorem on the coincide of detection coverage of PSA and that of NSA. This theorem
will lead to one contribution of the thesis that will be discussed in next chapter.
-
30
Chapter 2
COMBINATION OF NEGATIVESELECTION AND POSITIVESELECTION
It can be seen from Theorem 1.1 in Chapter 1 that the r-chunk based PSA and
NSA are dual in term of detection. This motivates our approach to combining two
selection algorithms that does not affect the detection performance of each.
2.1 Introduction
NSA and PSA are computational models that have been inspired by negative
and positive selection of the biological immune system. Among the two, NSA has been
studied more extensively resulting in more variants and applications [51]. However,
all of existing string-based NSAs have worst-case exponential memory complexity for
storing the detectors set, hence, limit their practical applicabilities [31]. In this chap-
ter, we introduce a novel selection algorithm that employs binary representation and
r-chunk matching rule for detectors. The new algorithm combines negative and pos-
itive selection to reduce both detector storage complexity and detection time, while
maintaining the same detection coverage as that of NSAs (PSAs).
In following section, we review some related works. Section 2.3 presents a new
r-chunk type NSA is presented in detail that is the combination of positive and nega-
tive selection, called PNSA. In our proposed approach, binary trees are used as data
structure for storing the detectors set to reduce memory complexity, and therefore the
time complexity of the detection phase. Section 2.4 details preliminary experimental
-
31
results. Summary of the chapter is in the final section.
2.2 Related works
Both PSA and NSA achieve quite similar performance for detecting novelty
in data patterns [24]. Dasgupta et al. [21] conducted one of the earliest experiments
on combining positive with negative selection. The combined process is embedded in
a genetic algorithm using a fitness function that assigns a weight to each bit based
on the domain knowledge. Their method is neither aimed to reduce detector storage
complexity nor detection time. Esponda et al. [34] proposed a generic NSA for anomaly
detection problems. Their model of normal behavior is constructed from an observed
sample of normally occurring patterns. Such a model could represent either the set
of allowed patterns (positive detection) or the set of anomalous patterns (negative
detection). However, their NSA is not concerned with the combination of positive and
negative selection in detection phase as in proposed one. Stibor et al. [80] argued that
positive selection might have better detection performance than negative selection.
However, the choice between positive selections and negative ones obviously depends
on representation of the AIS-based applications.
To the best of our knowledge, there has not been any published attempt in
combining r -chunk type PSA and NSA for the purpose of reducing detector storage
complexity and detection time complexity.
2.3 New Positive-Negative Selection Algorithm
Our algorithm first constructs ` − r + 1 binary trees (called positive trees)
corresponding to `−r+1 positive r-chunk detectors set Dpi, i = 1, . . . , `−r+1. Then,
all complete subtrees of these trees are removed to achieve a compact representation of
the positive r-chunk detectors set while maintaining the detection coverage. Finally, for
every ith positive trees, we decide whether or not it should be converted to the negative
tree, which covers the negative r-chunk detectors set Dni. The decision depends on
which tree is more compact. When this process is done, we have ` − r + 1 compact
-
32
binary trees that some of them represent positive r-chunk detectors and the others
represent negative ones.
The r-chunk matching rule on binary trees is implemented as follows: a given
sample s matches the positive (negative) tree ith if s[i . . . i + k] is a path from the
root to a leaf, i = 1, . . . , ` − r + 1, k < r. The detection phase can be conducted by
traveling the compact binary trees iteratively one by one: a sample s is claimed as
nonself if it matches a negative tree or it does not match all positive trees, otherwise
it is considered as self.
Example 2.1. For the set of six self strings from Example 1.3, S = {00000, 00010,
10110, 10111, 11000, 11010}, where ` = 5 and r = 3, the six binary trees (the left
and right child are labeled 0 and 1 respectively) represent the detectors set of six 3-
chunk detectors (Dpi and Dni, i = 1, 2, 3) as depicted in Fig. 2.1. In the figure, dashed
arrows in some positive trees mark the complete subtrees that will be removed to achieve
compact tree representation. The positive trees for Dp1, Dp2 and Dp3 are in (a), (c)
and (e), respectively; The negative trees for Dn1, Dn2 and Dn3 are in (b), (d) and (f),
respectively.
The number of nodes of the trees in Figures 2.1.a - 2.1.f (after deleting complete
subtrees) are 9, 10, 7, 6, 8 and 8, respectively. Therefore, the chosen final trees are
those in Figures 2.1.a (9 nodes), 2.1.d (6 nodes) and 2.1.e or 2.1.f (8 nodes). In real
implementation, it is unnecessary to generate both positive trees and negative trees.
Since each Dpi could dually be represented either by a positive or a negative tree,
we only need to generate (compact) positive trees. If a compact positive tree T has
more number of leaves than the number of internal nodes that have single child, the
corresponding negative tree T ′ has less number of nodes than that of T . Therefore, T ′
should be used instead of T to represent Dni more compactly. Figure 2.3 presents a
diagram of the algorithm. The following example illustrates this observation.
Example 2.2. Consider again the set of six self strings S from Example 1.3, S =
{00000, 00010, 10110, 10111, 11000, 11010}. The compact positive tree for the positive
3-chunk detectors set Dp2 = {(000,2); (001,2); (011,2); (100,2); (101,2)} is showed
-
33
1
(a)
1
0
0
0
0
0
1
1
1
0
0
0
0
0
1
1
1
0
1
0
0
1
(b)
1
10
1
11 0
1
0
(c) (d)
1
1
000
0
0
1
10
1 10 1
(e) (f)
0
1
Figure 2.1: Binary tree representation of the detectors set generated from S.
in Fig. 2.2.a. This tree has three leaves and two nodes that have only one child (in
dotted circles) so it should be converted to the corresponding negative tree as illustrated
in Fig. 2.2.b.
1
00
0
1
10
1
1 0
1
(a) (b)
Figure 2.2: Conversion of a positive tree to a negative one.
-
34
Algorithm 2.1 Detector Generation Algorithm.
1: procedure DetectorGeneration(S, r, T )Input: A set of self strings S ⊆ Σ`, a matching threshold r ∈ {1, . . . , `}.Output: A set T of `− r + 1 prefix trees presenting all r-chunk detectors.
2: T = ∅3: for i = 1, ..., `− r + 1 do4: create an empty prefix positive tree Ti5: for all s ∈ S do6: insert every s[i . . . i + r − 1] into Ti7: end for8: for all internal node n ∈ Ti do9: if n is root of complete binary subtree then
10: delete this subtree11: end if12: end for13: if (number of leaves of Ti) > (number of nodes of Ti that have only one
child) then14: for all internal node ∈ Ti do15: if it has only one child then16: if the child is a leaf then17: delete the child18: end if19: create the other child for it20: end if21: end for22: mark Ti as a negative tree23: end if24: T = T ∪ {Ti}25: end for26: end procedure
The proposed technique is summarized by Algorithm 2.1 and Algorithm 2.2.
The first algorithm, Algorithm 2.1, generates ` − r + 1 trees where each of these is
labeled with self or nonself. The process of generating compact binary (positive and
negative) trees representing the complete r-chunk detectors set is conducted in the outer
“for” loop. First, all binary positive tree Ti are constructed by the first inner loop.
Then, the compactification of all Ti is conducted by the second one, i = 1, . . . , `−r+1.
The conversion of a positive tree to negative one takes place in “if” statement after the
second inner “for” loop. The procedure for recognizing a given cell string s as self or
nonself, is carried out by the last “while . . . do” and “if . . . then . . . else” statements.
Figure 2.4 presents a diagram of the algorithm.
-
35
Figure 2.3: Diagram of the Detector Generation Algorithm.
The detection phase could be done by the second algorithm, Algorithm 2.2, and
be illustrated by the following example.
Example 2.3. Given S, r as in Example 1.3, S = {00000, 00010, 10110, 10111,
11000, 11010}, and s = 10100 as the inputs of the algorithm, three binary trees are
-
36
constructed as the detectors set in Figures 2.1.a, 2.1.d. and 2.1.e. One of the output of
the algorithm is “s is nonself” because all the paths of tree T2 do not contain substring
of s: s[2. . . 4] = 010.
Algorithm 2.2 Positive-Negative Selection Algorithm.
1: procedure PNSA(T , r, s)Input: A set T of `−r+1 prefix trees presenting all r-chunk detectors, a matchingthreshold r ∈ {1, . . . , `}, an unlabeled string s ∈ Σ`.Output: A label of s (as self or nonself).
2: flag = true . A temporary boolean variable3: i = 14: while (i ≤ `− r + 1) and (flag = true) do5: if (Ti is a positive tree) and (s /∈ Ti) then6: flag = false7: end if8: if (Ti is a negative tree) and (s ∈ Ti) then9: flag = false
10: end if11: i=i+112: end while13: if flag = false then14: output s is nonself15: else16: output s is self17: end if18: end procedure
From the description of DetectorGeneration, it is trivial to show that it takes
|S|(`−r+1).r steps to generate all necessary trees (detector generation time complexity)
and (`− r + 1).r steps to verify a cell string as self or nonself in the worst case (worse-
case detection time complexity). These time complexities are similar to the popular
NSAs (PSAs) such as the one proposed in [31]. However, by using compact positive
and negative binary trees for storing the detectors set, PNSA could reduce the storage
complexity of the detectors set in comparison with the other r-chunk type single NSAs
or PSAs that store detectors as binary strings. This storage complexity reduction could
potentially lead to better detection time complexity in real and average cases. To see
this, first, let the following theorem be stated:
Theorem 2.1 (PNSA detector storage complexity). Given a self set S and an integer
-
37
Figure 2.4: Diagram of the Positive-Negative Selection Algorithm.
`, the procedure DetectorGeneration produces the detector (binary) tree set that have
at most total (` − r + 1).2r−2 less number of nodes in comparison to the detector tree
set created by a PSA or NSA only, where r ∈ {2, . . . , `− r + 1}.
-
38
Proof. We only prove the theorem for the PSA case, the NSA case can be proven in a
similar way. Because there are (`− r + 1) positive trees can be build from the self set
S, so the theorem is proved if it can reduce at most 2r−2 nodes from a positive tree.
The theorem is proved by induction on r (also the height of binary trees).
It is noted that when converting a positive tree to a negative tree, the reduction
in number of nodes is exactly as the result of the subtraction of number of leaf nodes
from the number of internal nodes that have only one child.
When r = 2, there are 16 trees of possible positive trees are of height 2. By
examining all 16 cases, we have found that the maximum reduction in number of
nodes is 1. One example of these cases is the positive tree that has 2 leaf nodes after
compactification as in Fig. 2.5.a. Since it has two leaf nodes and one one-child internal
node, after being converted to the corresponding negative tree, the number of nodes is
reduced by 2 - 1 = 1.
1
(a)
0
0
11
(b)
0
1
Figure 2.5: One node is reduced in a tree: a compact positive tree has 4 nodes (a) andits conversion (a negative tree) has 3 node (b).
It is supposed that the theorem’s conclusion is right for all r < k. We shall
prove that it is also right for k. This is done by an observation that in all positive
trees that are of height k, there is at least one tree with both left subtree and right
subtree (of height k − 1) that each can be reduced by at least 2(k−1)−2 nodes after
conversion.
A real experiment on network intrusion dataset showed at the end of following
section shows that the storage reduction is only about 0.35% of this maximum.
-
39
2.4 Experiments
Next, we investigate the possible impact of the reduction in detector storage
complexity in PNSA on the detection real (average) time complexity in comparison
with single NSA (PSA). All experiments are performed on a laptop computer with
Windows 8 Pro 64-bit, Intel Core i5-3210M, CPU 2.50GHz (4 CPUs), 4GB RAM.
Table 2.1 shows the results on detector memory storage and detection time of
PNSA compared to a popular NSA proposed in [31] on some combinations of S, ` and
r. The training dataset of selves S contains randomly generated binary strings. The
memory reduction is measured as the ratio of reduction in number of nodes of the
binary tree detectors generated by PNSA when compared to the binary tree detectors
generated by the NSA in [31]. The comparative results show that when ` and r are
sufficiently large, the detector storage complexity and the detection time of PNSA are
significantly smaller than NSA in [31] (36% and 50% less).
Table 2.1: Comparison of memory and detection time reductions.
S ` r Memory (%) Time (%)1,000 50 12 0 02,000 30 15 2.5 52,000 40 17 25.9 42.72,000 50 20 36.3 50
We have conducted another experiment by choosing ` = 40, |S| = 20,000 (S is
the set of randomly generated binary strings of length `) and varying r (from 15 to
40). Then, `− r + 1 trees were created using single NSA and other `− r + 1 compact
trees were created using PNSA. Next, both detectors sets were used to detect every
s ∈ S. Fig. 2.6 depicts the detection time of PNSA and NSA in the experiment. The
results show that PNSA detection time is significantly smaller than that of NSA. For
instance, when r is from 20 to 34, detection in PNSA is about 4.46 times faster than
that of NSA.
Next experiment is conducted on Netflow dataset, a conversion of Tcpdump
from well-known DARPA dataset to Netflow [86]. We use all 105,033 normal flows as
self samples. This self set is first converted to binary string of length 104, then we run
-
40
15 20 25 30 35 40
500
1000
1500
2000
2500
3000
3500
t (mins)
r
NSAPNSA
Figure 2.6: Detection time of NSA and PNSA.
our algorithm on r changing from 5 to 45. Table 2.2 shows some of the experiment
steps. The percentage of node reduction is in the final column. Fig. 2.7 depicts the
reduction of nodes in trees created by PNSA comparison to that of NSA for all r = 3,..,
45. It shows that the reduction is more than one third when the matching threshold
greater than 19.
Table 2.2: Comparison of nodes generation on Netflow dataset.
r NSA PNSA Reduction(%)5 727 706 2.8910 33,461 31,609 5.5315 1,342,517 1,154,427 14.0120 9,428,132 6,157,766 34.6825 18,997,102 11,298,739 40.5230 29,668,240 17,080,784 42.4235 42,596,987 2