anomaly detection techniques

Computer Networks 51 (2007) 3448–3470

www.elsevier.com/locate/comnet

An overview of anomaly detection techniques: Existingsolutions and latest technological trends

Animesh Patcha *, Jung-Min Park

Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University,

Blacksburg, VA 24061, United States

Received 8 June 2006; received in revised form 8 February 2007; accepted 9 February 2007Available online 16 February 2007

Responsible Editor: Christos Douligeris

Abstract

As advances in networking technology help to connect the distant corners of the globe and as the Internet continues toexpand its influence as a medium for communications and commerce, the threat from spammers, attackers and criminalenterprises has also grown accordingly. It is the prevalence of such threats that has made intrusion detection systems—thecyberspace’s equivalent to the burglar alarm—join ranks with firewalls as one of the fundamental technologies for networksecurity. However, today’s commercially available intrusion detection systems are predominantly signature-based intru-sion detection systems that are designed to detect known attacks by utilizing the signatures of those attacks. Such systemsrequire frequent rule-base updates and signature updates, and are not capable of detecting unknown attacks. In contrast,anomaly detection systems, a subset of intrusion detection systems, model the normal system/network behavior whichenables them to be extremely effective in finding and foiling both known as well as unknown or ‘‘zero day’’ attacks. Whileanomaly detection systems are attractive conceptually, a host of technological problems need to be overcome before theycan be widely adopted. These problems include: high false alarm rate, failure to scale to gigabit speeds, etc. In this paper,we provide a comprehensive survey of anomaly detection systems and hybrid intrusion detection systems of the recent pastand present. We also discuss recent technological trends in anomaly detection and identify open problems and challengesin this area.� 2007 Elsevier B.V. All rights reserved.

Keywords: Survey; Anomaly detection; Machine learning; Statistical anomaly detection; Data mining

1. Introduction

Today, the Internet along with the corporatenetwork plays a major role in creating and advanc-

1389-1286/$ - see front matter � 2007 Elsevier B.V. All rights reserved

doi:10.1016/j.comnet.2007.02.001

* Corresponding author. Tel.: +1 540 239 0574.E-mail addresses: [email protected] (A. Patcha), jungmin@

vt.edu (J.-M. Park).

ing new business avenues. Business needs havemotivated enterprises and governments across theglobe to develop sophisticated, complex informationnetworks. Such networks incorporate a diverse arrayof technologies, including distributed data storagesystems, encryption and authentication techniques,voice and video over IP, remote and wireless access,and web services. Moreover, corporate networks

.

mailto:[email protected]

mailto:jungmin@ vt.edu

mailto:jungmin@ vt.edu

A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470 3449

have become more accessible; for instance, mostbusinesses allow access to their services on theirinternal networks via extranets to their partners,enable customers to interact with the networkthrough e-commerce transactions, and allow emp-loyees to tap into company systems through virtualprivate networks.

The aforementioned access points make today’snetworks more vulnerable to intrusions and attacks.Cyber-crime is no longer the prerogative of thestereotypical hacker. Joining ranks with the hackersare disgruntled employees, unethical corporations,and even terrorist organizations. With the vulnera-bility of present-day software and protocols com-bined with the increasing sophistication of attacks,it comes as no surprise that network-based attacksare on the rise [1–4]. The 2005 annual computercrime and security survey [5], jointly conductedby the Computer Security Institute and the FBI,indicated that the financial losses incurred by therespondent companies due to network attacks/intrusions were US $130 million. In another surveycommissioned by VanDyke Software in 2003, some66% of the companies stated that they perceivedsystem penetration to be the largest threat to theirenterprises. Although 86% of the respondents usedfirewalls, their consensus was that firewalls by them-selves are not sufficient to provide adequate protec-tion. Moreover, according to recent studies, anaverage of twenty to forty new vulnerabilities incommonly used networking and computer productsare discovered every month. Such wide-spreadvulnerabilities in software add to today’s insecurecomputing/networking environment. This insecureenvironment has given rise to the ever evolvingfield of intrusion detection and prevention. Thecyberspace’s equivalent to the burglar alarm, intru-

sion detection systems complement the beleagueredfirewall.

An intrusion detection system gathers andanalyzes information from various areas within acomputer or a network to identify possible securitybreaches. In other words, intrusion detection is theact of detecting actions that attempt to compro-mise the confidentiality, integrity or availability ofa system/network. Traditionally, intrusion detectionsystems have been classified as a signature detectionsystem, an anomaly detection system or a hybrid/compound detection system. A signature detectionsystem identifies patterns of traffic or applicationdata presumed to be malicious while anomalydetection systems compare activities against a ‘‘nor-

mal’’ baseline. On the other hand, a hybrid intru-sion detection system combines the techniques ofthe two approaches. Both signature detection andanomaly detection systems have their share ofadvantages and drawbacks. The primary advantageof signature detection is that known attacks can bedetected fairly reliably with a low false positive rate.The major drawback of the signature detectionapproach is that such systems typically require asignature to be defined for all of the possible attacksthat an attacker may launch against a network.Anomaly detection systems have two major advan-tages over signature based intrusion detectionsystems. The first advantage that differentiatesanomaly detection systems from signature detectionsystems is their ability to detect unknown attacks aswell as ‘‘zero day’’ attacks. This advantage isbecause of the ability of anomaly detection systemsto model the normal operation of a system/networkand detect deviations from them. A second advan-tage of anomaly detection systems is that theaforementioned profiles of normal activity arecustomized for every system, application and/ornetwork, and therefore making it very difficult foran attacker to know with certainty what activitiesit can carry out without getting detected. However,the anomaly detection approach has its share ofdrawbacks as well. For example, the intrinsic com-plexity of the system, the high percentage of falsealarms and the associated difficulty of determiningwhich specific event triggered those alarms are someof the many technical challenges that need to beaddressed before anomaly detection systems canbe widely adopted.

The aim of this paper is twofold. The first is topresent a comprehensive survey of recent literaturein the domain of anomaly detection. In doing so,we attempt to assess the ongoing work in this areaas well as consolidate the existing results. Whilethe emphasis of this paper is to survey anomalydetection techniques proposed in the last six years,we have also described some of the earlier work thatis seminal to this area. For a detailed exposition oftechniques proposed before 2000, the reader isdirected to the survey by Axelsson [6]. Our secondaim is to identify the open problems and theresearch challenges.

The remainder of this article is organized as fol-lows. In Section 2, we define intrusion detection,put forth the generic architectural design of anintrusion detection system, and highlight the threemain techniques for detecting intrusions/attacks in

3450 A. Patcha, J.-M. Park / Computer Networks 51 (2007) 3448–3470

computer networks and systems. In Section 3, wedescribe the premise of anomaly detection and pro-vide detailed discussions on the various techniquesused in anomaly detection. We highlight recentlyproposed hybrid systems in Section 4, and concludethe paper by discussing open problems and researchchallenges in Section 5.

2. Intrusion detection

An intrusion detection system is a software toolused to detect unauthorized access to a computersystem or network. An intrusion detection systemis capable of detecting all types of maliciousnetwork traffic and computer usage. This includesnetwork attacks against vulnerable services, data dri-ven attacks on applications, host-based attacks—such as privilege escalation, unauthorized loginsand access to sensitive files—and malware. An intru-sion detection system is a dynamic monitoring entitythat complements the static monitoring abilities of afirewall. An intrusion detection system monitorstraffic in a network in promiscuous mode, very muchlike a network sniffer. The network packets that arecollected are analyzed for rule violations by a patternrecognition algorithm. When rule violations aredetected, the intrusion detection system alerts theadministrator.

One of the earliest work that proposed intrusiondetection by identifying abnormal behavior can beattributed to Anderson [7]. In his report, Andersonpresents a threat model that classifies threats as

Monitored Entity Reference Data

Audit Collection Audit Storage Analysis

A

Security Officers R

Active Intrusion Response

Fig. 1. Organization of a generalized

external penetrations, internal penetrations, and mis-

feasance, and uses this classification to develop asecurity monitoring surveillance system based ondetecting anomalies in user behavior. External pen-etrations are defined as intrusions that are carriedout by unauthorized computer system users; inter-nal penetrations are those that are carried out byauthorized users who are not authorized for thedata that is compromised; and misfeasance isdefined as the misuse of authorized access both tothe system and to its data.

In a seminar paper, Denning [8] put forth theidea that intrusions to computers and networkscould be detected by assuming that users of acomputer/network would behave in a manner thatenables automatic profiling. In other words, amodel of the behavior of the entity being monitoredcould be constructed by an intrusion detection sys-tem, and subsequent behavior of the entity couldbe verified against the entity’s model. In this model,behavior that deviates sufficiently from the norm isconsidered anomalous. In the paper, Denning men-tioned several models that are based on statistics,Markov chains, time-series, etc.

In a much cited survey on intrusion detection sys-tems, Axelsson [9] put forth a generalized model of atypical intrusion detection system. Fig. 1 depictssuch a system where solid arrows indicate data/control flow while dotted arrows indicate a responseto intrusive activity. According to Axelsson, thegeneric architectural model of an intrusion detectionsystem contains the following modules:

and Detection Alarm

Configuration Data

Entity Security Authority

ctive/Processing Data

esponse to Intrusion

intrusion detection system [9].


• Audit data collection: This module is used in thedata collection phase. The data collected in thisphase are analyzed by the intrusion detectionalgorithm to find traces of suspicious activity.The source of the data can be host/network activ-ity logs, command-based logs, application-basedlogs, etc.

• Audit data storage: Typical intrusion detectionsystems store the audit data either indefinitelyor for a sufficiently long time for later reference.The volume of data is often exceedingly large.Hence, the problem of audit data reduction is amajor research issue in the design of intrusiondetection systems.

• Analysis and detection: The processing block isthe heart of an intrusion detection system. It ishere that the algorithms to detect suspiciousactivities are implemented. Algorithms for theanalysis and detection of intrusions have beentraditionally classified into three broad catego-ries: signature (or misuse) detection, anomaly

detection and hybrid (or compound) detection.• Configuration data: The configuration data are

the most sensitive part of an intrusion detectionsystem. It contains information that is pertinentto the operation of the intrusion detection systemitself such as information on how and when tocollect audit data, how to respond to intrusions,etc.

• Reference data: The reference data storage mod-ule stores information about known intrusionsignatures (in the case of signature detection) orprofiles of normal behavior (in the case of anom-aly detection). In the latter case, the profiles areupdated when new knowledge about systembehavior is available.

• Active/processing data: The processing elementmust frequently store intermediate results suchas information about partially fulfilled intrusionsignatures.

• Alarm: This part of the system handles all outputfrom the intrusion detection system. The outputmay be either an automated response to an intru-sion or a suspicious activity alert for a systemsecurity officer.

Historically, intrusion detection research hasconcentrated on the analysis and detection stageof the architectural model shown in Fig. 1. As men-tioned above, algorithms for the analysis and detec-tion of intrusions/attacks are traditionally classifiedinto the following three broad categories:

• Signature or misuse detection is a technique forintrusion detection that relies on a predefinedset of attack signatures. By looking for specificpatterns, the signature detection-based intru-sion detection systems match incoming packetsand/or command sequences to the signatures ofknown attacks. In other words, decisions aremade based on the knowledge acquired fromthe model of the intrusive process and theobserved trace that it has left in the system. Legalor illegal behavior can be defined and comparedwith observed behavior. Such a system tries tocollect evidence of intrusive activity irrespectiveof the normal behavior of the system. One ofthe chief benefits of using signature detection isthat known attacks can be detected reliably witha low false positive rate. The existence of specificattack sequences ensures that it is easy for thesystem administrator to determine exactly whichattacks the system is currently experiencing. Ifthe audit data in the log files do not contain theattack signature, no alarm is raised. Anotherbenefit is that the signature detection systembegins protecting the computer/network immedi-ately upon installation. One of the biggestproblems with signature detection systems ismaintaining state information of signatures inwhich an intrusive activity spans multiple discreteevents—that is, the complete attack signaturespans multiple packets. Another drawback is thatthe signature detection system must have a signa-ture defined for all of the possible attacks that anattacker may launch. This requires frequent sig-nature updates to keep the signature databaseup-to-date.

• An anomaly detection system first creates a base-line profile of the normal system, network, orprogram activity. Thereafter, any activity thatdeviates from the baseline is treated as a possibleintrusion. Anomaly detection systems offer sev-eral benefits. First, they have the capability todetect insider attacks. For instance, if a user orsomeone using a stolen account starts performingactions that are outside the normal user-profile,an anomaly detection system generates an alarm.Second, because the system is based on custom-ized profiles, it is very difficult for an attackerto know with certainty what activity it can carryout without setting off an alarm. Third, an anom-aly detection system has the ability to detect pre-viously unknown attacks. This is due to the factthat a profile of intrusive activity is not based


on specific signatures representing known intru-sive activity. An intrusive activity generates analarm because it deviates from normal activity,not because someone configured the system tolook for a specific attack signature. Anomalydetection systems, however, also suffer fromseveral drawbacks. The first obvious drawbackis that the system must go through a training per-iod in which appropriate user profiles are createdby defining ‘‘normal’’ traffic profiles. Moreover,creating a normal traffic profile is a challengingtask. The creation of an inappropriate normaltraffic profile can lead to poor performance.Maintenance of the profiles can also be time-con-suming. Since, anomaly detection systems arelooking for anomalous events rather thanattacks, they are prone to be affected by time con-suming false alarms. False alarms are classified aseither being false positive or false negative.A false positive occurs when an IDS reports asan intrusion an event that is in fact legitimate net-work activity. A side affect of false positives, isthat an attack or malicious activity on the net-work/system could go undetected because of allthe previous false positives. This failure to detectan attack is termed as a false negative in theintrusion detection jargon. A key element ofmodern anomaly detection systems is the alertcorrelation module. However, the high percent-age false alarms that are typically generated inanomaly detection systems make it very difficultto associate specific alarms with the events thattriggered them. Lastly, a pitfall of anomaly detec-tion systems is that a malicious user can train ananomaly detection system gradually to acceptmalicious behavior as normal.

• A hybrid or compound detection system combinesboth approaches. In essence, a hybrid detectionsystem is a signature inspired intrusion detectionsystem that makes a decision using a ‘‘hybridmodel’’ that is based on both the normal behav-ior of the system and the intrusive behavior of theintruders.

3. Anomaly detection techniques

An anomaly detection approach usually consistsof two phases: a training phase and a testing phase.In the former, the normal traffic profile is defined;in the latter, the learned profile is applied to newdata.

3.1. Premise of anomaly detection

The central premise of anomaly detection is thatintrusive activity is a subset of anomalous activity[10]. If we consider an intruder, who has no ideaof the legitimate user’s activity patterns, intrudinginto a host system, there is a strong probability thatthe intruder’s activity will be detected as anomalous.In the ideal case, the set of anomalous activities willbe the same as the set of intrusive activities. In sucha case, flagging all anomalous activities as intrusiveactivities results in no false positives and no falsenegatives. However, intrusive activity does notalways coincide with anomalous activity. Kumarand Stafford [10] suggested that there are four pos-sibilities, each with a non-zero probability:

• Intrusive but not anomalous: These are false nega-tives. An intrusion detection system fails to detectthis type of activity as the activity is not anoma-lous. These are called false negatives because theintrusion detection system falsely reports theabsence of intrusions.

• Not intrusive but anomalous: These are false pos-itives. In other words, the activity is not intrusive,but because it is anomalous, an intrusion detec-tion system reports it as intrusive. These arecalled false positives because an intrusion detec-tion system falsely reports intrusions.

• Not intrusive and not anomalous: These are truenegatives; the activity is not intrusive and is notreported as intrusive.

• Intrusive and anomalous: These are true posi-tives; the activity is intrusive and is reported assuch.

When false negatives need to be minimized,thresholds that define an anomaly are set low. Thisresults in many false positives and reduces theefficacy of automated mechanisms for intrusiondetection. It creates additional burdens for thesecurity administrator as well, who must investi-gate each incident and discard false positiveinstances.

3.2. Techniques used in anomaly detection

In this subsection, we review a number of differ-ent architectures and methods that have been pro-posed for anomaly detection. These includestatistical anomaly detection, data-mining basedmethods, and machine learning based techniques.


3.2.1. Statistical anomaly detection

In statistical methods for anomaly detection, thesystem observes the activity of subjects and gener-ates profiles to represent their behavior. The profiletypically includes such measures as activity intensitymeasure, audit record distribution measure, cate-gorical measures (the distribution of an activity overcategories) and ordinal measure (such as CPUusage). Typically, two profiles are maintained foreach subject: the current profile and the storedprofile. As the system/network events (viz. auditlog records, incoming packets, etc.) are processed,the intrusion detection system updates the currentprofile and periodically calculates an anomaly score(indicating the degree of irregularity for the specificevent) by comparing the current profile with thestored profile using a function of abnormality ofall measures within the profile. If the anomaly scoreis higher than a certain threshold, the intrusiondetection system generates an alert.

Statistical approaches to anomaly detection havea number of advantages. Firstly, these systems, likemost anomaly detection systems, do not requireprior knowledge of security flaws and/or the attacksthemselves. As a result, such systems have the capa-bility of detecting ‘‘zero day’’ or the very latestattacks. In addition, statistical approaches can pro-vide accurate notification of malicious activities thattypically occur over extended periods of time andare good indicators of impending denial-of-service(DoS) attacks. A very common example of suchan activity is a portscan. Typically, the distributionof portscans is highly anomalous in comparison tothe usual traffic distribution. This is particularlytrue when a packet has unusual features (e.g., acrafted packet). With this in mind, even portscansthat are distributed over a lengthy time framewill be recorded because they will be inherentlyanomalous.

However, statistical anomaly detection schemesalso have drawbacks. Skilled attackers can train astatistical anomaly detection to accept abnormalbehavior as normal. It can also be difficult to deter-mine thresholds that balance the likelihood of falsepositives with the likelihood of false negatives. Inaddition, statistical methods need accurate statisticaldistributions, but, not all behaviors can be modeledusing purely statistical methods. In fact, a majorityof the proposed statistical anomaly detection tech-niques require the assumption of a quasi-stationaryprocess, which cannot be assumed for most dataprocessed by anomaly detection systems.

Haystack [11] is one of the earliest examples of astatistical anomaly-based intrusion detection sys-tem. It used both user and group-based anomalydetection strategies, and modeled system parametersas independent, Gaussian random variables. Hay-stack defined a range of values that were considerednormal for each feature. If during a session, a fea-ture fell outside the normal range, the score forthe subject was raised. Assuming the features wereindependent, the probability distribution of thescores was calculated. An alarm was raised if thescore was too large. Haystack also maintained adatabase of user groups and individual profiles. Ifa user had not previously been detected, a new userprofile with minimal capabilities was created usingrestrictions based on the user’s group membership.It was designed to detect six types of intrusions:attempted break-ins by unauthorized users, mas-querade attacks, penetration of the security controlsystem, leakage, DoS attacks and malicious use.One drawback of Haystack was that it was designedto work offline. The attempt to use statistical analy-ses for real-time intrusion detection systems failed,since doing so required high-performance systems.Secondly, because of its dependence on maintainingprofiles, a common problem for system administra-tors was the determination of what attributes weregood indicators of intrusive activity.

One of the earliest intrusion detection systemswas developed at the Stanford Research Institute(SRI) in the early 1980’s and was called the Intru-sion Detection Expert System (IDES) [13,14]. IDESwas a system that continuously monitored userbehavior and detected suspicious events as theyoccurred. In IDES, intrusions could be flagged bydetecting departures from established normalbehavior patterns for individual users. As the anal-ysis methodologies developed for IDES matured,scientists at SRI developed an improved version ofIDES called the Next-Generation Intrusion Detec-tion Expert System (NIDES) [15,16]. NIDES wasone of the few intrusion detection systems of its gen-eration that could operate in real time for continu-ous monitoring of user activity or could run in abatch mode for periodic analysis of the audit data.However, the primary mode of operation of NIDESwas to run in real-time. A flow chart describing thereal time operation of NIDES is shown in Fig. 2.Unlike IDES, which is an anomaly detection sys-tem, NIDES is a hybrid system that has anupgraded statistical analysis engine. In both IDESand NIDES, a profile of normal behavior based


on a selected set of variables is maintained by thestatistical analysis unit. This enables the system tocompare the current activity of the user/system/net-work with the expected values of the audited intru-sion detection variables stored in the profile andthen flag an anomaly if the audited activity is suffi-ciently far from the expected behavior. Each vari-able in the stored profile reflects the extent towhich a particular type of behavior is similar tothe profile built for it under ‘‘normal conditions’’.The way that this is computed is by associating eachmeasure/variable to a corresponding random vari-able. The frequency distribution is built andupdated over time, as more audit records are ana-lyzed. It is computed as an exponential weightedsum with a half-life of 30 days. This implies thatthe half-life value makes audit records that weregathered 30 days in the past to contribute with halfas much weight as recent records; those gathered 60days in the past contribute one-quarter as muchweight, and so on. The frequency distribution iskept in the form of a histogram with probabilitiesassociated with each one of the possible ranges thatthe variable can take. The cumulative frequency dis-tribution is then built by using the ordered set of binprobabilities. Using this frequency distribution, andthe value of the corresponding measure for the cur-rent audit record, it is possible to compute a valuethat reflects how far away from the ‘‘normal’’ valueof the measure the current value is. The actual com-

Target System

DataCoalesc

Cente

Statistical Analysis

Resolv

User Inter

Statistical Analysis Results

Fig. 2. Flow chart of real time

putation in NIDES [16] renders a value that is cor-related with how abnormal this measure is.Combining the values obtained for each mea-sure and taking into consideration the correlationbetween measures, the unit computes an index ofhow far the current audit record is from the normalstate. Records beyond a threshold are flagged aspossible intrusions.

However the techniques used in [13–16] have sev-eral drawbacks. Firstly, the techniques are sensitiveto the normality assumption. If data on a measureare not normally distributed, the techniques wouldyield a high false alarm rate. Secondly, the tech-niques are predominantly univariate in that a statis-tical norm profile is built for only one measure ofthe activities in a system. However, intrusions oftenaffect multiple measures of activities collectively.

Statistical Packet Anomaly Detection Engine(SPADE) [17] is a statistical anomaly detection sys-tem that is available as a plug-in for SNORT [18],and is can be used for automatic detection ofstealthy port scans. SPADE was one of the firstpapers that proposed using the concept of an anom-aly score to detect port scans, instead of using thetraditional approach of looking at p attempts overq seconds. In [17], the authors used a simplefrequency based approach, to calculate the ‘‘anom-

aly score’’ of a packet. The fewer times a givenpacket was seen, the higher was its anomaly score.In other words, the authors define an anomaly score

Target System

ing r

RulebasedAnalysis

er

face

Resolved Analysis Results

Rulebase Analysis Results

operation in NIDES [12].


as the degree of strangeness based on recent pastactivity. Once the anomaly score crossed a thresh-old, the packets were forwarded to a correlationengine that was designed to detect port scans. How-ever, the one major drawback for SPADE is that ithas a very high false alarm rate. This is due to thefact that SPADE classifies all unseen packets asattacks regardless of whether they are actually intru-sions or not.

Anomalies resulting from intrusions may causedeviations on multiple measures in a collective man-ner rather than through separate manifestations onindividual measures. To overcome the latter prob-lem, Ye et al. [19] presented a technique that usedthe Hotellings T2 test1 to analyze the audit trailsof activities in an information system and detecthost based intrusions. The assumption is that hostbased intrusions leave trails in the audit data. Theadvantage of using the Hotellings T2 test is that itaids in the detection of both counter relationshipanomalies as well as mean-shift anomalies. Inanother paper, Kruegel et al. [20] show that it is pos-sible to find the description of a system that com-putes a payload byte distribution and combinesthis information with extracted packet header fea-tures. In this approach, the resultant ASCII charac-ters are sorted by frequency and then aggregatedinto six groups. However, this approach leads to avery coarse classification of the payload.

A problem that many network/system adminis-trators face is the problem of defining, on a globalscale, what network/system/user activity can betermed as ‘‘normal’’. Maxion and Feather [21] char-acterized the normal behavior in a network by usingdifferent templates that were derived by taking thestandard deviations of Ethernet load and packetcount at various periods in time. An observationwas declared anomalous if it exceeded the upperbound of a predefined threshold. However, Maxionet al. did not consider the non-stationary nature ofnetwork traffic which would have resulted in minordeviations in network traffic to go unnoticed.

More recently, analytical studies on anomalydetection systems were conducted. Lee and Xiang[22] used several information-theoretic measures,

1 The Hotelling’s T2 test statistic for an observation xi isdetermined as T 2 ¼ nðxi � �xÞ0W�1ðxi � �xÞ where �x is the mean,xi = (xi1,xi2,. . .,xip) denote an observation of p measures on aprocessor system at time i and W is the sample variance. A largevalue of T2 indicates a large deviation observation xi from the in-control population.

such as entropy and information gain, to evaluatethe quality of anomaly detection methods, deter-mine system parameters, and build models. Thesemetrics help one to understand the fundamentalproperties of audit data. The highlighting featuresof some of the schemes surveyed in this section arepresented in Table 1.

3.2.2. Machine learning based anomaly detection

Machine learning can be defined as the ability ofa program and/or a system to learn and improvetheir performance on a certain task or group oftasks over time. Machine learning aims to answermany of the same questions as statistics or datamining. However, unlike statistical approacheswhich tend to focus on understanding the processthat generated the data, machine learning tech-niques focus on building a system that improvesits performance based on previous results. In otherwords systems that are based on the machinelearning paradigm have the ability to change theirexecution strategy on the basis of newly acquiredinformation.

3.2.2.1. System call based sequence analysis. One ofthe widely used machine learning techniques foranomaly detection involves learning the behaviorof a program and recognizing significant deviationsfrom the normal. In a seminal paper, Forrest et al.[23] established an analogy between the humanimmune system and intrusion detection. They didthis by proposing a methodology that involved ana-lyzing a program’s system call sequences to build anormal profile. In their paper, they analyzed severalUNIX based programs like sendmail, lpr, etc., andshowed that correlations in fixed length sequencesof system calls could be used to build a normal pro-file of a program. Therefore, programs that showsequences that deviated from the normal sequenceprofile could then be considered to be victims ofan attack. The system they developed was only usedoff-line using previously collected data and used aquite simple table-lookup algorithm to learn theprofiles of programs. Their work was extended byHofmeyr et al. [24], where they collected a databaseof normal behavior for each program of interest.Once a stable database is constructed for a givenprogram in a particular environment, the databasewas then used to monitor the program’s behavior.The sequences of system calls formed the set of nor-mal patterns for the database, and sequences notfound in the database indicated anomalies.

Table 1A summary of statistical anomaly detection systems

Reference Highlighting feature Methodology

Haystack [11] It uses descriptive statistics to model user behavior Also modeled acceptablebehavior for a generic user within a particular user group

Host based statistical anomalydetection

NIDES [15] A distributed intrusion detection system that had both anomaly as well as signaturedetection modules

Network based statisticalanomaly detection

Stanifordet al. [17]

Statistical anomaly detection technique that calculates an anomaly score for eachpacket that it sees; it forwards the packets to a correlation engine for intrusiondetection purposes when a predefined threshold was crossed

Network based statisticalanomaly detection

Ye et al. [19] It uses the Hotellings T2 test to analyze the audit trails of activities in an computersystem and detect host-based intrusions

Host based multivariatestatistical anomaly detection

2 A naive Bayesian network is a restricted network that hasonly two layers and assumes complete independence between theinformation nodes (i.e., the random variables that can beobserved and measured). These limitations result in a tree-shapednetwork with a single hypothesis node (root node) that hasarrows pointing to a number of information nodes (child nodes).All child nodes have exactly one parent node, that is, the rootnode, and no other causal relationship between nodes arepermitted.


Another machine learning technique that hasbeen frequently used in the domain of machinelearning is the sliding window method. The slidingwindow method, a sequential learning methodol-ogy, converts the sequential learning problem intothe classical learning problem. It constructs a win-

dow classifier hw that maps an input window ofwidth w into an individual output value y. Specifi-cally, let d = (w � 1)/2 be the ‘‘half-width’’ of thewindow. Then hw predicts yi,t using the windowhxi,t � d,xi, t � d+1, . . .,xi, t, . . .,xi, t + d� 1,xi, t + di. Thewindow classifier hw is trained by converting eachsequential training example (xi,yi) into windowsand then applying a standard machine learningalgorithm. A new sequence x is classified by con-verting it to windows, applying hw to predict eachyt and then concatenating the yt’s to form the pre-dicted sequence y. The obvious advantage of thissliding window method is that it permits any classi-cal supervised learning algorithm to be applied.While, the sliding window method gives adequateperformance in many applications; it does not takeadvantage of correlations between nearby yt values.Specifically, the only relationships between nearbyyt values that are captured are those that are pre-dictable from nearby xt values. If there are correla-tions among the yt values that are independent ofthe xt values, then these are not captured. The slid-ing window method has been successfully used in anumber of machine learning based anomaly detec-tion techniques [25–27]. Warrender et al. [27] pro-posed a method that utilized sliding windows tocreate a database of normal sequences for testingagainst test instances. Eskin et al. [26], improvedthe traditional sliding window method by proposinga modeling methodology that uses dynamic lengthof a sliding window dependent on the context ofthe system-call sequence.

However, system call based approaches for hostbased intrusion detection system suffer from two

drawbacks. Firstly, the computational overheadthat is involved in monitoring every system call isvery high. This high overhead leads to a perfor-mance degradation of the monitored system. Thesecond problem is that system calls themselves areirregular by nature. This irregularity leads to highfalse positive rate as it becomes difficult to differen-tiate between normal and anomalous system calls.

3.2.2.2. Bayesian networks. A Bayesian network is agraphical model that encodes probabilistic relation-ships among variables of interest. When used inconjunction with statistical techniques, Bayesiannetworks have several advantages for data analysis[28]. Firstly, because Bayesian networks encodethe interdependencies between variables, they canhandle situations where data is missing. Secondly,Bayesian networks have the ability to represent cau-sal relationships. Therefore, they can be used to pre-dict the consequences of an action. Lastly, becauseBayesian networks have both causal and probabilis-tic relationships, they can be used to modelproblems where there is a need to combine priorknowledge with data. Several researchers haveadapted ideas from Bayesian statistics to createmodels for anomaly detection [29–31]. Valdeset al. [30] developed an anomaly detection systemthat employed naıve Bayesian networks2 to performintrusion detection on traffic bursts. Their model,which is a part of EMERALD [32], has the capabil-ity to potentially detect distributed attacks in which


each individual attack session is not suspiciousenough to generate an alert. However, this schemealso has a few disadvantages. First, as pointed outin [29], the classification capability of naıve Bayes-ian networks is identical to a threshold based systemthat computes the sum of the outputs obtained fromthe child nodes. Secondly, because the child nodesdo not interact between themselves and their outputonly influences the probability of the root node,incorporating additional information becomes diffi-cult as the variables that contain the informationcannot directly interact with the child nodes.

Another area, within the domain of anomalydetection, where Bayesian techniques have been fre-quently used is the classification and suppression offalse alarms. Kruegel et al. [29] proposed a multi-sensor fusion approach where the outputs of differ-ent IDS sensors were aggregated to produce a singlealarm. This approach is based on the assumptionthat any anomaly detection technique cannot clas-sify a set of events as an intrusion with sufficientconfidence. Although using Bayesian networksfor intrusion detection or intruder behavior predic-tion can be effective in certain applications, theirlimitations should be considered in the actual imple-mentation. Since the accuracy of this method isdependent on certain assumptions that are typicallybased on the behavioral model of the target system,deviating from those assumptions will decrease itsaccuracy. Selecting an inaccurate model will leadto an inaccurate detection system. Therefore, select-ing an accurate model is the first step towards solv-ing the problem. Unfortunately selecting anaccurate behavioral model is not an easy task astypical systems and/or networks are complex.

3.2.2.3. Principal components analysis. Typical data-sets for intrusion detection are typically very largeand multidimensional. With the growth of highspeed networks and distributed network based dataintensive applications storing, processing, transmit-ting, visualizing and understanding the data isbecoming more complex and expensive. To tacklethe problem of high dimensional datasets, research-ers have developed a dimensionality reduction tech-nique known as principal component analysis(PCA) [33–35]. In mathematical terms, PCA is atechnique where n correlated random variables aretransformed into d 6 n uncorrelated variables. Theuncorrelated variables are linear combinations ofthe original variables and can be used to expressthe data in a reduced form. Typically, the first prin-

cipal component of the transformation is the linearcombination of the original variables with the larg-est variance. In other words, the first principal com-ponent is the projection on the direction in whichthe variance of the projection is maximized. The sec-ond principal component is the linear combinationof the original variables with the second largestvariance and orthogonal to the first principal com-ponent, and so on. In many data sets, the first sev-eral principal components contribute most of thevariance in the original data set, so that the restcan be disregarded with minimal loss of the variancefor dimension reduction of the dataset.

PCA has been widely used in the domain ofimage compression, pattern recognition and intru-sion detection. Shyu et al. [36] proposed an anomalydetection scheme, where PCA was used as an outlierdetection scheme and was applied to reduce thedimensionality of the audit data and arrive at a clas-sifier that is a function of the principal components.They measured the Mahalanobis distance of eachobservation from the center of the data for anomalydetection. The Mahalanobis distance is computedbased on the sum of squares of the standardizedprincipal component scores. Shyu et al. evaluatedtheir method over the KDD CUP99 data and havedemonstrated that it exhibits better detection ratethan other well known outlier based anomaly detec-tion algorithms such as the Local Outlier Factor‘‘LOF’’ approach, the Nearest Neighbor approachand the kth Nearest Neighbor approach. Othernotable techniques that employ the principal com-ponent analysis methodology include the work doneby Wang et al. [35], Bouzida et al. [37] and Wanget al. [38].

3.2.2.4. Markov models. Markov chains, have alsobeen employed extensively for anomaly detection.Ye et al. [39], present an anomaly detection tech-nique that is based on Markov chains. In theirpaper, system call event sequences from therecent past were studied by opening an observa-tion window of size N. The type of audit events,Et�N+1, . . .,Et in the window at time t was examinedand the sequence of states Xt�N+1, . . .,Xt obtained.Subsequently, the probability that the sequence ofstates Xt�N+1, . . .,Xt is normal was obtained. Thelarger the probability, the more likely the sequenceof states results from normal activities. A sequenceof states from attack activities is presumed toreceive a low probability of support from theMarkov chain model of the normal profile.


A hidden Markov model, another popularMarkov technique, like the one shown in Fig. 3, isa statistical model where the system being modeledis assumed to be a Markov process with unknownparameters. The challenge is to determine the hid-den parameters from the observable parameters.Unlike a regular Markov model, where the statetransition probabilities are the only parametersand the state of the system is directly observable,in a hidden Markov model, the only visible elementsare the variables of the system that are influenced bythe state of the system, and the state of the systemitself is hidden. A hidden Markov model’s statesrepresent some unobservable condition of the sys-tem being modeled. In each state, there is a certainprobability of producing any of the observablesystem outputs and a separate probability indicatingthe likely next states. By having different outputprobability distributions in each of the states, andallowing the system to change states over time, themodel is capable of representing non-stationarysequences.

To estimate the parameters of a hidden Markovmodel for modeling normal system behavior,sequences of normal events collected from normalsystem operation are used as training data. Anexpectation-maximization (EM) algorithm is usedto estimate the parameters. Once a hidden Markovmodel has been trained, when confronted with testdata, probability measures can be used as thresholdsfor anomaly detection. In order to use hidden Mar-kov models for anomaly detection, three key prob-lems need to be addressed. The first problem, alsoknown as the evaluation problem, is to determinegiven a sequence of observations, what is the prob-ability that the observed sequence was generated bythe model. The second is the learning problem whichinvolves building from the audit data a model, or aset of models, that correctly describes the observed

y2 y3 yn

x2 x3 xn

y1

x1

Hidden States

Observable States

Fig. 3. Example of a hidden Markov model.

behavior. Given a hidden Markov model and theassociated observations, the third problem, alsoknown as the decoding problem, involves determin-ing the most likely set of hidden states that haveled to those observations.

Warrender et al. [27] compare the performance offour methods viz., simple enumeration of observedsequences, comparison of relative frequencies of dif-ferent sequences, a rule induction technique, andhidden Markov models at representing normalbehavior accurately and recognizing intrusions insystem call datasets. The authors show that whilehidden Markov models outperform the other threemethods, the higher performance comes at a greatercomputational cost. In the proposed model, theauthors use an hidden Markov model with fullyconnected states, i.e., transitions were allowed fromany state to any other state. Therefore, a processthat issues S system calls will have Sstates. Thisimplies that we will roughly have 2S2 values in thestate transition matrix. In a computer system/net-work, a process typically issues a very large numberof system calls. Modeling all of the processes in acomputer system/network would therefore be com-putationally infeasible.

In another paper, Yeung et al. [40] describe theuse of hidden Markov models for anomaly detec-tion based on profiling system call sequences andshell command sequences. On training, their modelcomputes the sample likelihood of an observedsequence using the forward or backward algorithm.A threshold on the probability, based on the mini-mum likelihood among all training sequences, wasused to discriminate between normal and anoma-lous behavior. One major problem with thisapproach is that it lacks generalization and/or sup-port for users who are not uniquely identified by thesystem under consideration.

Mahoney et al. [41–43] presented several meth-ods that address the problem of detecting anomaliesin the usage of network protocols by inspectingpacket headers. The common denominator of allof them is the systematic application of learningtechniques to automatically obtain profiles of nor-mal behavior for protocols at different layers.Mahoney et al. experimented with anomaly detec-tion over the DARPA network data [44] by rangematching network packet header fields. PacketHeader Anomaly Detector (PHAD) [41], LEarningRules for Anomaly Detection (LERAD) [42] andApplication Layer Anomaly Detector (ALAD)[43] use time-based models in which the probability


of an event depends on the time since it lastoccurred. For each attribute, they collect a set ofallowed values and flag novel values as anomalous.PHAD, ALAD, and LERAD differ in the attributesthat they monitor. PHAD monitors 33 attributesfrom the Ethernet, IP and transport layer packetheaders. ALAD models incoming server TCPrequests: source and destination IP addresses andports, opening and closing TCP flags, and the listof commands (the first word on each line) in theapplication payload. Depending on the attribute,it builds separate models for each target host,port number (service), or host/port combination.LERAD also models TCP connections. Eventhough, the data set is multivariate network trafficdata containing fields extracted out of the packetheaders, the authors break down the multivariateproblem into a set of univariate problems and sumthe weighted results from range matching alongeach dimension. While the advantage of thisapproach is that it makes the technique more com-putationally efficient and effective at detecting net-work intrusions, breaking multivariate data intounivariate data has significant drawbacks especiallyat detecting attacks. For example, in a typical SYNflood attack an indicator of the attack, is havingmore SYN requests than usual, but observing alower than normal ACK rate. Because higherSYN rate or lower ACK rate alone can both happenin normal usage (when the network is busy or idle),it is the combination of higher SYN rate and lowerACK rate that signals the attack.

The one major drawback of many of the machinelearning techniques, like the system call basedsequence analysis approach and the hidden Markovmodel approach mentioned above, is that they areresource expensive. For example, an anomaly detec-tion technique that is based on the Markov chainmodel is computationally expensive because it usesparametric estimation techniques based on theBayes’ algorithm for learning the normal profile ofthe host/network under consideration. If we considerthe large amount of audit data and the relatively highfrequency of events that occur in computers and net-works of today, such a technique for anomaly detec-tion is not scalable for real time operation. Thehighlighting features of some of the schemes sur-veyed in this section are presented in Table 2.

3.2.3. Data mining based anomaly detection

To eliminate the manual and ad hoc elementsfrom the process of building an intrusion detection

system, researchers are increasingly looking at usingdata mining techniques for anomaly detection [45–47]. Grossman [48] defines data mining as being‘‘concerned with uncovering patterns, associations,changes, anomalies, and statistically significantstructures and events in data’’. Simply put data min-ing is the ability to take data as input, and pull fromit patterns or deviations which may not be seen eas-ily to the naked eye. Another term sometimes usedis knowledge discovery. Data mining can helpimprove the process of intrusion detection by add-ing a level of focus to anomaly detection. By identi-fying bounds for valid network activity, data miningwill aid an analyst in his/her ability to distinguishattack activity from common everyday traffic onthe network.

3.2.3.1. Classification-based intrusion detection. Anintrusion detection system that classifies audit dataas normal or anomalous based on a set of rules, pat-terns or other affiliated techniques can be broadlydefined as a classification-based intrusion detectionsystem. The classification process typically involvesthe following steps:

1. Identify class attributes and classes from trainingdata.

2. Identify attributes for classification.3. Learn a model using the training data.4. Use the learned model to classify the unknown

data samples.

A variety of classification techniques have beenproposed in the literature. These include inductiverule generation techniques, fuzzy logic, genetic algo-rithms and neural networks-based techniques.

Inductive rule generation algorithms typicallyinvolve the application of a set of association rulesand frequent episode patterns to classify the auditdata. In this context, if a rule states that ‘‘if event

X occurs, then event Y is likely to occur’’, then eventsX and Y can be described as sets of (variable, value)-pairs where the aim is to find the sets X and Y suchthat X ‘‘implies’’ Y. In the domain of classification,we fix Y and attempt to find sets of X which aregood predictors for the right classification. Whilesupervised classification typically only derives rulesrelating to a single attribute, general rule inductiontechniques, which are typically unsupervised in nat-ure, derive rules relating to any or all the attributes.The advantage of using rules is that they tend to besimple and intuitive, unstructured and less rigid. As

Table 2A summary of machine learning based anomaly detection systems

Reference Highlighting features Methodology

Forrest et al. [23] It finds correlations in fixed length sequences of system callsin the UNIX operating system, and uses them to build a normalprofile for anomaly detection

Sequence analysis basedstatistical anomalydetection

Eskin et al. [26] It employs dynamic window sizes to improve system call modeling System call modeling andsequence analysis

Valdes et al. [30] It employs Bayesian inference techniques to determine if traffic burstscontain attacks. It also provides distributed attack detection capability

Bayesian networks basedanomaly detection

Shyu et al. [36] It uses principal component analysis as an outlier detection scheme todetect intrusions

Principal componentanalysis based anomalydetection

Yeung et al. [40] It presents a dynamic modeling approach based on hidden Markovmodels and a static modeling approach based on event occurrencefrequency distribution for modeling system calls. It showed that thedynamic modeling approach is better for system call datasets

Host based anomalydetection system

PHAD [41] Examines IP headers, connections to various ports as well as packetheaders of Ethernet, IP and transport layer and other transport layerpacket headers

Network based anomalydetection

ALAD [43] It detects anomalies in inbound TCP connections to well knownports on the server


LERAD [42] It detects TCP stream anomalies like ALAD, but uses a learningalgorithm to pick good rules from the training set, rather than usinga fixed set of rules



the drawbacks they are difficult to maintain, and insome cases, are inadequate to represent many typesof information. A number of inductive rule genera-tion algorithms have been proposed in literature.Some of them first construct a decision tree and thenextract a set of classification rules from the decisiontree.3 Other algorithms (for e.g., RIPPER [25], C4.5[49]) directly induce rules from the data by employ-ing a divide-and-conquer approach. A post learningstage involving either discarding (C4.5 [49]) or prun-ing (RIPPER [25]) some of the learnt rules is carriedout to increase the classifier accuracy. RIPPER hasbeen successfully used in a number of data miningbased anomaly detection algorithms to classifyincoming audit data and detect intrusions. One of

3 Decision trees are powerful and popular tools for classifica-tion and prediction. The attractiveness of tree-based methods isdue in large part to the fact that, in contrast to neural networks,decision trees represent rules. A decision tree is a tree that hasthree main components: nodes, arcs, and leaves. Each node islabeled with a feature attribute which is most informative amongthe attributes not yet considered in the path from the root, eacharc out of a node is labeled with a feature value for the node’sfeature and each leaf is labeled with a category or class.A decision tree can then be used to classify a data point bystarting at the root of the tree and moving through it until a leafnode is reached. The leaf node would then provide the classifi-cation of the data point.

the primary advantages of using RIPPER is thatthe generated rules are easy to use and verify. Leeet al. [45,46,50] used RIPPER to characterizesequences occurring in normal data by a smallerset of rules that capture the common elements inthose sequences. During monitoring, sequences vio-lating those rules are treated as anomalies.

Fuzzy logic techniques have been in use in thearea of computer and network security since the late1990’s [51]. Fuzzy logic has been used for intrusiondetection for two primary reasons [52]. Firstly, sev-eral quantitative parameters that are used in thecontext of intrusion detection, e.g., CPU usage time,connection interval, etc., can potentially be viewedas fuzzy variables. Secondly, as stated by Bridgeset al. [52], the concept of security itself is fuzzy. Inother words, the concept of fuzziness helps tosmooth out the abrupt separation of normal behav-ior from abnormal behavior. That is, a given datapoint falling outside/inside a defined ‘‘normal inter-val’’, will be considered anomalous/normal to thesame degree regardless of its distance from/withinthe interval. Dickerson et al. [53] developed theFuzzy Intrusion Recognition Engine (FIRE) usingfuzzy sets and fuzzy rules. FIRE uses simple datamining techniques to process the network input dataand generate fuzzy sets for every observed feature.The fuzzy sets are then used to define fuzzy rules

4 A self organizing map is a method for unsupervised learningbased on a grid of artificial neurons whose weights are adapted tomatch input vectors in a training set.


to detect individual attacks. FIRE does not estab-lish any sort of model representing the current stateof the system, but instead relies on attack specificrules for detection. Instead, FIRE creates andapplies fuzzy logic rules to the audit data to classifyit as normal or anomalous. Dickerson et al. foundthat the approach is particularly effective againstport scans and probes. The primary disadvantageto this approach is the labor intensive rule genera-tion process.

Genetic algorithms, a search technique used tofind approximate solutions to optimization andsearch problems, have also been extensivelyemployed in the domain of intrusion detection todifferentiate normal network traffic from anomalousconnections. The major advantage of genetic algo-rithms is their flexibility and robustness as a globalsearch method. In addition, a genetic algorithmsearch converges to a solution from multiple direc-tions and is based on probabilistic rules instead ofdeterministic ones. In the domain of network intru-sion detection, genetic algorithms have been used ina number of ways. Some approaches [54,55] haveused genetic algorithms directly to derive classifica-tion rules, while others [52,56] use genetic algo-rithms to select appropriate features or determineoptimal parameters of related functions, while dif-ferent data mining techniques are then used toacquire the rules. The earliest attempt to applygenetic algorithms to the problem of intrusiondetection was done by Crosbie and Spafford [57]in 1995, when they applied multiple agent technol-ogy to detect network based anomalies. While theadvantage of the approach was that it used numer-ous agents to monitor a variety of network basedparameters, lack of intra-agent communicationand a lengthy training process were some issues thatwere not addressed.

Neural network based intrusion detection systemshave traditionally been host based systems thatfocus on detecting deviations in program behavioras a sign of an anomaly. In the neural networkapproach to intrusion detection, the neural networklearns to predict the behavior of the various usersand daemons in the system. The main advantageof neural networks is their tolerance to imprecisedata and uncertain information and their ability toinfer solutions from data without having priorknowledge of the regularities in the data. This incombination with their ability to generalize fromlearned data has shown made them an appropriateapproach to intrusion detection. However, the neu-

ral network based solutions have several drawbacks.Firstly, they may fail to find a satisfactory solutioneither because of lack of sufficient data or becausethere is no learnable function. Secondly, neural net-works can be slow and expensive to train. The lackof speed is partly because of the need to collect andanalyze the training data and partly because theneural network has to manipulate the weights ofthe individual neurons to arrive at the correct solu-tion. There are a few different groups advocatingvarious approaches to using neural networks forintrusion detection. Ghosh et al. [58–60] used thefeed-forward back propagation and the Elmanrecurrent network [61] for classifying system-callsequences. Their experimental results with the1998 and 1999 DARPA intrusion detection evalua-tion dataset verified that the application of Elmannetworks in the domain of program-based intrusiondetection provided superior results as compared tousing the standard multilayer perceptron based neu-ral network. However, training the Elman networkwas expensive and the number of neural networksrequired was large. In another paper, Ramadaset al. [62] present the Anomalous Network-TrafficDetection with Self Organizing Maps (ANDSOM).ANDSOM is the anomaly detection module forthe network based intrusion detection system, calledINBOUNDS, being developed at Ohio University.The ANDSOM module creates a two dimensionalSelf Organizing Map or SOM4 for each network ser-vice that is being monitored. In the paper, theauthors test the proposed methodology using theDNS and HTTP services. Neurons are trained withnormal network traffic during the training phase tocapture characteristic patterns. When real time datais fed to the trained neurons, then an anomaly isdetected if the distance of the incoming traffic ismore than a preset threshold.

Anomaly detection schemes also involve otherdata mining techniques such as support vectormachines (SVM) and other types of neural networkmodels [63,64]. Because data mining techniques aredata driven and do not depend on previouslyobserved patterns of network/system activity, someof these techniques have been very successful atdetecting new kinds of attacks. However, these tech-niques often have a very high false positive rate. Forexample, as pointed out in [65], the approach

6 The basic Euclidian distance treats each variable as equallyimportant in calculating the distance. An alternative approach isto scale the contribution of individual variables to the distancevalue according to the variability of each variable. This approachis illustrated by the Mahalanobis distance which is a measure ofthe distance between each observation in a multidimensionalcloud of points and the centroid of the cloud. In other words, the


adopted by Sung and Mukkamala [66] that use aSVM technique to realize an intrusion detection sys-tem for class-specific detection is flawed becausethey totally ignore the relationships and dependen-cies between the features.

3.2.3.2. Clustering and outlier detection. Clustering isa technique for finding patterns in unlabeled datawith many dimensions.5 Clustering has attractedinterest from researchers in the context of intrusiondetection [67–69]. The main advantage that cluster-ing provides is the ability to learn from and detectintrusions in the audit data, while not requiringthe system administrator to provide explicit descrip-tions of various attack classes/types. As a result, theamount of training data that needs to be providedto the anomaly detection system is also reduced.Clustering and outlier detection are closely related.From the viewpoint of a clustering algorithm, outli-ers are objects not located in the clusters of a dataset, and in the context of anomaly detection, theymay represent intrusions/attacks. The statisticscommunity has studied the concept of outliers quiteextensively [70]. In these studies, data points aremodeled using a stochastic distribution and pointsare determined to be outliers based on their rela-tionship with this model. However, with increasingdimensionality, it becomes increasingly difficult toaccurately estimate the multidimensional distribu-tions of the data points [71]. Recent outlier detec-tion algorithms [68,72,73] are based on the fulldimensional distances between the points as wellas the densities of local neighborhoods.

There exist at least two approaches to clusteringbased anomaly detection. In the first approach, theanomaly detection model is trained using unlabelleddata that consists of both normal as well as attacktraffic. In the second approach, the model is trainedusing only normal data and a profile of normalactivity is created. The idea behind the firstapproach is that anomalous or attack data formsa small percentage of the total data. If this assump-tion holds, anomalies and attacks can be detectedbased on cluster sizes—large clusters correspondto normal data, and the rest of the data points,which are outliers, correspond to attacks.

The distances between points play an importantrole in clustering. The most popular distance metricis the Euclidean distance. Since each feature con-

5 The number of dimensions is equivalent to the number ofattributes.

tributes equally to the calculation of the Euclideandistance, this distance is undesirable in many appli-cations. This is especially true when features havevery different variability or different features aremeasured on different scales. The effect of the fea-tures that have large scales of measurement or highvariability would dominate others that have smallerscales or less variability. A distance-based approachwhich incorporates this measure of variability is theMahalanobis distance6 [74,75] based outlier detec-tion scheme. In this scheme, the threshold is com-puted according to the most distant points fromthe mean of the ‘‘normal’’ data, and it is set to bea user defined percentage, say m%, of the total num-ber of points. In this scheme, all data points in theaudit data that have distances to the mean (calcu-lated during the training phase) greater than thethreshold are detected as outliers. The Mahalanobisdistance utilizes group means and variances foreach variable, and the correlations and covariancesbetween measures.

The notion of distance-based outliers wasrecently introduced in the study of databases [68].According to this notion, a point, P, in a multidi-mensional data set is an outlier if there are less thanp points from the data in the e-neighborhood of P,where p is a user-specified constant. Ramaswamyet al. [68] described an approach that is based oncomputing the Euclidean distance of the kth nearestneighbor from a point O. In other words, thek-nearest neighbor algorithm classifies points byassigning them to the class that appears most fre-quently amongst the k nearest neighbors. Therefore,for a given point O, dk(O) denotes the Euclidean dis-tance from the point O to its kth nearest neighborand can be considered as the ‘‘degree of outlierness’’of O. If one is interested in the top n outliers, thisapproach defines an outlier as follows: Given valuesfor k and n, a point O is an outlier, if the distance toits kth nearest neighbor is smaller than the corre-sponding value for no more than (n � 1) otherpoints. In other words, the top n outliers with the

Mahalanobis distance between a particular point x and themean l of the ‘‘normal data’’ is computed as: DM ðxÞ ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðx� lÞT

P�1ðx� lÞq

, whereP

is the covariance matrix.


maximum Dk(O) values are considered as outliers.While the advantage of the k-nearest neighborsapproach is that it is robust to noisy data, theapproach suffers from the drawback that it is verydifficult to choose an optimal value for k in practice.Several papers [76,77] have proposed using thek-nearest neighbor outlier detection algorithms forthe purpose of anomaly detection.

The Minnesota Intrusion Detection System(MINDS) [78] is another network-based anomalydetection approach that utilizes data mining tech-niques. The MINDS anomaly detection moduleassigns a degree of outlierness to each data point,which is called the local outlier factor (LOF) [72].The LOF takes into consideration the density ofthe neighborhood around the observation point todetermine its outlierness. In this scheme, outliersare objects that tend to have high LOF values.The advantage of the LOF algorithm is its abilityto detect all forms of outliers, including thosethat cannot be detected by the distance-basedalgorithms.

3.2.3.3. Association rule discovery. Association rules[79,80] are one of many data mining techniques thatdescribe events that tend to occur together. The con-cept of association rules can be understood asfollows: Given a database D of transactions whereeach transaction T 2 D denotes a set of items inthe database, an association rule is an implicationof the form X) Y, where X � D, Y � D andX \ Y = ;. The rule X) Y holds in the transactionset Dwith confidence c if c% of transactions in X alsocontain Y. Two important concepts when dealingwith association rules are rule confidence and rule

Offline Single & Domain Level Mining

Online Single & Domain Level Mining

Feature Selection

TrainingData

Attack FreeTraining Data

SuspiciousItem Sets

Features

Fig. 4a. The training pha

support. The probability of rule confidence isdefined as the conditional probability P(Y �TjX � T) The rule X) Y has support s in the trans-action database D if s% of transactions in D containX [ Y. Association rules have been successfully usedto mine audit data to find normal patterns foranomaly detection [46,50,81]. They are particularlyimportant in the domain of anomaly detectionbecause association rules can be used to constructa summary of anomalous connections detected bythe intrusion detection system. There is evidencethat suggests program executions and user activitiesexhibit frequent correlations among system features.These consistent behaviors can be captured in asso-ciation rules.

Lee et al. [46,50] proposed an association rule-based data mining approach for anomaly detectionwhere raw data was converted into ASCII networkpacket information, which in turn was convertedinto connection-level information. These connectionlevel records contained connection features like ser-vice, duration, etc. Association rules were thenapplied to this data to create models to detect intru-sions. In another paper, Barbara et al. describeAudit Data Analysis and Mining (ADAM), a real-time anomaly detection system that uses a moduleto classify the suspicious events into false alarmsor real attacks. ADAM’s training and intrusiondetection phases are illustrated in Figs. 4a and 4b,respectively. ADAM was one out of seven systemstested in the 1999 DARPA evaluation [82]. It usesdata mining to build a customizable profile of rulesof normal behavior and then classifies attacks(by name) or declares false alarms. To discoverattacks in TCPdump audit trail, ADAM uses a

Profile

Classify asAttacks or

False AlarmsClassifier Builder

Training

se of ADAM [81].

Offline Single & Domain Level Mining

Profile

Classify asAttacks or

False Alarms

Test Data

SuspiciousItem Sets

Feature Selection

False Alarms

Security Officer

Known / Unknown Attacks

Fig. 4b. The intrusion detection phase of ADAM [81].

Table 3A summary of data mining-based anomaly detection systems

Reference Highlighting features Methodology

Lee et al. [46] It uses inductive rule generation to generate rules for important,yet infrequent events

Classification based anomalydetection

FIRE [53] It generates fuzzy sets for every observed feature which are in turnused to define fuzzy rules to detect individual attacks


ANDSOM [62] It uses a pre-processor to summarize certain connection parameters(source and destination host and port) and then adds several valuesto track and classify the connection’s behavior


MINDS [78] It clusters data and uses the density-based local outliers to detectintrusions

Clustering based anomalydetections

ADAM [81] It performs anomaly detection to filter out most of the normal traffic,then it uses a classification technique to determine the exact natureof the remaining activity

Association rules and classificationbased anomaly detection


combination of association rules, mining and classi-fication. During the training phase, ADAM builds adatabase of ‘‘normal’’ frequent itemsets using attackfree data. Then it runs a sliding window online algo-rithm that finds frequent item sets in the last D con-nections and compares them with those stored in thenormal item set repository. With the remaining itemsets that have deemed suspicious, ADAM uses aclassifier which has previously been trained to clas-sify the suspicious connections as a known attack,unknown attack, or a false alarm. Association rulesare used to gather necessary knowledge about thenature of the audit data. If the item set’s supportsurpasses a threshold, then that item set is reportedas suspicious. The system annotates suspicious itemsets with a vector of parameters. Since the systemknows where the attacks are in the training set,the corresponding suspicious item set along withtheir feature vectors are used to train a classifier.The trained classifier will be able to, given a suspi-

cious item set and a vector of features, classify theitem set as a known attack (and label it with thename of attack), an unknown attack, or a falsealarm. The highlighting features of some of theschemes surveyed in this section are presented inTable 3.

4. Hybrid systems

It has been suggested in the literature [14,15,32,83,84] that the monitoring capability of currentintrusion detection systems can be improved by tak-ing a hybrid approach that consists of both anomalyas well as signature detection strategies. In such ahybrid system, the anomaly detection technique aidsin the detection of new or unknown attacks whilethe signature detection technique detects knownattacks. The signature detection technique will alsobe able to detect attacks launched by a patientattacker who attempts to change the behavior


patterns with the objective of retraining the anom-aly detection module so that it will accept attackbehavior as normal. Tombini et al. [83] used anapproach wherein the anomaly detection techniqueis used to produce a list of suspicious items. Theclassifier module which uses a signature detectiontechnique then classified the suspicious items intofalse alarms, attacks, and unknown attacks. Thisapproach works on the premise that the anomalydetection component would have a high detectionrate, since missed intrusions cannot be detected bythe follow-up signature detection component. Inaddition, it also assumed that the signature detec-tion component will be able to identify falsealarms. While the hybrid system can still miss cer-tain types of attacks, its reduced false alarm rateincreases the likelihood of examining most of thealerts.

EMERALD [32] was developed in the late 1990’sat SRI. It is an extension of the seminal work donein [8,13–15]. EMERALD is a hierarchical intrusiondetection system that monitors systems at a varietyof levels viz. individual host machines, domains andenterprises to form an analysis hierarchy. EMER-ALD uses a subscription-based communicationscheme both within and between monitors. How-ever, inter monitor subscription methodology ishierarchical and therefore limits the access to eventsand/or results from the layer immediately below.The system has a built-in feedback system thatenables the higher layers to request more informa-tion about particular anomalies from the lowerlayers. To achieve a high rate of detection, the archi-tects of EMERALD employed an ensemble of tech-niques like statistical analysis engines and expertsystems. The single most defining feature of EMER-ALD is its ability to analyze system-wide, domain-wide and enterprise-wide attacks like Internetworms, DDoS attacks, etc. at the top level.

In another paper [84], Zhang et al. [85] employedthe random forests algorithm in the signature detec-tion module to detect known intrusions. Thereafter,the outlier detection provided by the random forestsalgorithm is utilized to detect unknown intrusions.Approaches that use signature detection and anom-aly detection in parallel have also been proposed. Insuch systems, two sets of reports of possible intru-sive activity are produced and a correlation compo-nent analyzes both sets to detect intrusions. Anexample of such a system is NIDES [15,16].

Lee et al. [45,86] extended the work done by themin [50] and proposed a hybrid detection scheme that

utilized the Common Intrusion Detection Frame-work (CIDF) to automatically get audit data, buildmodels, and distribute signatures for novel attacksto ensure that the time required to detect them isreduced. The advantage of using CIDF was that itenabled different intrusion detection and responsecomponents to interoperate and share the informa-tion and resources in a distributed environment.

Although it is true that combining multiple intru-sion detection technologies into a single system cantheoretically produce a much stronger intrusiondetection system, the resulting hybrid systems arenot always better. Different intrusion detection tech-nologies examine system and/or network traffic andlook for intrusive activity in different ways. There-fore, the major challenge to building an operationalhybrid intrusion detection system is getting thesedifferent technologies to interoperate effectivelyand efficiently.

5. The road ahead: open challenges

In the last twenty years, intrusion detection sys-tems have slowly evolved from host- and operatingsystem-specific applications to distributed systemsthat involve a wide array of operating systems.The challenges that lie ahead for the next generationof intrusion detection systems and, more specifi-cally, for anomaly detection systems are many. Firstand foremost, traditional intrusion detection sys-tems have not adapted adequately to new network-ing paradigms like wireless and mobile networksnor have they scaled to meet the requirements posedby high-speed (gigabit and terabit) networks (ananalysis of intrusion detection techniques for high-speed networks can be found in [87,88]). Factorslike noise in the audit data, constantly changingtraffic profiles, and the large amount of networktraffic make it difficult to build a normal traffic pro-file of a network for the purpose of intrusion detec-tion. The implication is that, short of somefundamental re-design, today’s intrusion detectionapproaches will not be able to adequately pro-tect tomorrow’s networks against intrusions andattacks. Therefore, the design methodology of intru-sion detection systems needs to closely follow thechanges in system and networking technologies.

A perennial problem that prevents widespreaddeployment of intrusion detection systems is theirinability to suppress false alarms. It has been shownin lab testing [89] that state of the art intrusiondetection systems often crash under the burden of


the false alarms that they generate. When actualattacks do occur, either they are missed completelyby the intrusion detection system or the traces leftby the intruder and/or the traces indicating theactual attack are lost in amongst the large numberof false alarms in the event logs. To make mattersworse, Axelsson [90], showed that for an intrusiondetection system to be effective the number of falsealarms have to be really low. In his paper, Axelssonsuggested that 1 false alarm in 100,000 events wasthe minimum requirement for an intrusion detectionsystem to be effective. Therefore, the primary andprobably the most important challenge that needsto be met is the development of effective strategiesto reduce the high rate of false alarms.

Over the years, numerous techniques, models, andfull-fledged intrusion detection systems have beenproposed and built in the commercial and researchsectors. However, there is no globally acceptablestandard/metric for evaluating an intrusion detec-tion system. Although the Receiver Operating Char-

acteristic (ROC) curve has been widely used toevaluate the accuracy of intrusion detection systemsand analyze the tradeoff between the false positivesrate and the detection rate, evaluations based onthe ROC curve are often misleading and/or incom-plete [91,92]. Recently, several methods have beenproposed to address this issue [90–92]. However,most, if not all, of the proposed solutions rely onparameters values (such as the cost associated witheach false alarm or missed attack instance) that aredifficult to obtain and are subjective to a particularnetwork or system. As a result, such metrics maylack the objectivity required to conduct a fair evalu-ation of a given system. Therefore, one of the openchallenges is the development of a general systematicmethodology and/or a set of metrics that can be usedto fairly evaluate intrusion detection systems.

There is a lack of a standard evaluation datasetthat can simulate realistic network environments.While the 1998 and 1999 intrusion detection evalu-ations from DARPA/MIT Lincoln labs have beenused to evaluate a large number of intrusion detec-tion systems, the methodology used to generate thedata as well as the data itself have been shown to beinappropriate for simulating actual network envi-ronments [93]. Therefore, there is a critical need tobuild a more appropriate evaluation dataset. Themethodology for generating the evaluation datasetshould not only simulate realistic network condi-tions but also be able to generate datasets that havenormal traffic interlaced with anomalous traffic.

An important aspect of intrusion detection,which has also been proposed as an evaluation met-ric [94–96], is the ability of an intrusion detectionsystem to defend itself from attacks. Attacks onintrusion detection systems can take several forms.As a specific example, consider an attacker sendinga large volume of non-attack packets that are spe-cially crafted to trigger many alarms within anintrusion detection system, thereby overwhelmingthe human operator with false positives or crashingthe alert processing or display tools. Axelsson [9], inhis 1998 survey of intrusion detection systems,found that a majority of the available intrusiondetection systems at that time performed verypoorly when it came to defending themselves fromattacks. Since then, the ability of intrusion detectionsystems at defending themselves from attacks hasimproved only marginally.

Another problem that still poses a major chal-lenge is trying to define what is ‘‘normal’’ in a net-work. As mentioned in [97], there is a need for thediscovery of ‘‘attack invariant’’ features. An attackinvariant would be a feature of the network/systemthat can always be verified except in the presence ofan attack. Examples include traffic volume, numberof connections to non standard ports, etc. In orderto define such attack invariants, it is imperative thata better understanding of the nature of anomalies inthe network and/or host is gained.

Traditionally, encryption has been a preferredmethodology for securing data and preventing mali-cious users from getting access to privileged/privateinformation. However, the widespread use ofencryption implies that network administratorshave a limited view of the network as traditionalintrusion detection systems do not have the abilityto decrypt the encrypted packets that they intercept.When an intrusion detection system intercepts anencrypted packet, it typically discards it, whichresults in greatly limiting the amount of traffic thatit is capable of inspecting. Therefore, the challengefor security researchers is the development of secu-rity mechanisms that provide data security whilenot limiting the functions of intrusion detectionsystems.

An increasing problem in today’s corporate net-works is the threats posed by insiders, viz., disgrun-tled employees. In a survey [98] conducted by theUnited States Secret Service and CERT of CarnegieMellon University, 71% of respondents out of 500participants reported that 29% of the attacks thatthey experienced were caused by insiders. Respon-


dents identified current or former employees andcontractors as the second greatest cyber securitythreat, preceded only by hackers. Configuring anintrusion detection system to detect internal attacksis very difficult. The greatest challenge lies in creat-ing a good rule set for detecting ‘‘internal’’ attacksor anomalies. Different network users require differ-ent degrees of access to different services, servers,and systems for their work, thus making it extre-mely difficult to define and create user- or system-specific usage profiles. Although there is someexisting work in this area (e.g., [99,100]), moreresearch is needed to find practical solutions.

References

[1] E. Millard, Internet attacks increase in number, severity, in:Top Tech News, 2005.

[2] J. Phillips, Hackers’ invasion of OU data raises blizzard ofquestions, in: The Athens News Athens, OH, 2006.

[3] C. Staff, Hackers: companies encounter rise of cyberextortion, vol. 2006, Computer Crime Research Center,2005.

[4] M. Williams, Immense network assault takes down Yahoo,in: CNN.COM, 2000.

[5] C.S. Institute, F.B.o. Investigation, in: Proceedings of the10th Annual Computer Crime and Security Survey 10,2005, pp. 1–23.

[6] S. Axelsson, Intrusion Detection Systems: A Survey andTaxonomy, Chalmers University, Technical Report 99-15,March 2000.

[7] J.P. Anderson, Computer security threat monitoring andsurveillance, James P Anderson Co., Fort, Washington,PA, USA, Technical Report 98-17, April 1980.

[8] D.E. Denning, An intrusion-detection model, IEEE Trans-actions in Software Engineering 13 (1987) 222–232.

[9] S. Axelsson, Research in intrusion-detection systems: asurvey, Department of Computer Engineering, ChalmersUniversity of Technology, Goteborg, Sweden, TechnicalReport 98-17, December 1998.

[10] S. Kumar, E.H. Spafford, An application of patternmatching in intrusion detection, The COAST Project,Department of Computer Sciences, Purdue University,West Lafayette, IN, USA, Technical Report CSD-TR-94-013, June 17, 1994.

[11] S.E. Smaha, Haystack: An intrusion detection system, in:Proceedings of the IEEE Fourth Aerospace ComputerSecurity Applications Conference, Orlando, FL, 1988, pp.37–44.

[12] D. Anderson, T. Frivold, A. Valdes, Next-generationIntrusion Detection Expert System (NIDES): A Summary,Computer Science Laboratory, SRI International, MenloPark, CA 94025, Technical Report SRI-CSL-95-07, May1995.

[13] D.E. Denning, P.G. Neumann, Requirements and Modelfor IDES—A Real-time Intrusion Detection System,Computer Science Laboratory, SRI International, MenloPark, CA 94025-3493, Technical Report # 83F83-01-00,1985.

[14] T.F. Lunt, A. Tamaru, F. Gilham, R. Jagannathm, C. Jalali,P.G. Neumann, H.S. Javitz, A. Valdes, T.D. Garvey, AReal-time Intrusion Detection Expert System (IDES),Computer Science Laboratory, SRI International,Menlo Park, CA, USA, Final Technical Report, February1992.

[15] D. Anderson, T. Frivold, A. Tamaru, A. Valdes, Next-generation intrusion detection expert system (NIDES),Software Users Manual, Beta-Update release, ComputerScience Laboratory, SRI International, Menlo Park, CA,USA, Technical Report SRI-CSL-95-0, May 1994.

[16] D. Anderson, T.F. Lunt, H. Javitz, A. Tamaru, A. Valdes,Detecting Unusual Program Behavior Using the StatisticalComponent of the Next-generation Intrusion DetectionExpert System (NIDES), Computer Science Laboratory,SRI International, Menlo Park, CA, USA SRI-CSL-95-06,May 1995.

[17] S. Staniford, J.A. Hoagland, J.M. McAlerney, Practicalautomated detection of stealthy portscans, Journal ofComputer Security 10 (2002) 105–136.

[18] M. Roesch, Snort – lightweight intrusion detection fornetworks, in: Proceedings of the 13th USENIX Conferenceon System Administration Seattle, Washington, 1999, pp.229–238.

[19] N. Ye, S.M. Emran, Q. Chen, S. Vilbert, Multivariatestatistical analysis of audit trails for host-based intrusiondetection, IEEE Transactions on Computers 51 (2002) 810–820.

[20] C. Krugel, T. Toth, E. Kirda, Service specific anomalydetection for network intrusion detection, in: Proceedingsof the 2002 ACM symposium on Applied computingMadrid, Spain 2002, pp. 201–208.

[21] R.A. Maxion, F.E. Feather, A case study of Ethernetanomalies in a distributed computing environment, IEEETransactions on Reliability 39 (1990) 433–443.

[22] W. Lee, D. Xiang, Information theoretic measures foranomaly detection, in: Proceedings of the 2001 IEEESymposium on Security and Privacy, Washington, DC,USA, 2001, pp. 130–143.

[23] S. Forrest, S.A. Hofmeyr, A. Somayaji, T.A. Longstaff, Asense of self for unix processes, in: Proceedings of the IEEESymposium on Research in Security and Privacy, Oakland,CA, USA, 1996, pp. 120–128.

[24] S.A. Hofmeyr, S. Forrest, A. Somayaji, Intrusion detectionusing sequences of system calls, Journal of ComputerSecurity 6 (1998) 151–180.

[25] W.W. Cohen, Fast effective rule induction, in: Proceedingsof the 12th International Conference on Machine Learning,Tahoe City, CA, 1995, pp. 115–123.

[26] E. Eskin, S.J. Stolfo, W. Lee, Modeling system calls forintrusion detection with dynamic window sizes, in: Pro-ceedings of the DARPA Information Survivability Confer-ence & Exposition II, Anaheim, CA 2001, pp. 165–175.

[27] C. Warrender, S. Forrest, B. Pearlmutter, Detecting intru-sions using system calls: alternative data models, in:Proceedings of the IEEE Symposium on Security andPrivacy, Oakland, CA, USA, 1999, pp. 133–145.

[28] D. Heckerman, A Tutorial on Learning With BayesianNetworks, Microsoft Research, Technical Report MSR-TR-95-06, March 1995.

[29] C. Kruegel, D. Mutz, W. Robertson, F. Valeur, Bayesianevent classification for intrusion detection, in: Proceedings


of the 19th Annual Computer Security Applications Con-ference, Las Vegas, NV, 2003.

[30] A. Valdes, K. Skinner, Adaptive model-based monitoringfor cyber attack detection, in: Recent Advances in IntrusionDetection Toulouse, France, 2000, pp. 80–92.

[31] N. Ye, M. Xu, S.M. Emran, Probabilistic networks withundirected links for anomaly detection, in: Proceedings ofthe IEEE Systems, Man, and Cybernetics InformationAssurance and Security Workshop, West Point, NY, 2000.

[32] P.A. Porras, P.G. Neumann, EMERALD: event monitor-ing enabling responses to anomalous live disturbances, in:Proceedings of the 20th NIST-NCSC National InformationSystems Security Conference, Baltimore, MD, USA, 1997,pp. 353–365.

[33] R.A. Calvo, M. Partridge, M.A. Jabri, A comparative studyof principal component analysis techniques, in: Proceedingsof the Ninth Australian Conference on Neural Networks,Brisbane, Qld, Australia, 1998.

[34] H. Hotelling, Analysis of a complex of statistical variablesinto principal components, Journal of Educational Psy-chology 24 (1993) 417–441, 498–520.

[35] W. Wang, R. Battiti, Identifying intrusions in computernetworks with principal component analysis, in: The FirstInternational Conference on Availability, Reliability andSecurity, Vienna, Austria, 2006, pp. 270–279.

[36] M.-L. Shyu, S.-C. Chen, K. Sarinnapakorn, L. Chang,A novel anomaly detection scheme based on principalcomponent classifier, in: Proceedings of the IEEE Founda-tions and New Directions of Data Mining Workshop,Melbourne, FL, USA, 2003, pp. 172–179.

[37] Y. Bouzida, F.e.e. Cuppens, N. Cuppens-Boulahia,S. Gombault, Efficient intrusion detection using principalcomponent analysis, in: Proceedings of the 3eme Confe-rence sur la Securite et Architectures Reseaux (SAR),Orlando, FL, USA, 2004.

[38] W. Wang, X. Guan, X. Zhang, A novel intrusion detectionmethod based on principle component analysis in computersecurity, in: Proceedings of the International Symposium onNeural Networks, Dalian, China, 2004, pp. 657–662.

[39] N. Ye, Y.Z.C.M. Borror, Robustness of the Markov-chainmodel for cyber-attack detection, IEEE Transactions onReliability 53 (2004) 116–123.

[40] D.-Y. Yeung, Y. Ding, Host-based intrusion detectionusing dynamic and static behavioral models, PatternRecognition 36 (2003) 229–243.

[41] M.V. Mahoney, P.K. Chan, PHAD: Packet HeaderAnomaly Detection for Identifying Hostile Network TrafficDepartment of Computer Sciences, Florida Institute ofTechnology, Melbourne, FL, USA, Technical Report CS-2001-4, April 2001.

[42] M.V. Mahoney, P.K. Chan, Learning Models of NetworkTraffic for Detecting Novel Attacks Computer ScienceDepartment, Florida Institute of Technology CS-2002-8,August 2002.

[43] M.V. Mahoney, P.K. Chan, Learning nonstationary mod-els of normal network traffic for detecting novel attacks, in:Proceedings of the Eighth ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining,Edmonton, Canada, 2002, pp. 376–385.

[44] R. Lippmann, J.W. Haines, D.J. Fried, J. Korba, K. Das,The 1999 DARPA off-line intrusion detection evaluation,Computer Networks 34 (2000) 579–595.

[45] W. Lee, R.A. Nimbalkar, K.K. Yee, S.B. Patil, P.H. Desai,T.T. Tran, S.J. Stolfo, A data mining and CIDF basedapproach for detecting novel and distributed intrusions, in:Proceedings of the 3rd International Workshop on RecentAdvances in Intrusion Detection (RAID 2000), Toulouse,France, 2000, pp. 49–65.

[46] W. Lee, S.J. Stolfo, Data mining approaches for intrusiondetection, in: Proceedings of the 7th USENIX SecuritySymposium (SECURITY-98), Berkeley, CA, USA, 1998,pp. 79–94.

[47] W. Lee, S.J. Stolfo, K.W. Mok, Adaptive intrusiondetection: a data mining approach, Artificial IntelligenceReview 14 (2000) 533–567.

[48] R. Grossman, Data Mining: Challenges and Opportunitiesfor Data Mining During the Next Decade, 1997.

[49] J.R. Quinlan, C4.5: Programs for Machine Learning,Morgan Kaufman, Los Altos, CA, 1993.

[50] W. Lee, S.J. Stolfo, K.W. Mok, A data mining frameworkfor building intrusion detection models, in: Proceedings ofthe IEEE Symposium on Security and Privacy, Oakland,CA, 1999, pp. 120–132.

[51] H.H. Hosmer, Security is fuzzy!: applying the fuzzy logicparadigm to the multipolicy paradigm, in: Proceedings ofthe 1992–1993 Workshop on New Security Paradigms LittleCompton, RI, United States, 1993.

[52] S.M. Bridges, R.B. Vaughn, Fuzzy data mining and geneticalgorithms applied to intrusion detection, in: Proceedings ofthe National Information Systems Security Conference,Baltimore, MD, 2000.

[53] J.E. Dickerson, J.A. Dickerson, Fuzzy network profilingfor intrusion detection, in: Proceedings of the 19th Inter-national Conference of the North American Fuzzy Infor-mation Processing Society (NAFIPS), Atlanta, GA, 2000,pp. 301–306.

[54] W. Li, Using Genetic Algorithm for Network IntrusionDetection, C.S.G. Department of Energy, 2004, pp. 1–8.

[55] M.M. Pillai, J.H.P. Eloff, H.S. Venter, An approach toimplement a network intrusion detection system usinggenetic algorithms, in: Proceedings of the 2004 AnnualResearch Conference of the South African Institute ofComputer Scientists and Information Technologists on ITResearch in Developing Countries, Stellenbosch, WesternCape, South Africa, 2004, pp. 221–228.

[56] J. Gomez, D. Dasgupta, Evolving fuzzy classifiers forintrusion detection, in: IEEE Workshop on InformationAssurance, United States Military Academy, NY, 2001.

[57] M. Crosbie, G. Spafford, Applying genetic programming tointrusion detection, in: Working Notes for the AAAISymposium on Genetic Programming, Cambridge, MA,1995, pp. 1–8.

[58] A.K. Ghosh, C. Michael, M. Schatz, A real-time intrusiondetection system based on learning program behavior, in:Proceedings of the Third International Workshop onRecent Advances in Intrusion Detection Toulouse, France,2000, pp. 93–109.

[59] A.K. Ghosh, A. Schwartzbard, A study in using neuralnetworks for anomaly and misuse detection, in: Proceedingsof the Eighth USENIX Security Symposium, Washington,DC, 1999, pp. 141–151.

[60] A.K. Ghosh, A. Schwartzbart, M. Schatz, Learningprogram behavior profiles for intrusion detection, in:Proceedings of the 1st USENIX Workshop on Intrusion


Detection and Network Monitoring, Santa Clara, CA,USA, 1999.

[61] J.L. Elman, Finding structure in time, Cognitive Science 14(1990) 179–211.

[62] M. Ramadas, S.O.B. Tjaden, Detecting anomalous networktraffic with self-organizing maps, in: Proceedings of the 6thInternational Symposium on Recent Advances in IntrusionDetection, Pittsburgh, PA, USA, 2003, pp. 36–54.

[63] W. Lee, S.J. Stolfo, P.K. Chan, E. Eskin, W. Fan, M.Miller, S. Hershkop, J. Zhang, Real time data mining-basedintrusion detection, in: Proceedings of the Second DARPAInformation Survivability Conference and Exposition,Anaheim, CA, 2001, pp. 85–100.

[64] K.M.C. Tan, R.A. Maxion, Determining the operationallimits of an anomaly-based intrusion detector, IEEE Journalon Selected Areas in Communication 2 (2003) 96–110.

[65] S.T. Sarasamma, Q.A. Zhu, J. Huff, Hierarchical Kohone-nen net for anomaly detection in network security, IEEETransactions on Systems, Man and Cybernetics—PART B:Cybernetics 35 (2005) 302–312.

[66] A.H. Sung, S. Mukkamala, Identifying important featuresfor intrusion detection using support vector machines andneural networks, in: Proceedings of the 2003 Symposium onApplications and the Internet 2003, pp. 209–216.

[67] L. Portnoy, E. Eskin, S.J. Stolfo, Intrusion detection withunlabeled data using clustering, in: Proceedings of theACM Workshop on Data Mining Applied to Security,Philadelphia, PA, 2001.

[68] S. Ramaswamy, R. Rastogi, K. Shim, Efficient algorithmsfor mining outliers from large data sets, in: Proceedings ofthe ACM SIGMOD International Conference on Manage-ment of Data, Dallas, TX, USA, 2000, pp. 427–438.

[69] K. Sequeira, M. Zaki, ADMIT: Anomaly-based datamining for intrusions, in: Proceedings of the 8th ACMSIGKDD International Conference on Knowledge Discov-ery and Data Mining, Edmonton, Alberta, Canada, 2002,pp. 386–395.

[70] V. Barnett, T. Lewis, Outliers in Statistical Data, Wiley,1994.

[71] C.C. Aggarwal, P.S. Yu, Outlier detection for high dimen-sional data, in: Proceedings of the ACM SIGMODInternational Conference on Management of Data, 2001,pp. 37–46.

[72] M. Breunig, H.-P. Kriegel, R.T. Ng, J. Sander, LOF:identifying density-based local outliers, in: Proceedings ofthe ACM SIGMOD International Conference on Manage-ment of Data, Dallas, TX, 2000, pp. 93–104.

[73] E.M. Knorr, R.T. Ng, Algorithms for mining distance-based outliers in large datasets, in: Proceedings of the 24thInternational Conference on Very Large Data Bases, NewYork, NY, USA, 1998, pp. 392–403.

[74] P.C. Mahalanobis, On tests and measures of groupsdivergence, Journal of the Asiatic Society of Bengal 26(1930) 541.

[75] Wikipedia, Mahalanobis Distance, vol. 2006, 2006.[76] V. Hautamaki, I. Karkkainen, P. Franti, Outlier detection

using k-nearest neighbour graph, in: Proceedings of the17th International Conference on Pattern Recognition LosAlamitos, CA, USA, 2004, pp. 430–433.

[77] Y. Liao, V.R. Vemuri, Use of K-nearest neighbor classifierfor intrusion detection, Computers & Security 21 (2002)439–448.

[78] L. Ertoz, E. Eilertson, A. Lazarevic, P.-N. Tan, V. Kumar,J. Srivastava, P. Dokas, The MINDS - Minnesota intrusiondetection system, in: Next Generation Data Mining, MITPress, Boston, 2004.

[79] R. Agrawal, T. Imielinski, A. Swami, Mining associationrules between sets of items in large databases, in: Proceed-ings of the ACM SIGMOD Conference on Management ofData, Washington, DC, 1993, pp. 207–216.

[80] J. Hipp, U. Guntzer, G. Nakhaeizadeh, Algorithms forassociation rule mining - a general survey and comparison,in: Proceedings of the ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining,Boston, MA, USA, 2000, pp. 58–64.

[81] D. Barbara, J. Couto, S. Jajodia, N. Wu, ADAM: a testbedfor exploring the use of data mining in intrusion detection,ACM SIGMOD Record: SPECIAL ISSUE: Special sectionon data mining for intrusion detection and threat analysis30 (2001) 15–24.

[82] R. Lippmann, J.W. Haines, D.J. Fried, J. Korba, K. Das,The 1999 DARPA off-line intrusion detection evaluation,Computer Networks: The International Journal of Com-puter and Telecommunications Networking 34 (2000) 579–595.

[83] E. Tombini, H. Debar, L. Me, M. Ducasse, A serialcombination of anomaly and misuse IDSes applied toHTTP traffic, in: Proceedings of the 20th Annual ComputerSecurity Applications Conference, Tucson, AZ, USA,2004.

[84] J. Zhang, M. Zulkernine, A hybrid network intrusiondetection technique using random forests, in: Proceedingsof the First International Conference on Availability,Reliability and Security, Vienna University of Technology,2006, pp. 262–269.

[85] L. Breiman, Random forests, Machine Learning 45 (2001)5–32.

[86] W.L.S.J. Stolfo, P.K. Chan, E. Eskin, W. Fan, M. Miller, S.Hershkop, J. Zhang, Real time data mining-based intrusiondetection, in: Proceedings of the Second DARPA Informa-tion Survivability Conference and Exposition, Anaheim,CA, USA, 2001, pp. 85–100.

[87] C. Kruegel, F. Valeur, G. Vigna, R. Kemmerer, Statefulintrusion detection for high-speed networks, in: Proceed-ings of the IEEE Symposium on Security and Privacy, 2002,pp. 285– 294.

[88] A. Patcha, J.-M. Park, Detecting denial-of-service attackswith incomplete audit data, in: Proceedings of the 14thInternational Conference on Computer Communicationsand Networks, San Diego, CA, USA, 2005, pp. 263–268.

[89] D. Newman, J. Snyder, R. Thayer, Crying wolf: Falsealarms hide attacks, in: Network World: Network World,2002.

[90] S. Axelsson, The base-rate fallacy and its implications forthe difficulty of intrusion detection, ACM Transactions onInformation and System Security 3 (2000) 186–205.

[91] JE. Gaffney, JW. Ulvila, Evaluation of intrusion detectors:a decision theory approach, in: Proceedings of the 2001IEEE Symposium on Security and Privacy, Oakland, CA,USA, 2001, pp. 50–61.

[92] S.J. Stolfo, W. Fan, W. Lee, Cost-based modeling for fraudand intrusion detection: results from the JAM Project, in:Proceedings of the DARPA Information SurvivabilityConference & Exposition, 2000, pp. 130–144.


[93] J. McHugh, Testing Intrusion detection systems: a critiqueof the 1998 and 1999 DARPA intrusion detection, ACMTransactions on Information and System Security 3 (2000)262–294.

[94] T. Ptacek, T. Newsham, Insertion, Evasion, and Denial ofService: Eluding Network Intrusion Detection, SecureNetworks Inc, 1998.

[95] U. Shankar, V. Paxson, Active mapping: resisting NIDSevasion without altering traffic, in: Proceedings of the IEEESymposium on Research in Security and Privacy, Oakland,CA, 2003.

[96] K.M.C. Tan, K.S. Killourhy, R.A. Maxion, Underminingan anomaly-based intrusion detection system using com-mon exploits, in: Proceedings of the Fifth InternationalSymposium on Recent Advances in Intrusion Detection,Zurich, Switzerland, 2002, pp. 54–73.

[97] J.M. Estevez-Tapiador, P. Garcia-Teodoro, J.E. Diaz-Verdejo, Anomaly detection methods in wired networks: asurvey and taxonomy, Computer Communications 27(2004) 1569–1584.

[98] M. Keeney, E. Kowalski, D. Cappelli, A. Moore, T.Shimeall, S. Rogers, Insider threat study: computer systemsabotage in critical infrastructure sectors, U.S.S. Serviceand C.M.U. Software Engineering Institute, SoftwareEngineering Institute, Carnegie Mellon University, 2005,pp. 1–45.

[99] A. Liu, C. Martin, T. Hetherington, S. Matzner, Acomparison of system call feature representations forinsider threat detection, in: Proceedings of the 6th AnnualIEEE Systems, Man and Cybernetics (SMC) InformationAssurance Workshop, West Point, NY, 2005 pp. 340–347.

[100] J.S. Park, J. Giordano, Role-based profile analysis forscalable and accurate insider-anomaly detection, in: Pro-ceedings of the 25th IEEE International Performance,Computing, and Communications Conference, Phoenix,AZ, 2006, pp. 463–470.

Animesh Patcha is a doctoral candidatein the Bradley Department of Electricaland Computer Engineering at VirginiaTech. since August 2002. His area ofresearch is computer and network secu-rity in wired and wireless networks.Currently, he is working on stochasticintrusion detection techniques under theexpert guidance of Dr. Jung-Min Park inthe Laboratory for Advanced Researchin Information Assurance and Security.

Prior to coming to Virginia Tech, he received his M.S. degreein Computer Engineering from Illinois Institute of Technology,Chicago, IL in May 2002 and his B.E. in Electrical and Elec-tronics Engineering from Birla Institute of Technology Mesra,Ranchi, India in December 1998 respectively. From January 1999to December 2000, he was a software engineer at Zensar Tech-nologies in Pune, India. He is currently a student member of theIEEE, SIAM, ASEE and a global member of the Internet Society.

Jung-Min Park received the B.S. andM.S. degrees both in electronic engi-neering from Yonsei University, Seoul,South Korea, in 1995 and 1997, respec-tively; and the Ph.D. degree in electricaland computer engineering from PurdueUniversity, West Lafayette, IN, in 2003.

He is currently an Assistant Professorin the Department of Electrical andComputer Engineering at Virginia Poly-technic Institute and State University

(Virginia Tech), Blacksburg, VA. From 1997 to 1998, he workedas a cellular systems engineer at Motorola Korea Inc. His current
interests are in network security, applied cryptography, andcognitive radio networks. More details about his research inter-ests and publications can be found at http://www.ece.vt.edu/faculty/park.html.He is a member of the Institute of Electrical and ElectronicsEngineers (IEEE), Association for Computing Machinery(ACM), and the Korean-American Scientists and EngineersAssociation (KSEA). He was a recipient of a 1998 AT& TLeadership Award.
http://www.ece.vt.edu/faculty/park.html

http://www.ece.vt.edu/faculty/park.html

anomaly detection techniques

Documents