fyp report

Sara Elizabeth Bury

Sentinel

Aberrant Network Behaviour

Indication and Analysis

BSc. Computer Science22nd March 2007

1

I certify that the material contained in this dissertation is my own work, and does notcontain significant portions of unreferenced or unacknowledged material. I also warrantthat the above statement applies to the implementation of the project, and all associateddocumentation.

Date: 22nd March 2007Signed:

Abstract

The aim of this project is to explore the use of aberrance detection techniques for networkmonitoring in a large network environment. This an important area for research and de-velopment as todays networks are expected to function twenty four hours a day, sevendays a week; something which is impossible to guarantee relying only on the vigilanceand investigative skill of network operators. This project can be broken down into threemain areas: research into current aberrant network detection methods and assessment oftheir suitability; eliciting the requirements of a large network operator, and the produc-tion of a prototype system to illustrate the advantages of an aberrance detection systemwithin a network operations environment. The result would be a system which indicatesinstances of aberrant behaviour as they occur and provides further information for net-work operators to aid their workflow and allow them to make an initial classification ofthe event.

Contents

1 Introduction 7

1.1 Project Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.1 DANTE and GEANT2 . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Report Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Background and Related Work 11

2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Sources of Network Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.1 Measurement of Metrics . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.2 Individual Packet Capture . . . . . . . . . . . . . . . . . . . . . . 15

2.2.3 SNMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.4 Network Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 Existing Network Monitoring Solutions . . . . . . . . . . . . . . . . . . . 17

2.3.1 TCPDump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.2 Snort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.3 RRDtool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.4 Cacti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3.5 Flow-Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3.6 NfDump . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.7 NfSen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3.8 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4 NfSen-HW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4.1 Architecture and Organisation . . . . . . . . . . . . . . . . . . . . 26

2.4.2 Holt-Winters Forecasting . . . . . . . . . . . . . . . . . . . . . . . 27

3 Design 30

1

CONTENTS 2

3.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1.1 Network Operator’s Workflow . . . . . . . . . . . . . . . . . . . . 30

3.1.2 Requirements list . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Design Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2.1 NfSen-HW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2.2 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2.3 MySQL and PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.4 Debian GNU/Linux . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3.1 NfSen-HW and NfDump . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.2 runSentinel.sh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.3 Sentinel.jar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.4 Sentinel Database . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.5 Sentinel Web Interface . . . . . . . . . . . . . . . . . . . . . . . . 42

4 Implementation 47

4.1 Method of Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 NfSen-HW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3 runSentinel.sh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 Sentinel.jar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4.1 Implementation Overview . . . . . . . . . . . . . . . . . . . . . . 51

4.4.2 Problems with XML Parsing . . . . . . . . . . . . . . . . . . . . . 51

4.4.3 Database Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.5 Sentinel Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.6 Sentinel Web Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.6.1 Live Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.6.2 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.6.3 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 System Operation 57

CONTENTS 3

5.1 Usage Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.1.1 Examining Live Update for Aberrant Behaviour . . . . . . . . . . 57

5.1.2 Filtering the results . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.1.3 Viewing further Details . . . . . . . . . . . . . . . . . . . . . . . . 58

5.1.4 Analysis and editing event details . . . . . . . . . . . . . . . . . . 60

5.1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6 Testing and evaluation 62

6.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.1.1 Defect and Component Testing . . . . . . . . . . . . . . . . . . . 62

6.1.2 Functional and Integration Testing . . . . . . . . . . . . . . . . . 65

6.2 User Interface Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.3.1 Requirements List Review . . . . . . . . . . . . . . . . . . . . . . 71

6.3.2 Summary and Feedback from DANTE . . . . . . . . . . . . . . . 77

7 Conclusion 78

7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

7.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

A Acknowledgements 80

B Project Proposal 81

C JavaDoc 82

D NfDump(1) Manpage 83

E Holt-Winters Forecasting Examples 92

List of Figures

1.1 GEANT2 Network Topology . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2 GEANT2 Global Connectivity . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 Section of an RRD exported to XML format . . . . . . . . . . . . . . . . 19

2.2 NfSen Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 NfSen-HW Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1 Use Case diagram depicting the diagnosis of a network anomaly . . . . . 31

3.2 Overview of Proposed System Architecture . . . . . . . . . . . . . . . . . 37

3.3 Sentinal Java UML Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4 Sentinal Database Entity Relationship Diagram . . . . . . . . . . . . . . 40

3.5 Simple foreign key linking example . . . . . . . . . . . . . . . . . . . . . 41

3.6 Proposed Live Update Web Interface . . . . . . . . . . . . . . . . . . . . 43

3.7 Proposed Details Web Interface . . . . . . . . . . . . . . . . . . . . . . . 44

3.8 Proposed Review Web Interface . . . . . . . . . . . . . . . . . . . . . . . 45

4.1 Sentinel Java UML Class Diagram . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Sentinel Database UML Diagram . . . . . . . . . . . . . . . . . . . . . . 52

4.3 Aberrant Marking Example . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.4 Subtracting 40 Minutes Example . . . . . . . . . . . . . . . . . . . . . . 55

5.1 Investigation Process Step 1 . . . . . . . . . . . . . . . . . . . . . . . . . 57



5.4 Investigation Process Step 4 - Editing . . . . . . . . . . . . . . . . . . . . 60

5.5 Investigation Process Step 4 - Inserting . . . . . . . . . . . . . . . . . . . 60

5.6 Sequence Diagram of System Operation . . . . . . . . . . . . . . . . . . . 61

6.1 General Defect Testing Model . . . . . . . . . . . . . . . . . . . . . . . . 62

4

LIST OF FIGURES 5

6.2 Functional Testing Model . . . . . . . . . . . . . . . . . . . . . . . . . . 65

E.1 Aberrant Marking Example . . . . . . . . . . . . . . . . . . . . . . . . . 92

E.2 Subtracting 40 Minutes Example 1 . . . . . . . . . . . . . . . . . . . . . 93



List of Tables

2.1 Consolidation functions within RRDtool for aberrant behaviour detection 20

3.1 Derived Requirements List for High Level Requirement A . . . . . . . . . 32

3.2 Derived Requirements List for High Level Requirement B . . . . . . . . . 33

3.3 Derived Requirements List for High Level Requirement C . . . . . . . . . 33

3.4 Derived Requirements List for High Level Requirement D . . . . . . . . . 33

3.5 Derived Requirements List for High Level Requirement E . . . . . . . . . 33

3.6 Derived Requirements List for High Level Requirement F . . . . . . . . . 34

3.7 Derived Requirements List for High Level Requirement G . . . . . . . . . 34

3.8 Sentinel Database Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.1 Sentinel.jar Testing - XML Parsing . . . . . . . . . . . . . . . . . . . . . 63

6.2 Sentinel.jar Testing - Source and Profile Detection . . . . . . . . . . . . . 63

6.3 Sentinel.jar Testing - Database Connectivity . . . . . . . . . . . . . . . . 64

6.4 runSentinel.sh Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.5 Sentinel UI Functional Testing - Live Update . . . . . . . . . . . . . . . . 67

6.6 Sentinel UI Functional Testing - Details . . . . . . . . . . . . . . . . . . . 68

6.7 Sentinel UI Functional Testing - Review . . . . . . . . . . . . . . . . . . 69

6

1Introduction

1.1 Project Aims

The rationale behind this project was to gain an understanding of current research worksurrounding aberrant network behaviour detection and then investigate the challengesfaced when creating a system which would detect aberrant behaviour and provide aclassification of its type. Leading on from this, the aim is to produce an application whichillustrates how the work of a network operator could be aided by indicating instancesof aberrant behaviour, providing any relevant information, and performing some kind ofclassification of the type of anomaly. Such an application should ease a network operator’sworkflow when diagnosing and fixing network problems by providing necessary detailswith considerably less manual intervention than might currently be required. It shouldalso provide the facility for instances of aberrant behaviour to be recorded to providea historical perspective on any future anomalies detected which should further aid thenetwork operator in their work.

1.2 Motivation

Computer Networks play an increasingly important role in today’s technological age. Thetransfer of information between computers has become something necessary for manyday to day activities, and this is especially true of the education and research sector.Universities and research institutes rely on them as communication links between scholarsand students across the globe. Also in many cases the research being done correspondsdirectly to the networks themselves, computing and communications research requireshigh speed reliable links between sites in order to accurately test new protocols andtechnologies. It is important that these networks are monitored carefully to ensure thatpotential issues are caught and resolved.

People charged with the task of maintaining computer networks face a constant battleto ensure that they do not fail, but failure is not such a black or white issue. Whilst oneproblem faced might be a network breaking removing a connection between machines, itis more than likely that regular problems would be less obvious and require investigationto solve. Services on the network may become unusually busy, or slow to respond; usersmight notice lag between transfers being sent and acknowledged. Other issues might affectnetwork traffic but remain unseen, namely security problems. Users might not notice iftheir data is being tampered with or observed, but it is up to a network administratorto try and prevent attacks of that kind, and to ensure they are rectified if they occur.

7

CHAPTER 1. INTRODUCTION 8

1.2.1 DANTE and GEANT2

DANTE, standing for “Delivery of Advanced Network Technology to Europe”, is anorganisation part owned by each of the European National Research and EducationNetworks (NRENs) which has worked to plan, build and operate pan-European com-puter networks for advanced research and education since it was established in 1993.[DANTE, 2007]. DANTE has played an important part in the previous four generationsof pan-European research network, and was responsible for the initial construction andsubsequently the maintainence and management of it’s current incarnation, GEANT2.This network connects 30 NRENs serving 34 countries providing network facilities forapproximately 30 million research and education users [GEANT2, 2007].

Figure 1.1: GEANT2 Network Topology


Figure 1.2: GEANT2 Global Connectivity

DANTE and its network operations team are responsible for the day to day business ofrunning GEANT2, ensuring the network is operating smoothly and that each of it’s endusers are happy with it’s performance. As you can see from Figure 1.1 and Figure 1.2, theGEANT2 network is exceptionally large and interacts with multiple research networksaround the world. Monitoring a network this size presents a very difficult prospect, it’sa balance between wanting to know about every network event in order to be sure thenetwork is operating correctly but also having only the time to deal with problems whichare being specifically reported by end users. This results in a situation where networkanomalies not causing immediate problems for network users are often missed, and po-tentially causes problems further down the line as whatever the cause of the networkevent might be is not dealt with in the first instance. This problem is emphasised by thesheer amount of data being dealt with, any metrics created for monitoring purposes arebe excessively large and logically cannot be kept for an indefinite amount of time. Thisleads to circumstances where a network problem has occurred but the data pertaining toit has been deleted simply because the data for that time period has expired.

A network operator in this situation does not have time to spend actively monitoringthe network for aberrant behaviour by hand. Most widely used network monitoring so-


lutions will provide an overview of network activity in a graphed format and this can beused to visually identify anomalies. Unfortunately this is not an automated process andrequires human interaction to view the graphs at the correct time. Also on a networkthe size of GEANT2, in order for an anomaly to show up on a graphed view it wouldhave be quite large. Due to this. network events could be missed both through beingof a size too small to create a visible footprint on the graphs and by ocurring at a timeduring which a network operator has not checked the monitoring software. In such acase there is not normally any record of the behaviour other than in the graphs, no otherhistorical record is kept of network anomalies and their type. In some cases it is possibleto go back and examine the graphs for a given time period, but the data used to createthem might have expired hence losing the potential for closer analysis and the graphsthemselves often become averaged over time, causing smaller anomalies to be evened outinto the normal flow of data.

DANTE and their work with GEANT2 provide an excellent example of why automatedaberrant network detection is a necessary area of research. This project aims to use thisscenario and through discussions and liaison with network operators at DANTE, providea concept network monitoring system which will attempt to provide a solution for theissues highlighted above.

1.3 Report Overview

The remaining sections of this report will be as follows:

Chapter two provides some background information pertaining to this area of researchand examines existing work and applications which could aid the design and implemen-tation of the project. Chapter three gives a breakdown and explanation of the majordesign decisions made and describes the system architecture, interface designs and com-munication structure. It also provides a list of documented requirements to be met bythe finished product. Chapter four describes how the application was implemented andlists important sections of code. Chapter five shows the application in operation, how itwould be used by a network operator in their daily work with a walkthrough of typicalusage. Chapter six gives an overview of the testing undertaken, and how successfully theapplication meets the specified requirements. Chapter seven draws conclusions based oncomparisons between the finished project and the initial aims and objectives. It analysesthe overall success of the project and indicates where further work could be undertaken.

2Background and Related Work

2.1 Related Work

There is a lot of research surrounding the area of network anomaly detection and in everycase the first thing that must be defined is what specifically constitutes anomalous oraberrant behaviour on a network. Some researchers have chosen to define aberrant eventsas any network traffic which has been caused by some malicious intent [Kim et al. 2004],others simply as any large scale event on the network [Wagner & Plattner, 2005]. Thesedefinitions seem simultaneously too broad and too specific; malicious network traffic maymake up a large part of anomalous traffic on a network but there are other contributingfactors such as network configuration issues which are not be covered by this, where assome large scale network events may be planned, or occur as part of general networkusage. A more reasonable definition is “circumstances when network operations deviatefrom normal network behaviour” [Thottan & Ji, 2003, pg 2192], in essence when witnessedtraffic on the network differs from might be expected according to prior knowledge of howthe network operates. This obviously requires an indepth knowledge of the network andhow it is used. One way of doing this is to create a picture of normal network traffic andactively use that for comparison purposes to judge which traffic is abnormal. Jake Brut-lag develops this idea further by stating that if you have an accurate statistical modelfor a given time series of network traffic data, then you can define aberrant behaviour as“behaviour that does not conform to this model” [2000, p140]. In these cases aberrant oranomalous behaviour is not necessarily of any given type or size, but merely somethingwhich would not have normally occurred, and even in relation to the earlier definitions itis appropriate as it can be used to specifically identify malicious traffic of network wideevents. Overall, it means that the identification is not restricted to network events whichhave been witnessed and identified before allowinh new, undefined network problems tobe flagged up. Of course, for this to be a useful definition there must first exist an exactspecification of what constitutes normal behaviour and this is a big focus of much re-search in this area. Almost all the research papers covered were in agreement that whatis required is a statistical analysis of network traffic data [Barford et al. 2002; Thottan &Ji, 2003; Kim et al. 2004; Brutlag. 2000]. The question then is how that statistical anal-ysis is performed, and then from that, how is aberrant or non-normal behaviour identified.

Barford et al. [2002] perform a signal analysis of network traffic data known as waveletanalysis which produces an organised hierarchy of data over time into separate levelsknown as strata. The differing levels of strata produce information of varying types,from “sophisticated aggregations of the original data” at the lower levels to “fine graineddetails” at the higher end [p74]. From this they show that these separations of resultscan indicate different characteristics, for example, lower level strata capturing patternsover a long period of time, middle ranges of strata producing information about daily

11

CHAPTER 2. BACKGROUND AND RELATED WORK 12

variations, and high levels indicating very short term variation and in their opinion notuseful to network anomaly analysis. They make the point that there can never be onesingle method for detecting network anomalies from this information due to the differingdefinitions of what a network anomaly is, but they suggest a method for automatingthe process of identifying “irregularities in the measured data”[p75] which they call adeviation score. The results illustrated that a wavelet analysis of network data is “quiteeffective” at showing the details of network traffic, both during normal network opera-tion and during anomalous events. The network data used throughout this analysis wasboth SNMP from their network devices, mostly activity counts (i.e. numbers of pack-ets transmitted per node) and Network Flow data including more specific protocol levelinformation about end to end packet flows, which together provide a “reasonably solidmeasurement foundation” [p71]. A comparison was made between the details elicitedfrom Network Flow and SNMP and they found that it is possible to expose anomalieseffectively using both. There will be further discussion of Network Flow data and SNMPdata later in this chapter.

Thottan & Ji [2003] provide an overview of what they consider to be the most popu-lar network anomaly detection methods; rule based techniques, finite state machines,pattern matching and finally their main focus, statistical analysis which can be used to“continuously track the behaviour of the network” [p2194] unlike the other approacheswhich often require recalibration over time. The statistical analysis is performed usingSNMP data collected from network routers and the method of statistical analysis hasbeen developed based upon the theory of change detection, i.e. having defined a networkanomaly as “correlated abrupt changes in network data” [p2195], using theory to detectchanges in network data indicates network anomalies also. They define an abrupt changeas “any change in the parameters of a time series that occurs on the order of the samplingperiod of the measurement” [p2195]. It is the correlated nature of the changes which dis-tinguish them from the normal variable nature of normal network operation but due tothe nature of SNMP data from various devices, even data of the same type from separatedevices cannot be treated the same way. Each source of data must be tested indepen-dently and correlations between devices found. To give a very general overview, abruptchanges are detected by comparing the variation of statistics between two contiguouswindows of data using an auto-regressive model. They found that the use of fine-grainednetwork data greatly improved the time taken for detection and something for concernwas the possibility of time synchronation being out between the various network devicesbeing polled. Also SNMP runs using the UDP network protocol meaning that there isno guarantee of queries and responses reaching their desired target.

Kim et al. [2004] propose a method for abnormal traffic detection based entirely onNetwork Flow analysis. They divide the analytical process into two sections, flow headerdetection and traffic pattern data generation. As a packet is received by their algorithmits header is checked and the transport protocol determined. From this further checkscan be made on such information as destination/source port number, or the packet/flowsize. The traffic patterns can be used to detect further aberrant behaviour, for example,


a scanning attack would result in a large flow count per host, but small flow and packetsizes. This is not strictly a statistical analysis of network data, more a record of previousnetwork traffic from a specific host/network in order to produce better knowledge of theiruse of the network. It suffers from the same pitfalls as most rule-based analysis, for exam-ple, a regular need for reconfiguring and a lack of ability to detect new and undocumentedaberrant events, but it does produce some interesting information regarding particularnetwork anomalies and how they appear as part of Network Flow data. It is also of notethat their system suffered problems with false alarms due to the similarities (accordingto their model) between attack traffic data and normal peer-to-peer communication datawhich, according to their paper, is the nature of as much as 50% of current Internet traffic.

Jake Brutlag [2000] describes the statistical model from which aberrant behaviour isdetermined as having to take into account a number of factors, mostly surrounding sea-son cycles or variations that are considered normal network behaviour, for example,network usage during the day being higher than at night, and higher still Monday-Fridaycompared to weekends. The model should be able to take this into account, and notmistakenly judge such trends as aberrant instances. It should also be capable of evolvingover time with the network as the cycles and trends gradually adapt to new conditions[p140]. His emphasis is on the use of such a model in a real-time monitoring context,complicated statistical modelling is not likely to be understood by the network operatorsand may have issues performing at an adequate speed. The model is broken down intothree sections [p140]:

• An algorithm for predicting the values of a time series one time step into thefuture.

• A measure of deviation between the predicted values and the observed values.

• A mechanism to decide if and when an observed value or sequence of values is‘too deviant’ from the predicted value(s).

His solution is an extension to the Holt-Winters forecasting algorithm which builds uponexponential smoothing. Exponential smoothing is a simple algorithm for predicting thenext value in a time series which works on the premise that the most useful value topredict the next value is the current value and that the continued usefulness of earliervalues decays exponentially. Aberrant behaviour is then detected through devised con-fidence bands, a measure of how much deviation is allowed for a specific time within aseasonal cycle. There will be a more full explanation of the Holt-Winters forecastingalgorithm and how it works later in this chapter. Jake Brutlag included this implemen-tation in RRDtool, a data logging and graphing application and illustrates its use withina web based network monitoring solution called Cricket [Brutlag, 2000b; RRDtool, 2007;Cricket, 2007]. His conclusions are that whilst not an optimal solution, it is flexible,efficient and does effectively detect aberrant behaviour.This solution appears to be the most complete, if not the most formally specified. The


technique used is already at a production level and is being used. The fact that it isincorporated into RRDtool, one of the most commonly used logging and graphing toolsavailable, makes it a very attractive option. There will be a closer examination of thisRRDtool/Cricket solution later in this chapter.

Whilst most of these methods of anomaly detection and analysis have involved basiccounting metrics, there has been some investigation into the use of different methodsof analysis to create models of traffic flow. One such approach involves devising theentropy content within traffic data and using that information to decide whether trafficis anomalous or aberrant [Wagner & Plattner, 2005]. Entropy is defined as “a measureof how random a data-set is” [p172] and the process they use to determine entropy forthe network traffic data first involves representing the data in a purely binary formatthen performing data compression. The resultant size of the compressed data then cor-responds directly to the level of entropy present. Their results found many interestingentropy patterns in normal and attack traffic, for example, in regular network traffic theentropy of source and destination port fields is almost identical where as in attack trafficmany of the answering flows do not exist, hence source port entropy increases whileddestination port entropy decreases. They also found that this method of analysis is notgreatly affected by the use of sampled network traffic data.

2.2 Sources of Network Data

Before looking at how to detect aberrant or anomalous network behaviour it is importantto examine possible sources of network traffic data and their strengths and weaknesses.There are four main types of network data available to use and in this section each willbe examined.

2.2.1 Measurement of Metrics

This refers to the method of obtaining network data by measuring certain metrics re-garding network performance. An example might be the measurement of packet roundtrip times and packet loss. This is not something which is necessarily automated, thereare command line tools which can give results of this nature such as ping or traceroute.Findings obtained in this manner would not normally be incorporated into a networkmonitoring solution but are useful as a secondary source of information during the inves-tigation of a potential network problem. They present useful information about the stateof the network at a given time and also how well it is currently operating but cannot giveany indication of the type or nature of network traffic.


2.2.2 Individual Packet Capture

This method involves capturing each individual packet as it passes through the networkand processing it to find out useful information. Due to its invasive nature it provideshighly detailed information about the type and even content of data traversing the net-work, this is due to its ability to look into the application layer of network packets. Suchindepth network traffic data creates the potential for incredibly accurate and specific anal-ysis of network operation, not just based upon protocols used or source/destination, butalso based upon the program or application the packet is being used to update. In a lotof cases this would be the ideal for network traffic analysis and would mean that all kindsof networ anomalies can be identified and very accurately classified, unfortunately suchhigh detail comes at a price. Capturing individual packets as they pass through networkdevices is an incredibly intensive process when the sheer amount of packets traversingeven a medium sized network operation. In a scenario such as that at DANTE, individualpacket capture would be far, far too heavy a load for any available server. Whilst theinformation would be highly desirable it would result in such a performance hit on thenetwork itself that it is inappropriate for a passive network monitoring application.

2.2.3 SNMP

SNMP stands for Simple Network Monitoring Protocol [SNMP, 2007] and is an IETFdeclared Internet standard in the application layer of the TCP/IP five layer model. It isused by network monitoring systems to monitor and manage network connected devicesusing Management Information Base (MIB) queries. Devices can be polled for numer-ous different types of information, the first is regarding the state of the device itself.This gives information about load and operational readiness, for example, informationabout how heavily loaded the processor within a router is which could indicate potentialproblems with the capability of that specific device, or possibility of unpredicted net-work load in that area. It can also produce statistical information about the networkdata the device is passing such as numbers of packets transmitted in a certain periodof time which gives another indication of bandwidth and network load. Another capa-bility is providing network management systems with alerts when certain events occuron the device, for example a large number of failed login attempts to its managementinterface. One other use of this protocol standard is to actually remotely manage thenetwork devices, reconfiguring them for different circumstances, for example, blockingpartiular ports or dropping network interfaces. This isn’t something which is specificallyconnected to network monitoring and aberrant behaviour detection but such capabilitywould allow a network engineer to react to aberrant events which might have been de-tected and perhaps find a solution.

Information gathered via this method is quite coarse, there are few specifics about typesof packets or regularity of their throughput other than plain statistical counts and aggre-gations. This data could be very useful alongside a more indepth source of network data,


but probably is not granular enough to be the soul data source in an aberrant behaviourdetection system.

2.2.4 Network Flow

A Network Flow is a record of a unidirectional sequence of packets between two endpointsover a defined period of time that contains certain information with which the flow canbe identified. This information consists of seven key fields: source IP address, destinationIP addresss, source port number, destination port number, protocol type, service typeand router input interface. After receiving a packet a flow capable router will examine itfor the information to fill these seven fields and based upon the results decide whetherthe packet is part of a pre-existant flow record, or if it is something new. In the casethat it is part of an existing flow record, the traffic statistics of that flow record willbe increased accordingly, otherwise a new flow record will be created with the statisticsincluding the initially recived packet. A few standards exist for for flow data, the mostcommon being NetFlow developed by Cisco [NetFlow, 2007] and generally accepted asthe industry standard, another is sFlow an alternative produced more recently. Bothproduce fairly similar data for analysis and for the purposes of this definition the focuswill be on NetFlow as it is currently the more commonly supported.

A flow record does not contain any information pertaining to the application layer, itis merely a traffic profiling tool. Flow level data is not as specific as full packet analysisbut holds the advantage in large scale heavily used networks due to its high speed na-ture. NetFlow recording is nowhere near as intensive as individual packet capture andproduced a much smaller dataset for a given series of packets due to the way it aggregatespackets into related flows. This can have a big impact on heavily used networks as thesheer amount of data created by each data source and the processing power requiredto perform analysis can be prohibative to producing any kind of useful network usagereport, especially when working in real-time. Even with this reduction in the amount ofdata, without some kind of presentation application NetFlow data can still be difficultto manage and so in organisations where NetFlow is used, it will most likely be sent toa network monitoring system to produce clear reports about the analysis carried out.NetFlow data can be used to gain an overview of traffic traversing the entire network ata point in time. It holds enough detail to analyse and produce reports of trends in portusage, bandwidth on a packets per second/flows per second/bits per second basis, as wellas giving indications of interesting network behaviour. A network can be analysed usingNetFlow data and characterised according to how it is normally used, from this it canbe seen when network usage is different. This is all based on flow level information butis usually enough to indicate areas of interest.

There are some potential problems with NetFlow as a main source of network data,the first being the common use of packet sampling in order to create the flow records.Even though recording Network Flow is much less intensive and quicker than individual


packet capture, it is still too much of an overhead for very large networks, such as inthe case of DANTE and their operation of GEANT2. The problem is twofold, firstly notwanting to impact network performance with the analysis load, but also creating largedatasets which are impossible to deal with sensibly. In such cases there is nothing tobe done but enforce some scheme of packet sampling upon the NetFlow enabled router.By this process, not all packets are examined to record flow data, a given ratio can beset, usually somewhere between the extremes of 1 in 15 and 1 in 1000. A study com-pleted quite recently by Braukhoff et al. [2006] examined the impact of packet samplingon anomaly detection metrics and the results were quite interesting. The investigationused a record of flow data from the outbreak of the Blaster worm in 2003 where thecharacteristics were well known, and the anomaly detection could be replayed at vari-ous levels of sampling to produce results which could be scientifically compared. Firstlythey found that packet counts are barely disturbed by packet sampling even as high as 1in 1000, where as flow counts are heavily disturbed causing many identifiable trends tosimply vanish. They attribute this to the fact that flows containing only a few packetsare sampled with a lower probability that flows containing many packets, hence in a lotof cases the smaller flows disappear. Secondly they examined how volume and featureentropy metrics are affected by packet sampling, their conclusion was that “though wesee that packet sampling disturbs entropy metrics (the unsampled value cannot easily becomputed from the sampled value as for byte and packet counts), the main traffic patternis still visible in the sampled trace.” [p161]. This is something which would have to betaken into account when deciding upon analysis techniques using NetFlow data.

2.3 Existing Network Monitoring Solutions

2.3.1 TCPDump

TCPdump [TCPdump, 2007] is a commonly used network debugging tool which enablesthe user to intercept and view individually captured TCP/IP packets that are beingtransmitted over a network. It is built upon the libpcap [libpcap, 2007] packet capturelibrary and has the capability of writing out the data obtained from captured packetsto a formatted text file. This can then be interpreted by a statistical analysis programto produce reports of trends in network usage and to give further information aboutnetwork traffic traversing the network in question. The program itself contains no formof alter or notifcation regarding network events but this can be acheived with the appli-cation of some network monitoring solution and an indepth analysis of the data recorded.

This solution provides an overwhelming amount of data regarding usage of the networkand can be very useful in diagnosing network faults. However, as mentioned in previously,individual packet capture is a very intensive process and on a network of any great sizethere would simply not be the resources available to capture, process and store everypacket or even a sampled amount of packets in this fashion. This is a very useful tool


to have when actively working to solve an identified issue, but is not something whichshould necessarily be used in a passive network monitoring context.

2.3.2 Snort

Snort [Roesch, 1999] is described by it’s creator as a “Lightweight Intrusion DetectionSystem” which operates in a passive fashion “providing administrators with enough datato make informed decisions” [p229]. It is based upon the libpcap [libpcap, 2007] packetcapture library like TCPdump but analyses individually captured packets with the capa-bility to examine the payloads of packets in the application layer which TCPdump lacks.Again, due to the nature of its operation, it does not scale successfully to be used onlarger networks, and it’s creator states that it is intended to be used on “small, lightlyutilized networks” [p229]. It’s method of network traffic analysis is a rule-based one andthe rules are created by the individual network administrators tailored to their network.If Snort witnesses some traffic trend which is defined as aberrant according to those rulesit will perform a set action, most commonly sending an email to the administrators toalert them to the possibility of some network problem.

Again this is dependent on individual packet capture which is not suited to large, heavilyused networks, even as stated by the creator of the program. It does contain some formof alert system but as mentioned previously, rule based systems are not ideal for suchanalysis as it is very difficult to predict new network trends, either naturally evolving onesor ones caused by new network threats. Such a system might create large amounts offalse positives or in the case of a new style of anomaly, may miss the problem altogether.

2.3.3 RRDtool

RRDtool [RRDtool, 2007] stands for Round Robin Database tool and describes itselfas “the industry standard data logging and graphing application”. It provides a seriesof tools capable of creating, updating and manipulating databases of time series datawith which to produce graphs for visualising results. It’s data storage uses round robindatabase principles which means that the database files will never grow to be larger thana custom set size. This is acheived through constantly averaging and generalising thedata held within over a set amount of time. This has two results, firstly that the sizeneeded to record the data for a particular source will always be constant, but secondlythat over time the older results held will lose their granularity as smaller differences willbe averaged out. This means it is a good choice for a situation such as at DANTE asthe initial data size is a known quantity, and storing data in this format means it is heldin a compact fashion meaning that potential I/O constraints are minimized. The loss ofdata granularity is something which can be organised such that only data so old that itis of no direct use is changed past a certain point.

Capabilities

RRDtool provides a way of storing data in a logical and easily readible/updateable for-mat. It provides the facility to generate graphs based upon the data values held withinthe RRD databases and to hold data in different resolutions depending on user con-figurable settings, and also contains a form of aberrant behaviour detection based onHolt-Winters forecasting. The result of its aberrant behaviour detection is a booleanresult, yes or no, for a particular time period. There is no built in functionality for alertsor available interface to more information, the behaviour is merely flagged when seen. Itdoes not provide any front end interface to these commands, nor is it specifically tailoredto data sources such as SNMP queries or Network Flow. The data must be organisedand processed into the required format before being inputted to RRDtool for archivingand graphing. There are quite a few front-ends and extensions available for RRDtool,most noteable are Cacti and NfSen which are discussed later in this chapter.

<rrd><version>0003</version><step>300</step><lastupdate>1169397600</lastupdate><ds>

<name>flows</name></ds><rra>

<cf>FAILURES</cf><pdp per row>1</pdp per row><database>

<row>

<v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>1.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v><v>0.0000000000e+00</v>

</row></database>

</rra></rrd>

Figure 2.1: Section of an RRD exported to XML format

Architecture

This is a brief overview of the architecture of an RRDtool database and how it operates,including the aberrant behaviour detection capability added by Jake Brutlag. Firstly an


RRDtool database (from this point on referred to as an RRD) is stored on disk as a binaryfile specific to the architecture of the machine used to compile the version of RRDtool itwas created using. It can be exported to and imported from an xml format, in which itis easy to see the constituant sections [See figure 2.1], but more importantly so it can beported between machines with RRDtool compiled for different architectures. This binaryformat minimizes the time taken for read and writes performed by the application itself.

RRDtool performs an operation on the RRD known as consolidation which is essentiallya form of archiving based upon user specific rules. Consolidation occurs with every RRDupdate, as new data is added older data is consolidated such that the archive maintainsa specific size, and that the overall data result is how the user has defined; older datacan be reduced to an average, a minimum, a maximum etcetera. There can be differentconsolidation functions per RRD and internally data using the same consolidation rulesis divided into separate Round Robin Archives (RRAs) where the required amount ofspace is set aside ready for data values to fill it. The example given in the RRDtooldocumentation is that of a need to store 1000 values at 5 minute intervals. Within theRRA, space for 1000 values will be allocated plus a header of a set size. As data valuesare updated they are added to the allocated space in a round robin fashion, so newervalues would appear to knock older values off the end of the 1000 recorded instances.This is when the consolidation function is used to keep track of the previous data in theway the user has specified.

The aberrant behaviour detection functionality within RRDtool is implemented usingthe Holt-Winters forecasting method which will be examined more closely later in thischapter. The information is stored through the addition of five consolidation functionsas shown in table 2.1 [Brutlag, 2000a p143].

HWPREDICT An array of forecasts computed by the Holt-Winters algorithm,one per primary data point.

SEASONAL An array of seasonal coefficients with length equal to the seasonalperiod. For each primary data point the seasonal coefficient thatmatches the index in the seasonal cycle is updated.

DEVPREDICT An array of deviation predictions. Essentially copied from theDEVSEASONAL array to preserve a history; it does no processingof its own.

DEVSEASONAL An array of seasonal deviations. For each primary data point theseasonal deviation that matches the index in the seasonal cycleis updated.

FAILURES An array of boolean indicators, 1 indicating a failure. Each updateupdate removes the oldest value and inserts the new observation.On each update the number of violations is recomputed.

Table 2.1: Consolidation functions within RRDtool for aberrant behaviour detection


When the calculations have been performed and the specific RRAs updated then theFAILURES section is where the actual aberrant behaviour is indicated.

2.3.4 Cacti

Cacti [Cacti, 2007] is described as being the “complete front end to RRDtool” providinga web based framework for aggregating data sources and displaying graphs dependent onuser configuration. A lot of the functionality is provided via RRDtool; what Cacti offersabove RRDtool alone is, in essence, exactly how it is described, a more easily config-urable interface which can be altered to a network administrator’s preference to providea coherent front-end display of available network usage information. It contains supportfor graphing based upon SNMP queries and drawing graphs from any data source can bemade to utilise the ‘create’ and ‘update’ functionality in RRDtool.

It does not have any inherent ability for analysing and processing network traffic dataother than that within RRDtool itself, it merely allows the data to be displayed in acoherent fashion so visual analysis can be carried out. This means that aberrant be-haviour detection can be performed using RRDtool’s Holt-Winters implementation, butthe network traffic data must first be processed to present it to RRDtool in a compat-ible format. Having done this there is no provided inteface to the aberrant behaviourdetection results other than a presentation of the data as a graph. This only provides avisible indication of what time the aberrant event occurred and would require a networkadministrator to conduct further separate research using alternative tools to diagnose theperceived problem.

2.3.5 Flow-Tools

Flow-Tools is “a software package for collecting and processing NetFlow data” created byMark Fulmer [Flow-Tools, 2007]. It can be used to collect raw Network Flow data fromrouters/servers and then process it to create reports on network activity. NetFlow that iscollected using the flow-capture command will be written to disk in files that cover a userconfigurable time period and compression is applied. The files can be configured to ex-pire, either after a set amount of time has passed or when a certain amount of disk spacehas been used. A rotation prcoess occurs so older files are expired first. The interfaceto the flow-tools files for querying and analysis purposes is largely at the commandline,there are commands to process the data in a way completely configurable by the user,but there are also inbuilt commands to aid searching for set kinds of aberrant behaviour,such as scanning traffic on the network.

Flow-Tools can be configured to produce reports and graphs (through RRDtool) aboutbehaviour it witnesses on the network, but any aberrant or anomalous behaviour detec-tion via this method would be rule based. It could be configured to sent the appropriate


information to RRDtool to make use of the in-built aberrant behaviour detection butthis is not available as standard. A statistical analysis could also be applied to flow-tools archived files but again this is something which must be configured by the networkadministrators individually, there are no in-built capabilities for this.

2.3.6 NfDump

NfDump is a command line application written principally by Peter Haag to collect,process and produce analytical reports of Network Flow data [NfDump, 2007]. It is akey part of the wider NfSen project which will be mentioned later in this section. It hasa built in NetFlow capture daemon, nfcapd, which runs as a background system processcollecting the NetFlow data as it is exported from the router. The data is then storedin 5 minute long timeslices in a proprietary format which can be accessed using otherNfDump command line tools. It contains the facility for viewing archived NetFlow datacorresponding to defined filters, a simplified example taken from the nfdump manpage[Haag, 2005b] would be:

nfdump -r nfcapd.200407110845 inet6 and tcp and (src port > 1024 and dst port 80)

This displays all IPv6 connections on port 80 to any webserver that occured within thetimeframe that the specified nfdump file covered. This filter syntax is capable of exam-ining a range of timestamped nfcapd files can produce detailed statistics very quickly,for example, the top 20 statistics during the two given timeslices in the regular format[Haag, 2005b]:

nfdump -r nfcapd.200407110845:nfcapd.200407110945 -S -n 20

An example of the output format:

Date flow start Duration Proto Src IP Addr:Port Dst IP Addr:Port Pkts Bytes Flows

2004-07-11 08:59:52.338 0.001 UDP 36.249.80.226:3040 -> 92.98.219.116:1431 1 404 1

2004-07-11 09:15:03.422 5.301 TCP 36.249.80.226:4314 -> 92.98.219.116:1222 45 2340 2

NfDump provides a highly flexible and quick interface at the command line to view spe-cific information pertaining to network events. It can be configured to to produce graphsvia it’s sister application NfSen which uses RRDtool as a back end. There is no com-mand line facility for detecting aberrant behaviour other than examining the statisticalinformation by hand or by processing the stored data files using some external statisticalanalysis program but as with flow-tools, this is something that a network administratorwould have to create and configure using their personal knowledge of the network. Oneextra possibility provided by NfDump is the use of the command line tool nfprofile to use


stored filters (known as profiles within NfDump) to process specified traffic into either anASCII formatted human readable report, or a binary formatted data file which can beanalysed again using the NfDump command line tools. This can be configured to occuras the files are stored, or when the administrator initiates analysis. This could allow dataof a particular type/to a particular subnet/from a particular ip to be stored separatelyfrom normally stored data to ease analysis.

Overall, NfDump provides a very nice solution for processing and organising collectedNetFlow into a format which can be analysed for aberrant behaviour, but it does notprovide any kind of aberrant behaviour detection itself. The only restriction on theamount of data which can be held is disk space and stored data is not held in any com-pressed way, even so, as files are rotated every 5 minutes and marked with the datestampthey cover, in cases such as DANTE where the amount of NetFlow being stored is toomuch for the machine to keep longer than two weeks, the older files could easily beautomatically deleted.

2.3.7 NfSen

NfSen, or NetFlow Sensor, is described as “a graphical web based front end for the nf-dump netflow tools” [NfDump, 2007]. Combined with NfDump it makes up what theauthor Peter Haag refers to as the NfSen Project. It provides an interface to the Nf-Dump command line tools, as well as illustrating network usage via graphs using dataprocessed and stored using the nfcapd NetFlow capture daemon. Figure 2.2 illustratesexactly how NfDump and NfSen interact [Kiss & Mohacsi, 2006].

Figure 2.2: NfSen Architecture

By default it will produce graphs based on the live NetFlow data being captured to dis-play current network traffic behaviour over various timeframes. It also offers the ability


to define further profiles, specifics of data you wish to see graphs separate from the livedisplay. The details of these can be configured via the web front end, including theamount of disk space the RRD for that profile will take up. and it makes use of the pro-filing feature within NfDump. This size per profile configuration means that an reliableestimate of disk space can be obtained before any data is captured, as well as giving theability to allow more detailed profiles to use more space and hence hold their granularityfor a longer period.

NfSen uses NfDump as its back end for capturing and processing the data, this means itis only capable of monitoring network traffic based upon Network Flow. Due to this focuson one source of data however, the analysis that it is capable of is perhaps more detailedthan other similar monitoring solutions and the presentation of the results is specificallytailored to this kind of information. It also uses RRDtool as it’s back end for producing agraphical display, this means that it would be possible to harness the aberrant behaviourcapabilities of it’s built in Holt-Winters algorithms, though NfSen does not have anyinherent solution for displaying such information. As with any system utilising RRDtoolfor its data storage, disk space is a known quantity though, as I mentioned previously,NfDump does not have any compression facility for its data storage. There is modularframe work for adding plugins to the system, one popular example is PortTracker whichmonitors the connections to various ports in a graphical way. There is also the facilityfor automatic alerting via email according to given rules, but this has to be configuredby a network administrator with specific to the network in question and its uses.

2.3.8 Overview

There are quite a number of network monitoring solutions available but few, if any,support aberrant behaviour detection and indication. TCPdump and Snort are far toointensive to use in any kind of passive monitoring environment and the data they provideis very specific to the rules used. The analysis they perform is not adaptive and in mostcases requires a very good knowledge of the network environment which is being moni-tored. A side factor is the legal implications of the data that is produced, for example,with Snort it is possible to view data held in the application layer of packets traversingthe network. A network administrator using such a tool to analyse network traffic mayinadvertently find network traffic which contains illegal material, and in such a case thelaw is not entirely clear regarding the administator’s position in having viewed it. Whilstit is important to ensure there are rules and restrictions regarding network use in action,it is not necessarily the place of a network administrator to enforce such legislation, andso, whilst diagnosing network faults the potential for such inadvertent discoveries mightbe something to be avoided. Especially for administrators in a situation like DANTEwhere their responsibility is simply for the links between separate service providers, in-stitutions who’s place it is to be enforcing such network usage restrictions rather thanDANTE.


RRDtool is the most promising service offering the capability of aberrant behaviourdetection using the Holt-Winters algorithm based upon supplied data. It is based uponRound Robin principles and hence uses a static amount of disk space to store it’s dataas well as having the ability to produce graphical representations of any kind of informa-tion, provided it is inputed in the correct format. Unfortunately this is where RRDtoolis not enough, it has no capability for analysing or processing data merely dealing withpre-processed values submitted with correct flags. Further applications are necessary togive RRDtool its full potential.

Cacti provides a fully functional interface to RRDtool, allowing the ability to createand view graphs of various data sources, even including the facility to carry out SNMPqueries. Whilst this solves some of the initial interface problems faced when using RRD-tool alone, it still leaves a need for some form of pre-processing of any network dataother than SNMP before it can be inputed to RRDtool. Also Cacti does not have anyinterface specifically designed to tailor for the results what might be produced by RRD-tool’s aberrant behaviour detection, so this would also need to be created. Flow-Toolsmight be a solution to the need for preparatory processing and analysis, it automati-cally captures and stores NetFlow data in a compressed format and can be configured tooutput to RRDtool depending on how the data presentation required. This still leaves aneed for a front end presentation of the data, both regular and aberrant behaviour related.

Finally there is NfDump and NfSen, two applications which are closely linked. Onesupplies a NetFlow capturing and storing facility, with processing and profiling capa-bility, the other a web based interface to RRDtool. This is the most complete packageoverall, within the context of this project. It is lacking in a number of areas however,there is no inherent ability for NfSen to provide any aberrant behaviour detection orindication, and the web interface only allows the creation of basic data profiles not theability to set up or view data being processed by the Holt-Winters algorithm. Thereis no pre designed facility for indication network events when they occur other than byviewing the specific graphs at the right time. This would appear to be the best on offer,but requires more development to be an ideal solution.

2.4 NfSen-HW

NfSen-HW is an extension to NfSen currently being developed as an attempt to makefull use of the Holt-Winters aberrant behaviour detection capabilities within RRDtool[NfSen-HW, 2007]. It was initially presented to JRA2, the GEANT2 security team, inSeptember 2006 as a project being undertaken by network administrators at HUNGAR-NET the Hungarian research and education network [Kiss & Mohacsi, 2006]. The aimwas to aid the work of the Computer Security Incident Report Teams (CSIRT) in theirusual work process; “find abnormal behaviour, report and coordinate incidents”, thegoal being to “help visually detect abnormal behaviour” [Kiss & Mohacsi, 2006, Slide 2].


Whilst it is at the very cutting edge of development, it does provide a combination ofeverything previously listed as being a requirement; NfDump for the underlying data pro-cessing and profiling, and a customised NfSen interface to create and view data sources,including instances of aberrant behaviour detected using the Holt-Winters functionalitywithin RRDtool.

2.4.1 Architecture and Organisation

The architecture of NfSen-HW is much like the architecture of NfSen, the main differ-ences being the extra processing done as part of RRDtool, and the redesign of the frontend. As you can see from comparison of figure 2.2 and 2.3, there are no alterations tothe actual framework of NfDump and NfSen, merely the addition of taking into accountthe Holt-Winters forecasting within RRDtool [Kiss & Mohacsi, 2006].

Figure 2.3: NfSen-HW Architecture

The forecasting algorithm reads from and updates the individual RRD files, adding datainto the Holt-Winters specific RRAs. When a forecasted value is considered to be toodeviant, it is marked within the RRD files such that during the next scheduled processingevent this is being displayed on the web front end.

The plugin architecture within NfSen is such that perl modules of a particular formatcan be included as processing scheduled to be run every time an update occurs, every fiveminutes. These plugins are held within a rigid framework and can provide information toa front end plugin, simply a php page included in the front end plugin directory. Using


this method certain extra processing can be performed tailored to a specific network orneed. In the case of NfSen-HW, the plugin architecture has not been used to imple-ment the extra processing and changes required to update RRDtool for Holt-Wintersforecasting correctly. Gabor Kiss said this is due to the organisation of NfSen’s modularstructure; in order to have acheived what he has within a plugin, he would have had torepeat large pieces of the underlying code base within the plugin itself, because of thishe chose to simply modify the source code and has submitted suggestions to Peter Haagas to how the modular framework could be improved. 1

In conclusion, this provides a very useful platform for detecting aberrant behaviour,but it does not fulfil all of the criteria laid down for use within DANTE. In their casethe amount of network data available is incredibly large, and even in graphical form itcan be too much to take in visually. With NfSen-HW there is no immediate way ofindicating network anomalies without an administrator examining the correct graph atthe right time. This might not seem like much of an initial issue, but due to the size ofthe NetFlow data being captured per day they can only keep hold of a certain amountof NetFlow data, and from that there would not be the space to hold unlimited sizes ofRRD files for profiles. If an RRD can only ever be a certain size, that size might only beone day’s worth of aberrant behaviour indications and hence after 24 hours the indicationof aberrant behaviour for that profile is lost.

2.4.2 Holt-Winters Forecasting

There are three separate sections which explain the mathematical process which consti-tutes Holt-Winters Forecasting, firstly:

Single Exponential Smoothing

This is a simple algorithm for predicting the next data value in a time series and can onlybe used for predictions in time series’ where there are no trends in results. A weightedaverage is taken of all previous time series values, weighted such that the most recentlyrecorded values are worth the most. This is because logically the most recent valuesare the most relevant to any further values. This is acheived by assigning geometricallydeclining weights to previous values which decrease over a constant ratio the further backthey go. This forecast can be updated using only two pieces of information, the latestobserved value and the previously calculated forecast. For this to work successfully it isimportant to choose the smoothing constant carefully, high values (0.8/0.9) will place aheavy emphasis on the newest values in the time series where as low values (0.1/0.2) willstretch the weight further giving further promenance to values in the past. A smoothing

1This explanation occurred during a telephone conference on 24th January 2007 involving myself,Maurizio Molina (Network Engineer, DANTE), Janos Mohacsi and Gabor Kiss (NfSen-HW Developers).


constant value of 1 would result in the forecasted value being equal to the previouslyobserved result.

Holt’s Method

The second section is what is known as Holt’s Method, the introduction of the possibilityof some trend in the values of a time series, and to take this into consideration whenforecasting the next result. This is done by creating another variable, the slope variable,which keeps track of the direction in which the trend is heading. This variable is alsoupdated using exponential smoothing hence there are two smoothing constants to choosevalues for. In the initial case these must be given values, usually in the region of

0.02 < α0α1 < 0.2

where a0 and a1 are the two smoothing constants.

Holt-Winters Forecasting

The third and most important section is the actual forecasting algorithm. This is anextension to Holt’s method which not only takes into account the possibility of sometrend in time series values, but also the potential for seasonal variation over differenttime periods, for example daily, monthly or yearly seasonal traits. The observed timeseries is broken down into three componants, each of which can be calculated to forecastfurther values:

• The Baseline (or Intercept)

• The Linear Trend (or Slope as it was referred to previously)

• The Seasonal Trend

The results are still calculated using exponential smoothing, but different weighting isapplied dependent on which component is involved. In the case of the seasonal trend,since the current point within the season is known, the last known value for the samepoint in the season can be referenced and given most relevance in calculating a prediction.

Aberrant Behaviour Detection is then performed using confidence bands. Since an res-onably accurate prediction can be made regarding the next value in a series, it is possiblealso to define limits that confidently the value will fall between. In other words, in thecase that the actual next value is not exactly the same as the predicted next value, towhat limits are we confident that it still follows the current trend and seasonal varia-tions. If the actual value is beyond these limits, either higher or lower, then dependingon the magnitude by which the prediction is incorrect, the actual value can be classifiedas aberrant, compared to previous known values.


This is a very simplified explanation of the Holt-Winters forecasting process with a re-duced emphasis on the mathematical formulae involved. It is based upon informationgiven by Jake Brutlag and on Chatfield and Yar’s investigation into the practical issuesof Holt-Winters forecasting, more detailed explanations of the algorithm can be found inthese sources [Brutlag, 2000a; Chatfield & Yar, 1988].

3Design

3.1 Requirements

3.1.1 Network Operator’s Workflow

The creation of requirements for this project requires an understanding of the situationin which the system will be used. A network operator has a considerable amount of dayto day responsibilities other than identifying and rectifying network problems, in somecases detecting issues on the network will be less of a proactive feature of their work,more something which might be triggered by a report from a user of a specific issue theyare facing.The result of this is that quite often network issues will go unnoticed and unattendeduntil they become enough of a problem for an end user to complain. In a situation likethat of DANTE the size of the network that is being monitored and they amount ofdata that traverses it means that even with a network monitoring application showinggraphs and trends of network activity, it is very easy to miss a network event whichonly effects a small portion of the network, or small number of sources of traffic data.Figure 3.1 gives an example of the actions taken by a network operator when a problemis detected or reported. With their current infrastructure, the majority of that processinvolves tracking down the problem and then using separate applications and tools to gaina better understanding. There is no facility for easily seeing other affected sites withoutgoing through the same process multiple times. In order to discover if other networkoperators have previously investigated or dealt with the problems identified they mustaccess a separate ticketing system and specifically identify the sources and time periodsin question. This means that if a problem has already been analysed and explainedpreviously, there is the possibility a second operator may have to go through the sameprocess a second time. Finally, as mentioned previously, due to the amount of data beingmonitored network problems could be missed. In a case where someone reports a problemwhich has been ongoing for longer than a certain period then the original NetFlow datacovering the time that the event began will probably have been deleted, in DANTEsnetwork monitoring setup the length of that window is two weeks. This would result inall analysis and diagnosis being performed based on the data held in the RRD files andgraphs which, due to the Round Robin nature of RRDtool, will become less accurate astime passes.

3.1.2 Requirements list

Based upon this understanding of the situation, a set list of requirements have beendervived, each of which should be met for the solution be considered a success. This list

30

CHAPTER 3. DESIGN 31

Figure 3.1: Use Case diagram depicting the diagnosis of a network anomaly

was completed after a series of discussions with a network operator at DANTE.

Overall Outcome

There is a simplistic overall outcome to be achieved by attaining each of the individualrequirements which was the initial starting point for derivation of more specific needs.

To assist a network operator in the identification and diagnosis ofnetwork problems and illustrate how the inclusion of automated aberrant

behaviour detection could improve large network monitoring.


High Level Requirements

Working from this overall end aim has produced a short list of high level, slightly morefocussed requirements:

A Automatically indicate aberrant network behaviour instances as they occur in aclear, coherent fashion.

B Allow the display of aberrant network behaviour instances to be tailored to theinformation the operator deems relevant.

C Supply enough information about each aberrant network behaviour instance thata preliminary analysis can be made straight away.

D Indicate possible links between indicated aberrant network behaviour instances.E Keep a historical record of aberrant network behaviour instances and basic analyt-

ical details.F Provide an flexible interface to past aberrant network behaviour information.G Provide a means of indicating that aberrant network behaviour instances have been

investigated.

Fully Derived Requirements List

Finally based upon the high level requirements, a fully derived requirements list can becreated. These are broken down into separate tables relevant to the high level requirementthey satisfy. These numbered requirements will be reviewed at the end of the project aspart of the Testing and Evaluation chapter of this report.

A Automatically indicate aberrant network behaviour instancesas they occur in a clear, coherant fashion.

A.1 Aberrant network behaviour instances should be displayed together onone page organised by the time they occurred.

A.2 Only the most relevant information for each aberrant behaviour instanceshould be displayed.

A.3 Aberrant network behaviour instances should be aggregated to display oneevent per continuously flagged period.

A.4 This display should automatically update as new aberrant behaviour isdetected on the network.

A.5 The display should be accessible from machines other than the machine itis installed on.

A.6 Each aberrant network behaviour events should be displayed in an identicalstyle so quick comparisons of information can be made.

Table 3.1: Derived Requirements List for High Level Requirement A


B Allow the display of aberrant network behaviour instancesto be tailored to the information the operator deems relevant

B.1 The information displayed as part of the live update can be filteredto show only instances which match particular conditions.

B.2 The default update should contain information the network operatorbelieves to be the most relevant in the first instance.

Table 3.2: Derived Requirements List for High Level Requirement B

C Supply enough information about each aberrant networkbehaviour instance that a preliminary analysis can be madestraight away.

C.1 Further information should be available for each aberrant networkbehaviour instance on request.

C.2 This information should include, at the least, a graph of the time frame inquestion and a brief statistical synopsis for the given period and traffictype.

C.3 This information should be persistant beyond deletion of the actual NetFlowrecords for that aberrant network behaviour event.

C.4 It should be made obvious if a particular aberrant network behaviour event hasbeen flagged as a false positive when examining further details.

Table 3.3: Derived Requirements List for High Level Requirement C

D Indicate possible links between indicated aberrant networkbehaviour instances.

D.1 If further information about an aberrant network behaviour event isrequested then a display should also be provided of possible associated events.

D.2 Further information pertaining to these associated aberrant networkbehaviour events should be available on request.

Table 3.4: Derived Requirements List for High Level Requirement D

E Keep a historical record of aberrant network behaviourinstance and basic analytical details.

E.1 Detected aberrant network behaviour events should be recorded in some formof persistant database.

E.2 The database should be reliable, quick to query, and scale well to holdingpotentially very large data sets.

Table 3.5: Derived Requirements List for High Level Requirement E


F Provide an flexible interface to past aberrant networkbehaviour information.

F.1 It should be possible to view past aberrant network behaviour event detailsbased upon a number of criteria;

F.2 Exact Start time and End time.

F.3 Start time somewhere between two given dates and times.

F.4 End time somewhere between two given dates and times.

F.5 Alongside queries based upon the starting and end times results should bechosen according to further specific information; type/source/profile etc.

F.6 When results have been found it should be possible to view further informationabout an event in the same way it would be possible for a live event.

Table 3.6: Derived Requirements List for High Level Requirement F

G Provide a means of indicating that aberrant network behaviourinstances have been investigated.

G.1 Aberrant network behaviour events stored in the system should be able tobe flagged as acknowledged when they have been dealt with.

G.2 Aberrant network behaviour events stored in the system should be able tobe flagged as a false positive if they have been identified as such.

G.3 Operators who have dealt with a particular aberrant network behaviour eventshould be able to leave some comment regarding their findings for the benefitof later users.

Table 3.7: Derived Requirements List for High Level Requirement G


3.2 Design Decisions

A brief justification of the tools and systems being used within the Sentinel system design.

3.2.1 NfSen-HW

This system has been chosen to provide a basis for the network traffic data analysis and forthe aberrant behaviour detection. This is for a few reasons, firstly it is the most completepackage available in this area of network monitoring. What it provides is a reliable,mathematically proven platform for detecting network anomalies packaged such thatinstallation and configuration is not an arduous task. Secondly the Network Operatorsat DANTE already have good working experience of NfSen, the non aberrant behaviourdetection capable version of this software. Due to this the exchange of GEANT2 datafor Sentinel development and testing should be more straightforward as the flows can betransferred as already organised compatible format files.

3.2.2 Java

Java 1.5 will be used to process the RRD files produced by NfSen-HW for aberrant be-haviour marks. This was intended to be done using a Java RRD library, allowing Javato directly interface with the RRD files, some examples of such libraries are comparedin the JRA1 Perfsonar wiki [RRD Java Libraries]. Unfortunately due to the version ofRRDtool required for use with NfSen-HW the libraries will not read the RRD files thatare produced by it. The most complete library, JRobin, required the use of a convertorbefore the RRD libraries and whilst JRobin itself would produce the results I required,the convertor did not support the version of the RRD files being used and so could notconvert them [JRobin, 2006]. Instead then a tool within RRDtool will be used, rrdtool

dump. This was mentioned earlier in the Background and Related Work section and pro-duces a full XML representation of the contents of the RRD files. Java by default containsvery flexible XML parsing libraries and so once the RRD files have been exported to anXML format, it should be possible to read in the appropriate aberrant behaviour results.

Java also contains methods and functions for connecting to, querying and alerting sqlcompliant databases, and this will be used to insert the collated aberrant behaviour eventsinto the database to be used by the front end. Using Java in this fashion should meanthat the finished application is completely portable to any system upon which NfSen-HWhas been installed, regardless of architecture unlike RRDtool. Java is portable to anysystem or architecture providing it has been installed, and this should mean that the endapplication will run in any NfSen-HW environment,


3.2.3 MySQL and PHP

The database will be stored using MySQL 5.0 and the web front end written usingPHP5 [MySQL, 2007; PHP, 2007]. MySQL is an open source database implementationvery widely used in web based applications and PHP a server side embedded scriptinglanguage which allows processing to be applied with results displayed to a webpage.These are two highly flexible and frequently integrated pieces of software which shouldprovide an excellent platform for the aberrant indication history and presentation. Theyprovide all the necessary tools and functions to complete the project in the easiest waypossible, allowing complex database queries and functionality within PHP for service andIP address lookups.

3.2.4 Debian GNU/Linux

Debian GNU/Linux will be the operating system platform for Sentinel. This is firstand foremost because the installation of NfSen-HW requires a well maintained and com-pliant Linux distribution, but secondly because of my familiarity with Debian’s sys-tem architecture and knowledge of Debian’s excellent package management system apt[Debian GNU/Linux, 2007]. This should mean that the installation of certain necessarysoftware, such as Java and PHP, will be a simple process leaving more time for devel-opment and testing. Also Linux generically comes with a number of useful applicationswhich will be required for this project, the most important being Bash or ‘Bourne-AgainSHell’ and Cron. Bash is the command line interpreter which comes as standard withGNU operating systems [Bash, 2007]. It provides a text based user interface to executecommands but also allows files containing commands to be created, Bash scripts, whichis what will be used to initiate the Sentinel java process on specific RRD files. Cron,or more specifically Vixie Cron, is a background process or daemon which exists to ex-ecute scheduled commands at specific times. Using formated configuration files knownas crontabs Cron can be set to run a particular command or script at a set point everyminute/hour/day/month. Cron will be used to ensure that the runSentinel.sh script isexecuted every five minutes to correspond with NfSen updates.


3.3 System Architecture

As described within the Design Decisions section, the system is made up of many smallercomponants which interact with NfSen-HW, NfDump and RRDtool.

Figure 3.2: Overview of Proposed System Architecture

Figure 3.3 illustrates how the various sections and systems interface with each other.As can be seen, NfSen-HW is an important part of the back end, Sentinel bases it’saberrance detection upon the events identified by RRDtool’s Holt-Winters forecasting.The rest of this section should give a more detailed description of what each of the


individual components do.

3.3.1 NfSen-HW and NfDump

The operation of these two applications has already been covered by previous chaptersof this report, but here is an overview of their use within the wider Sentinel indicationsystem.The Network Flow data from all sources is captured, as shown in Figure 3.2, byindividual instances of the nfcapd capture daemon. This is then analysed and organisedaccording to specified profile filters by NfDump. This information is then processed bythe front end system, NfSen-HW and the specific parameters are passed to RRDtool forthe creation of RRD files for each source/profile. The Holt-Winters forecasting occursas part of this process within RRDtool itself, and the resultant aberrance indicationdata is updated to each individual RRD in RRA sections specific to aberrant behaviourdetection. Once this information has been stored within the RRD files then NfSen-HWplays no further part in the abberance indication process. This occurs once every 5minutes, as the nfcapd NetFlow data files are rotated allowing new NetWork traffic datato be analysed.

3.3.2 runSentinel.sh

runSentinel.sh is a Bash script which is executed once every five minutes by Cron. Ittraverses the directory structure that holds the NfSen-HW RRD files, uses the rrdtool

dump command to export them to XML and runs Sentinel.jar on each file to pull outthe aberrant behaviour. This is merely a method of ensuring that the aberrant eventdatabase is updated every five minutes, the same as the RRD files themselves, whichshould ensure that no aberrant events are missed.

3.3.3 Sentinel.jar

This is the Java file which is responsible for interpreting the contents of the RRD files andthen for inserting that information into the Sentinel database. This is done by parsingthe XML outputted version of each RRD file created using runSentinel.sh and then usingJava’s inbuilt SAX XML parsing libraries. The default XML handler provided by thelibrary is extended to create an XML handler which only looks for the specific sectionsof XML that are required to retrieve the aberrant behaviour data. Information abouteach flagged event is pulled out and placed in an AberrantBehaviour object and once theXML file has been parsed to pick up ever instance of aberrant behaviour, the collectionof AberrantBehaviour objects are inserted into the Sentinel database. It makes use of theJDBC libraries within Java for connecting to and manipulating data within databases,in this case using the MySQL connector.


Figure 3.3 depicts a high level UML diagram of the classes within the Sentinel Javacomponent. As you can see from the diagram most of the complexity is within the RRD-Database, the XML parser simply pulls out the relevant information. It should be notedthat there are two forms of parse available within Sentinel, the first is the default, a scanfor any aberrant behaviour which has been indicated in the five minutes previous to thelast updated time. This is the form that will be run by the runSentinel.sh script every 5minutes, and ensures that only the latest information is pulled into the database as it isupdated. The second is a full scan, trigged by a command line argument, which will gothrough and parse an XML file for every single aberrant event that it contains. This isdesigned to be run the first time the system is put into operation, to retrieve the backlogof aberrant events into the database for historical purposes.

Figure 3.3: Sentinal Java UML Diagram


3.3.4 Sentinel Database

This MySQL data base stores all information about aberrant network events, includingtheir type, source, profile and a basic amount statistical information. Here is an entityrelationship diagram for the database schema:

Figure 3.4: Sentinal Database Entity Relationship Diagram

As you can see, an aberrant event can have one type, profile and source, but each of thosecould be applicable to many events. Here is an overview of the contents and responsibil-ities of each table within the database schema.

events Holds information relevant to an aberrant network event, including the type,source and profile via foreign key links to other tables. A start time andend time is held per event, as well as a comment and a marker indicatingacknowledged and false positive status. Also brief statistics are held,taken from nfdump and a lookup of port/hostname.

types A simple table containing all possible types of network data and an id numberfor linking purposes.

sources Contains all the sources seen so far with an id number for linking and adescription field to store further brief information about each source.

profiles Contains all the profiles seen so far with an id number for linking and adescription field to store further brief information about each profiles.

Table 3.8: Sentinel Database Tables

This final diagram illustrates how the tables will link together and the connections that


will take place using foreign keys.

Figure 3.5: Simple foreign key linking example

These diagrams give a good precise description of the contents of the tables and therelationships between them, but it is also important to understand how the data withinthe tables will be used by the other sections of the Sentinel system.

Firstly, and most importantly, the events table which links together all the relevantinformation about a particular aberrant event. The table contains a unique event id asit’s primary key; by using an integer and separating this necessity away from the actualheld data should mean that indexing of the table is a lot quicker and lookup times shouldbe improved. Second to that are two columns relating to the time that the event tookplace. The way that aberrant behaviour detection is implemented within NfSen-HW andRRDtool means that one particular network event will be flagged within the RRD as acontinuing series of 5 minute long segments. Since it is quite obvious from viewing theproduced graphs that each individual 5 minute long segment is not an aberrant eventin its own right, this design holds single events by storing the start time and the endtime of each event; from the RRD this would be the first 5 minute segment that theaberrant behaviour was indicated, and subsequently the last 5 minute segment that itwas indicated. In the case of a live updated page, the end time would be the last timethat aberrant event was seen as active as, without seeing the next segment in time, wecannot predict when a series of aberrant markers is going to end. The events table thenholds three foreign keys, linking to tables containing information about the type, sourceand profile of an aberrant event. Next is a comment field where network operators cancomment on an event, leaving messages about any research they have undertaken to solvea problem. From that there are two boolean flags, firstly an acknowledged field, wherenetwork operators may mark events as having been dealt with, and secondly a false pos-itive field. This can be used if NfSen-HW has incorrectly identified a period of time asan aberrant event. These fields are merely present for filtering purposes, when using the


system a network operator does not want to be presented with falsely identified eventsif they have been idenfied as such. The final two fields within this table are simply textfields containing more detailed information about the flows which were occurring duringthe idenfied time frame of the particular network protocol. While the system is beingused as a live update, this information is will most likely be retrieved from NfDumpdirectly but once an event has been marked as ended and time has passed without itbecoming active this information will be stored in the database for two reasons. Firstly,this will speed up the front end considerably, once an event has finished there will be nonew flow data added to it, the information which can be garnered from flow statisticsand hostname lookups is not going to change so removing the need to requery the storedflows should save time. Secondly, in cases such as at DANTE where the Network Flowdata is only held for a restricted amount of time, this will keep at least some basic level ofinformation connected to an event where it can be examined at a later date. If this werenot done, at a point in the future when information about a past event was retrieved, thelookup from NfDump could not be performed due to the NetFlow data no longer beingpresent on the system.

The other tables in the database schema are quite similar, the types table contains anumeric primary key and a corresponding network traffic type. NfSen-HW chooses tospecify 15 types of network traffic data which do not change throughout the rest of thesystem, these correspond to ‘flows’, ‘packets’ and finally ‘traffic’. For each of these thereare 5 subcatergories, firstly all traffic within that classification, then all tcp traffic, alludp traffic, all icmp traffic and finally ‘other’ which catches all other kinds of networktraffic protocol (for example, PIM or OSPF).

The profiles and sources tables are practically identical other than content, one con-tains information regarding the data sources being used, the other information regardingthe profiles that have been configured. They both contain a numeric primary key and aname for the source/profile being stored. The final optional field is a description field, aplace for further information about a source or profile. This might be used to clarify acertain source or profile’s reference, something which might not be immediately apparentfrom the short name.

3.3.5 Sentinel Web Interface

The Sentinel web interface should provide three different views on the same data. The firstview is a live update screen showing all the aberrant behaviour which has beem identifiedby the system during a configured amount of time, for example, the last 24 hours. Thesecond is a more detailed view of a specific aberrant event with further information anddetails to help identify the source of the problem. The third is an interface to searchthe database of stored aberrant events based on when they occurred, what kind of trafficwas involved, which sources. Each of these interfaces will be discussed in turn with aprototype of the end design.


Live Update

Figure 3.6: Proposed Live Update Web Interface

This interface is designed to be simple and easy to view at a glance. The aberrant eventswhich have occurred within the specified time frame are displayed in a tabular format,just containing the information immediately necessary to gain an initial understandingof what has happened. They are ordered by end time, in other words, the events whichwere active most recently are near the top. It is possible to further filter the aberrantevents displayed, perhaps to group together events affecting a particular data source ortraffic type. This is done using the filter interface at the top of the screen, on clickingsubmit the page would be refreshed showing only the data relevant to the options selected.By default on this display any events which have been marked as a false positive, or asacknowledged will not be displayed. This stems from understanding a network operator’sworkflow, in most cases if an event has been dealt with or is being dealt with then itshould not be listed as an event in the Live Update requiring attention. It could benecessary for an operator to compare a currenrly un acknowledged event with otherprevious events regardless of acknowledged or false positive status, in this case the filterscan be temporarily alters through the filter interface to display all events within the given


timeframe, regardless of the flags applied.

Details

Figure 3.7: Proposed Details Web Interface

The Details interface is designed to contain as much information about the event as possi-ble in one place. The stored information about the event is first presented in text form atthe top of the page, this is the longer form of the information containing comments, flagsand descriptions. The information specific to this event can be edited from this page,


further towards the bottom there is a small entry form. This will be auto completed tocontain the information that is stored currently about that event so it may be edited/ removed as appropriate. The graph covers a time period relevant to the event andunderneath is a brief synopsis of statistical analysis from NfDump based upon the startand end times and the classification of traffic that was indicated as aberrant. There isalso here a presentation of the top few flows with the port numbers and hostnames lookedup. This is to present the operator with as much information as possible in one placeso they aren’t required to use separate applications to perform the analysis necessary.Finally on the page is a display of events that have been idenfied as associated with thisone, primarily by when the events occurred. If two events are entered as starting andending at identical times, then logically they are going to be related in some way. Thisgives a network operator a better feeling for how widespread the problem is.

Review

Figure 3.8: Proposed Review Web Interface


The Review section is primarily for accessing data about events that have fallen outsideof the Live Update time period. Events can be searched for initially based upon theirstart and end date/time and secondly against filters like the Live Update page. FurtherDetails of events which have been found will be displayed through the same Detailspage as mentioned previously. Similarly to the Live Update section, the Review pageis concerned with presenting the essential pieces of information clearly and concisely, ifan operator is interested in a particular event then further information can be found byclicking on it.

4Implementation

4.1 Method of Implementation

The system was implemented over a number of weeks, initially though the focus was ongaining a full understanding of NfSen-HW and its operation alongside RRDtool and howto use it. Once this had been acheived the focus moved to the creation of the Java pack-age to parse the XML using small RRD files dumped to XML format for testing as it wasproduced. A small difficulty was encountered with the parsing due to the organisationand content of the XML file and this will be discussed in more depth in the Sentinel.jarsection of this chapter. Once the package was operating correctly the database tableswere created and development in Java continued to ensure the correct insertion of data.Next the Bash script runSentinel.jar was created and then the back end of the systemcould be put into proper operation. Lastly the web interface was produced taking datafrom the Sentinel Database.

The implementation of each of these sections will be discussed in more detail underthe headings which follow.

4.2 NfSen-HW

This is was not technically implemented as part of the system, but it’s installation anduse did cause some initial problems for development. NfSen-HW is based on the 20060412snapshot of NfSen, a non-stable version which appears to have some bugs. The biggestproblem was with the creation of NfSen-HW instances with previous data; the applicationwould work perfectly if, on creation, each source was specified and there was no earlierdata to be imported. Unfortunately due to the way data was transfered from DANTE theonly data available for use was technically past data from a large number of sources whichneeded to be imported before graphs or aberrant detection could be displayed. After alarge amount of experimentation it became apparent that if all past data is present atthe very moment you create the NfSen-HW instance for the first time, on the initial startup it will go through every stored NetFlow file and create appropriately dated RRD files.If past data is added at a future time even initiating a rebuild of the RRD files will notallow the creation of graphs based on this new data. This is because when NfSen-HWcreates the RRD files it has to give a starting time for the data it contains, any lateraddition of previous data failed to change the starting date and so any earlier data wasignored.

Another problem was the inability to add new sources of data once an NfSen-HW instance

47

CHAPTER 4. IMPLEMENTATION 48

had been initialised. The configuration options can be altered but on rebuild even thoughthe correct RRD files would be constructed, no data was added to them. This resultedin graphs claming to contain data from new sources but never actually displaying anycontent. The data received from DANTE contained over twenty separate sources of Net-Flow data, and was not delivered to any installation of NfSen-HW ‘live’, that is to say, asit was produced by the routers. What was received was backlog of nfcapd archived filessince the last update of NetFlow data was performed from my machine performed usingrsync over ssh. The only solution to this and the previously mentioned problem was,with every fresh installment of DANTE NetFlow, to reinstall NfSen-HW and rebuild theRRD libraries which was somewhat time consuming. Due to the amount of data beingreceived, a rebuild of RRD files after a reinstall could take over three hours, the size ofdata was approximately 20Gb per week, with an initial download of 83Gb, now nearingthe end of the project the space required to hold the NetFlow has surpassed 400Gb.

The implications of this were a little further reaching; as the data received from DANTEwas never live it meant that it was impossible to test the system using that data in alive situation with aberrant network data events being updated at five minute intervals.Initially I thought that having the RRDs based on the old data would allow me to doa full scan and collect the aberrant network behaviour events which occurred through-out the period the data covered. Unfortunately it appeared that the RRD files onlyhold the aberrant behaviour markers for 24 hours, after which point they are removed.This caused me to think about how RRDs work, they archive information based upona number of averages; over time the results lose their granularity and trends becomemore vague. In the case of Holt-Winters Forecast results, they are held within the RRDstructure as a binary 1 or 0 marker. Binary data like this cannot be averaged, a 1 or 0result makes no sense if it becomes translated into 0.8 at some point in the future, andso it became obvious that RRD files must only hold their Holt-Winters marks for a setperiod of time. Through discussions with a network operator at DANTE and a telephoneconference with Gabor Kiss and Janos Mohacsi, the developers of NfSen-HW, it appearedthat Gabor when creating the system had never set it up personally without having oldRRD files which he wanted to reimport. He then ran a perl script called Holt WintersReapply to take data from the RRDs files, run Holt Winters forecasting on it, and createnew HW capable RRDs for use with his system. Whilst in my case there was old databeing imported into NfSen-HW, it was not in RRD format, but infact NfDump archivedNetFlow data. This meant that the entire process of creating RRD files was done viaNfSen-HW, and the default time period parameters were hard coded. The solution wasto run the Holt Winters Reapply script once the any NetFlow data had been incorporatedinto a new NfSen-HW install which took more time upon each new NetFlow installmentarriving. Even having done this, the RRD files will only hold their Holt-Winters marksfor two weeks. This meant that the historical perspective functionality of my systembecame all the more critical.

In order to ensure the Sentinel system was working correctly collecting its informationfrom a live data source, another installation of NfSen-HW was performed, this time


running using data exported from a router based in my home on a small scale testingnetwork. Whils the data size is not nearly as large as that from DANTE, it worked aswas expected detecting aberrant network behaviour events of various kinds. It is usingthis installation that the majority of the development work was performed.

4.3 runSentinel.sh

The Bash script holds everything together by navigating the directory structures andconverts each RRD file to their XML equivalent. It then runs the Java XML parser overthe XML file with the correct parameters and finally deletes the XML file so as not tointerfere with further conversions.To understand the way the script works an understanding of the directory structure usedby NfSen-HW is required.

/home/nfsen-hw/profiles/

This is the root directory for any RRD data to be held. RRD files created using no filtersor profiles are stored within a profile known as live. It can generally be accepted thatthere will be a live profile part of every installation of NfSen-HW which actively capturesNetFlow data but runSentinel.sh does not make that assumption.

sara@fairlop: profiles$ lslive/ profile1/ profile2/

Asking for a directory listing of the profiles directory would yield results similar to this,where each of the files listed is actually a directory containing all data related to thatprofile name. Looking inside a profile directory shows the actual images which RRDtoolcreates.

sara@fairlop: live$ lsflows-day.gif DataSource1/ packets-week.gif traffic-month.gifflows-month.gif DataSource1.rrd packets-year.gif traffic-week.gifflows-week.gif packets-day.gif profile.dat traffic-year.gifflows-year.gif packets-month.gif traffic-day.gif

The two important listings here are DataSource1.rrd, the RRD file containing all thedata for this profile and data source, and the directory DataSource1/ which contains allof the nfcapd archived NetFlow files. This is repeated for every named profile directorywithin /home/nfsen-hw/profiles.The Bash script therefore works by changing directory into /home/nfsen-hw/profiles

and reading in the directory listing as a list of files. For every ‘file’ found in profiles,it changes to that directory, and reads in a file list of all files that ends in .rrd. This


should give a list of all RRD files, and hence Data Sources for that Profile. Knowing this,and its current working directory, it then executes the command rrdtool dump on eachRRD in turn, creating the XML formatted file, and then runs Sentinel.jar passing in thecorrect directory paths as parameters to read the XML file.

4.4 Sentinel.jar

The Java XML parser was completed mostly to the specification given in the Designchapter. Here is a more specific UML diagram of the component classes.

Figure 4.1: Sentinel Java UML Class Diagram


There is one additional class which was not present in the original design, the Aberrant-Mark class. This is due to some unforeseen problems with parsing the XML files whichshall be mentioned in more detail later in this section.

4.4.1 Implementation Overview

Sentinel.jar is executed via a call from the runSentinel.sh Bash script and is passedthe appropriate parameters to know the location of the RRD file that to be processed.When passing the parameters it is important that the full path to the chosen RRD fileis given; this is because the RRD files themselves contain no reference to the profile orsource they correspond to. Such information can only be retrieved from the directorystructure and filename. When Sentinel.jar is run, the first thing that happens is thepassed in parameter is broken down into its component parts and the source and profilename stored. An instance of RRDDatabase is created and the profile and source are setwithin it. This is where the information about the RRD file being parsed will be saved,including a list of AberrantBehaviour objects, one for each aberrant nework behaviourevent that is retrieved. When the XML parsing has finished, the Main driver class gets theVector of AberrantBehaviour’s from the RRDDatabase using the getAberrantBehaviour()method. This Vector is then iterated through and based upon its start and end time, theinformation is inserted into the database.

4.4.2 Problems with XML Parsing

Originally I had assumed that parsing the XML file would be as straightforward waslookng for the tags within the FAILURES section which were marked as 1.0000000000e+00

rather than 0.0000000000e+00 and retrieve the timestamp for that FAILURES entry. Un-fortunately on closer examination of the RRD structure, the individual entries in theFAILURES section do not contain a timestamp. The only timestamp available withinthe file is the one marking the instant that it was last updated. In order to solve thisproblem in a generic and portable way, every aberrant network behaviour marker re-quired it’s time calculating based upon it’s place in the file, working backwards from thelast entry, which logically is equal to the last update time. An AberrantMark class wascreated which is created whenever an aberrant mark is found, as the file is parsed everyentry which would have occured with a time update is counted and when an aberrantmarker is located, the number of the row it was from is stored within the AberrantMarkobject created. When the file has been fully processed then the exact number of entriesis known and the timestamp for each event can be worked out using simple mathematics.Secondary to this, when an AberrantMark is located, it is important to note which fieldor fields the mark occurred in. For each update there are multiple types of traffic beinggraphed, and the type of traffic the aberrant behaviour occurs as part of determines whichfield the mark occurs in. This number was stored inside the AberrantMark instance foreach event and translated back into a human readable name in the Main driver class.


4.4.3 Database Insertion

Sentinel.jar uses the JDBC libraries for connection to the Sentinel MySQL database butthere are some checks made before the data is inserted. The database design is such thateach aberrant instance cannot just be inserted into the events table as any number ofaberrant network behaviour marks may be aggregated into one event if they are part ofa series with the same type, source and profile. The first information to be updated isthe profile and source data, this is so the identifier can be retrieved to be inserted as aprimary key in the events table. A check is made to ensure that the same profile andsource are not already present in the database, if not they are stored and the id numberssaved. Once this has taken place, the actual aberrant network behaviour informationcan be checked. For every event, the type id is retrieved, and then a query performedto see whether an entry exists with the exactly the same information apart from an endtime stamp 5 minutes previous. If this is the case then the end time in the database isupdated to the end time of the new aberrant instance, and the rest of the informationleft as it was. If there weren’t any prior entries in the table fitting that descriprion thena new entry is created with that information, and so the process continues until there areno more AberrantBehaviour objects left.

4.5 Sentinel Database

The creation of the database was exactly as laid out in the design, here is a more detailedUML diagram of the datatypes and interactions between tables.

Figure 4.2: Sentinel Database UML Diagram


4.6 Sentinel Web Interface

The web interface was implemented as illustrated in the Design report also, written inPHP and divided over 3 separate sections. Here is a brief overview of each page and whathow it was implemented.

4.6.1 Live Update

The Live Update operates initally using a default SQL query. It makes a connection tothe Sentinel database and retrieves all events whose end timestamp was with the last 24hours. It filters the results in order to not show events which have been acknowlegedor marked as a false positive as part of the default view. Second to that there arethree smaller queries which get a current list of all profiles, sources and types being usedwithin the system. Along with the option to show acknowledged and false positive events,this information is used to create the filter functionality. Operators can choose certaininformation they would like to see by ticking checkboxes. When the submit button isclicked, the values that have been selected are submitted back to the same page, the pagedetects that selections have been made and the choices are retrieved from the POSTarray and assembled into appropriate SQL queries. Here it was important to ensure thatthe SQL logic was correct using brackets to separate parts of queries. The assembledqueries are performed and the results displayed in the same style as the default querywould. Alongside this, the page auto-refreshes every five minutes to ensure the displayedresults are as up to date as possible.

4.6.2 Details

The Details page is initialised by a user clicking on an event for more information, thispasses the event id via GET to the Details page. For security reasons it is importantwhen using the GET method in this situation to validate the value that has been passed;in Sentinel’s case this checks that the value passed is numeric, which removes the abilityfor malicious users to perform SQL commands upon the Sentinel database. The graph isdrawn by sending the appropriate values as part of the GET request to rrdgraph.php, apart of NfSen-HW. The appropriate values are:

• profile name• ‘:’ separated list of sources• proto type: ‘any’, ‘TCP’, ‘UDP’, ‘ICMP’, ‘other’• ‘flows’, ‘packets’, ‘traffic’• profile start time - UNIX format• start time - UNIX format• end time - UNIX format• left time of marker - UNIX format; 0 is no marker


• right time of marker - UNIX format; 0 is no marker• width of graph• heigh of graph• light version ( small graphs ) - no title or footer• linear or log y-Axis• linear or log y-Axis

Using this it is possible to draw a graph for any period, with specific markers dependingon the options chosen. The Details page draws graphs starting a number of hours beforethan the actual start time of the aberrant network event. Also an amount of time isadded to the end time, just to give a better view of what happened at that time whichremoved the aberrant marker. This is done by either adding one hour or showing alltraffic up until the current last update time, whichever is smaller. Having done this, thetime period covered by the event itself is marked and on the graph appears as highlightedin green.

The next stage displays some statistical analysis of the flows during that time period,either by retrieving the already performed nfdump query from the database or, if theevent is still ongoing, by directly querying nfdump via PHP exec() and displaying theresults back to the webpage. This was not as straight forward as passing in the start andend times due to the way Holt-Winters forecasting detects aberrant results. An aberrantmark is displayed based upon the next value in a time series being mathematically “toodeviant” from what was expected, because of this an assessment of aberrance cannot oc-cur at the exact time the network event begins to happen, it is only realised a set amountafterwards and marked from then onwards. I found that if analysis was performed basedupon the exact start and end times, then the results would quite often not cover theperiod of aberrance. To get around this I conducted some experimentation into the av-erage amount of time passes between the aberrant event starting and the aberrant markbeing set. Figure 4.3 llustrates the difference between the start of the aberrant event,and the initial marker being placed. I found that in most cases, unless the aberrant eventwas exceptionally out of the ordinary, if 40 minutes was subtracted from the aberrantevent marker’s start time then the start of the actual network activity was included inthe statistics. Figure 4.4 gives an example of this, and further examples are available inthe appendix, section E.


Figure 4.3: Aberrant Marking Example

Figure 4.4: Subtracting 40 Minutes Example


Depending on the aberrant network event, different filters are applied to the nfdumpquery to produce the most appropriate results, for example, only showing TCP traffic. Asecond version of the query is also performed to retrieve only the top four flow statisticsof that kind, and the result is requested in machine readable format. This produces asimilar result but there are no human readable lables and each entry is separated bythe pipe symbol. From this I pulled out the source and destination ip addresses andport numbers, and these are looked up using two PHP commands, ‘getServByPort()’ and‘getHostByIP()’. The results are then displayed to the page in a similar format to thenfdump output.

The last two sections of the Details page deal with editing information about a par-ticular stored event and showing potential links between it and other stored events. Thedetails which can be edited are those which are immediately specific to that event, so thecomment and the acknowledged/false positive flags. Source and profile descriptions arenot specific to an event so they are edited elsewhere. The method of implementation isa simple form which redirects the details filled in via the POST array to another pagewhere it is inserted into the database. Associated events are displayed in a similar styleto Live Update, but only if they match the strict similarity criteria; starting and endingat the same time as the currently viewed event. Details of these events can be viewed inthe same way as from the Live Update page.

4.6.3 Review

The Review page is quite straight forward in comparison, past data can be queried usinga form allowing searching by exact start and end time, start time between two dates andend time withing two dates. Other filters can be applied such as specific source, profileor type. Acknowledged and false positive events can be excluded or included and theresults are displayed, again, similarly to the Live Update page complete with a link toview further details.

5System Operation

To illustrate the system’s operation this section will contain a walk through of the work-flow as experienced by a network operator and will conclude with a comparison of thisand the previous processed defined in the Design chapter.

5.1 Usage Scenario

A Network Operator wishes to know if there are any current problems with externalconnectivity (i.e. connections to the wide Internet) on the network. There is already aprofile set up in NfSen-HW to monitor traffic passing outward/inward to the Internetknown as ‘External’. The process of investigation follows these steps:

1. Examining Live Update for indications of Aberrant Behaviour.2. Filtering the results to only show relevant information.3. Viewing further details of a specific event.4. Analysing the results and editing the event details to reflect the results.

5.1.1 Examining Live Update for Aberrant Behaviour

Figure 5.1: Investigation Process Step 1

57

CHAPTER 5. SYSTEM OPERATION 58

This page indicates all currently or recently active aberrant events as detected by thesystem. As you can see it contains both data from the live profile and the Externalprofile. The view can be tailored to see only the External sources.

5.1.2 Filtering the results


Only aberrant behaviour involving data being received from or passed to external hostsis now show in the summary. From this more information can be requested.

5.1.3 Viewing further Details

Further details contains more information about the event, including a graph of an appro-priate time period with the actual time period marked in green, the top flows ordered bythe traffic type, in the case of the screenshot this is any traffic type, and a lookup of themost relevant hostnames and port numbers. From this display an operator could easilyidentify that the activity here is nothing of concern and then, using the edit section, acomment could be added to this effect.


5.1.4 Analysis and editing event details

Adding conclusions of the findings is very simple, and this information is then stored inthe database for other operators to see.

Figure 5.4: Investigation Process Step 4 - Editing

Figure 5.5: Investigation Process Step 4 - Inserting

5.1.5 Summary

A comparison between this network operator’s workflow and the original example given inthe Design section shows a number of improvements. Firstly, there is one single locationfor finding aberrant behaviour instances, the operator does not need to view individualgraphs of profiles and sources as the database picks up all relevant information. Thisinformation can be displayed in the manner the operator chooses, so initial assessmentsof problem scope can be made. Leading on from this, the biggest improvement over theprevious process is the ability to see a large amount of relevant information in one place.The Details section provide basic information about the duration and location of theproblem as well as suggesting possible causes via NetFlow statistics, and finally indicateslikely explanations of the issue by indicating the hostnames and services in use. Thisis information that previously the operator would have had to find out by hand. The


Details section also provides an assessment of possibly associated events which can beviewed in more detail. This reduces the amount of time the operator might have requiredto find out other areas and services affected by the event.

Figure 5.6 shows a final sequence diagram depicting the system in operation duringthe preceding usage scenario.

Figure 5.6: Sequence Diagram of System Operation

6Testing and evaluation

6.1 Testing

In order to thoroughly test the system I used a number of different testing stratgies, eachof which will be covered in detail during this chapter.

6.1.1 Defect and Component Testing

According to Ian Sommervile “the goal of defect testing is to expose defects in a softwaresystem before the system is delivered” [2004 p442]. He provides a graphical example ofa general model of the defect testing process [2004 p443].

Figure 6.1: General Defect Testing Model

His suggestion for testing system usage and operational features is to meet the followingcriteria [2004 p443].

1. All system functions that are accessed through menus should be tested.2. Combinations of functions that are accessed through the same menu should be

tested.3. Where user input is provided, all functions must be tested with both correct and

incorrect input.

Test Cases have been identified in order to meet this criteria. For this stage of the testingcycle I have separated out the components to test their individual correctness. Once thistesting has been completed there will be further testing to ensure that the integratedsystem works as it should. The conclusions reached by carrying out this testing will bediscussed afterwards.

62

CHAPTER 6. TESTING AND EVALUATION 63

Firstly, some tests to ensure that Sentinel.jar is parsing the XML file and inserting thedata into the database correctly. For these tests the Java application will be treated as aseparate componant reading from the XML outputted format of a small RRD file. Theresults will be displayed at the command line rather than inserted into the database,apart from the test cases which involve testing database connectivity and correctness.Each test case will be defined by a number, a description of the test, the expected out-come and the result. The sections are broken in to separate tables for ease of viewing.

Sentinel.jar - XML Parsing

No. Test Description Expected Outcome Result

100 Parse XML file for the last update The correct last update time istime. printed to screen PASS

101 Parse XML file for Aberrant Marks. Seven Aberrant Marks arefound and printed to screen. PASS

102 Correctly specify the times that The times the Aberrantthe Aberrant Marks occurred. Marks occurred are correct. PASS

103 Parse the XML file for the traffic The correct list of traffictypes in use. types are printed to screen. PASS

104 Correctly idenfity the traffic type The traffic type of eachin use for each Aberrant Mark. Aberrant Mark is correct. PASS

Table 6.1: Sentinel.jar Testing - XML Parsing

Sentinel.jar - Source and Profile detection


105 The path to the RRD file specified The correct path as specifiedas a command line argument can be at the command line is printedread in by the system. to screen PASS

106 The path can be broken down into The correct path and source isthe correct profile and source. found and printed to screen. PASS

Table 6.2: Sentinel.jar Testing - Source and Profile Detection

From the tests specified in Figures 6.1, 6.2 and 6.3 it can be seen that the Java portionof the Sentinel system is working correctly, both in it’s XML parsing and in it’s databaseconnectivity. It is able to retrieve the source and profile names from the path it issupplied. It can also retrieve all the necessary information from the RRD files in XMLform, including the correct date and time per Aberrant Mark, which was a concern atthe Implementation stage. It is capable of querying the database for results already held,and based upon that knowledge can insert or update currently held event information.


Sentinel.jar Database Connectivity


107 Check for the presence of the traffic All of the data types aretypes in the database. present in the database. PASS

108 Retrieve the ID numbers of each The correct ID numbers andtraffic type from the database. traffic types are printed to

screen. PASS

109 Check for the presence of the found Of the two detected datadata sources in the database. sources, one is present in the

database and one is not. PASS

110 Check for the presence of the found Of the two detected datadata profiles in the database. profiles, one is present in the

database and one is not. PASS

111 Insert the data source not already The data source not presentpresent, into the database. should be inserted into the

database PASS

112 Insert the data profile not already The data profile not presentpresent, into the database. should be inserted into the

database. PASS

113 Retrieve the ID numbers for each of The correct ID number and sourcethe data sources from the database. should be printed to screen. PASS

114 Retrieve the ID numbers for each of The correct ID number and profilethe data profiles from the database. should be printed to screen. PASS

115 Check for the existence of an Two of the detected AberrantAberrant Event in the database with Marks have equivalent entries inthe same start time, profile, type the database, the rest do not.and source as each Aberrant Markfound, but with an end time fiveminutes earlier. PASS

116 Insert the full details of each The five Aberrant Marks withoutdetected Aberrant Event which does equivalent entries should benot have an equivalent Aberrant inserted into the database.event already present in thedatabase. PASS

117 Update the end time of each of the The two Aberrant Events in theequivalent Aberrant Events in the database should have their enddatabase to be the same as the found time updated to be the same asAberrant Mark it matches. the two Aberrant Marks. PASS

Table 6.3: Sentinel.jar Testing - Database Connectivity


Next, runSentinel.sh must be tested to ensure its correct operation. This will be carriedout using a mock directory structure containing two profiles and two data sources. Thescript will be modified first of all to not delete the XML output it creates, and secondlyto print the command it will use to execute Sentinel.jar to the screen instead of runningit. This will ensure that it is behaving correctly, further testing will ensure that the twocomponents work together in the proper way.

runSentinel.sh


200 XML files are created for every valid After the script has been runRRD file within the directory appropriate XML files shouldstructure. exist. PASS

201 For every XML file created it should The correct command to runcreate a valid set of arguments to Sentinel.jar should be printedrun Sentinel.jar correctly to screen as the script meets

each XML file to be parsed. PASS

Table 6.4: runSentinel.sh Testing

As these test runSentinel.sh passes in every case possible to test while each componentis being dealt with independently.

6.1.2 Functional and Integration Testing

Functional testing, sometimes known as Black Box testing is, according to Ian Som-merville “an approach to testing where the tests are derived from the program or compo-nent specification”, the system is a Black Box and its behaviour can only be determinedby “studying its inputs and the related outputs”[2004 p443].

Figure 6.2: Functional Testing Model


In the case of Sentinel, this is also a form of integration testing as all the individualcomponents have to work together in order to meet the specification goals. Sommervilleprovides a graphical example of functional testing which illustrates how to view the sys-tem when conducting the tests which is shown in Figure 6.2. Testing in this area is mostlycentered around the use of the User Interface, in Sentinel’s case the web front end. Inputswill be chosen as test cases and outputs recorded. This section uses a database whichcontains a certain amount of test data, including two data sources, two profiles and 25events of multiple traffic types. Some of the events are older than 24 hours, which inthe test case is the amount of data to be shown in the Live Update. One of the eventsis marked as a False Positive, one as Acknowledged, and one as both; all three have acomment stored. Three of the events share the same start and end times. During thetest I simulated some aberrant behaviour by pinging a machine on the network at a veryhigh speed for approximately 15 minutes. The tables of test cases are shown over thenext three pages.

From the results of the test cases it can be seen that the Sentinel system has integratedsuccessfully and works as it was intended.


Table 6.5: Sentinel UI Functional Testing - Live Update

Sentinel UI - Live Update


301 Opening a web browser and loading Page should display only athe Sentinel Live Update page. set period of Aberrant Events

and a complete list of types,sources and profiles excludingthose flagged as Acknowlegedor False Positive. PASS

302 Filter Aberrant Events for only one In every case, only Aberrantdata source. Should be tried with Events involving that dataevery data source listed. source should be shown. PASS

303 Filter Aberrant Events for only one In every case, only Aberranttraffic type. Should be tried with Events involving that trafficevery traffic type listed. type should be shown. PASS

304 Filter to display events which have The event flagged asbeen flagged as Acknowleged. Acknowledged should be shown

alongside the normal resultsbut not the event marked asAcknowledged and False Positive PASS

305 Filter to display events which have The event flagged asbeen flagged as False Positive. False Positive should be shown

alongside the normal resultsbut not the event marked asAcknowledged and False Positive PASS

306 Filter to display events which have The events flagged asbeen flagged as False Positive or False Positive or AcknowledgedAcknowledged. alongside normal results as well

as the events flagged as bothAcknowledged and False Positive. PASS

307 Display every Aberrant Event within The default list of Aberrantthe set timeframe by selecting every Events should be displayed,kind of filter possible at once. plus the Events which had been

flagged as Acknowledged orFalse Positive. PASS

308 Leave the Live Update page open for Every five minutes the tableapproximately an hour. (During this should refresh. At some pointtime, aberrant network traffic will during the hour, the newbe created.) Aberrant Behaviour should be

detected and displayed. PASS

309 Click on the Details link of a The Details page should beparticular Aberrant Event. displayed with information

pertaining to that event. PASS


Table 6.6: Sentinel UI Functional Testing - Details

Sentinel UI - Details


310 Leading on from test 309, load the The Details page should beDetails page by clicking on the Details loaded with information aboutlink from an Aberrant Event that event. A graph should

be shown which covered therelated time period, theexact times of the eventshould be highlighted ingreen. Statistical detailsbased on the flows should beshown and a lookup of the toptop IP addresses and portnumbers. PASS

311 Check for associated Aberrant Events. These should be displayed atthe bottom of the page. PASS

312 Click on the Details link from one of A similar Details page shouldthe associated Aberrant Events. be loaded with details

relevant to the new event. PASS

313 Edit the details of an event by The UI should redirect to achanging the acknowledgement status different page indicating theand false positive status to yes success of the alteration.and adding/altering a comment. The new details should be

inserted into the database. PASS

314 Return to the Live Update page and The newly edited event shouldfilter the results to show events not have been displayedwhich have been marked as false initially but should appearpositive and acknowledged. then the filter is applied. PASS


Table 6.7: Sentinel UI Functional Testing - ReviewSentinel UI - Review


315 Load the Review page of the web No events should be shown,interface. but a filter interface for

searching. PASS

316 Search for an event with a specific Using a specified start andstart time and end time. No other end time known to be in thefilters. database, four events should

be found; one from eachprofile and source. PASS

317 Search for the same start and end The two results as showntimes as test 316 but filter to previously connected to thatonly show sourceA. source should be displayed. PASS

317 Search for the same start and end The two results as showntimes as test 316 but filter to previously connected to thatonly show profileA. profile should be displayed. PASS

318 Search for events starting between All the events which start intwo dates. that period should be

displayed and no others. PASS

319 Search for events ending between All the events which end intwo dates. that period should be

displayed and no others. PASS

320 Search for events ending between the In each case, only the eventssame dates as test 319, but only those which concern that networkof each listed type individually. traffic type should be shown.

If there are none then thereshould be nothing displayed. PASS

321 Search for events starting between the In each case, only the eventssame dates as test 318, but only those which concern that networkof each listed type individually. traffic type should be shown.

If there are none then thereshould be nothing displayed. PASS

321 Search for events starting and ending In each case, only the eventsat the same times as test 316 but which concern that networkonly those of each listed type traffic type should be shown.individually. If there are none then there

should be nothing displayed. PASS


6.2 User Interface Evaluation

The user interfaces within Sentinel are intended to be simplistic, but functional and theiruse should be fairly straight forward. One of the key things kept in mind when design-ing these interfaces was the situation in which they would be used. When diagnosing aproblem, a network operator does not want the information to be spread across multiplepages, the information should be presented in a coherant clear way in as little time aspossible. For this evaluation I shall assess each section of the user interface in turn.

The idea behind the Live Update page was two fold, it should be functional such thanan operator could use it on their personal machine, but also so it could be used in anoffice environment as a network monitoring tool. The page if displayed on a larger screenwould give anyone concerned an instant overview of any strange behaviour on the net-work which they could then investigate more thoroughly, using the same interface attheir personal machine. The colour scheme is very basic, colour is not very important aslong as the information is clear. The filters are very clear and simple; it should be fairlyobvious how they are intended to be used, but the aim was to provide a quick way ofnarrowing down on a particular problem - on a network the size of GEANT2 there couldbe a large amount of aberrant behaviour occuring at any one time and it is importantthat the operators should be able to see exactly what they require.

The Details page was designed with similar aspirations. The page has no use as anoffice wide monitoring solution so there is no requirement for the information to be sostripped down. This page is to provide more information to the network operator so thatthey can perform some analysis and hopefully make an initial suggestion as to the causeof the anomaly. There are three pieces of information which are very important, the firstis the graph of the time period. This gives an instant view of what was happening onthe network as this aberrant event was triggered. As mentioned previously, it displaysinformation about a longer period of time than the actual event lasted, this is to give abetter overview; operators can see what was happening in the build up to an event, andin the case of events which have ended, what happened at the end to cause the event tofinish. The second important section is the statistical analysis. This shows in detail whatthe graph gives an indication of, broken down into what was causing the most traffic (bethat flows, packets or bytes) and what protocols it was using. The IP addresses and portnumbers are then looked up to provide an extra level of information. The final importantsection shows potentially associated events. The aim of this information is to give theoperator a better indication of the scope of the problem, and to provide easy links toinvestigate other problematic areas. It is displayed in the same style as the Live Updatepage, full details are not required unless the operator is specifically interested in them,in which case the details link can be clicked.

The Details page also provides the ability to edit the details of an event via a form.The details of the event are automatically filled into the details of the form so if an


operator chooses to change something, they know what the current values are beforethey start. It is fairly simplistic, but again was designed with the aim of being quickand simple. An operator does not want to be delayed when performing his work by acomplicated interface design.

The Review page is simply an interface to the historical information, a way of build-ing queries for the database. The most important factor was to provide a flexible filtersystem, operators can search for events either by knowing the exact start and end times,or just requesting any events which started or finished between two dates. This then canbe filtered further in a similar style to the Live Update page, by selecting different piecesof information which should be present in the results. The style is common across allsections of the interface so that once the technique of filtering is understood there is nofurther knowledge required. Results are again presented in the same style as on the LiveUpdate page with further details available on request.

Overall the user interface serves its purpose, it is clean, clear and simple which arethe most important factors for how it is intended to be used. The design could have beenmore polished, and the filters organised in a more flexible way but due to the nature ofthe data, it can’t be known before then system is run how many sources and profilesthere are. They should be organised into blocks of five per line, incase more than 5 arelisted which keeps them together in a sensible way which shouldn’t overfill the webpage.Other than that the design is a successful interpretation of the requirements and needsof a large network support office.

6.3 Evaluation

This is broken down into two sections, a comparison of the original derived requirementslist and the finished system, and then an overview containing some feedback from thenetwork operator at DANTE who has been my liaison for the project.

6.3.1 Requirements List Review

In order to evaluate the success of the system I am going to go back over each of thederived requirements from the Design chapter, and assess how well this requirement hasbeen met.

A.1Aberrant network behaviour instances should be displayed together on one

page organised by the time they occurred.

This is fully realised in the Live Update section. Aberrant network events are displayedin a tabular format organised by the end time, so the events more recently active are


displayed nearest the top of the list. The reasoning behind this was so the Live Updatepage could be used as an office wide network monitoring screen, where every event couldbe seen easily. I feel this has been acheived successfully.

A.2Only the most relevant information for each aberrant behaviour instance

should be displayed.

The interface design was created so that the Live Update page would be merely a list ofevents, with very basic information. I decided that the most important information wasthe start and end times, since this should be what the events are ordered by. Then thetraffic type, profile and source, as this identifies where on the network the behaviour isoccurring. Then the two flags are listed, whether the events are acknowledged or markedas false positive. This is not necessarily vital information about the event, but it aidsunderstanding of the interface as a filter is applied removing events marked with thoseflags at the start. The last entry per event is a simple link to the Details page wheremore information can be found. I believe this requirement has been met successfully.

A.3Aberrant network behaviour instances should be aggregated to display one

event per continuously flagged period.

This functionality is provide via the database and the Java component. As an event isadded to the database, the database is checked to see if there is an existing event withidentical details, other than the end time. If so, only the end time is updated ratherthan a new entry made. This was of crucial importance to the design as it made thepotentially large amounts of held data much more manageable and created the possibilityfor associated events to be identified very easily.

A.4This display should automatically update as new aberrant behaviour is

detected on the network.

This has also been acheived, the Live Update page automatically refreshes and retrievesany new aberrant network event data when it does so.

A.5The display should be accessible from machines other than the machine it is

installed on.

This requirement is met as the interface is all web based, the communication to thedatabase occurs over the network so providing the server can be accessed then the webinterface can be too.


A.6Each aberrant network behaviour events should be displayed in an identical

style so quick comparisons of information can be made.

The same information is retrieved for each aberrant network event, and this is displayedas a list in a table. The table is organised by time, so it should be easy to scan the list forthe events you are looking for. This style is carried throughout the system and is usedon other pages so when an operator gets used to the layout and information presentationit will aid his work.

B.1The information displayed as part of the live update can be filtered to show

only instances which match particular conditions.

All of the results displayed on the Live Update page can be filted to show only specificinformation. This is flexible, so any amount of filters for things to be included can beadded. The filtered display is only a temporary thing however, perhaps functionalityto persist filters across aberrant event display updates might have been useful. Theimplemented system does acheive what was stated but could perhaps have been moreusable with a slightly different implementation.

B.2The default update should contain information the network operator

believes to be the most relevant in the first instance.

This is connected to the previous requirement, Sentinel is implemented so that events ofany traffic type, source and profile are displayed as a normal Live Update, but the displayis filtered by default to not show any events which have been marked as acknowledgedor as a false positive. This was implemeted after discussions with both the networkoperator at DANTE and a network specialist based at Lancaster University and thereasoning behind it is that if an event has been dealt with then it no longer needs to beshown as a current event. If there is another event which is connected to an acknowledgedone, then it should be shown as associated from within the Details section. This is simplyfor speed of viewing on the main update page and has been implemented successfully.

C.1Further information should be available for each aberrant network

behaviour instance on request.

This requirement is met via the Details section, every event displayed as part of the LiveUpdate also supplies a link to view further details if it is required.


C.2This information should include, at the least, a graph of the time frame in

question and a brief statistical synopsis for the given period and traffic type.

The Details page shows all the required information and also gives further details byperforming a hostname lookup on the top four IP addresses, and a service lookup ontheir respective source and destination ports.

C.3This information should be persistant beyond deletion of the actual

NetFlow records for that aberrant network behaviour event.

The statistical analysis and service/hostname lookup information is held as a text recordas part of each event in the database. The graph however is only held for as long asthe RRD files are scheduled to last, and over time will lose its accuracy. The statisticalanalysis is more information that would normally have been available in such a situationhowever, and so I feel this requirement has been met.

C.4It should be made obvious if a particular aberrant network behaviour event

has been flagged as a false positive when examining further details.

On the Details page, if the specified event has been marked as a false positive, this isdisplayed in large writing above the graph. This is so a network operator does not wastetime re analysing something which has already been assesed and found to be an error.

D.1If further information about an aberrant network behaviour event is

requested then a display should also be provided of possible associatedevents.

The Details page provides this as a table in a similar style to the Live Update display. Theresults present in this table to not follow the same restrictions as the Live Update pageand this list includes events which have been marked as acknowledges or false positive.This is because the events are related regardless of whether they have been dealt with ordetermined to be inaccurate.

D.2Further information pertaining to these associated aberrant network

behaviour events should be available on request.


In the same way as the Live Update page, the associated events which have been identifiedprovide a link back to the Details page for more information regarding themselves.

E.1Detected aberrant network behaviour events should be recorded in some

form of persistant database.

This is one of the most basic requirements and it has been achieved admirably, withoutthe existence of a database the system would not function.

E.2The database should be reliable, quick to query, and scale well to holding

potentially very large data sets.

The database server used is MySQL which is commonly used in industry for much moretime critical applications than the regular five minute updates that Sentinel works from.The database has been implemented so there is no data repetition, most searching andqueries are done based upon ID numbers which are the easiest thing to index and generallythe quickest information to query upon. The way the database has been designed to holdinformation about continuous aberrant behaviour marks as one entry, rather than as aseries means that the data sets involved will be considerably smaller, and it increases theease with which searches based on date/time can be performed.

F.1It should be possible to view past aberrant network behaviour event details

based upon a number of criteria;

This is requirement is met via the Review page, the next few requirements specify moredetail.

F.2Exact Start time and End time.

This is possible using the query form on the Review page. Only events which start andend at that exact time specified will be displayed.

F.3Start time somewhere between two given dates and times.

This is also possible using a differemt section of the query form on the Review page.


F.4Start time somewhere between two given dates and times.

Lastly, this is also possible, queried in a similar way to F.3.

F.5Alongside queries based upon the starting and end times results should bechosen according to further specific information; type/source/profile etc.

This functionality is also provided alongside the date based filters, other options can beselected to show only events which match those details. Ideally it might have been usefulto have been able to specify queries like “everything but not using profileX” more easilythan ticking every profile apart from profileX, but there is still the capability for doingqueries of that kind so the requirement is adequately met.

F.6When results have been found it should be possible to view further

information about an event in the same way it would be possible for a liveevent.

The results displayed on the Review page are of a similar style to the Live Update. Eachone has a basic amount of information, but there is a link to access much more detailedresults via the Details page.

G.1Aberrant network behaviour events stored in the system should be able to

be flagged as acknowledged when they have been dealt with.

This feature is built into the database and is possible via the Details interface.

G.2Aberrant network behaviour events stored in the system should be able to

be flagged as a false positive if they have been identified as such.

This is also provided in the database design and again is accessed via the Details interface.

G.3Operators who have dealt with a particular aberrant network behaviour

event should be able to leave some comment regarding their findings for thebenefit of later users.

This is provided with a comment field in the database, per event. The Details page offeresan interface to leave a comment, or alter a comment that someone else has made. Thisinformation is not displayed as part of the Live Update screen, but is shown when moredetails are requested about a specific event.


6.3.2 Summary and Feedback from DANTE

The overall aim of this project was;

To assist a network operator in the identification and diagnosis of network problems andillustrate how the inclusion of automated aberrant behaviour detection could improve

large network monitoring.

From this there were certain lower level requirements derived which I have shown to allhave been successfully met, however, I wanted to look at the statement again and justassess how far this project has gone to achieve those aims. The Sentinel system is func-tional and it does exactly what it set out to do, it assists a network operator by aiding hisworkflow, providing all the right information in one place. That however is a feature ofmany network monitoring systems, and where Sentinel is different is the automated aber-rance detection and it is that which makes it the interesting prospect that it is. When Ihad finished the implementation I contacted the network operator in DANTE who I havebeen liaising with throughout the duration of the project, his name is Maurizio Molina.He very kindly offered to review my project and provide me with feedback about howwell it met his requirements. Overall he was very pleased, it’s a functional self containedproject which acheives what it set out to - the initial requirements were created basedupon conversations with him about his workflow and the way he utilised the NfSen dataas part of his work. His initial hopes had been that I would take NfSen-HW and use it toexamine whether it was useful in detecting aberrance, rather than providing a fully func-tional solution for viewing the aberrant events it detected, so in some ways my projectwas beyond his expectations. His only criticism was that he would have appreciated moreresearch into NfSen-HW and how successful its aberrance detection was based upon theGEANT2 data, but having implemented what I have has indicated that NfSen-HW doesindeed do as it claims and Holt-Winters Forecasting is generally successful, within itslimits. More research would have been carried out based upon the data from GEANT2if it had not been for the data source issues present in NfSen-HW.Most importantly I believe that the project does indicate the usefulness of aberrantbehaviour detection as part of a wider network monitoring strategy for large networkproviders. It is another source of information when diagnosing problems which hasn’treally been taken advantage of. The prototype I have created could be extremely infor-mative, perhaps with a little more development regarding anomaly detection methods,this is something I will discuss in my conclusion.

7Conclusion

7.1 Overview

Looking back over the project as a whole, I am extremely happy with the outcome.Firstly, I have gained quite an indepth understanding of network monitoring techniques,aberrance detection methods, a selection of Linux tools and configuration options, andthe experience of dealing with a real world network operation centre in DANTE. This isall knowledge I have developed whilst working on the Sentinel system, and it has been avery worthwhile learning experience. Secondly, the system that has been developed worksas it was intended, and it provides and interesting angle on network monitoring whichaccording to my research, has not been widely utilised. The feedback I received fromMaurizio helped prove to me that the project has been a success and that this is an areawhere there is still much work to be done. This is something I’ll come back to as part ofthe Further Work section. There are some areas where I think further development couldhave been undertaken. The database was more or less exactly what was required basedon the needs of the data, so if I were to redevelop from scratch, I think I would keepthe same database design. One area I would look into changing is the method of pickingup data from the RRD files. Whilst Java provide a functional, working application, itprobably isn’t a language most suited to the task. Unfortunately I spent so much timeresearching NfSen, NfDump, NfSen-HW and RRDtool that when I discovered the JavaRRD libraries were not compatible, there was not enough time to change the develop-ment plan. Hence the XML parsing solution was implemented, and successfully, but withhindsight I would have looked into developing that part of the project using a lower levellanguage, perhaps Perl where there is a substantial chance of an RRD interface librarythat works. The second thing which would be reconsidered is the development of theweb interface as an independent entity. It might have been possible, having used a PerlRRD interface, to tie the web interface into NfSen-HW as a plugin. I investigated thepossibility of implementing the web interface as a stand alone front end plugin, that isone without a back end plugin to go along side it, but unfortunately it seems that theNfSen-HW snapshot does not recognise front end plugins withoug a corresponding perlmodule back end. I don’t think that the Sentinel web interface lost anything by beingan independent system however, it just might have been a nice touch to tie everythingtogether. I also believe that the web interface could have been made more sophisticated,I am not a web developer by choice and the interface, whilst meeting all of the desirecriteria, was very plain and simple. Perhaps using some other development language itmight have been easier to implement, but PHP met all the needs and satisfied all of therequirements. This is again something which I would examine if I were do either continuethe project or to start again. Something that would need to be added in order to movethe system from being a prototype to becoming a live service would be user authenti-cation and permissions, this would require changes to the database and front end, but

78

CHAPTER 7. CONCLUSION 79

shouldn’t be too difficult. It’s not something which causes problems for the prototype,and security has been considered in its development, but it would be a required featurein any real world network operations centre.

One of the biggest problems was infact the software I was intefacing with, NfSen-HW.It is a project very much in beta development, and it is based upon an old version of anapplication which at the time of the snapshot was still having major bugs ironed out ofit. The problems cannot be rectified in NfSen-HW as it stands due to the amount thathas been added to the default codebase, so the development team from Hungarnet wouldhave to start from scratch with the latest version. If I had not encountered so muchdifficulty with adding the GEANT2 data into NfSen-HW, there would have been moreresearch into the accuracy of the results and the implications for fine tuning, this wouldhave been a highly useful addition to the report, sadly it was just not possible with thetime available.

7.2 Further work

There is a lot of scope for this project to be taken further. What I have produced is anindication of how incredibly useful aberrance detection could be in real world networkmonitoring environments, but it bases all of its aberrance detection on one method andone technology. As I identified within the Background and Related Work section, thereare many different aberrance detection methods being researched, the most interesting ofwhich I believe to be the use of entropy to detect changes in network use. If this projectwere to be continued I would like to see an investigation into the use of entropy to detectanomalous events, and a comparison of those results with further study regarding theaccuracy of NfSen-HW, as I mentioned in the previous section. Whilst there is a greatamount of further research to be done surrounding this topic, I feel that in order forNfSen-HW to be regarded as a decent platform for development some time should begiven to bring the project up to speed. The latest versions of NfSen use an entirelydifferent RRD structure compared to the snapshot NfSen-HW is built on which meansthat their results are incompatible. The RRD structure in the newer version is moredistributed and logical, each traffic type is divided out into it’s own RRD file rather thanstoring everything in one source RRD. Allowing time for development could also meanthat Peter Haag has had chance to implement the suggestions Gabor Kiss made regardingthe plugin functionality, and that could result in NfSen-HW being implemented simplyas a plugin for NfSen. This would be the ideal as it would allow the version of NfSen toalways be the most up to date and least likely to cause problems in development.

AAcknowledgements

The following individuals helped in the develoment and design of thisproject:

Maurizio Molina, DANTE Network OperatorFor large amounts of help and advice regarding the process and systems in place atDANTE, and for some very delicious Italian food.

Gabor Kiss and Janos MohacsiFor taking the time to answer my questions regarding NfSen-HW and RRDtool.

80

BProject Proposal

See the pages following.

81

CJavaDoc

See the pages following.

82

DNfDump(1) Manpage

nfdump(1) nfdump(1)

NAMEnfdump - netflow display and analyze program

SYNOPSISnfdump [options] [filter]

DESCRIPTIONnfdump is the netflow display and analyzing program of the nfdump toolset. It reads the netflow data from files stored by nfcapd and pro-cesses the flows according the options given. The filter syntax is com-parable to tcpdump and extended for netflow data. Nfdump can also dis-play many different top N flow and flow element statistics.

OPTIONS-r inputfile

Read input data from inputfile. Default is read from stdin.

-R exprRead input from a sequence of files in the same directory. expr maybe one of:/any/dir Read all files in directory dir./dir/file Read all files beginning with file./dir/file1:file2 Read all files from file1 to file2.

-M exprRead input from multiple directories. expr looks like:/any/path/to/dir1:dir2:dir3 etc. and will be expanded to the direc-tories: /any/path/to/dir1, /any/path/to/dir2 and /any/path/to/dir3Any number of colon separated directories may be given. The files toread are specified by -r or -R and are expected to exist in all thegiven directories. The options -r and -R must not contain anydirectory part when used in conjunction with -M.

-m Sort the netflow records according the date first seen. This optionis usually only useful in conjunction with -M, when netflow recordsare read from different sources, which are not necessarily sorted.

-w outputfileIf specified writes binary netflow records to outputfile ready to beprocessed again with nfdump. The default output is ASCII on stdout.

-f filterfileReads the filter syntax from filterfile. Note: Any filter specifieddirectly on the command line takes precedence over -f.

83

APPENDIX D. NFDUMP(1) MANPAGE 84

-t timewinProcess only flows, which fall in the time window timewin, wheretimewin is YYYY/MM/dd.hh:mm:ss[-YYYY/MM/dd.hh:mm:ss]. Any parts ofthe time spec may be omitted e.g YYYY/MM/dd expands toYYYY/MM/dd.00:00:00-YYYY/MM/dd.23:59:59 and processes all flow froma given day. The time window may also be specified as +/- n. In thiscase it is relativ to the beginning or end of all flows. +10 meansthe first 10 seconds of all flows, -10 means the last 10 seconds ofall flows.

-c numLimit number of records to process to the first num flows.

-a Aggregate netflow data. Aggregation is done at connection level.

-A fields[/netmask]Aggregate netflow data using the specified fields, where fields is a, separated list out of srcip dstip srcport dstport. The defaultis using all fields: srcip,dstip,srcport,dstport. An additional net-mask may be given. In that case flows from the same subnets areaggregated. In order to do proper aggregation, the IP version isimportant, for which the mask applies. Therefore the IP protocolversion must be given in the form of: srcip4/24 for IPv4 orsrcip6/64 for IPv6 address aggregation. Apply the protocol versionfor dstip respectively. Only flows of the same IP protocol tcp,udp, icmp etc. are aggregated.

-I Print flow statistics from file specified by -r, or timeslot speci-fied by -R/-M. The printed information corresponds to pre nfdump1.5 nfcapd stat files.

-S Compatibility option with pre 1.4 nfdump. Is equal to -srecord/packets/bytes.

-s statistic[:p][/orderby]Generate the Top N flow or flow element statistic. statistic can be:record Statistic about arregated netflow records.srcip Statistic about source IP addressesdstip Statistic about destination IP addressesip Statistic about any (source or destination) IP addressessrcport Statistic about source portsdstport Statistic about destination portsport Statistic about any (source or destination) portssrcas Statistic about source AS numbersdstas Statistic about destination AS numbersas Statistic about any (source or destination) AS numbersinif Statistic about input interfaceoutif Statistic about output interfaceproto Statistic about IP protocols

By adding :p to the statistic name, the resulting statistic issplitted up into transport layer protocols. Default is transportprotocol independant statistics.orderby is optional and specifies the order by which the statistics


is ordered and can be flows, packets, bytes, pps, bps or bpp. Youmay specify more than one orderby which results in the same statis-tic but ordered differently. If no orderby is given, statistics areordered by flows. You can specify as many -s flow element statis-tics on the command line for the same run.Example: -s srcip -s ip/flows -s dstport/pps/packets/bytes -srecord/bytes

-O orderbySpecifies the default orderby for flow element statistics -s, whichapplies when no orderby is given at -s. orderby can be flows, pack-ets, bytes, pps, bps or bpp. Defaults to flows.

-l [+/-]packet_numLimit statistics output to those records above or below thepacket_num limit. packet_num accepts positive or negative numbersfollowed by K , M or G 10E3, 10E6 or 10E9 flows respectively.See also note at -L

-L [+/-]byte_numLimit statistics output to those records above or below the byte_numlimit. byte_num accepts positive or negative numbers followed by K, M or G 10E3, 10E6 or 10E9 bytes respectively. Note: These lim-its only apply to the statistics and aggregated outputs generatedwith -a -s or -S. To filter netflow records by packets and bytes,use the filter syntax ’packets’ and ’bytes’ described below.

-n numDefine the number for the Top N statistics. Defaults to 10. If 0 isspecified the number is unlimited.

-o formatSelects the output format to print flows or flow record statistics(-s record). The following formats are available:raw Print each file flow record on multiple lines.line Print each flow on one line. Default format.long Print each flow on one line with more detailsextended Print each flow on one line with even more details.pipe Machine readable format: Print all fields ’|’ separated.fmt:format User defined output format.

For each defined output format except -o fmt:<format> an IPv6 longoutput format exists. line6, long6 and extended6. See output formtsbelow for more information.

-K keyAnonymize all IP addresses using the CryptoPAn (Cryptography-basedPrefix-preserving Anonymization) module. The key is used to initial-ize the Rijndael cipher. key is either a 32 character string, or a64 hex digit string starting with 0x. Anonymizing takes place afterapplying the flow filter, but before printing the flow or writingthe flow to a file.

See http://www.cc.gatech.edu/computing/Telecomm/cryptopan/ for more


information about CryptoPAn.

-q Suppress the header line and the statistics at the bottom.

-z Zero flows. Do not dump flows into the output file, but only thestatistics record.

-Z Check filter syntax and exit. Sets the return value accordingly.

-X Compiles the filer syntax and dumps the filter engine table to stdout. This is for debugging purpose only.

-V Print nfdump version and exit.

-h Print help text on stdout with all options and exit.

RETURN VALUEReturns

0 No error.255 Initialization failed.254 Error in filter syntax.250 Internal error.

OUTPUT FORMATSThe output format raw prints each flow record on multiple lines,including all information available in the record. This is the mostdetailed view on a flow.

Other output formats print each flow on a single line. Predefined out-put formats are line, long and extended The output format line is thedefault output format when no format is specified. It limits theimformation to the connection details as well as number of packets,bytes and flows.

The output format long is identical to the format line, and includesadditional information such as TCP flags and Type of Service.

The output format extended is identical to the format long, andincludes additional computed information such as pps, bps and bpp.

Fields:

Date flow start: Start time flow first seen. ISO 8601 format includ-ing miliseconds.

Duration: Duration of the flow in seconds and miliseconds. If flowsare aggregated, duration is the time span over the entire periode oftime from first seen to last seen.

Proto: Protocol used in the connection.

Src IP Addr:Port: Source IP address and source port.


Dst IP Addr:Port: Destination IP address and destination port.

Flags: TCP flags ORed of the connection.

Tos: Type of service.

Packets: The number of packets in this flow. If flows are aggre-gated, the packets are summed up.

Bytes: The number of bytes in this flow. If flows are aggregated,the bytes are summed up.

pps: The calculated packets per second: number of packets / dura-tion. If flows are aggregated this results in the average pps dur-ing this periode of time.

bps: The calculated bits per second: 8 * number of bytes / duration.If flows are aggregated this results in the average bps during thisperiode of time.

Bpp: The calculated bytes per packet: number of bytes / number ofpackets. If flows are aggregated this results in the average bppduring this periode of time.

Flows: Number of flows. If flows are listed only, this number isalwasy 1. If flows are aggregated, this shows the number of aggre-gated flows to one record.

Numbers larger than 1048576 (1024*1024), are scaled to 4 digits and onedecimal digit including the scaling factor M, G or T for cleaner out-put, e.g. 923.4 M

To make the output more readable, IPv6 addresses are shrinked down to16 characters. The seven most and seven least digits connected with twodots .. are displayed in any normal output formats. To display thefull IPv6 address, use the appropriate long format, which is the formatname followed by a 6.

Example: -o line displays an IPv6 address as 2001:23..80:d01e where asthe format -o line6 displays the IPv6 address in full length2001:234:aabb::211:24ff:fe80:d01e. The combination of -o line -6 isequivalent to -o line6.

The pipe output format is intended to be read by another programm forfurther processing. Values are separated by a |. IP addresses areprinted as 4 consecutive 32bit numbers. Output sequence:

Address family PF_INET or PF_INET6Time first seen UNIX time secondsmsec first seen Mili seconds first seenTime last seen UNIX time secondsmsec last seen Mili seconds first seenProtocol Protocol


Src address Src address as 4 consecutive 32bit numbers.Src port Src portDst address Dst address as 4 consecutive 32bit numbers.Dst port Dst portSrc AS Src AS numberDst AS Dst AS numberInput IF Input InterfaceOutput IF Output InterfaceTCP Flags TCP Flags

000001 FIN.000010 SYN000100 RESET001000 PUSH010000 ACK100000 URGENTe.g. 6 => SYN + RESET

Tos Type of ServicePackets PacketsBytes Bytes

For IPv4 addresses only the last 32bit integer is used. All others areset to zero.

The output format fmt:<format> allows you to define your own outputformat. A format description format consists of a single line contain-ing arbitrary strings and format specifier as described below

%ts Start Time - first seen%te End Time - last seen%td Duration%pr Protocol%sa Source Address%da Destination Address%sap Source Address:Port%dap Destination Address:Port%sp Source Port%dp Destination Port%sas Source AS%das Destination AS%in Input Interface num%out Output Interface num%pkt Packets%byt Bytes%fl Flows%pkt Packets%flg TCP Flags%tos Tos%bps bps - bits per second%pps pps - packets per second%bpp bps - Bytes per package

For example the standard output format long can be created as


-o "fmt:%ts %td %pr %sap -> %dap %flg %tos %pkt %byt %fl"

You may also define your own output format and have it compiled intonfdump. See nfdump.c around line 100 for more details.

FILTERThe filter syntax is similar to the well known pcap library used bytcpdump. The filter can be either specified on the command line afterall options or in a separate file. It can span several lines. Anythingafter a # is treated as a comment and ignored to the end of the line.There is virtually no limit in the length of the filter expression. Allkeywords are case independent.

Any filter consists of one or more expressions expr. Any number of exprcan be linked together:

expr and expr, expr or expr, not expr and ( expr ).

Expr can be one of the following filter primitives:

protocol versioninet for IPv4 and inet6 for IPv6

protocolproto <protocol> where protocol can be any known protocol such asTCP, UDP, ICMP, ICMP6 GRE, ESP, AH, or a valid protocol number.

IP address[SourceDestination] IP <ipaddr> or[SourceDestination] HOST <ipaddr> with <ipaddr> as any valid IPv4or IPv6 address. SourceDestination may be omitted.

SourceDestinationdefines the IP address to be selected and can be SRC DST or anycombination of SRC and|or DST. Ommiting SourceDestination is equiv-alent to SRC or DST.

inoutdefines the interface to be selected and can be IN or OUT.

network[SourceDestination] NET a.b.c.d m.n.r.s. for IPv4 with m.n.r.s asnetmask.[SourceDestination] NET <net> / num with <net> as a valid IPv4 orIPv6 network and num as maskbits. The number of mask bits mustmatch the appropriate address familiy IPv4 or IPv6. Networks may beabreviated such as 172.16/16 if they are unambiguous.

Port[SourceDestination] PORT [comp] num with num as a valid port num-ber. If comp is omitted, = is assumed.


Interface[inout] IF num with num as an interface number.

Flagsflags tcpflags with tcpflags as a combination of:A ACK.S SYN.F FIN.R Reset.P Push.U Urgent.X All flags on.

The ordering of the flags is not relevant. Flags not mentioned aretreated as dont care. In order to get those flows with only the SYNflag set, use the syntax ’flags S and not flags AFRPU’.

TOS Type of service: tos value with value 0..255.

Packetspackets [comp] num [scale] to specify the packet count in the net-flow record.

Bytesbytes [comp] num [scale] to specify the byte count in the netflowrecord.

Packets per second: Calculated value.pps [comp] num [scale] to specify the pps of the flow.

Duration: Calculated valueduration [comp] num to specify the duration in miliseconds of theflow.

Bits per second: Calculated value.bps [comp] num [scale] to specify the bps of the flow.

Bytes per packet: Calculated value.bpp [comp] num [scale] to specify the bpp of the flow.

AS [SourceDestination] AS num with num as a valid AS number.

scale scaling factor. Maybe k m g. Factor is 1024

comp The following comparators are supported:=, ==, >, <, EQ, LT, GT . If comp is omitted, = is assumed.

EXAMPLESnfdump -r /and/dir/nfcapd.200407110845 -c 100 ’tcp and ( src ip172.16.17.18 or dst ip 172.16.17.19 )’ Dumps the first 100 netflowrecords which match the given filter:

nfdump -R /and/dir/nfcapd.200407110845:nfcapd.200407110945 ’host192.168.1.2’ Dumps all netflow records of host 192.168.1.2 from July 11


08:45 - 09:45

nfdump -M /to/and/dir1:dir2 -R nfcapd.200407110845:nfcapd.200407110945-S -n 20 Generates the Top 20 statistics from 08:45 to 09:45 from 3sources

nfdump -r /and/dir/nfcapd.200407110845 -S -n 20 -o extended Generatesthe Top 20 statistics, extended output format

nfdump -r /and/dir/nfcapd.200407110845 -S -n 20 ’in if 5 and bps > 10k’Generates the Top 20 statistics from flows comming from interface 5

nfdump -r /and/dir/nfcapd.200407110845 ’inet6 and tcp and ( src port >1024 and dst port 80 ) Dumps all port 80 IPv6 connections to any webserver.

NOTESGenerating the statistics for data files of a few hundred MB is noproblem. However be careful if you want to create statistics of severalGB of data. This may consume a lot of memory and can take a while.Also, anonymizing IP addresses is time consuming and uses a lot of CPUpower, which reduces the number of flows per second. Thereforeanonymizing takes place only, when flow records are printed or writtento files. Any internal flow processing takes place using the originalIP addresses.

SEE ALSOnfcapd(1), nfprofile(1), nfreplay(1)

BUGSThere is still the famous last bug. Please report them - all the lastbugs - back to me.

2005-08-19 nfdump(1)

EHolt-Winters Forecasting Examples

Figure E.1: Aberrant Marking Example

92

APPENDIX E. HOLT-WINTERS FORECASTING EXAMPLES 93

Figure E.2: Subtracting 40 Minutes Example 1



Bibliography

[Working Documents] Available from: http://www.lancs.ac.uk/˜burys1/fyp

[Barford & Plonka 2001] Barford, P. & Plonka, D. (2001) Characteristics of Network TrafficFlow Anomalies. In:IMW 01: Proceedings of the 1st ACM SIGCOMM Workshop on In-ternet Measurement, San Francisco, California, USA. ACM Press, New York, NY, USA.pp69-73.

[Barford et al. 2002] Barford, P., Kline, J., Plonka, D. & Ron, A. (2002) A Signal Analysis ifNetwork Traffic Anomalies. In:IMW ’02: Proceedings of the 2nd ACM SIGCOMM Work-shop on Internet Measurement, Marseille, France. ACM Press, New York, NY, USA.pp71-82.

[Bash, 2007] Bash (2007) The Bash Reference Manual [Internet]. Available from:<http://www.gnu.org/software/bash/manual/bashref.html>[Accessed on 20th February2007].

[Braukhoff et al. 2006] Braukhoff, D., Tellenbach, B., Wagner, A., May, M. & Lakhina. A.(2006) Impact of Packet Sampling on Anomaly Detection Metrics. In: IMC ’06: Proceed-ings of the 6th ACM SIGCOMM on Internet measurement, Rio de Janeriro, Brazil. ACMPress, New York, NY, USA. pp159-164.

[Brutlag, 2000a] Brutlag, J. (2000) Aberrant Behaviour Detection in Time Series for NetworkMonitoring. In:LISA ’00: Proceedings of the 14th USENIX conference on System Admin-istration, New Orleans, Louisiana. USENIX Association, Berkeley, CA, USA. pp139-146.

[Brutlag, 2000b] Brutlag, J. (2000) Notes on RRDTOOL implementation of Aberrant BehaviorDetection [Internet], Microsoft WebTV, Mountain View, California, USA. Available from:<http://cricket.sourceforge.net/aberrant/rrd hw.htm/>[Accessed 20th February 2007].

[Cacti, 2007] Cacti (2007) Cacti - The complete rrd based graphing solution [Internet]. Availablefrom: <http://cacti.net/features.php/>[Accessed 25th February 2007].

[Chatfield & Yar, 1988] Chatfield, C & Yar, M. (1988) Holt-Winters Forecasting: Some Prac-tical Issues The Statistician, Vol. 37, No. 2, Special Issue: Statistical Forecasting andDecision-Making. 1988, pp. 129-140.

[Cricket, 2007] Cricket (2007) Cricket [Internet]. Available from:<http://cricket.sourceforge.net/>[Accessed 20th February 2007].

[DANTE, 2007] DANTE (2007) Delivery of Advanced Network Technology to Europe [Inter-net], Cambridge, UK. Available from: <http://www.dante.net/>[Accessed 20th February2007].

[Debian GNU/Linux, 2007] Debian GNU/Linux (2007) Debian GNU/Linux [Internet]. Avail-able from: <http://www.debian.org/>[Accessed 21st February 2007].

94

BIBLIOGRAPHY 95

[Flow-Tools, 2007] Flow-Tools (2007) Flow-Tools - A toolset for working with NetFlow data[Internet]. Available from: <http://www.splintered.net/sw/flow-tools/>[Accessed 23rdFebruary 2007].

[GEANT2, 2007] GEANT2 (2007) GEANT2 [Internet], Cambridge, UK. Available from:<http://www.geant2.net/>[Accessed 20th February 2007].

[Haag, 2005a] Haag, P. (2005) Watch your flows with NfSen and NfDump [Internet],Presented at 50th RIPE Meeting, Stockholm, Sweden, May 3rd 2005. Avail-able from: <http://www.ripe.net/ripe/meetings/ripe-50/presentations/ripe50-plenary-tue-nfsen-nfdump.pdf>[Accessed 10th March 2007].

[Haag, 2005b] Haag, P. (2005) NfDump(1) Manpage. Installed with NfDump the application.Available as an appendix of this report [Appendix C].

[JRobin, 2006] JRobin, (2006) JRobin - A Java port of RRDtool by Sasa Markovic [Internet].Available from: <http://www.jrobin.org/index.php/Main Page>[Accessed 10th March2007].

[Kim et al. 2004] Kim, M.-S., Kang, H.-J., Hung, S.-C., Chung, S.-H. & Hong, J. W. (2004)A Flow-based Method for Abnormal Network Traffic Detection. In: Proceedings of theIEEE/IFIP Network Operations and Management Symposium, Seoul, April 2004.

[Kiss & Mohacsi, 2006] Kiss, G. & Mohacsi, J. (2006) Anomaly detection forNFSen/nfdump netflow engine - with Holt-Winters algorithm Presented at19th TF-CSIRT Meeting, Espoo, Finland, 21st September 2006. Avail-able from: <http://bakacsin.ki.iif.hu/kissg/project/nfsen-hw/JRA2-meeting-at-Espoo slides.pdf>[Accessed 10th March 2007].

[Korzyk, 1998] Korzyk, A. D. Sr, (1998) A Forecasting Model for Internet Security Attacks. In:NISSC ’98. Proceedings of the National Information System Security Conference, CrystalCity, Virginia, USA, October 6th-9th 1998.

[libpcap, 2007] libpcap (2007) libpcap - Packet Capture Library [Internet]. Available from<http://www.tcpdump.org/>[Accessed 21st February 2007].

[MySQL, 2007] MySQL (2007) MySQL - The worlds most popular open source database [Inter-net]. Available from <http://www.mysql.org/>[Accessed 21st February 2007].

[NetFlow, 2007] NetFlow (2007) Cisco IOS NetFlow [Internet]. Available from:<http://www.cisco.com/go/netflow/>[Accessed 21st February 2007].

[NfDump, 2007] NfDump (2007) NfDump - NetFlow Dump [Internet]. Available from:<http://nfdump.sourceforge.net/>[Accessed 10th March 2007].

[NfSen, 2007] NfSen (2007) NfSen - NetFlow Sensor [Internet]. Available from:<http://nfsen.sourceforge.net/>[Accessed 10th March 2007].

[NfSen-HW, 2007] NfSen-HW (2007) NfSen - Holt Winters [Internet]. Available from:<http://bakacsin.ki.iif.hu/ kissg/project/nfsen-hw/>[Accessed 10th March 2007].

BIBLIOGRAPHY 96

[PHPL, 2007] PHP (2007) PHP: Hypertext Preprocessor [Internet]. Available from<http://www.php.net>[Accessed 21st February 2007].

[Roesch, 1999] Roesch, M. (1999) Snort - Lightweight Intrusion Detection for Networks.In:LISA ’99: Proceedings of the 13th USENIX conference on System Administration, Seat-tle, Washington, USA. USENIX Association, Berkeley, CA, USA. pp229-238.

[RRDtool, 2007] RRDtool (2007) RRDtool - logging and graphing [Internet]. Available from:<http://oss.oetiker.ch/rrdtool/>[Accessed 21st February 2007].

[RRD Java Libraries] RRD Java Libraries (2007) RRD Libraries for Java [Internet]. Availablefrom: <http://monstera.man.poznan.pl/wiki/index.php/RRD Java libraries>[Accessed10th March 2007].

[sFlow, 2007] sFlow (2007) sFlow End User Forum [Internet]. Available from:<http://www.sflow.org/index.php>[Accessed 22nd February 2007]

[SNMP, 2007] SNMP (2007)Information about Simple Network ManagementProtocol and Management Information Base [Internet]. Available from<http://www.snmplink.org/>[Accessed 22nd February 2007].

[Sommerville, 2004] Sommerville, I. (2004) Software Engineering. Seventh Ed. Harlow, PearsonEducation Limited.

[TCPdump, 2007] TCPdump (2007) TCPdump - Network debugging tool [Internet]. Availablefrom <http://www.tcpdump.org/>[Accessed 21st February 2007].

[Thottan & Ji, 2003] Thottan, M. & Ji, C. (2003) Anomaly Detection in IP Networks. IEEETransactions On Signal Processing. Vol. 51, No. 8, August 2003. pp2191-2204.

[Wagner & Plattner, 2005] Wagner, A. & Plattner, B. (2005) Entropy Based Worm andAnomaly Detection in Fast IP Networks. In:WETICE ’05: Proceedings of the 14th IEEEInternational Workshops on Enabling Technologies: Infrastructure for Collaborative En-terprise, Linkoping University, Sweden, June 13-15 2005. IEEE Computer Society, Wash-ington, DC, USA. pp172-177.