d4.2 visual correlation analysis - vis-sense · visual analytics methods for correlation in network...

53
SEVENTH FRAMEWORK PROGRAMME Area ICT-2009.1.4 (Trustworthy ICT) Visual Analytic Representation of Large Datasets for Enhancing Network Security D4.2 Visual Correlation Analysis Contract No. FP7-ICT-257495-VIS-SENSE Workpackage WP 4 – Information Visualization Author UKON Version 1 Date of delivery M24 Actual Date of Delivery M24 Dissemination level Public Responsible UKON Data included from SYM The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement n°257495.

Upload: others

Post on 27-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

SEVENTH FRAMEWORK PROGRAMMEArea ICT-2009.1.4 (Trustworthy ICT)

Visual Analytic Representation of Large Datasetsfor Enhancing Network Security

D4.2 Visual Correlation Analysis

Contract No. FP7-ICT-257495-VIS-SENSE

Workpackage WP 4 – Information VisualizationAuthor UKONVersion 1Date of delivery M24Actual Date of Delivery M24Dissemination level PublicResponsible UKONData included from SYM

The research leading to these results has received funding from the European Community’sSeventh Framework Programme (FP7/2007-2013) under grant agreement n°257495.

Page 2: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

SEVENTH FRAMEWORK PROGRAMMEArea ICT-2009.1.4 (Trustworthy ICT)

The VIS-SENSE Consortium consists of:

Fraunhofer IGD Project coordinator GermanyInstitut Eurecom FranceInstitut Telecom FranceCentre for Research and Technology Hellas GreeceSymantec Ltd. IrelandUniversitat Konstanz Germany

Contact information:Dr Jorn KohlhammerFraunhofer IGDFraunhoferstraße 564283 DarmstadtGermany

e-mail: [email protected]: +49 6151 155 646

Page 3: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

Contents

1 Introduction 61.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Relevant Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.1 Network Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.2 Blacklists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.3 Intrusion Detection Alerts . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.4 BGP and Traceroute Data . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Overview of the Deliverable . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 ClockView – Visual Correlation of Network Traffic in Time 92.1 Data Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 The ClockView System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.1 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.2 Glyph Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.3 Network Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.4 Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2.5 Subnet View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.6 Host Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.7 Parallel Coordinates View . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.8 Details: Port Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.1 Monitoring an entire company network . . . . . . . . . . . . . . . . 22

2.3.2 Integration of external data sources . . . . . . . . . . . . . . . . . 24

3 ClockMap – Visual Correlation of Network Traffic for Cross-Level Analysis 263.1 The ClockMap System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1.1 Clockeye Design for Time-Series Data . . . . . . . . . . . . . . . . 27

3.1.2 Combining Circular Treemaps with Clockeyes . . . . . . . . . . . . 27

3.2 Case Study: Visual Exploration of Network Traffic . . . . . . . . . . . . . 30

3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3

Page 4: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4 VisTracer – Visual Correlation of Network Entities and Time for BGP RoutingAnalysis 344.1 Data Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1.1 Extracting Routing Anomalies . . . . . . . . . . . . . . . . . . . . 364.2 The VisTracer System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2.1 ASN Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2.2 Target History Visualization . . . . . . . . . . . . . . . . . . . . . 394.2.3 Temporal Graph Representation . . . . . . . . . . . . . . . . . . . 40

4.3 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.3.1 Visual Analysis Workflow . . . . . . . . . . . . . . . . . . . . . . . 414.3.2 Analysis of Suspicious BGP Anomaly . . . . . . . . . . . . . . . . 434.3.3 Link Telecom BGP Hijack . . . . . . . . . . . . . . . . . . . . . . . 45

5 Conclusions 495.1 ClockView . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2 ClockMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.3 VisTracer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4

Page 5: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

Abstract

Network attacks are often correlated by the involved network entities and/or in theirtiming. Correlation is therefore an important aspect when trying to understand com-plex attack scenarios or failures of interlinked systems. In this deliverable, we focus onvisual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address different analysis problems. The first prototype,ClockView, is a system for monitoring the traffic of large networks at the host level andhelps not only to identify similar host behavior, but also allows the correlation withexternal information such as blacklists or IDS alerts. The second prototype, ClockMap,uses the same clock representation as the previous prototype, but is able to aggregatetraffic information to higher-level network structures (e.g. prefixes). Lastly, the thirdprototype, VisTracer, focuses on correlating temporal changes of routing paths in theinternet backbone with alerts from anomaly detection algorithms. Core to all these pro-totypes is a tight integration of automated analysis methods with visualization methodsto facilitate visual correlation analysis for network security.

Page 6: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

1 Introduction

1.1 Motivation

During the last few years an increasing number of viruses, trojans, worms and othermalware have been circulating on the Internet infecting more and more computers. Itwill become an even greater challenge in the future to keep a network safe from all thisanomalous traffic. Because of the possibility to download hacker scripts or to obtaininformation about certain security weaknesses from the internet, even non expert usersare able to develop their own malware. This creates an unlimited amount of malicioussoftware, which makes it very difficult to secure every machine in a network.

After a computer has been hacked, the cyber criminal can manipulate the machineand cause unrepairable damages, including stealing personal data, sending spam or ex-panding his own botnet. Many times the ordinary user does not even realize that hiscomputer has been hacked, which makes it even harder for the network administratorto monitor and secure the network. In the worst case, the malware can spread from thisinfected computer to other machines in the network causing widespread damage. Notbeing aware of the menace from the Internet, many computer users pay little attentionto security updates or other defense mechanisms.

Besides hacking individual computers, manipulating routing information is anotherway of misuse. While this kind of attack does not allow the free use of computingresources, it allows stealing of IP addresses in so-called BGP hijacking attacks. Inaddition to misusing the hijacked addresses for malicious purposes (i.e., sending outspam or attacking commercial servers), the hijacker can also receive network traffic,which was not meant for him, and extract potentially valuable information from it orcause damage to the communication partners.

This deliverable aims at innovating visual analytics methods for correlation analysisthat address the above described network security problems in three ways:

1. Through the ClockView system we demonstrate how thousands of hosts can bevisually monitored and correlated with up-to-date security information originatingfrom blacklists or IDS alerts.

2. Through the ClockMap prototype we show how the temporal and quantitative

6

Page 7: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

1.2 Relevant Data Sources

aspects of traffic or alert information can be investigated at different levels – fromindividual hosts up to high-level prefixes.

3. Through VisTracer we demonstrate the use of visual correlation methods for theanalysis of anomaly detection and routing information.

In the remainder of this section, we will briefly discuss the data sources that we usedfor demonstrating our visual correlation prototypes and give an overview of the contentof this deliverable.

1.2 Relevant Data Sources

1.2.1 Network Traffic

While the analysis of network traffic itself (e.g. NetFlows, etc.) was not considered inthe Network Information Sources of Deliverable 2.1 and in the VIS-SENSE Use Casesand Scenarios of Deliverable 1.2, we realized in WP4 that they are essential for makinguse of the information gained from the analysis of the “Internet Threat Landscape” and“Attacks against the Control Plane (BGP)”. Only through a correlation of this infor-mation with network traffic can an administrator see the damage that global attackershave caused to his network. For our experiments we thus reverted to using NetFlowsthat we stored in a database server.

1.2.2 Blacklists

Blacklists are probably the easiest way to use globally collected attacker informationto protect a network and assess its integrity. They are commonly updated on a dailybasis. A new release not only contains the IP addresses of the latest attackers, but alsokeeps a historic record of attackers, unless a mitigation effort has been conducted fromthe IP addresses’ owners (i.e., they removed the malware and reported this back to theblacklist) or a temporal threshold has passed (e.g., a few days/weeks/months) in whichthe IP address was not reported again.

1.2.3 Intrusion Detection Alerts

Intrusion detection alerts (e.g. from Snort) are a common information source for networkadministrators. Once an intrusion is detected in the network, the administrator can notonly use this information to close the vulnerability, but also to protect the remainingnetwork from becoming affected through the same malware. In our concrete case, our

FP7-ICT-257495-VIS-SENSE 7

Page 8: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

1 Introduction

prototypes can easily be extended to correlate this information through, for example, asimple IP matching between the attacker as identified in the IDS alert with the networktraffic.

1.2.4 BGP and Traceroute Data

The VisTracer prototype focuses on visual correlation of BGP and traceroute data.While we heavily rely on traceroute data and open routing information sources for thisas described in Deliverable 2.1, we also make use BGP anomaly detection methodsdetailed in Deliverable 3.2 to filter out interesting events from the abundance of BGPupdates.

1.3 Overview of the Deliverable

The rest of the deliverable is structured as follows: Besides describing the three proto-types of this deliverable, chapters 2 to 4 also cover case studies that show how the newmethods lead to insights into the network security problems described in Section 1.1above.

In detail, Chapter 2 demonstrates how the temporal aspect of network traffic can beused for visual correlation analysis using the ClockView prototype. Next, the ClockMapprototype detailed in Chapter 3 builds up on the previously introduced glyph represen-tation and extends it to enable cross-level analysis (e.g. by comparing traffic of a hostto a larger prefix) using a circular TreeMap with semantic zoom. Afterwards, we theVisTracer prototype discussed in Chapter 4 deals with graph and glyph representationsto visually correlate dynamic routing information.

The last chapter concludes our work and gives an outlook to future research on visualanalytics for information correlation.

8 SEVENTH FRAMEWORK PROGRAMME

Page 9: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2 ClockView – Visual Correlation of Network Trafficin Time

The content of this chapter was originally published in our paper [10] fundedby the VIS-SENSE project. If you wish to cite this content or parts of it, pleasereference the paper and not this deliverable.

Detecting anomalous traffic in an entire company network is difficult because of tworeasons. First, since the number of machines in a network grows at a rapid pace, manydifferent hosts have to be monitored over time. Second, the amount of traffic leaving orentering the network grows relative to the number of new hosts. Thus, there is a needfor network security tools helping the administrator to analyze the traffic. This massiveamount of data cannot be effectively investigated by sequentially reading textual logfiles. Researchers and practitioners are aware of this fact and developed many differenttools and concepts to apply filtering and visualization methods to this kind of data inthe last few years. The goal is to support the administrator in dealing with this massiveamount of data and in exploring anomalous traffic. Besides operationally monitoringreal-time traffic to supervise a network, forensic analysis becomes an important aspectto reveal attack patterns and develop defense mechanisms against future attacks throughdiversifying malware aimed at circumventing traditional defense mechanisms. We thusbelieve that scalable visual support for forensic analysis tasks can complement currentlyused automated detection mechanism for a more holistic view on emerging threads inIP networks.

In this chapter, we introduce the ClockView system with its different visualizationtechniques. The core contribution of this chapter is the scalable visual pattern detectiontool, which is capable of showing the temporal activity of thousands of hosts at once. Thisvisualization builds upon the structural properties of IP addresses belonging to subnetsand a global prefix and therefore describes a two-level hierarchical data structure. Everyvisual item (host) shows temporal activity (traffic) as a small 24 hours clock.

Note that while showing the tool on real data, we only use anonymized NetFlow datato guarantee confidentiality towards our users.

9

Page 10: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2 ClockView – Visual Correlation of Network Traffic in Time

The rest of this chapter is structured as follows. To understand the data gatheringand preprocessing, Section 2.1 explains certain aspects of traffic flows and describes theunderlying data structure of the ClockView tool. The software itself will be explained indetail in the Section 2.2. A short evaluation of the implementation is provided througha case study in Section 2.3.

2.1 Data Perspective

The tool described in this chapter uses NetFlows1 as data source. NetFlows are locatedone layer above the packet captures and are collected primarily on routers and switches[11]. A single flow can contain information about multiple packets passing through therouter. Information about packets with identical source and destination IP address, thesame protocol and ports within a certain timeframe are summarized into one flow.

A server with different software solutions like for example flow-tools2 was installed tocollect the data. Since the UDP protocol is responsible for the export in real-time, theflows are stored to RAM every five minutes to avoid a possible data loss at this earlystage. In the network used for our research there are about 300 million NetFlows on anormal business day.

This massive amount of data has to be transformed in a file format adequate for a fastimport in the PostgreSQL database. Therefore, comma separated text files are a goodchoice. Due to hardware limitation issues it was only possible to import the data onceevery day. The import rate could be increased by updating the hardware and investingmore time to speed up the preprocessing. For each daily import a new table in thedatabase is created to improve the build speed of the indices.

To use the data for scientific purposes and to maintain the privacy of the networkusers, it was necessary to anonymize the data before importing into the database. Thisstep can be discarded for operational usage thus improving the performance. To providethe interactivity of the developed visualization tool even with very complex queries,several aggregated views were created which are stored in a separate table. These viewsprepare an interface where the data which is necessary for certain queries can be accessedvery fast thus improving the performance of predefined queries dramatically.

10 SEVENTH FRAMEWORK PROGRAMME

Page 11: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2.2 The ClockView System

Figure 2.1: Analysis workflow with ClockView

2.2 The ClockView System

To detect abnormal traffic over time, automated algorithms and visualizations must becombined to profit most from the computational power of the computer and the visualperception of the human. A scalable visualization helps the analyst to understand theoutcome of the algorithms and to interpret the results efficiently. In order to detectconspicuous traffic changes over time and to investigate the findings in more detail, thetool ClockView was developed. ClockView is able to adequately display suspicious trafficpatterns on an hourly basis or to compare the traffic volume over many days. In thecurrent version, ClockView satisfies different use cases:

1. Detecting Suspicious Traffic: The analyst monitors a whole network and triesto detect suspicious traffic over time concerning the whole network or single hosts.Different filtering options help to find an interesting pattern. After selecting a hostthe user receives additional information like for example connections to other hostsor the geo-location.

1http://www.cisco.com/go/netflow2http://www.splintered.net/sw/flow-tools/

FP7-ICT-257495-VIS-SENSE 11

Page 12: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2 ClockView – Visual Correlation of Network Traffic in Time

2. Forensic Analysis: The analyst combines historical data with the most recentone in order to detect irregularities. The deviation in traffic over time will bedisplayed in the same way as the overview.

3. Data Fusion: The analyst extends the data with other sources. These additionalsources will support the investigation with further insightful information.

4. Feedback Loop: The analyst can save and use his knowledge gained from previ-ous investigations for future analysis.

2.2.1 Workflow

Due to the large amount of data available, Shneiderman’s information seeking mantra“Overview first, zoom and filter, then details-on-demand” [13] was kept in mind whiledeveloping the workflow (Figure 2.1) of ClockView. ClockView’s visualizations, to explo-ratively analyze and monitor the network traffic, can be grouped into three categories:at first the Network Overview and the Subnet View (Figure 2.3) with different glyphsand layout options provide the network analyst with an overview of the internal network.Internal hosts can be selected to further zoom into the data. Additionally, interestingexternal IP addresses can be chosen from pregenerated lists of blacklisted or scanninghosts. In the second part (the Focus), the Host Matrix and the Parallel CoordinatesView (Figure 2.5) can be used to investigate the traffic of the selected host. In thesevisualizations a second host can be selected to enable the third category. This third cat-egory consists of the Port Matrix (Figure 2.6), which provides the user with informationabout the whole traffic between these two hosts as details-on-demand.

Two additional feedback loops allow the user to carry out an iterative analysis. Thefirst feedback loop is realized by a global filtering system, accomplished by the conceptof Dynamic Querying. Different ports, protocols and traffic types (incoming, outgoingor both) can be chosen. Filters selected in one representation are also applied to allother visualizations and therefore give the user the ability to easily refocus on the otherviews. The second feedback loop is realized through a pattern management interface.The term pattern refers to a general condition that a host or the connection betweentwo hosts match, for example all machines with traffic on port 22 (SSH). The user isvisually supported in defining such patterns to find similar hosts. For the ease of use,a built-in database query template can be used to generate a pattern based on thecurrently selected filters and hosts. Since the most common patterns can be alreadyexpressed by the global filtering system, this option is better suited for advanced users,who want to modify these database queries to express more complex patterns or buildarbitrary queries to the database itself. With the pattern management, it is also possible

12 SEVENTH FRAMEWORK PROGRAMME

Page 13: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2.2 The ClockView System

to integrate external data sources, like blacklists (e.g. DShield3), data collected fromhoneypots, or alarms of an intrusion detection system (e.g. Snort4).

To keep track of the current selection of hosts and filters, they are shown in the upperleft. Additionally, further information about the IP address can be found there. Theseinclude the current hostname, the country according to a geolocation database and somestatistics, like the total number of NetFlows and connections to distinct other hosts.

2.2.2 Glyph Visualization

To get a global picture of the servers and workstations used in the network, it is use-ful to visually encode each host individually in the Network Overview. The hosts arerepresented in a way the user can easily notice, if a specific machine’s behavior matchesmore a server with 24 hours of traffic or a client with only traffic during working hours.Therefore, we want to show all internal hosts with their traffic at a granularity of onehour for a timespan of one day. For this purpose we need to display up to 65536 (256*256possible IP addresses for a /16 network) time series, each with 24 (one per hour) datavalues. This leads to a maximum of 1572864 data points. We implemented 4 differentversions (Figure 2.2) to visualize one time series.

The first is a simple line chart arranging the hours on the x-axis and the amount oftraffic on the y-axis , which is a well-known visualization technique and therefore easyto understand. Changes within one time series can be well detected. However, giventhis large amount of time series, it is not possible to assign at least 1 pixel per datavalue in width for the line chart on a normal screen resolution. The comparison betweendifferent time series is difficult, because the line charts are mostly far apart from eachother and not all of them can be aligned side by side.

The second representation is a bar chart, where the traffic is double encoded to theheight and with a colorscale ranging from white (low traffic) to red (high traffic) of thebars. Regarding the space in width the same problem occurs as with the line chart.However, they were better comparable because of the usage of color. The main problemof the line and bar chart is, that they rely on position - the farther apart they are, theharder they are to be compared.

Based on the above, we used a more space-filling pixel-oriented visualization [9] andencode the amount of traffic only with color. In this third representation every hour isrepresented by a pixel/rectangle. Hours without traffic remain in the background color,to perceive the clear cut between hours with traffic and no traffic. The rectangles of onetime series are arranged line-wise in a 4 times 6 matrix. Even though they are separated

3DShield Blacklist, http://www.dshield.org4Snort, http://www.snort.org

FP7-ICT-257495-VIS-SENSE 13

Page 14: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2 ClockView – Visual Correlation of Network Traffic in Time

Figure 2.2: The same time series in four different representations (1) line chart (2) coloredbar chart (3) pixel matrix (4) glyph in style of a clock

14 SEVENTH FRAMEWORK PROGRAMME

Page 15: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2.2 The ClockView System

by spacing, hosts with irregular activity can hardly be distinguished. The comparisonbetween time series is better than the comparison with the line and bar chart, since forevery data value there is more space available and the amount of traffic is no longerrepresented by the position.

The fourth representation is a glyph in style of a clock as seen in Figure 3.1. Eachcircular glyph is subdivided into 24 segments, each of them showing the traffic of onehour encoded with color. 0:00 o’clock is at the top, 6:00 o’clock at the right side, 12:00o’clock at the bottom and 18:00 o’clock at the left side. As a clock metaphor is used here,this segmentation is more intuitive as the segmentation into rectangles, even if the clockis transformed from 12 to 24 hours. Also the natural order of time is better preserved,since there are no line breaks between the data points. The time representing segmentsare not only at the same position for every host, but also have the same orientation.Corresponding hours of different hosts are displayed in parallel and thus at a glance canbe recognized as group. Since the separation between the glyphs is already achieved dueto the circular shape, no additional spacing has to be added. Because of this, the glyphis more space-efficient on smaller screen resolutions.

The amount of traffic is represented by a fixed diverging color scale from blue (negative,only used for comparison showing a decrease in traffic) over white (0) to red (positive).Due to the fixed color scale hosts remain comparable on different days. Otherwise a hostwith the same amount of traffic on different days could be perceived totally different.However, a drawback of this design choice is that the exact value the color represents isnot visible.

Due to the above mentioned reasons, we think, that the glyph is the most appropriateway for displaying the large amount of time series, although the other representationshave their advantages, too.

2.2.3 Network Overview

Different layout options are available to arrange the glyphs in the Network Overview.In the first layout the glyphs are represented in a matrix. The subnets (in this casethe 3rd byte of the IP address) are arranged on the y-axis and the individual hosts(4th byte of the IP address) are arranged on the x-axis. This positioning was chosenbecause subnets without traffic can optionally be removed from the matrix to better fitthe available screen resolution, since computer screens are mostly oriented horizontally.With this layout the user can not only directly locate a certain IP address of interest,but also see trends occurring within a subnet (horizontal) or on the same 4th byte ofthe IP address (vertical). In the second layout the glyphs are arranged recursively. Thefirst dimension is the 3rd byte of the IP address and the second dimension is the 4th

FP7-ICT-257495-VIS-SENSE 15

Page 16: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2 ClockView – Visual Correlation of Network Traffic in Time

Figure 2.3: Graphical user interface of ClockView: (1) Host Information, (2) SubnetView, (3) Color Legend, (4) Network Overview, (5) Options, (6) Global Fil-ters and (7) Patterns

16 SEVENTH FRAMEWORK PROGRAMME

Page 17: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2.2 The ClockView System

byte of the IP address. Since this layout minimizes the distance between IP addresses inthe same subnet, trends within a subnet can be better spotted in this layout. The thirdlayout arranges the glyphs in order of their total traffic within the network or subnet-wise. The top talkers of the whole network or within a subnet can be easily spottedhere. This layout is more space-efficient, because gaps from hosts without traffic arediscarded. Unfortunately there is no direct mapping of an IP address to the position onthe screen.

By default, the traffic of the current day is mapped to the glyph, but the user has twoadditional options to change the information that is represented by the glyph. The firstoption, which is the coefficient of variation of the previous days (3, 5 or 7 days), givesa clue about how stable the traffic is. To find anomalous traffic, we combined these 2measures. Therefore, the second option for every hour is computed as follows: x−mean

stddev+1 ,where x is the value of the current day, mean is the average and stddev the standarddeviation of the previous days. To avoid a division by zero an alpha value of 1 is addedto the standard deviation. The result is a value, which indicates the relative change incomparison to the normal behaviour on the previous days. Additional options can beused for filtering. They display the absolute traffic of the currently selected day only forthose data values (hours), which had either no traffic or exceed the maximum traffic ofthe previous days.

For an easier navigation, a tooltip with the IP address of the host appears on a mouse-over over the corresponding glyph. Zooming possibilities allow the user not only to adaptthe size of the visualization to his screen resolution or personal preferences, but also makeit easier to choose a clickable glyph. By selecting a glyph the visualization is updatedand all connections within the internal network to the chosen machine are shown bylines connecting the glyphs. Optionally, each communication line between two machinesin the network can be displayed on top of the visualization. These lines form an internalgraph of the network. Connections with traffic below a user defined threshold can bediscarded. The user is able to define the transparency of the lines. If the internal graphis visible, the connections of the selected host will be highlighted additionally (Figure2.4).

Several filtering methods can be applied to the visualization like for example the globalfilters for port, protocol and traffic. They are located on the top right of the screen andrepresented in different ways. A table contains the amount of traffic and a checkboxfor each used source- and destinationport. By selecting a checkbox the different viewswill be constrained to show only the traffic on these specific ports. Furthermore, theuser can decide to see only the activity of the hosts on a single protocol by choosing thecorresponding entry in a dropdown list (e.g. ICMP, TCP, UDP). To reduce the amountof visible hosts the analyst is able to focus only on those machines having incoming as

FP7-ICT-257495-VIS-SENSE 17

Page 18: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2 ClockView – Visual Correlation of Network Traffic in Time

Figure 2.4: Internal graph on top of the Network Overview. Communication lines of theselected glyph are additionally highlighted.

well as outgoing traffic. IP addresses which receive traffic but are not assigned will bediscarded. The filters set are not only affecting the glyphs, but also the internal graphon top of the view.

In addition, hosts can be filtered out by choosing one or more of the predefined pat-terns in the list on the lower right. Each of them can be selected positively or negativelyto show the machines, which either match or do not match with the given characteristics.The list of manually named patterns displays the percentage of the currently visible hoststhat match the specified properties. The navigation in the list is additionally improvedby the use of a Visual Scent [18] in form of a bar chart (Frame 7 in Figure 2.3).

2.2.4 Focus

The second group of visualizations is available once the user has selected a glyph fromthe Network Overview and focuses on the traffic of this specific device. The SubnetView (Frame 2 in Figure 2.3) on the left side of the screen provides the user with arough overview of the connections to that host. While also visible from the Network

18 SEVENTH FRAMEWORK PROGRAMME

Page 19: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2.2 The ClockView System

Figure 2.5: Two Focus visualizations replacing the Network Overview in the middle ofthe screen. (1) Host Matrix (2) Parallel Coordinates View

FP7-ICT-257495-VIS-SENSE 19

Page 20: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2 ClockView – Visual Correlation of Network Traffic in Time

Overview and the Port Matrix, the Subnet View links these visualizations to each other.The Host Matrix view replaces the overview visualization in the middle of the screenand displays the activity between the previously selected host and its counterparts inmore detail. The filtering options can be adopted to this representation as well. Besidesthe global options, a custom one is available to filter according to an IP address range.This functionality was accomplished within the Parallel Coordinates View (Frame 2 inFigure 2.5). Applied filters do not only affect the Host Matrix, but also the Subnet Viewand the selectable port and IP address range filters. For example, if the user decides toconstrain the views to a certain IP address range, the statistics on the port table will beupdated. Only the ports and the amount of traffic within the chosen IP address rangeare shown. Selections made in one view are reflected in the other views using Linking &Brushing.

2.2.5 Subnet View

The Subnet View displays every host communicating with the previously selected glyph(Frame 2 in Figure 2.3). This glyph is located in the center of the visualization andis connected with its counterparts via lines. These connection lines are bundled usingthe Edge Bundling Technique [6] according to the first Byte of the hosts’ IP addresses.The approach was inspired by the network security tool FloVis [16] to improve thelayout of similar IP addresses. The counterparts themselves are equally distributed onan additional outer ring around the glyph and are represented by a small colored circle.The color codes the amount of traffic between the two communicating hosts using thesame fixed color scale as in the Network Overview. To gain further information, a mouse-over over any host provides the analyst with the exact IP address, the hostname and thecountry according the geolocation database. A further host can be selected for the PortMatrix view by clicking on the corresponding circle. This second host, even if it waschosen in another visualization, will be highlighted additionally. The Subnet View hasscalability problems when there are too many connections resulting in many items placedon the outer ring. This problem can be solved by applying some filters thus reducingthe number of counterparts. Nevertheless the user is provided a rough overview of thedistribution of the communicating hosts.

2.2.6 Host Matrix

The Host Matrix is the most detailed visualization of the Focus category. While theglyphs in the Network Overview and the Subnet View represent the traffic at a granu-larity of 1 hour, the Host Matrix uses a granularity of 1 minute. Visualizations at a finer

20 SEVENTH FRAMEWORK PROGRAMME

Page 21: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2.2 The ClockView System

grade do only make partially sense, since a NetFlow consists of one or more packets itselfand therefore the timestamp is not totally accurate. In the Host Matrix the traffic of thechosen device is subdivided by the IP address into multiple time series. An additionalone shows the whole traffic of the chosen device. The time series are arranged as SmallMultiples in a matrix.

With only one dimension in this matrix, the time series can be ordered by their IPaddress, their country or the total amount of traffic. Because of the possible high numberof hosts pagination was used to provide the overview. However, in a typical workflowthe user would first apply some filters, to reduce the number of available and interestinghosts and therefore pages. The filters also affect the traffic shown by the times series,since in the Network Overview patterns are available to filter out certain hosts. Everytime series is represented by a pixel matrix visualization [9], where each pixel displaysone minute of the day. For this purpose 60 minutes are shown in one row, resulting ina total of 24 rows for one day. The color of the pixels codes the amount of traffic. Sincethe granularity is lower in this view, the color scale represents different values as in theNetwork Overview. Especially regular time patterns can be easily recognized. Zoomingpossibilities allow the user to reveal more details. It is not only possible to choose an IPaddress for the Port Matrix in this view, but also directly switch the visualizations ofthe Focus group to another chosen device. Like in the Subnet View, a selected host willbe highlighted in the Host Matrix.

2.2.7 Parallel Coordinates View

The Parallel Coordinates View uses Parallel Coordinates [8] to display all connectedhosts. Each of the 4 bytes of an IP address is shown on one axis. On each axis are 256data points. A connection line between each of the four axes symbolizes a single host.The amount of traffic is represented on every data point by the same fixed color scaleas used in the Network Overview and appears as tooltip on every data point as well.With this visualization the user should get an insight to the structure of communicatingmachines. The second utilization of this view is the possibility to filter by an IP addressrange. Since every data point represents one structural part of an IP address, this canbe achieved by clicking on one or more of the data points.

2.2.8 Details: Port Matrix

The Port Matrix (Figure 2.6) shows the detailed activity between two machines. As inthe Host Matrix, a single time series is represented by a pixel matrix with the same colorscale. The traffic is subdivided according to the ports used. A variation of Dimensional

FP7-ICT-257495-VIS-SENSE 21

Page 22: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2 ClockView – Visual Correlation of Network Traffic in Time

Stacking was applied to align the Small Multiples of time series. The outer dimensionsare defined by the port combination, the inner dimensions represent the time in thepixel matrix. In the outer matrix, the ports of the first host are arranged on the y-axis,the ports of the second host on the x-axis. Additionally, the sum of all time series ofeach row is shown on the left side of the row and the sum of all time series of eachcolumn is shown on top of the column. Each sum represents the whole traffic on thecorresponding port to reveal possible time patterns that would otherwise be distributedover many single pixel matrices. The aggregated traffic between the two machines canbe seen on the upper left corner of the matrix. On a mouse-over a tooltip appears withadditional information like the official IANA port assignments5 (e.g. SSH for port 22) orinformation about known malware using a certain port. By clicking on one of the pixelmatrices, the user directly selects the corresponding port filter and can therefore easilyrefocus to this port (-combination) on the preceding views.

2.3 Case Study

To evaluate the software for operational usage the case study deals with different usecases to monitor a whole class B company network and detect anomalous traffic. There-fore, the tool was used in different ways to exemplify the variety of the possibilitiesprovided.

2.3.1 Monitoring an entire company network

This case study describes a typical workflow for monitoring one day of network activity.After selecting the desired day in the database, we switch to the Network Overviewvisualization to start the analysis. In this view, as described in section 2.2.3 in moredetail, the daily activity of the entire company network is visible at a single glance(Figure 2.3). Top talkers as well as different patterns are easy to spot and with somebackground knowledge of the network structure easy to interpret like for example thecontinuously cluttered red clocks within a certain subnet range (Figure 2.7). This fissuredpattern is caused by assigning a dynamic IP address to each computer establishing aconnection within the wireless lan network to the internet. As a result, a single glyph,although representing a unique IP address, can possibly display the traffic of manydifferent computers. Most of these computers belong to students using the wireless lanconnection on an irregular basis.

5http://www.iana.org/assignments/port-numbers

22 SEVENTH FRAMEWORK PROGRAMME

Page 23: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2.3 Case Study

Figure 2.6: Port Matrix

Figure 2.7: Typical glyph pattern of hosts with dynamic IP addresses using the wirelesslan connection in the university

FP7-ICT-257495-VIS-SENSE 23

Page 24: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2 ClockView – Visual Correlation of Network Traffic in Time

Figure 2.8: Subpart of the Network Overview showing a relative increase of traffic com-pared to the previous days

With some background knowledge most of the patterns can be easily explained. Tospot abnormal behavior without additional knowledge about the network the tool pro-vides helpful interactive features. As an example, we select the item “Change 5 days”in the dropbox next to the label named “glyph”. This option compares the traffic ofthe current day with those of the five previous ones and displays the result in the glyph.As expected, most of the glyphs are colored white thus signalizing nearly no change,except for a partial red pattern in one single subnet (Figure 2.8). To take a closer lookat the single glyphs we enlarge the visual representations by zooming in this exact area.With the additional space for each circle the traffic distribution over time is getting moreobvious. Basically on the second half of the day the amount of traffic rises. It seemsthat some new machines have been added to the network causing extra traffic. This issuspicious because the monitored dataset was a Sunday where there is no regular dailywork in the university. After investigation, we discovered that the corresponding subnetof the university is assigned to the vpn connections. A computer connecting to the uni-versity from an external network gets an IP address in this specific subnet. With thisadditional information the suspicious pattern can be explained as a common occurrence.

2.3.2 Integration of external data sources

Since additional knowledge is often crucial to detect anomalous behaviour, the traffic ofthe university’s network is matched with different blacklists. For this purpose we definedifferent patterns. The condition of the pattern is expressed in a way, that internal hostsare required to have traffic with at least one of the IP addresses on the correspondingblacklist.

The first blacklist we choose is a very general one with all kinds of threats fromDShield. As an interesting fact, nearly every computer (about 98 per cent) has somekind of activity with at least one blacklisted IP address. Fortunately, the only oneswithout activity are the hosts in the so called demilitarized zone. Since this list is very

24 SEVENTH FRAMEWORK PROGRAMME

Page 25: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

2.3 Case Study

general and often the matching results only from a scan, the findings can be improvedby defining a certain threshold of minimum traffic in the pattern.

Figure 2.9: Pixel matrices of a hacked host communicating with its Command & Controlserver on two consecutive days in a 60min x 24h pixel matrix

The second blacklist we choose is a more specific blacklist of known ZeuS Command &Control6 servers. Once a computer is infected with the ZeuS trojan, it becomes part ofa botnet and communicates with its Command & Control server on a regular basis. Thematching with the blacklist reveales, that there is one computer within the university’snetwork with activity to one of the blacklisted IP addresses on several consecutive days.The inspection in the focus & detail visualizations as detailed in Figure 2.9 shows thatthe traffic to the specific server is indeed on a regular basis. In the pixel matrix actuallytwo regular communication patterns can be distinguished between the infiltrated hostand his master server. The first shows activity about every 5 minutes and the secondone about every 20 minutes. Because of the additional knowledge gathered by using theblacklist, we think that this host has been hacked and should be manually checked. Thisexample shows the usefulness of additional information and the integration of externaldata sources. Without this knowledge this specific host could not have been found withonly the data generated from the NetFlows, since the host has shown no additionalextraordinary traffic besides the periodic communication to his master. Since periodiccommunication is by no means a sign of an intrusion by default (e.g. a mail programchecking for new arrived mails every 5 minutes), it also cannot be used in general todetect anomalous traffic.

6abuse.ch ZeuS Tracker, https://zeustracker.abuse.ch

FP7-ICT-257495-VIS-SENSE 25

Page 26: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

3 ClockMap – Visual Correlation of Network Trafficfor Cross-Level Analysis

The content of this chapter was originally published in our paper [4] funded by theVIS-SENSE project. If you wish to cite this content or parts of it, please referencethe paper and not this deliverable.

Many real-world datasets contain an intrinsic hierarchy, which can provide importantinformation to the analyst. In network security, for example, such a hierarchy is oftengiven through the network definitions encoded in prefixes of IP addresses. Especially forthe analysis of network traffic of large computer networks, it is important to monitor thenetwork usage to detect anomalies or to understand the behavior at different levels ofdetail. On the one hand, there is the need to gain an overview about the current situation.On the other hand, obtaining details and more information is crucial to understand suchoverall trends to eventually identify the underlying cause. To provide an integratedoverview and detailed time-series information within a single visualization, we propose avisualization technique, called ClockMap, which uses the approach of circular treemapsas layout algorithm for a large number of temporal glyphs representing data values ofa time-series. In particular, we apply this idea to a clock-based glyph inspired by thework of [10], which we call clockeye. The advantage of this circular design is that wecan smoothly switch between different levels of the hierarchy and either show aggregatedoverview data for a subnet, or show all individual time-series as glyphs.

The main contribution of this chapter is the novel combination of clock-based glyphswith circular treemaps. Although, there are major drawbacks of such treemaps, we showin a case study, that the integration as layout algorithm for the placement of circularglyphs is quiet effective and can successfully be applied to network security data.

The remainder of this chapter is organized as follows. In Section 3.1 we describe ourproposed visualization technique, and provide a case study in Section 3.2, and discussthe technique in Section 3.3.

26

Page 27: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

3.1 The ClockMap System

3.1 The ClockMap System

In the following, we will describe our novel visualization, called ClockMap, which is basedon the combination of temporal glyphs, called clockeyes, and a circular treemap layout.

3.1.1 Clockeye Design for Time-Series Data

The basic idea of clockeyes is to make use of the metaphor of a classic clock. A circleis subdivided into sectors, each sector representing a time span of one hour. When 24slices are used, we have a 24-hour clock as seen in Figure 3.1. In this example, therewas no data from 00:00 to 06:00 o’clock and from 23:00 to 24:00, which results in anoticeable empty area in the representation. This can be very helpful to find specificpatterns without data or zero data values. At one point between 06:00 and 07:00, thetime-series seems to start, having high peaks between 08:00 to 09:00 and 10:00 to 11:00.Afterwards there is a downward trend until 24:00.

When many clockeyes are plotted to a dense area, it is important that they can beseparated from each other intuitively, without the need to have an additional border inbetween. Circular shapes are very suitable for this purpose, because they are perceived asseparate items pre-attentively. However, if many have the same color values, this task canbecomes difficult in dense areas. To visually improve the perception of the compactnessand further emphasize the borders, we applied circular shading, which seems to bean improvement according to our experiments. This generally led to darker colors,therefore, we decided to use an intense yellow to red color mapping from ColorBrewer[3] to counterbalance this effect. The inner black circle can be used for additional metalabels or to indicate highlighting with color.

3.1.2 Combining Circular Treemaps with Clockeyes

As discussed in Deliverable 1.1 there are visualization techniques dealing with hierar-chical data and others, e.g., glyphs, displaying temporal or multi-dimensional informa-tion. Especially in computer networks the combination helps to understand temporaldependencies in different substructures of the network. With ClockMap we use circu-lar treemaps in combination with clockeyes. The circular treemap itself is often lesspowerful than rectangular layouts, however, in the combination with clockeyes it seemsto be a promising use case. To make further use of the implicit characteristics of thelayout algorithm, we implemented ClockMap on top of a zoomable user interface, whichenables infinite zooming and panning possibilities. Each hierarchy can show the aggre-gated values for all underlying children to provide the user with a high-level overview

FP7-ICT-257495-VIS-SENSE 27

Page 28: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

3 ClockMap – Visual Correlation of Network Traffic for Cross-Level Analysis

24:00 |  00:00

06:0018:00

12:00

Time‐Series with 24 Hours

No Data

Figure 3.1: Visual representation of a single clockeye showing a time-series of 24 hours.Each one hour sector is colored by its data value. Circular shading is appliedto emphasize the borders of the glyph.

as seen in Figure 3.2. While zooming into the aggregated areas more details and even-tually each host represented as small clockeyes become visible. Through this semanticzooming, the scalability of the overall approach is improved, because less visual objectsneed to be drawn to the canvas when zooming out. Even with thousands of leaf nodesthe visualization can be explored interactively. During exploration of real datasets itbecame obvious that in some cases very prominent nodes need to be removed or movedto another group. To facilitate this, we integrated edit operations to add hierarchies,remove nodes or place them freely into other circles or outside the main circle. Aftereach modification the weights are changed accordingly to automatically recalculate thelayout. To search for specific attributes of the nodes, a search field is integrated toClockMap. The black inner circles of matching nodes are highlighted to guide the user

28 SEVENTH FRAMEWORK PROGRAMME

Page 29: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

3.1 The ClockMap System

to the relevant nodes.

Figure 3.2: A circular treemap is used to lay out hundreds of clockeyes into groups basedon their hierarchy. The rectangle illustrates the visualization, when the userzooms out.

FP7-ICT-257495-VIS-SENSE 29

Page 30: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

3 ClockMap – Visual Correlation of Network Traffic for Cross-Level Analysis

3.2 Case Study: Visual Exploration of Network Traffic

Network operators of large networks use NetFlow data to analyze attacks and networkusage. These datasets do not contain payload information, but do contain communica-tion flows between hosts. We used an anonymized dataset of 24-hours with about 200million NetFlow records collected at the core routers. The data is stored to a databaseand visually explored with ClockMap. The visual analysis does only focus on the recordsdescribing the outgoing traffic of all 6048 hosts belonging to our /16 IPv4 address block,which were active on that particular day. Figure 3.3 shows the upper part of the vi-sualization. The analyst is interested in the highlighted subnet, because it has threehours (can be seen as deep red colored sectors), where much more traffic is transferredthan usual. The total traffic originating form this subnet was 94.4 GiB. The tooltipsshow that most times of the day the transferred volume ranges only from ten to a fewhundreds megabytes. The analyst selects this /24 subnet node and zooms in. The visualrepresentation of this particular clockeye subnet will change to show all belonging hosts(shown as highlighted circle in Figure 3.3). This immediately shows that there is indeeda single host responsible for most of the traffic. It is up to the analyst if such nightly datatransfers of an individual host in that particular subnet is legitimate or not. However,the visualization clearly shows, that compared to the other hosts in this group, this is in-deed uncommon behavior. Figure 3.2 shows another very prominent pattern, which canbe spotted in the ClockMap visualization. The subnet (which is shaped like a pac-man)reveals a strange time-series pattern. There was no traffic at all during night hours.This looks suspicious to the analyst. Zooming into this subnet reveals more details inFigure 3.4. This form of details on demand is implemented using semantic zooming.After a user-defined zooming threshold, the time-series for all underlying hosts becomevisible instead of the previously shown aggregated subnets. Such a pattern could be anetwork outage or indicate a broken switch in the building were the physical machinesare located. However, in this case the pattern is legitimate, because it is known aswireless network subnet, which is not in use during night time.

3.3 Discussion

The layout of glyphs is often determined by coordinate systems or matrix layouts. [10]use a matrix representation to position IP addresses in a meaningful way. Compared tosuch matrix layouts, ClockMap has several advantages. Matrix representations cannotconvey the hierarchy in an intuitive way. The circular treemap layout instead makes thehierarchy obvious, because it is visualized through nested circles. Another advantage

30 SEVENTH FRAMEWORK PROGRAMME

Page 31: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

3.3 Discussion

Figure 3.3: ClockMap visualization showing the network traffic of a large number ofsubnets. Each glyph circle represents a 24-hour time-series of either a subnetor an IP address with respect to the semantic zoom level. They are laid outaccording to a circular tree map algorithm. Color is mapped to the amountof network traffic in bytes.

is, that the aspect ratio does not change in ClockMap. We use circles, which can befurther explored through interactive exploration with techniques like zooming and pan-ning. The integration of semantic zooming helps to smoothly switch between generaloverviews and detailed time-series analysis. Both approaches are overlap-free, while thefree arrangement in ClockMap results in a tighter packing of the glyphs and thus makesthe approach slightly more scalable. In addition, the tight packing better supports theuser to visually compare the shapes and color distributions of neighboring hosts in onebranch of the displayed tree. Consequently, outliers with a different behavior in thegroup can be spotted pre-attentively. The used clockeye glyph has the advantage to usea common real-world metaphor. Everyone knows how to read a clock, which helps theuser to identify particular hour values within the time-series. Visualizing time-orientated

FP7-ICT-257495-VIS-SENSE 31

Page 32: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

3 ClockMap – Visual Correlation of Network Traffic for Cross-Level Analysis

Figure 3.4: Underlying hosts of a very prominent subnet outlier having no night timetraffic.

data effectively is important, non-trivial, and lead to a large variety of different visu-alization techniques. A systematic overview can be found in [2]. However, it is evenharder to visualize hundreds of different time-series simultaneously. Clockeye glyphs arevery compact and general trends or patterns can be distinguished even on a very smallscale. This helps to provide a scalable way to represent hundreds of time-series, andeven more, when grouped within an hierarchy.

There are also drawbacks of our visualization technique, which are implicit by design.Circular treemaps are indeed not space-filling. This means that, at least compared torectangular treemaps, space is wasted. However, compared to a matrix representation,this is not necessarily the case, because nodes are packed tightly together while stillconveying the hierarchy information. The ordering within a group of the circular layoutis also challenging and non-intuitive. This drawback can be overcome to a certain degreeby interaction and tooltips. While comparison of shape and color distribution in circularlayouts is effective, the comparison of the area of the circles is not. Additionally, thehigher level circles only approximately match the aggregated size of their descendants.

32 SEVENTH FRAMEWORK PROGRAMME

Page 33: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

3.3 Discussion

Consequently, the visualization is probably less precise with respect to these attributes.Clockeyes are using color to represent the data values, which makes it hard to preciselycompare the values, which would be better in length-encoded glyphs. The basic designidea of clockeyes uses a clock metaphor. Obviously, this metaphor cannot be applied anymore, if an arbitrary time-series length is used. This means, that a clockeye glyph is bestsuited for 12 or 24-hour time-series. Other lengths of time-series will be less intuitive,but are still possible from a technical point of view.

FP7-ICT-257495-VIS-SENSE 33

Page 34: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4 VisTracer – Visual Correlation of Network Entitiesand Time for BGP Routing Analysis

The content of this chapter was originally published in our paper [5] funded by theVIS-SENSE project. If you wish to cite this content or parts of it, please referencethe paper and not this deliverable.

Routing is a fundamental concept in the Internet. Correct path announcements areimportant to reach the correct destination servers. Despite of the importance and thesevere consequences of routing issues, the responsible border gateway protocol (BGP) isquite vulnerable. Announcing malicious routing paths can be used to hijack IP blocks.As a result the attacker can conduct malicious activities from legitimate IP addresses.Distribution of vast amounts of spam is a scenario where the misuse of legitimate IPprefixes helps the attackers to circumvent widely used IP-based blacklists. The focus ofthis work is the large-scale analysis and exploration of routing anomalies for IP addressesstarting to send spam in the Internet. This is achieved by actively tracking and measuringthe traceroutes to the origin IP addresses over longer periods of time to eventuallymonitor possibly malicious path changes. Because of the vast amount of trace datawith their changing underlying BGP routes, it is not helpful to just visualize the rawdata. To solve this it is important to algorithmically identify anomalies first. The tightintegration of visual displays can be used to get an overview for quick ad-hoc analysis toidentify noteworthy events and to differentiate them from false positives. The proposedvisualizations help to gain deep insights and visually explore the events within theircontext of historic and related anomalous traceroutes. Furthermore the analysts canpush their findings back to the system. This feedback can then be used for furtherimproving the underlying anomaly detection algorithms.

The three main contributions of this chapter are:

1. a visual analytics tool called VisTracer to analyze large-scale traceroute data,

2. the integration into our large-scale automatic analysis system and

3. novel glyph- and graph-based summary visualizations for traceroutes.

34

Page 35: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4.1 Data Infrastructure

Additionally, we present an in-depth discussion of recent case studies for suspiciousrouting anomalies with respect to spam activities.

The remainder of this chapter is organized as follows. In Section 4.1 we describe thedata infrastructure and the anomaly detection of our analysis approach. In Section 4.2we present the proposed visual analytics tool and discuss real case studies in Section 4.3to evaluate the system.

4.1 Data Infrastructure

Manipulating the Internet routing infrastructure to hijack a block of IP addresses in-volves modifying the route taken by data packets so that they reach the physical net-work of the attacker. A system called Spamtracer [17] has been developed to monitorthe routes towards malicious hosts by performing traceroute measurements repeatedlyfor a certain period of time. IP-level routes are translated into AS-level routes usinglive BGP feeds. The motivation for monitoring data plane routes towards specific hostsinvolved in spam campaigns is to collect the route taken by data packets to reach thesehosts as soon as a spam is received from them. By performing multiple measurementson consecutive days for a certain period of time, typically one week, routes towards agiven host or network can be compared and analyzed in depth to find evidences of apossible manipulation by an attacker of the routing infrastructure.

This system is based on a linear data flow where a feed of IP addresses to monitor isgiven as input and a series of enriched traceroute paths produced as output from whichabnormal patterns can be uncovered. The incoming feed of IP addresses are retrievedfrom Symantec.cloud1 spamtraps. This data is enriched with IP traceroutes. A cus-tomized version of the classic traceroute function is implemented and takes advantageof ICMP, UDP and TCP packets to increase the likelihood of hosts to be reached bythem. Due to the many artifacts that can be found in IP-level traces, we also buildthe AS-level routes. The IP-to-AS mapping is performed using live and distributedBGP feeds from RouteViews 2 to obtain as accurate and complete mappings as possible.Additionally, information about the different hosts, AS owners, IP networks and geolocations is collected.

1http://www.symanteccloud.com/2http://www.routeviews.org/

FP7-ICT-257495-VIS-SENSE 35

Page 36: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4 VisTracer – Visual Correlation of Network Entities and Time for BGP Routing Analysis

4.1.1 Extracting Routing Anomalies

Based on the collected data, we analyze the routes to uncover abnormal routing changesand classify them as benign or malicious. Routing anomalies are extracted indepen-dently for every monitored IP addresses. The first approach focuses on extracting rout-ing anomalies from BGP hijacking scenarios, while the second one searches for suspiciouspatterns based on different metrics.

To identify malicious BGP hijacks, we start from existing scenarios of BGP hijacking[7] for which we know the resulting routing anomalies. However it has to be consid-ered that such routing anomalies can also result from benign BGP routing practices,e.g., multi-homing of customer ASes by ISPs, or from non-malicious incidents due tomisconfiguration or operational errors. Prefix Ownership Conflicts occur when ablock of IP addresses appears in the Internet routing infrastructure as originated bymultiple ASes. This routing behavior can be the result of a hijacker advertising someoneelse’s IP space in order to attract traffic to or originate traffic from that IP space. Ad-vertising the same prefix is a possible way for BGP hijacking, if the IP prefix is alreadyadvertised by a different AS. This technique creates a routing anomaly referred to asMultiple Origin AS (MOAS). Announcing a slightly different prefix can also be used fortampering with the ownership of a given IP prefix, which can be more (resp. longer) orless specific (resp. shorter). In this case, we refer to this anomaly as a Sub Multiple Ori-gin AS (subMOAS). BGP AS Path Anomalies occur, when the location of a networkin the Internet AS topology changes. As a result of a BGP hijack it is likely that thesequence of ASes traversed from two different points will change. Significant changes inthe BGP AS paths should to be investigated to determine if they are indeed benign or ifthey result from a malicious manipulation of the routing infrastructure. The Next-HopAS anomaly can be observed with a certain number of different next-hop ASes, i.e.,ASes next to the origin AS in an AS path, for a given origin AS and BGP collector. AComplete AS Path anomaly consists in observing a significant change in the AS pathsfor a given origin AS and BGP collector.

The second approach searches for suspicious patterns in traceroutes based mostly onmetrics already used in previous works [19, 15]. Traceroute Destination Anoma-lies refer to suspicious values in features related to traceroute metadata. Host/ASreachability defines if a destination host or AS towards a given IP address is reachable(unreachable) for a certain number of days during the monitoring period and suddenlybecomes unreachable (reachable) and remains like this until the end of the monitoringperiod. This reachability anomaly can result from a major routing change which causes

36 SEVENTH FRAMEWORK PROGRAMME

Page 37: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4.2 The VisTracer System

the destination host or AS to become (un)reachable. The hop count or the length of atraceroute path is the value of the last TTL for which a reply to our probe IP packetshas been received. The hop count anomaly is the consequence of a significant and sud-den change in the hop count. This situation suggests that an important routing changeoccurred to permanently change the route taken by packets to reach the destinationnetwork. Traceroute Path Anomalies refer to suspicious changes in the sequenceof hops traversed by traceroute paths to a given destination host. Using the differentfeatures collected for IP/AS hops, we can consider a traceroute not only as a sequenceof IP addresses or ASes, but also as a sequence of countries, domain names, RIRs, etc.These alternate paths are leveraged in this detection of suspicious traceroute paths. TheAS-level Path Anomaly consists in observing a significant change in the AS-level pathstowards a given IP address. Country-level Path Anomalies are observed by extractingtraceroute paths towards a given host exhibiting significant discrepancies in the sequenceof traversed countries. This assumes that the countries traversed to reach a given desti-nation from a given source is likely to remain constant even if routing changes occur atthe IP or AS levels.

4.2 The VisTracer System

The continuously growing Spamtracer database can be accessed by the analyst using ourvisual exploration tool called VisTracer . The graphical user interface is built in a way tosatisfy the needs of experienced analysts by providing an overview linked to more detailedvisualizations. This helps to solve the different analysis tasks. The individual views canbe placed according to the user’s preference or adjusted to the working environmentwhich is important, when the tool is used in multi-display environments.

The general workflow of VisTracer is inspired by Shneiderman’s information seekingmantra of having the overview first and then focusing on certain areas of interest to re-trieve additional details [13]. The overall graphical user interface is shown in Figure 4.1.The left panel (1) provides a tabular anomaly view with all occurred anomalies. To in-vestigate specific cases a filter box is integrated for quick ad-hoc queries. Using differentconstraints (2) for anomaly types and subtypes the user can focus on the different classesand combinations of anomalies. Based on the given constraints the ASN Overview (3)provides an overview of all anomalies using a visual representation. Findings can bestored in the database using the feedback panel (4), which can be used to annotateanomalies and comment on findings to make them accessible for other analysts. Theright panel (5) provides tabular access to all destination targets with their traceroutes.Selecting entries in any of the tables will update the loaded visualizations for further

FP7-ICT-257495-VIS-SENSE 37

Page 38: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4 VisTracer – Visual Correlation of Network Entities and Time for BGP Routing Analysis

Figure 4.1: Graphical user interface of the VisTracer visual analytics tool.

investigation. A zoomable geographic map (6) to visually present the currently selectedAS path is included. The Visual Traceroute Summary (7) is a compact visual represen-tation, while the target graph visualization (8) can be used to get an in-depth overviewof the temporal connections based on a graph-based approach.

4.2.1 ASN Overview

The main starting point for an exploratory analysis is to monitor different ASes and theoccurring anomalies over time. Therefore, a zoomable matrix layout has been chosen asthe basis for the visual marks shown in Figure 4.1 (3). The x-axis encodes the time andthe y-axis the different destination ASes of traceroutes. By default, the ASes are orderedaccording to the total number of anomalies, while other sorting algorithms might be moreappropriate for finding common patterns and correlations between different ASes. Dueto the fact that multiple anomalies of different types can occur on specific points intime, rectangular glyphs are used to encode this additional information. Glyphs have

38 SEVENTH FRAMEWORK PROGRAMME

Page 39: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4.2 The VisTracer System

the advantage of showing multiple data dimensions in a space efficient compact way.Each glyph has a fixed size and consists of several colored vertical stripes. Each coloredstripe encodes one type of anomaly. The stripe width is proportional to the amount ofdaily anomalies for the respective event type. We decided to chose this additional sizeencoding to emphasize on the most prominent anomaly types in the overview, especiallywhen they spread over longer periods of time. The stripe’s color encoding is basedon a qualitative color scale provided by Colorbrewer3 and helps to visually distinguishbetween the different kinds of anomalies. Therefore, ASes with reoccurring anomaliescan be detected resulting in characteristic colored patterns. To further focus on the “hotspots” with lots of anomalies, opacity is used to encode the overall number of occurred.Figure 4.2 (a) shows a closeup of such a single anomaly glyph.

Figure 4.2: Three glyphs used in the visualizations.

An AS-based normalization is used to avoid artificially promoting heavily used largeASes. Suspicious ASes can be further investigated through double clicking on the differ-ent rectangles, which updates the different views and tables to provide more details ondemand.

4.2.2 Target History Visualization

Traceroutes to the same destination can be investigated in the Target History Visual-ization. The main idea of this visualization is to provide a visual traceroute summaryto show hop usage variances of single traceroutes to the same target. Therefore, thex-axis encodes the individual hops and the y-axis the traceroutes on the different days.

3http://colorbrewer2.org/

FP7-ICT-257495-VIS-SENSE 39

Page 40: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4 VisTracer – Visual Correlation of Network Entities and Time for BGP Routing Analysis

Whenever a hop is used within a traceroute a small glyph is placed accordingly. Thisrectangular glyph encodes the country code of the hop with a small label and a coloredbar. With the help of this colored bar, connections within the same country can bespotted preattentively. The main color of the glyph reflects whether the traceroute wascomplete (green) or incomplete (gray). This prominent feature is visible at first sightbecause it is considered of high importance. Additionally, brightness is used to encodethe latency of the individual hops. A closeup of this glyph can be seen in Figure 4.2 (b).At the end of each traceroute row, a small anomaly container is placed. The containerrepresents the four main types of anomalies with equally sized rectangles. These rect-angles are further divided into smaller rectangles representing the subtypes. Whenevera type/subtype combination can be found in a traceroute the corresponding rectangle iscolored. Thus, anomalies lasting for a longer period can be easily detected as a reoccur-ring pattern over many traceroutes. Suspicious traceroutes with lots of anomalies showseveral colored rectangles and, therefore, are easy to spot. Examining the anomalies incombination with the used hops and the completeness of the traceroutes over time canlead to relevant findings and helps the analyst to understand the traceroutes. This visu-alization is especially effective to get an overview of the hops being used in the differenttraceroutes.

4.2.3 Temporal Graph Representation

The previous visualization does not focus on following the exactly used routes or theidentification of the most common route in the correct order. To solve this task instead,an additional graph visualization is provided. The graph layout is extended with anadditional glyph encoding to show routing changes over time. The nodes represent thedifferent hops, the edges show the connections with each other. The width of an edgedepends on the amount of traces using this exact connection. The nodes are visualizedby circular glyphs with equally sized slices and small flags reflecting the country of thehop as can be seen in Figure 4.2 (c). Because of the aspect ratio, the circular glyphscan be directly integrated into the graph nodes without wasting additional space for thistemporal information or requiring disturbing and more time-consuming animation. Thenumber of slices depends on the amount of traceroutes shown in the graph. When a hopwas used in a traceroute the respective slice is colored, otherwise it is not displayed atall. The color depends on whether the traceroute reaches its destination or not. Thisencoding supports the analyst in detecting the main route (i.e., based on the path’swidth), the usage of hops (i.e., the proportion of colored slices), the reachability ofthe destination (i.e., the hue of the colored slices) and the temporal development ofthe route (i.e., the partition of the slices). Additionally, the geographic location of the

40 SEVENTH FRAMEWORK PROGRAMME

Page 41: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4.3 Case Studies

corresponding country can be taken into account in the layout to highlight possible routeflappings between different countries with the help of the graph’s layout. To focus onthe main route, we additionally propose an Enhanced Baseline Layout which displaysthe most common path at the bottom. The hops, not being part of the baseline arearranged in a force-directed way above the baseline.

Combining the different views or looking at them individually supports the user inthe different analysis tasks. To evaluate the tool’s effectiveness, the following sectiondescribes the analyst’s workflow and how the visualizations help.

4.3 Case Studies

In this section we describe how suspicious routing events are identified and how the Vis-Tracer framework reflects this workflow in order to assist the analyst. We also presenttwo case studies of routing events identified as suspicious using the developed visualiza-tion tool.

4.3.1 Visual Analysis Workflow

The steps involved in the analysis of the network traces collected by Spamtracer aredepicted in Figure 4.3. The figure also shows, where in the workflow the visualizationscan assist the analyst in examining the data. The analysis is based on (i) automaticallyextracting routing anomalies from the traces as described in Section 4.1, (ii) selecting themonitored hosts having a meaningful set of anomalies, and (iii) investigating cases usingall the collected data to identify the suspicious cases. The result of the investigation ofa case is finally reported back to the database.

VisTracer supports the Selection of Candidate Suspicious Cases by providing a graph-ical user interface to filter for anomalies which match a given set of constraints on thetype, the number and the time of appearance of the anomalies. These correspond tothe most likely suspicious cases. This step is associated with the ASN Overview Visu-alization, which allows the analyst to define the constraints on the anomalies and thenexplore the resulting set of targets aggregated at the AS level. The Investigation ofCandidate Suspicious Cases means to investigate the suspicious cases with the help ofthe collected traces as well as some external routing information services to determineif a case is benign or if it results from a malicious BGP hijack. When investigating acase, the Graph and Target History Visualization as well as the traceroute hop list pro-vide the analyst with all the data available to determine whether the routing anomaliesobserved reflect a malicious routing behavior. To communicate and further make use ofthe findings the tool also focuses on Reporting of Investigation Results. The feedback

FP7-ICT-257495-VIS-SENSE 41

Page 42: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4 VisTracer – Visual Correlation of Network Entities and Time for BGP Routing Analysis

User Feedback

Graph VisualizationTarget History Visualization

ASN Overview

SpamtracerDatabase

Routing AnomaliesExtraction

Selection ofCandidate

Suspicious Cases

Report ofInvestigation Result

Investigation ofCandidate

Suspicious Cases

(i)

(ii) (iii)

(iv)

Figure 4.3: Overview of Visual Analysis Workflow.

loop embedded in VisTracer allows to share the result of the investigation with otheranalysts.

The Spamtracer data set used to produce the two case studies contains traceroutescollected from April 2011 until the end of August 2011. We can see that 848,916 dataplane routes were collected towards 239,907 IP addresses and 5,912 ASes. After therouting anomalies were extracted from the traces 41,430 destination IP addresses werefound to have at least one anomaly. Given the high number of cases exhibiting atleast one anomaly, we decided to focus on cases having the following combinations ofanomalies:

• BGP Origin & BGP or Traceroute Path anomalies: Select cases exhibiting a PrefixOwnership Conflict with a significant change in the BGP or Traceroute AS path.

• BGP Origin & Traceroute Destination anomalies: Select cases exhibiting a PrefixOwnership Conflict with either an IP/AS reachability change or a significant dataplane route length change.

• Traceroute Destination & BGP or Traceroute Path anomalies: Select cases ex-hibiting a significant change in the BGP or Traceroute AS Path with an IP/ASreachability change or a significant data plane route length change.

42 SEVENTH FRAMEWORK PROGRAMME

Page 43: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4.3 Case Studies

We have thus applied these filters in the Traceroute Anomalies panel of VisTracer tofocus our analysis on these cases.

4.3.2 Analysis of Suspicious BGP Anomaly

The first case study presents the visual analysis of a network whose traffic was apparentlyhijacked by another AS. Actually, we show how such a case can be uncovered andinvestigated using the visualizations and other information provided by VisTracer .

From the ASN Overview visualization, one particular case caught our attention, whichcan be seen in Figure 4.4. Two ASes actually appeared to share several anomalies, whichoccurred on the same day. The visualization allows to extract such time correlationbetween anomalies in different ASes thanks to the ASNs and time dimensions. Lookingat the anomalies extracted for the two ASes reveals (i) a Traceroute Destination Anomaly(related to the destination AS reachability), (ii) Traceroute Path Anomalies, (iii) BGPPath Anomalies (AS Path Deviation) and, (iv) a BGP Origin Anomaly (related to asubMOAS conflict).

Figure 4.4: Closeup of the ASN Overview showing two nearly identical anomaly distri-butions for two different ASN at the same point in time.

We can make use of the Target History Visualization to have a first view of thetraceroute paths and the uncovered routing anomalies. Figure 4.5 shows the set of IPhops traversed by traceroutes from the vantage point in France to the destination hostthroughout the monitoring period. From this visualization we can say that there is

FP7-ICT-257495-VIS-SENSE 43

Page 44: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4 VisTracer – Visual Correlation of Network Entities and Time for BGP Routing Analysis

a noticeable change in the set of traversed IP hops between the third and the fourthtraceroute. The six routing anomalies uncovered for these traceroutes on the fourth dayconfirm that a major routing change occurred. In this case, a change in the origin AS ofthe destination IP prefix occurred at the same time as a change in the sequence of ASestraversed both in the traceroutes and in the BGP AS paths. The BGP Origin Anomaly,in the third column, has been marked as benign (green) by Spamtracer , because the twoconflicting ASes were found to have a provider-customer relationship.

Figure 4.5: Target History Visualization of case study 1. The visualization shows thesignificant difference in the ASes traversed between the third and fourth day.The routing anomalies observed on the fourth day are also shown.

To further investigate the case, we make use of the Graph Visualization, which ispresented in Figure 4.6 for the same monitored host. The Graph Visualization allowsthe analyst to look at the IP-, AS- or the Country-level traceroute paths, i.e., the sequenceof IP hops, ASes or countries traversed. While the AS-level graph is particularly wellsuited to investigate abnormal changes in inter-domain routing, the IP- or Country-level graphs can also be leveraged to investigate routing anomalies. Actually, they arecomplementary. It is thus interesting to start from the high-level view of the Country-level graph and go down the levels to analyze in more details specific parts of the routes.In the present case we decide to make use of the AS-level graph to compare the sequenceof traversed ASes before and after the change of origin AS. The origin and destination ASbefore the change belongs to a backbone ISP, which advertises an aggregated IP prefixincluding the destination IP prefix. The unreachability of the destination AS after thechange can be observed on day four and correlated with the Traceroute DestinationAnomaly seen on the same day in the Target History Visualization. Also, the last ASthat could be reached by traceroutes also appears in the collected BGP AS paths, as thenext-hop AS, i.e., as the direct upstream provider, of the new origin AS. This provider-customer relationship could not be officially explained. Hijacking a network can actually

44 SEVENTH FRAMEWORK PROGRAMME

Page 45: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4.3 Case Studies

be performed by advertising it with a correct origin AS and by putting the attacking ASas the next-hop AS.

Figure 4.6: The Graph Visualization shows the significant difference in the ASes tra-versed between the third and fourth day.

After the investigation, it turns out that the next-hop AS belongs to a company pro-viding DDoS mitigation as service by sink holing the attacking traffic of their customers.The analysis suggests that either the security company redirected the traffic of theircustomer’s AS because they were under attack or the security company may sometimesact as an ISP for some companies’ AS to easily protect them from undesired traffic.Given the fact that the security company advertised the route in BGP for at least threedays, we believe that it actually acted as an ISP for its customer.

Although we have detected abnormal routing changes regarding this network, it isquite difficult to validate these anomalies as a real hijack case without the feedback fromthe owner of the network.

4.3.3 Link Telecom BGP Hijack

This second case study presents the visual analysis of a validated BGP hijack performedby a spammer to send spam from the stolen IP address space. The hijacking spammerphenomenon has already been observed in [12, 7] and consists of spammers taking con-trol of unused IP address space in order to send spam from clean, non-blacklisted IPaddresses.

From the ASN Overview (Figure 4.7), AS31733 caught our attention, because manydiverse routing anomalies occurred within a limited period of time. Moreover, severalanomalies occurred on the same day, which reinforced the idea that a major routingchange occurred at that time for this AS. The uncovered anomalies related to AS31733

FP7-ICT-257495-VIS-SENSE 45

Page 46: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4 VisTracer – Visual Correlation of Network Entities and Time for BGP Routing Analysis

include (i) Traceroute Destination Anomalies (related to the destination host and ASreachability), (ii) Traceroute Path Anomalies and, (iii) BGP AS Path Anomalies (ASPath Deviation).

Figure 4.7: The ASN Overview of AS31733 reveals many different anomalies over alonger period of time.

The Target History Visualization of a monitored host within AS31733 exhibiting acombination of Traceroute Destination Anomalies, Traceroute Path Anomalies and BGPAS Path Anomalies. Figure 4.8 presents the Target History Visualization which showsthe set of ASes traversed by traceroutes from the vantage point in France to AS31733throughout the monitoring period. We can clearly see that the set of traversed ASeschanges significantly. By looking at the anomalies extracted for that case, we can alsosee that all anomalies were observed on a particular day, i.e., just after the change inthe traceroute path. The observation of the set of IP hosts traversed by the traceroutesshows the exact same behavior. From these observations we can say that the locationof the monitored AS in the Internet AS topology changed significantly.

Figure 4.9 presents the Graph Visualization of the same monitored host within AS31733.This visualization shows the sequence of IP hops, ASes or countries traversed by thetraceroutes. In this case, looking at the Country-level paths would show that packetsalways seem to go through the US to go from a source in France to a destination inRussia. While this routing behavior can be considered abnormal, we also know thatsome big ISPs, i.e., backbone ISPs, are spread across continents and may be introduceUS hops in a European route. If we now look at the AS-level graph we can see that USISPs Level-3 (AS3356) and Internap (AS12182) both appear in the routes. Besides beinga backbone ISP, Level-3 also appears in every traceroute during the monitoring period.However, Internap only appears in the first traceroute, before the routing change. To

46 SEVENTH FRAMEWORK PROGRAMME

Page 47: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4.3 Case Studies

Figure 4.8: The Target History Visualization shows the significant difference in the setof ASes traversed between the fourth and fifth day. The routing anomaliesobserved are also shown.

have more details about the traceroute going through AS12182 Internap, we can havea look at the IP-level graph. The graph reveals that the first traceroute goes throughtwo routers of AS12182 apparently located in the US and then directly ends in AS31733apparently located in Russia. This suggests that the destination host currently usingan IP of AS31733 is likely located in the US instead of Russia. Furthermore, the vi-sualization also shows that the destination host and AS could not be reached from thefifth day until the end of the monitoring period. This observation is corroborated by theTraceroute Destination Anomalies (related to the host/AS reachability) uncovered onthe fifth day. All this suggests that the routing change observed lead to the destinationhost and AS to become unreachable.

After the investigation, it turns out that on August 20th 2011 the network administra-tor of the Russian telecommunication company “Link Telecom”, whose AS31733 belongsto complained on the North American Network Operators’ Group (NANOG) mailing listthat his network had been hijacked by a spammer [1]. On both August 25th and August29th 2011 changes were observed in the traceroutes and BGP routes towards AS31733.These changes were the result of the owner regaining control over his network. In thiscase, the aggregation in the ASN Overview of the routing anomalies extracted for the

FP7-ICT-257495-VIS-SENSE 47

Page 48: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

4 VisTracer – Visual Correlation of Network Entities and Time for BGP Routing Analysis

Figure 4.9: The Graph Visualization shows the significant difference in the sequence ofASes traversed between the first and second day. It also highlights the un-reachability of the destination AS after the routing change occurred.

individual monitored hosts within their AS actually uncovered the pattern of severaldiverse and timely close routing anomalies.

This hijack case is further described in [14]. Although the prefix appeared to beannounced by the correct origin AS, i.e., AS31733, it was routed via a US ISP calledInternap (AS12182). During this period the network was under the control of the spam-mer, spam messages were received by Symantec.cloud honeypots. The hijack lasted forfive months from April 2011 until August 2011 and is a validated case of a hijackingspammer that managed to steal someone else’s IP space and sent spam from it.

48 SEVENTH FRAMEWORK PROGRAMME

Page 49: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

5 Conclusions

The scope of this deliverable was to demonstrate how visual analytics methods canbe used for visually correlating network security information. In short, the followingdescriptions summarize our core contributions.

1. Glyphs allow a scalable representation of the temporal behavior of many networkhosts as demonstrated in the ClockView system and allows correlation with black-list or IDS alerts through filters or detailed views.

2. To enable multi-level analysis of the temporal aspect of network traffic, the ClockMapsystem allows to visually correlate entities at different levels (e.g. hosts with high-level prefixes).

3. VisTracer demonstrates how complexly structured network routing informationcan be analyzed in a visual analytics system. While relying on anomaly detectionmethods on the automatic side, a combination of glyph and graph representationswere the means to enable visual correlation analysis.

These contributions are discussed in depth in the following three sections.

5.1 ClockView

In Chapter 2 we presented the network security tool ClockView. This tool displaysthe daily activity of a whole company network in a scalable glyph based visualization.To provide this scalability, every single IP address is broken down to its subnet andhost identifier to perfectly fit in a matrix layout. Each device is then represented by around glyph subdivided into 24 different areas. This 24 hour clock metaphor shows theactivity of a whole day by color coding each slice according to the amount of traffic at thecorresponding hour. After detecting something suspicious in the overview the analystis able to dig deeper by investigating certain areas in a more detailed view. Differentordering, layout and analysis algorithms, like the traffic change of a single host overmany days, support the user in exploring the dataset. If interesting interesting findingsare made, the user is able to safe this discovery with its characteristics in a pattern to

49

Page 50: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

5 Conclusions

enable a feedback loop. This pattern can then be applied to all the other visualizationsto filter the views for the previously detected findings. Additionally, each predefinedpattern can be used with other datasets to quickly scan the network at different times.

The tool was tested with real anonymized NetFlows of a whole class B IP network toassure its operational suitability. The use cases described in section 2.3 verify the appli-cability of the software in a real environment with actual traffic data. This distinguishesClockView from other tools developed only for research purposes.

To further push the operational usage, our future plans are to design a user study toimprove the usability of the software and therefore simplify the explorative and analyticaltasks of a network administrator. Furthermore, we intend to support more historicalviews to gain different perspectives on the data, since these are often crucial to detectanomalous traffic. Another issue we want to address is the generation of patterns, whichsupport better constrains in terms of time. We also plan to implement an adequatesimilarity measure between time series. Once this is completed hosts in the NetworkOverview could for example be laid out by a clustering algorithm based on this similaritymeasure.

This prototype was published at VizSec 2011 [10].

5.2 ClockMap

In Chapter 3 we described a novel visualization technique called ClockMap for hierar-chical time-series data. The technique combines a circular nested treemap layout witha circular glyph representation for time-series data and appears to be effective for com-parative tasks on large amounts of hierarchically structured time-series data. Whenbeing used in combination with circular glyphs, the shape preserving property of circu-lar nested treemaps seems to outweigh the known disadvantages of such treemap variantsand facilitates comparative tasks within and across hierarchy levels.

Since preliminary results of our experiments with the tool on network traffic data werepromising, our next steps will be to generalize the basic idea of ClockMap in such a waythat it can be applied to a wider range of datasets originating from different applicationfields. Furthermore, we plan to formally evaluate the effectiveness of the visualization ina user study and seek feedback of expert users. From such a study we expect to be ableto judge which specific tasks of analysts can be improved with respect to both precisionand performance when using the novel ClockMap representation.

This prototype was published at EuroVis 2012 [4].

50 SEVENTH FRAMEWORK PROGRAMME

Page 51: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

5.3 VisTracer

5.3 VisTracer

In Chapter 4 we described a novel visual analytics tool called VisTracer to investigaterouting anomalies and BGP hijacks with respect to spamming activities with the help of alarge-scale traceroute collection system. To achieve this we designed VisTracer accordingto the analysts’ needs to assist the workflow of analyzing the large-scale dataset. To beflexible enough for different tasks and to address a variety of analyzing questions, severallinked views and visualizations have been integrated. The effectiveness and usefulnessof the tool for network security analysts was shown using two case studies.

In the future we will integrate different additions to further improve the usability ofthe tool. Regular usage of VisTracer by our analysts will also show, which additionalviews should be integrated. To improve scalability of the graph representation, furtherlayout improvements will be made to reduce possible clutter of traceroutes with verycomplex connections and to incorporate missing hops in the layout.

This prototype is accepted for publication at VizSec 2012 [5].

5.4 Future Work

We see a lot of potential to to use the techniques of this deliverable in other areasthan network security, since visual analytics for information correlation can also play animportant role in domains such as business, finance, bioinformatics or movement analysisto name just a few. Common to these fields is an abundance of information and the needfor human judgment to take into account not only the modeled complexity, but also thebackground knowledge of the experts.

FP7-ICT-257495-VIS-SENSE 51

Page 52: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

Bibliography

[1] Prefix hijacking by michael lindsay via internap. http://mailman.nanog.org/

pipermail/nanog/2011-August/039381.html, August 2011.

[2] W. Aigner, S. Miksch, H. Schumann, and C. Tominski. Visualization of Time-Oriented Data. Human-Computer Interaction. Springer Verlag, 1st edition, 2011.

[3] C. A. Brewer. Colorbrewer - Color Advice for Maps.

[4] F. Fischer, J. Fuchs, and F. Mansmann. ClockMap: Enhancing Circular Treemapswith Temporal Glyphs for Time-Series Data. In M. Meyer and T. Weinkauf, editors,Proceedings of the Eurographics Conference on Visualization (EuroVis 2012 ShortPapers), pages 97–101, Vienna, Austria, 2012.

[5] F. Fischer, J. Fuchs, P.-A. Vervier, F. Mansmann, and O. Thonnard. VisTracer: AVisual Analytics Tool to Investigate Routing Anomalies in Traceroutes. In Proceed-ings of the VizSec Symposium on Visualization for Cyber Security. ACM, 2012.

[6] D. Holten. Hierarchical edge bundles: Visualization of adjacency relations in hierar-chical data. IEEE Transactions on Visualization and Computer Graphics, 12:741–748, September 2006.

[7] X. Hu and Z. M. Mao. Accurate Real-Time Identification of IP Prefix Hijacking. InProceedings of the 2007 IEEE Symposium on Security and Privacy, SP ’07, pages3–17, Washington, DC, USA, 2007. IEEE Computer Society.

[8] A. Inselberg and B. Dimsdale. Parallel coordinates: a tool for visualizing multi-dimensional geometry. In Proceedings of the 1st conference on Visualization ’90,VIS ’90, pages 361–378, Los Alamitos, CA, USA, 1990. IEEE Computer SocietyPress.

[9] D. A. Keim. Designing Pixel-oriented Visualization Techniques: Theory and Ap-plications. IEEE Transactions on Visualization and Computer Graphics (TVCG),6(1):59–78, January–March 2000.

52

Page 53: D4.2 Visual Correlation Analysis - VIS-SENSE · visual analytics methods for correlation in network security. In particular, three pro-totypes are described, which address di erent

Bibliography

[10] C. Kintzel, J. Fuchs, and F. Mansmann. Monitoring Large IP Spaces with Clock-View. In Proceedings of the 8th International Symposium on Visualization for CyberSecurity, VizSec ’11, pages 2:1–2:10, New York, NY, USA, 2011. ACM.

[11] R. Marty. Applied security visualization. Addison-Wesley, 2008.

[12] A. Ramachandran and N. Feamster. Understanding the network-level behavior ofspammers. In SIGCOMM ’06: Proceedings of the 2006 conference on Applications,technologies, architectures, and protocols for computer communications, pages 291–302, New York, NY, USA, 2006. ACM.

[13] B. Shneiderman. The Eyes Have It: A Task by Data Type Taxonomy for Informa-tion Visualizations. In Proceedings 1996 IEEE Symposium on Visual Languages,pages 336–343. IEEE Computer Society, 1996.

[14] Symantec Corporation. Symantec Internet Security Threat Report. http://www.

symantec.com/threatreport/, April 2012.

[15] M. Tahara, N. Tateishi, T. Oimatsu, and S. Majima. A Method to Detect PrefixHijacking by Using Ping Tests. In APNOMS ’08: Proceedings of the 11th Asia-Pacific Symposium on Network Operations and Management, pages 390–398, Berlin,Heidelberg, 2008. Springer-Verlag.

[16] T. Taylor, D. Paterson, J. Glanfield, C. Gates, S. Brooks, and J. McHugh. Flo-vis: Flow visualization system. Conference For Homeland Security, CybersecurityApplications & Technology, 0:186–198, 2009.

[17] P.-A. Vervier and O. Thonnard. Spamtracer: Using Traceroute To Tracking Fly-By Spammers (under review). In The 8th International Conference on emergingNetworking EXperiments and Technologies, CoNEXT ’12, Nice, France, 2012. ACM.

[18] W. Willett, J. Heer, and M. Agrawala. Scented widgets: Improving navigation cueswith embedded visualizations. IEEE Transactions on Visualization and ComputerGraphics, 13:1129–1136, 2007.

[19] C. Zheng, L. Ji, D. Pei, J. Wang, and P. Francis. A light-weight distributed schemefor detecting IP prefix hijacks in real-time. In Proceedings of the 2007 conferenceon Applications, technologies, architectures, and protocols for computer communi-cations, SIGCOMM ’07, pages 277–288, New York, NY, USA, 2007. ACM.

FP7-ICT-257495-VIS-SENSE 53