graph-based event correlation for network security defense

Graph-based Event Correlation for Network Security Defense

by Patrick Neise

B.S in Electrical Engineering, May 1999, The University of Texas at AustinM.A. in Information Technology Managment, June 2010, Webster University

M.S. in Information Security Engineering, June 2017, SANS Technical Institute

A Praxis submitted to

The Faculty ofThe School of Engineering and Applied Science

of The George Washington Universityin partial fulfillment of the requirementsfor the degree of Doctor of Engineering

May 20, 2018

Praxis directed by

Thomas F. BerssonAdjunct Professor of Engineering and Applied Science

The School of Engineering and Applied Science of The George Washington University certifies that

Patrick Neise has passed the Final Examination for the degree of Doctor of Engineering as of March

23, 2018. This is the final and approved form of the Praxis.


Patrick Neise

Praxis Research Committee:

Thomas F. Bersson, Adjunct Professor of Engineering and Applied Science, Praxis Director

Amirhossein Etemadi, Assistant Professor of Engineering and Applied Science, Committee Member

Ebrahim Malalla, Visiting Associate Professor of Engineering and Applied Science, Committee Member

Johannes Ullrich, Senior Instructor; Dean of Research, SANS Technical Institute, Committee Member

ii

c© Copyright 2018 by Patrick NeiseAll rights reserved

iii

Dedication

This praxis is dedicated to my wife and children. Thank you for your support and enduring the

long nights and missed weekends when I was unavailable. Stephanie, Addison, and Austin, thank

you for being there as I completed this important educational and career milestone.

iv

Acknowledgements

I would like to acknowledge my advisors Drs. Bersson and Young for their support and feed-

back throughout the process. I would also like to thank Dr. Ullrich for his guidance and technical

advice during the research and Mr. Roy Luongo for participation as the third-party attacker during

the simulation phase of the research.

v

Abstract of Praxis


Organizations of all types and their computer networks are constantly under threat of attack. While

the overall detection time of these attacks is getting shorter, the average detection time of weeks

to months allows the attacker ample time to potentially cause damage to the organization. Current

detection methods are primarily signature based and typically rely on analyzing the available data

sources in isolation. Any analysis of how the individual data sources relate to each other is usually

a manual process, and will most likely occur as a forensic endeavor after the attack identification

occurs via other means. The use of graph theory and the graph databases built to support its appli-

cation can provide a repeatable and automated analysis of the data sources and their relationships.

By aggregating the individual data sources into a graph database based on a model that supports

the data types and relationships, database queries can extract information relevant to the detection

of attack behavior within the network. The work in this Praxis shows how the graph model and

database queries will reduce the overall time to detection of a successful attack by enabling defend-

ers to understand better how the data elements and what they represent are related.

Keywords: graph model, graph database, network intrusion, network security, event correlation

vi

Table of Contents

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Abstract of Praxis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Chapter 1 – Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 The Threat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 State of Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 A New Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Practical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 Organization of Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Chapter 2 – Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 Data Breaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Collect the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Defend the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Chapter 3 – Problem Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1 Modular, Scalable, Distributed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Container Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Centralized Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.4 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4.1 Bro Network Security Monitor . . . . . . . . . . . . . . . . . . . . . . . . 14

3.4.2 Suricata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4.3 osquery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4.4 Additional Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.5 Events to Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18vii

3.6 The Graph Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.7 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.7.1 Bro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.7.2 Suricata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.7.3 osquery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.7.4 Complete Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Chapter 4 – Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1.1 Attack Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1.2 Indicators of Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Graph-Based Event Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3 Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3.1 Collecting Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3.1.1 Kafka and Zookeeper . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.1.2 Fluentd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3.1.3 Bro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3.1.4 Suricata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.3.1.5 osquery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.3.2 Extracting, Transforming, and Loading Events . . . . . . . . . . . . . . . 39

4.3.3 Storing Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.3.4 Querying the Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Chapter 5 – Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.1.1 Background Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.1.2 Vulnerable Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.2 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.2.1 Scripted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.2.2 Blind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.3 Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.3.1 Baseline Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50viii

5.3.2 False Positives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.3.3 Scripted Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.3.4 Blind Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Chapter 6 – Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.4 Practical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

ix

List of Figures

Figure 3.1 – Complete Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Figure 3.2 – Bro Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Figure 3.3 – Suricata Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Figure 3.4 – osquery Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Figure 4.1 – FireEye Cyber Attack Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Figure 4.2 – Software Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . 31

Figure 4.3 – Example osquery results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Figure 4.4 – Example ArangoDB vertex document . . . . . . . . . . . . . . . . . . . . . . 41

Figure 4.5 – Example ArangoDB edge document . . . . . . . . . . . . . . . . . . . . . . . 42

Figure 5.1 – Simulation Environment Overview . . . . . . . . . . . . . . . . . . . . . . . . 45

Figure 5.2 – AQL Query of External Domains and IP Addresses . . . . . . . . . . . . . . . 51

Figure 5.3 – AQL Query of Inbound Network Connections and Domains . . . . . . . . . . . 52

Figure 5.4 – Suricata Alert from ArangoDB . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Figure 5.5 – AQL Query for Suricata Alert Analysis . . . . . . . . . . . . . . . . . . . . . . 54

Figure 5.6 – Visualization of Query Results . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Figure 5.7 – Scripted Attacks - NIDS Alerts Over Time . . . . . . . . . . . . . . . . . . . . 59

Figure 5.8 – Scripted Attacks - AQL Alerts Summary . . . . . . . . . . . . . . . . . . . . . 60

Figure 5.9 – Scripted Attacks - HTTP Service Query . . . . . . . . . . . . . . . . . . . . . 61

Figure 5.10 –Scripted Attacks - Web Server Outbound Query . . . . . . . . . . . . . . . . . 63

Figure 5.11 –Scripted Attacks - Web Server Outbound Results . . . . . . . . . . . . . . . . . 63

Figure 5.12 –Scripted Attacks - Command Query . . . . . . . . . . . . . . . . . . . . . . . 64

Figure 5.13 –Blind Attacks - Alerts Over Time . . . . . . . . . . . . . . . . . . . . . . . . . 66

Figure 5.14 –Blind Attacks - Alerts for 10.10.10.11 Query . . . . . . . . . . . . . . . . . . . 70

Figure 5.15 –Blind Attacks - Alerts for 10.10.10.11 Results . . . . . . . . . . . . . . . . . . 70

Figure 5.16 –Blind Attacks - sshuser Query . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Figure 5.17 –Blind Attacks - sshuser Logins Query . . . . . . . . . . . . . . . . . . . . . . . 71

Figure 5.18 –Blind Attacks - sshuser Logins Query Results . . . . . . . . . . . . . . . . . . 72

x

List of Tables

Table 4.1 – Bro conn.log data fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Table 4.2 – Bro http.log data fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Table 5.1 – Summary of Network Activity During Simulation . . . . . . . . . . . . . . . . . 46

Table 5.2 – Summary of Outbound Activity Prior to Attack Events . . . . . . . . . . . . . . 51

Table 5.3 – Summary of Inbound Activity Prior to Attack Events . . . . . . . . . . . . . . . 53

Table 5.4 – Connection Data Associated with Alert Under Investigation . . . . . . . . . . . 56

Table 5.5 – HTTP Session Data Associated With Alert Under Investigation . . . . . . . . . 56

Table 5.6 – Scripted Attack - Alerts Summary Query Results . . . . . . . . . . . . . . . . . 61

Table 5.7 – Scripted Attack - HTTP Service Results . . . . . . . . . . . . . . . . . . . . . . 62

Table 5.8 – Scripted Attack - Command Line Query Results . . . . . . . . . . . . . . . . . 64

Table 5.9 – Blind Attack - Alerts Summary Query Results . . . . . . . . . . . . . . . . . . . 67

Table 5.10 –Blind Attack - Command Line Query Results for 10.10.10.10 . . . . . . . . . . 68



Table 5.13 –Blind Attack - sshuser Query Results . . . . . . . . . . . . . . . . . . . . . . . 71

Table 6.1 – Average Breach Detection Times . . . . . . . . . . . . . . . . . . . . . . . . . . 75

xi

List of Acronyms

API Application Programming Interface

AQL Arango Query Language

DNS Domain Name System

ETL Extract, Transform, Load

EWS Early Warning System

FTP File Transfer Protocol

HIDS Host Intrusion Detection System

HTTP Hypertext Transfer Protocol

ICMP Internet Control Message Protocol

IDS Intrusion Detection System

IP Internet Protocol

IPS Intrusion Prevention System

JSON Javascript Object Notation

NIDS Network Intrusion Detection System

NIST National Institute of Standards and Technology

NSM Network Security Monitor

PCAP Packet Capture

PII Personally Identifiable Information

SANS SysAdmin, Audit, Network, Security

SIEM Security Incident and Event Management

SMB Server Message Block

SMTP Simple Mail Transfer Protocol

SQL Structured Query Language

SSH Secure Shell

SSL Secure Sockets Layer

TCP Transmission Control Protocol

UDP User Datagram Protocol

UID Unique Identifierxii

URI Uniform Resource Identifier

URL Uniform Resource Locator

USB Universal Serial Bus

VM Virtual Machine

VPN Virtual Private Network

xiii

Chapter 1 – Introduction

1.1 The Threat

Attacks against networks span the spectrum of industries and include goals such as denial of

service, ransom for retrieval of sensitive information, and theft of intellectual property and personal

information. The skill set of the attacker capabilities include hobbyists trying to learn or to claim

bragging rights, organized crime syndicates seeking monetary gain, and state-backed organizations

that are conducting intelligence-gathering operations. The tactics, techniques, and procedures range

from free and open source tools using exploits against known vulnerabilities to custom developed

toolchains using previously unreported software vulnerabilities.

The range of targets, attackers, and tools require network defenders to detect and respond to a

complex and growing combination of threats against the network. The attack life cycle typically

follows the stages of reconnaissance, initial exploitation, gaining persistence, lateral movement,

and attaining the objective, producing numerous network and host-based artifacts for detection of

malicious activity. Identification and recognition of these artifacts, and how they relate to each

other, allows the defender to separate regular activity from that generated by an attacker. To reduce

the total time between initial compromise and remediation, or dwell time, defenders require the

ability to rapidly identify malicious indicators and the supporting related information from multiple,

unconnected sources.

1.2 State of Practice

Although the average dwell time has decreased based on recent analysis (Brumfield 2017), there

are still many cases where adversaries persist within a network for months if not years. With the

current state of network intrusion detection focused primarily on signature or rule-based alerting

of individual events, network defenders are required to correlate events from many data sources

manually.

The current state of event correlation for security relies primarily on Structure Query Language

(SQL) or full text-based searches against vast data stores of event logs and alerts. Products such

as Splunk (“SIEM, AIOps, Application Management, Log Management, Machine Learning, and

1

Compliance” 2017) and the Elastic stack (“Powering Data Search, Log Analysis, Analytics” 2017)

provide defenders with the capability to collect all of the relevant security information into a single

location from which they can search and perform event correlation. Having these systems in place

give the defender the ability to gain insight into the status of the network and possible malicious

activity.

What these products do not provide is an easily queryable model that represents how all of the

data sources are related. By placing data into relational database tables or a centralized full-text

database, it is typically up to the defender to create queries that define how the data are related

and the desired results. While this method can prove useful; it requires extensive knowledge of the

underlying data structure from each of the sources, an understanding of how those sources relate to

each other, and how to write queries that can merge the appropriate sources to gain insight into the

activity within the network.

The complexity presented by the current tooling, although useful, can result in increased time

to detection or missed indicators of compromise within the network. This complication can also

reduce the ability of a defender to respond to an identified incidents due to data storage and query

methods. To empower defenders to locate and remediate and incident rapidly, a new approach to

store and query the large volumes of data in a contextual model is needed.

1.3 A New Model

To address the problem of long adversary dwell times within organization’s networks, this

Praxis proposes that the use of graph databases to correlate data will provide network defenders

the ability to analyze the relationships between individual data sets resulting in higher detection

rates and shorter dwell times.

Specific objectives of the research were to 1) assemble current methods of data collection within

a typical network structure, 2) to analyze the relationships between individual data sources within

context of detection of malicious activity 3) identify a graph model that supports detection of adver-

sarial network activity, and 4) evaluate the effectiveness of a graph model to reduce time to detection

of malicious network activity.

Graph databases treat the relationships between data nodes with the same level of importance

2

as the data nodes themselves (Robinson, Webber, and Eifrem 2015). The storing of the data about

the connections between nodes as an entity within the database removes the need to conduct com-

plicated SQL JOIN queries and creates a contextual representation of the data stored directly in the

database. Storing the data and its relationships in a graph model allow the user to traverse the graph

with queries that move through the model based on how the nodes are related to each other.

The graph model for the types of data required for the defense of a network provides the frame-

work to define the information contained within each of the data sources, how those data sources are

related, and supporting information about those relationships. A fully developed graph model pro-

vides the level of understanding required to perform extraction, transformation, and loading (ETL)

of the individual data sources, typically available in a wide variety of formats, for insertion into

the graph databases. Additionally, the graph model provides the ability to generate queries of the

database in a manner that is representative of how the data is related.

The research in this praxis demonstrates that the use of graph databases in network defense can

reduce the dwell time of adversaries within a network by providing defenders with a contextual

correlation of the data sources typically available in a network. The graph model seeks to provide

a holistic view of the data that allows the defender to locate and understand how individual events

within their logs are directly related to each other. A distributed, modular, and scalable architec-

ture supports the collection and ETL of individual data sources into the graph database within a

test network to capture representative activity during normal operations. Simulated attacks against

the network demonstrated the ability of the model to rapidly identify potential malicious activity

resulting in lower dwell time for the adversary.

The graph model and software architecture produced as a result of the research in this praxis

can be applied to existing networks of varying scale and structure. Implementation of the sensors

and graph correlation system discussed in the following chapters would provide network defenders

within any industry with the capability to gain a more informed understanding of the activity within

the network and reduce the time required to detect and act against the potential malicious activity.

3

1.4 Practical Application

As a scalable, modular, and distributed system, the software architecure and graph data model

developed in this Praxis supports network defenders in more rapid detection of potential adversarial

activity. By centralizing data from multiple sensors and sources into a unified graph model with a

single user interface, network defenders’ workflow is improved while reducing the training burden

typically associated with the diverse tool chains used in network defense. Additionally, through

the use of a container based architecture supported by open source software, the deployment and

lifecycle costs of the system are minimized. Finally, by reducing the time to detection of adversary

activity within the network, organizations could prevent the cost of a data breach, which in 2017

averaged nearly $4 million per breach.

Engineering managers in the areas of Risk Management, Technology Management, Information

Management, and Enterprise Information Assurance may apply the techniques, methodology, and

technologies discussed in this Praxis. By employing the correlation system, which is based on

open source software, the reduced cost of implementation and training would make the decision to

implement the system more palatable for security risk managers concerned with cost versus benefit

of new defensive technologies. For technology and information managers, the modular, scalable,

and distributed design of the system provides the flexibility to grow and modify the system to best

suit the existing and future architecure needs of the individual organization. Finally, as the goal of

the research is focused on better defending networks, managers in enterprise information assurance

can implement the correlation system within their organization to support more rapid detection of

adversarial activity, therefore better protecting their organizations sensitive data.

1.5 Organization of Chapters

This praxis is organized into chapters. Chapter 2 discusses the graph, intrusion detection, and

security related literature reviewed to support the development of the graph model and software

architecture. Chapter 3 discusses the foundation of the graph data model and the data sources and

architecture needed to implement the event correlation system. Chapter 4 describes the process of

collecting, transforming, and storing of events from the data sources into a graph database. Chapter

5 covers the execution and analysis of attack scenarios in a simulation environment for validation

4

of the graph model. Finally, Chapter 6 summarizes the results of the research and provides recom-

mended areas of future research.

5

Chapter 2 – Literature Review

Topics of investigation for this research included industry trends in data breaches, methods of

intrusion detection, graph databases, and their uses, and techniques for the collection and aggrega-

tion of network and host-based data for computer network defense. Sources of research included

books, peer-reviewed journal publications, industry technical reports, and online documentation of

software to support developing the implemented architecture.

2.1 Data Breaches

Although the methods and objectives of attackers change over time, organizations in every

industry are victims of attacks (Brumfield 2017) against their networks. While the costs of these

breaches globally dropped 10% in the last year, the average cost per breach is still $3.62 million

(Ponemon Institute 2017). One factor in reducing the average cost per breach is the reduction in

time to detect the breach.

The average time to detection varies based on the organization conducting the analysis. While

Verizon, the Ponemon Institute, and the SANS Institute agree that defenders are getting better at

detecting breaches, the average detection time reported varies from 6 hours to months. (Brumfield

2017; Ponemon Institute 2017; SANS Analyst Program 2017)

Perpetrators of the attacks include insider threats, outsiders, state-affiliated actors, and orga-

nized crime organizations using malware, phishing, stolen credentials, social engineering, and

physical access to breach targeted networks (Brumfield 2017). The wide range of threat vectors

and techniques require that network defenders and analysts be able to identify the malicious activity

of varying capabilities and intents from within all of the event data, both good and bad, available

for analysis.

As a result of the growing threat to networks within all major industry verticals, “a substantial

industry has arisen focused on research and development of software products to monitor and scan

data networks, and detect events indicative of potential attacks” (Reed et al. 2014).

6

2.2 Intrusion Detection

“Intrusion detection is the process of identifying malicious activity targeted to computing and

network resources” (Pharate et al. 2015). and the intrusion detection systems may be host or network

based. The host-based intrusion detection system (HIDS) is a “single computer specific intrusion

detection system which monitors the security of that system or computer from internal and external

attacks” (Pharate et al. 2015) while the network-based intrusion detection system “monitors network

traffic and analyzes the passing traffic for attacks” (Pharate et al. 2015). As a complement to current

intrusion detection systems, new research in the area of early warning systems (EWS) seek to take

a proactive approach to alert correlation from multiple sensors (Ramaki and Atani 2016).

As networks have grown, the deployment of IDS solutions has also evolved to evaluate larger

amounts of data. Collaborative IDS solutions collect information and communicate with each other

in a centralized, decentralized, or distributed architecture (Vasilomanolakis et al. 2015). The large

number of options available for intrusion detection systems has resulted in creation of new research

areas including the evaluation of IDS effectiveness (Milenkoski et al. 2015).

The inclusion of network artifacts such NetFlow and full packet captures (PCAP) in addition to

host-based logs and intrusion detection alerts can further enhance the ability to detect and analyze

malicious activity. The use of network-based artifacts can add context to traditional host artifacts,

provide baseline network activity, and identification of anomalous traffic (Bromiley 2017). While

NetFlow and PCAP were not directly utilized in this research, the Bro network security monitor

provides output similar to NetFlow and the ability to collect PCAP.

Other research seeks to improve current intrusion detection systems through a learning model

based on system events and their dependencies (Friedberg et al. 2015; Garcia-Teodoro et al. 2009).

These approaches seeks to apply statistical methods and machine learning through event correlation

to support detection of anomalies in support of intrusion detection.

As the number of IDS solutions and other sensors deployed to networks grows, the volume of

alerts and event data created by the sensors also grows. Collecting and aggregating the data into a

central location for analysis creates the need for an architecture capable of handling the data volume.

7

2.3 Collect the Data

With sensors from multiple software vendors installed on hosts running various operating sys-

tems, a flexible platform for event and alert collection is required. To support the goal of collecting

data from multiple sensors located throughout the network, the collection agent should support

cross-platform operation and the ability to parse data in multiple formats.

Several open source software offerings meet the desired requirements for centralized logging

of alerts and events. The Beats family of shippers include agents to monitor and transfer log files,

metrics, network data, Windows Event Logs, and audit data to Elasticsearch for storage or Logstash

for parsing and forwarding to a storage backend (“Beats: Data Shippers for Elasticsearch” 2017).

While the Beats shippers are fully integrated into the Elastic software stack, forwarding events to

Logstash provides the ability to parse the event data for ultimate storage in various databases or

messaging frameworks.

Similarly, Fluentd is an open source project that aims to provide a unified logging layer that

is reliable, scalable, and extensible (“What is Fluentd?” 2017). Through a plugin architecture,

Fluentd provides the capability to ingest, parse, transform, and forward log and event data from

multiple data sources to multiple data outputs concurrently.

While Beats and Fluentd support the collection requirements for multiple data sources, they

do not completely satisfy the desire for extensibility in the consumption of data. Both products

support forwarding data to multiple outputs. However, changes in the data consumer also require

configuration changes in Beats or Fluentd. The inclusion of a centralized messaging broker in the

logging architecture provides the desired flexibility in consumption of event and alert data.

Two open-source message brokers that meet the goals of a centralized logging platform are

RabbitMQ (“RabbitMQ - Messaging that just works” 2017) and Kafka (“Apache Kafka - Intro”

2017). Both products support distributed deployment in clusters for high availability and through-

put, essential for reliable processing of high volumes of data in modern enterprise networks. Both

products also offer cross-language development of message producers and consumers, allowing for

flexibility in the development of the logging architecture. The publisher/subscriber architecture pro-

vided by a message broker allows the addition and modification of data sources and data consumers

without perturbing the rest of the architecture.

8

2.4 Graphs

While graph theory itself is over 300 years old, the concept of graph databases has only emerged

in the last two decades. Graph databases leverage “complex and dynamic relationships in highly

connected data to generate insight” (Robinson, Webber, and Eifrem 2015) to understand the rela-

tionships between the data elements. A common modeling paradigm for data and relationships for

storage in a graph database is the labeled property graph. A labeled property graph:

• Contains nodes and relationships;

• Nodes contain properties (key-value pairs);

• Nodes can be labeled with one or more labels;

• Relationships are named and directed, and always have a start and end node;

• Relationships can also contain properties; (Robinson, Webber, and Eifrem 2015)

One of the first, and most popular, databases to utilize the label property graph is Neo4j. As a

full-featured graph database, Neo4j provides scalability, graph transactions, graph analytics, visu-

alization support, and application program interfaces (APIs) (“The Neo4j Graph Platform” 2017).

As an alternative to a pure graph database, multi-model databases such as ArangoDB, Couch-

Base, and OrientDB utilize a combination of key-value, document, graph, geospatial, and relational

models to store data within a single database. A native multi-model database can use multiple data

models within a single core and provides a single query language to access the data (ArangoDB

2016). As an example, ArangoDB provides document, key-value, and graph model data stores

within a single database. The document store can contain complex nested documents while the

graph store connects individual documents based on their relationships.

Although the use of graphs and graph theory appears in recent intrusion detection and security-

related research, there is little research against the direct application of graph or multi-model

databases to the correlation of security and metric related events from multiple data sources within

a network. There are examples of graph-based alert correlation (Fredj 2015), graph-based created

of IDS rules from HTTP logs (Djanali et al. 2015), graphs for monitoring user authentication events

(Kent, Liebrock, and Neil 2015), and the use of clustering and nearest neighbor for evaluation of

IDS alerts (Shapoorifard and Shamsinejad 2017). The previous research demonstrates the capabil-9

ity of approaching the security problem as a graph, but focuses on specific indicators vice modeling

all of the available data as a graph for storage and query within a database.

2.5 Defend the Network

Development of a graph model for the network alert and event data requires an understanding

of which data elements are important for network defense and how the individual events relate

to each other. Additionally, understanding how and where this research fits into the workflow of

network defenders (Reed et al. 2014) provides insight into the construction of the graph model and

the queries for extraction of relevant information from the database.

Although focused on the implementation of specific defensive measures, the Critical Security

Controls (Critical Security Controls for Effective Cyber Defense 2015) and Special Publication

800-53 (Security and Privacy Controls for Information Systems and Organizations 2017) provide

valuable insight into data sources and their relationships. The framework provided by the National

Institute of Standards and Technology (NIST) lays out the functional categories of network defense

as Identify, Protect, Detect, Respond, and Recover (NIST 2014). Using the NIST framework as a

guide, this research focused on improving the defender workflow in the detection phase to reduce

time to detection while providing the information necessary to respond and recover from the attack.

10

Chapter 3 – Problem Solution

The research in this Praxis focused on developing the software, architecture, and graph-based

data model to support reducing the time for adversary activity detection through improving the

workflow of network defenders. Sections 3.1, 3.2, and 3.3 discuss the goals and implementation of a

software architecture capable of being implemented in current enterprise networks of all industries.

Section 3.4 discusses the data sources that contribute to the graph data model and identifies possible

future data sources for inclusion into the graph model. Sections 3.5 and 3.6 cover the graph database

and the insertion of events from the data sources into the database. The development of the graph

model based on the available data sources is discussed in Section 3.7.

3.1 Modular, Scalable, Distributed

To support networks of various sizes and architectures while providing for future growth, the

event correlation system developed as a result of the research in this praxis is modular, scalable,

and distributed. Designing the architecture in this manner provides for a system capable of be-

ing integrated into existing networks and adapting to an organization’s specific needs and network

architecture. Modularity offers the ability to add or modify input sources, change the data trans-

formation process to meet business needs, and add additional queries and alerting methods. The

scalability of the system allows for on-demand increases in the processing of the modular event

streams to meet the current level of observed network activity. The distributed cluster architecture

provides for redundancy by splitting core functionality across numerous physical servers while also

delivering performance gains through the distribution of workload across multiple instances.

3.2 Container Based

The use of container images supports the above goals of modularity, scalability, and a dis-

tributed architecture. A container image is a lightweight bundle of the software and libraries re-

quired to run executable code (“What is a Container” 2017). When combined with an orchestrator

to manage the deployment, starting, and monitoring of the images, a container based platform in-

troduces techniques and capabilities typically unavailable in traditional physical or virtual machine

11

based environments.

Running a single process within each container supports the goal of achieving modularity within

the system. In the development of the system, each container focuses on a single task. For example,

one container to transfer network intrusion detection system (NIDS) alerts across the network and

the second container to transform and load the alerts into a database. When adequately architected

this separation provides the ability to modify or add the functionality of individual components

independent of the rest of the system.

With each function running in its own container, scalability is achieved by starting more in-

stances of the service required to handle the additional workload. If the number of alerts being

process by the transform function exceeds the capacity of the running container, the orchestrator

can be used to start other, identical containers to split the workload. The startup speed of containers

over virtual machines also highlights another advantage. As containers only run a single process

and utilize the resources of the host machine they do not require the traditional boot process of a

physical or virtual machine resulting in start times of single seconds.

Through the use of an orchestrator such as Docker Swarm (“Swarm mode key concepts” 2017)

distribution of the system occurs across many physical or virtual hosts that are viewed as a single

platform to run the containers. By distributing across several hosts, the system becomes resilient

to failures in a single host as there are multiple instances of core functionality deployed across the

platform. The distributed system also provides for increased performance through the running of

various instances in a clustered environment that allows sharing of the workload of core function-

ality such as message distribution. For example, running Apache Kafka (“Apache Kafka” 2017) in

a clustered architecture across multiple hosts ensures that messages are replicated across all of the

Kafka instances while allowing clients to connect to any of the instances to produce or consume

messages.

3.3 Centralized Logging

A core component of any event correlation system is the ability to collect and aggregate the

logs and alerts centrally from the sensors, hosts, and applications within a network. The volume

of data produced from various sources within a network can provide invaluable insight into the

12

activity within the network. The challenge becomes the management and distribution of the logs in

a manner that supports analysis of the events as a connected system.

Through the use of a logging pipeline, events produced by multiple sources in various formats

are collected, standardized, and transferred into a single repository for consumption. To support

the logging pipeline in this research, Apache Kafka (“Apache Kafka - Uses” 2017) provides for

producers and consumers of event information. Apache Kafka contains the libraries and interfaces

needed to develop an integrated production or consumption capability into the components of the

logging pipeline. For example, by adding a plugin to the Bro network security framework, the logs

produced by Bro can be automatically sent to the Kafka cluster in a format for consumption by other

services. Where this direct integration with Kafka is not possible the addition of Fluentd (“What is

Fluentd?” 2017) as a log transport mechanism provides the ability to add many different sources

into the centralized logging framework. With no direct Kafka plugin for the intrusion detection

system Suricata, Fluentd is used to collect, transform, and transport the alerts into Kafka.

The production of messages from all sources within the network results in Kafka topics grouped

by the nature of the information source. Collection of logs from all instances of Bro under a single

topic allows for consumption of all Bro events by an individual consumer. This architecture supports

the overall goals of modularity and scalability by enabling the addition of similar sensors into the

network while still collecting all of the events into the same location for consumption. Adding a

new type of data source only requires the addition of a Kafka or Fluentd supported plugin to add the

events to a new topic ready for consumption. Additionally, running Kafka in a distributed cluster

supports the resilience and performance requirements necessary in large environments.

With event data from all sources centrally located within the Kafka, consumers can retrieve

messages from the available topics for processing. Kafka tracks which consumers are pulling from

each topic and the retrieval of individual messages. This tracking allows for multiple consumers

to read from the same topic providing for increased throughput when processing through the same

stream or for numerous different output streams to pull from the same topic. The flexibility offered

in this architecture supports the goals of scalability and modularity by allowing numerous identical

consumers to pull from the same topic to keep up with message rates while also enabling new

consumers to pull from existing topics to support additional functionality.

13

3.4 Data Sources

The framework provided by a centralized logging system enables the collection of the vast ma-

jority of data sources typically found within modern networks. Additionally, the modularity of the

previously discussed architecture allows the addition of new security or performance related data

sources as they are added to the network. As this research is concerned with the presence of mali-

cious activity within a network, the data sources will focus on information relevant to identifying

potential adversarial activity.

3.4.1 Bro Network Security Monitor

For an analyst to gain insight and understanding into the activity occurring within a network,

they must be able to determine information about which hosts are communicating over which pro-

tocols and the type and volume of data being communicated (Shostack 2014). The Bro Network

Security Monitor is a potent tool that provides the capability to monitor network activity.

At the core, Bro is a framework of network protocol decoders and event handlers enabled by

a scripting language, also named Bro. The network protocol decoders understand the structure of

many of the protocols used within modern networks including HTTP, DNS, SMTP, FTP, SMB,

and others. By following the structure of the protocols, Bro can identify specific protocol traffic

even when it is not using standard ports such as TCP port 80 for HTTP traffic. The knowledge of

network protocols also enables Bro to extract all relevant information about the session including

files transmitted across the connection.

To support the correlation of connection, protocol, and file information Bro uses a unique iden-

tifier (UID) associated to each recorded connection. The UID appears in each of the logs files

produced by Bro to enable the analyst to determine information about the connection itself such as

duration, IP addresses, ports, and packets transferred. For example, using the UID from the con-

nection, the analyst can then query the HTTP log for a web connection and determine information

about the underlying HTTP session within the connection including the User-Agent header, the re-

quest URI, and the HTTP Request verb. Finally, the connection UID can be used to query the files

log to determine which files the HTTP session contained.

By default Bro stores the logs for connections, protocols, and files within individual files which

14

simplifies ingesting the information into a logging pipeline. However, as previously discussed this

research utilized a plugin for the Bro framework (“Metron - Logging Bro Output to Kafka” 2017)

that transmits the log entries to the Kafka cluster as they occur. The plugin will stream the log entries

to each node in the Kafka cluster in a load balanced manner as Kafka will separate the storage of

the topic messages into partitions to maximize available throughput and processing.

3.4.2 Suricata

While Bro can provide detailed information about the connections occurring within the net-

work and detection of activity across multiple network flows, it may not be the ideal means to

detect malicious activity based on signatures. When paired with a rules-based IDS such as Suricata,

the combination of flow and signature-based detection becomes a powerful mechanism to detect

adversary activity.

Where Bro succeeds in having a detailed understanding at the protocol level, Suricata and sim-

ilar tools such as Snort look for signatures at the byte level as network traffic passes through the

sensor. By examining traffic at the byte level, detection of specific anomalies within the traffic is

accomplished by writing rules to match the pattern of the anomaly at the byte level. Rules can

support the detection of known malware variants, policy violations, and post-exploitation activities

such as adding a user account over the network (Shostack 2014).

For both Suricata and Snort rules can be obtained from a third-party provider such as Emerging

Threats (“Emerging Threats” 2017) or Talos (“Talos - Author of the Official Snort Rule Sets” 2017).

Both providers offer free rule sets in addition to subscription-based updates for the most recently

identified network threats. The third-party rule sets provide a significant foundation to detect mali-

cious activity known to occur in the wild. In addition to the provided rules, an analyst can quickly

write or modify existing rules to meet the specific needs of the monitored network. For example,

rules can be created to monitor for sensitive information such as personally identifiable information

(PII) or corporate secrets exiting the network.

The fundamental components of any rule are the protocol of concern, the source and destination

IP address and port, the action taken, and the signature itself. The use of variables in the rule for

internal and external networks allows the analyst to create rules that are based on the direction of the

15

monitored network traffic, providing for finer control and tuning of alerts. For example, based on the

nature of operations within the network, outbound SSH connections are part of normal day-to-day

operations but the analyst wants to be alerted for any inbound SSH connections.

With the third-party and custom rules in place, Suricata will log alerts for any network traffic

that matches a signature rule. Traditionally the alerts are evaluated in an alert management console

such as Squil (“Squil - Open Source Network Security Monitoring” 2014) or forwarded to a SIEM

such as Splunk (“SIEM, AIOps, Application Management, Log Management, Machine Learning,

and Compliance” 2017). As previously discussed, while these methods improve the ability for an

analyst to evaluate activity within the network, much of the correlation and subsequent investigation

is a manual process. By adding the components of a Suricata alert to the graph model, they are

already correlated to other events such as the components of the Bro logs, simplifying the work of

the analyst to develop a full picture of the activity under investigation and providing a means for

automated queries against the graph database.

Although there is currently no plugin for Suricata to automatically send alerts into the Kafka

cluster, the addition of Fluentd (“What is Fluentd?” 2017) and Fluentbit (“Fluent Bit” 2017) into

the centralized logging framework provides the needed functionality. The role of Fluentd in the

centralized logging framework is to receive inputs from various sources, transform the messages

from the sources, and forward the messages to the Kafka cluster for consumption by other services.

Fluentbit simply monitors the alert log generated by Suricata and transmits the alerts as they occur

to the Fluentd instance. This architecture allows deployment of multiple instances of Suricata in the

network and ensures alerts from all instances are forwarded to the Kafka cluster.

3.4.3 osquery

The combination of Bro and Suricata provides for detailed insight into the network-based ac-

tivity but does not address the need to understand what activity is occurring on the hosts and servers

within the network. A host-based monitoring agent is required to capture and understand the events

that occur at the host level. While there are many host-based solutions available, the open source of-

fering osquery (“osquery Docs” 2017) from Facebook provides the detail, flexibility, and scalability

desired to add host-based information to the graph model.

16

As the software that runs on the target host machine, osquery provides the ability to obtain

extremely detailed and valuable information about the host via automated or manual queries. At its

core, osquery essentially turns the host into a series of queryable relational database tables. SQL

queries run against the host provide information such as running processes, open network ports, file

integrity changes, installed software, and connected USB devices.

Facebook itself (“osquery Docs” 2017) and other organizations in the security community

(“Trail of Bits” 2017) provide users of osquery with queries or packs of queries to run on every

major operating system. While some of the queries are limited to running on specific operating

systems, much of the core functional queries run on Mac, Windows, and Linux operating systems.

Another benefit of osquery related to the research is the inclusion of a Kafka Producer as part

of the default installation. (“osquery Docs” 2017) A few modifications and additions to the osquery

configuration will send all of the query results to the appropriate Kafka topic in the cluster. The flex-

ible configuration minimizes the number of programs that must run on each of the monitored hosts

while also allowing for rapid addition of new hosts into the logging system. A standard configura-

tion can be deployed to every host in the monitored environment with configuration management

tools such as Ansible, Puppet, or Chef.

3.4.4 Additional Sources

The use of Bro, Suricata, and osquery in the environment provides a significant amount of

detail for network intrusion detection, network security monitoring, and host-based analysis. While

many enterprise networks include other tools such as vulnerability scanners, host-based intrusion

detection, and network scanners, the scope of this research is limited to the selected tools to maintain

the complexity of system development within in the time requirements of the research phase.

However, the previously discussed goal of modularity in the system provides for the relatively

simple integration of the additional data sources into the system and the data model. For example,

the vulnerability reports for hosts provided by the Nessus Vulnerability Scanner (“Nessus Profes-

sional Vulnerability Scanner” 2017) could be forwarded to the Kafka cluster and processed for

insertion into the graph database.

Addition of new data sources would also provide for further queries of the database to gain

17

additional insight into the activity within in the network. The structure of the data model and the

functionality of the query language provides seamless integration of the new data sources into the

workflow of the analyst. Adding existing and future data sources into the system offers several

opportunities for future research and development.

3.5 Events to Graph Model

With the events from each data source now stored in individual topics in the Kafka cluster, the

events are transformed and loaded into the graph database. The use of unique containers to provide

the extraction, transformation, and loading of event data by topic supports the goals of modularity

and scalability. This architecture provides the ability to modify the transform and load process for

a given topic or to add processing of new topics from additional data sources. Additionally, the

structure of the transform and load process combined with scalable nature of Kafka Consumers

allows for multiple instances of the processing container to pull events from the same topic to

increase the processing throughput for the given topic.

Each of the processing containers provides three core functions, a Kafka Consumer to retrieve

events, the transformation of the event data, and loading of the transformed data into ArangoDB.

A Python script running within the container provides all of the required functionality. Python

modules for the Kafka Consumer (Oh 2016) and the ArangoDB Client (Powers and Arthur 2016)

are used to simplify the extraction and loading functionality of the process. The modules provide

a standard interface to Kafka and ArangoDB with support for error handling and load balancing

within each cluster.

The Python code for the transformation of the event data depends on the content of the individ-

ual topics. Each of the data sources produces events in their formats with potentially different vari-

able names for the same data point. For example, Bro refers to the source IP address as ‘id.orig h’

while Suricata uses ‘src ip’ as the variable name for the originating IP address of the connection.

The code also ensures that the event data is transformed to conform to the graph data model by

creating the appropriate vertices and edges for loading into ArangoDB.

18

3.6 The Graph Database

As the central point of the correlation system, the database receives the transformed event

data and provides the query language and interface to analyze the correlated events. While there

are several graph database technologies available (“DB-Engines Ranking - popularity ranking of

graph DBMS” 2017), this research utilized ArangoDB (“ArangoDB - highly available multi-model

NoSQL database” 2017) to provide the storage and querying of the event data.

ArangoDB is a multi-model database that provides document store, key/value store, and graph

store in one database. The multi-model database allows for storage of rich data sets and the rela-

tionships between the individual data items. The document store maintains individual documents

within collections and supports storing deeply nested data items. The ability to store nested data

items within a single document allows for the combining of data from separate sources into a single

representation of the related data. The graph store maintains entries in vertices and edges based on

the data model. By storing the data in a graph model, the queries can traverse the graph edges based

on the relationships between the vertices.

In meeting the scalable and distributed goals of the system, ArangoDB supports deployment

in a clustered architecture. The clustered architecture provides for distribution of the data across

multiple nodes resulting in improved performance of reads and writes through sharding while im-

proving data resiliency through replication. A three node cluster similar to that used for Kafka is

employed in the system by running multiple instances of ArangoDB within separate containers.

To query the database ArangoDB provides the ArangoDB Query Language (AQL). AQL is sim-

ilar in structure to the Structured Query Language (SQL) used in relational databases (“ArangoDB

- highly available multi-model NoSQL database” 2017). AQL operates against the multi-model

database allowing for the ability to query from a document store, a graph store, and a key/value

store within the same query. The flexibility of AQL allows for queries that traverse the graph store

based on relationships between vertices and returning results from deeply nested vertices stored as

documents within the document store.

Another benefit of choosing ArangoDB as the database for the correlation system is the Foxx

Microservice Framework (“Foxx at a glance” 2017) provided by ArangoDB. Foxx provides “a

JavaScript framework for writing data-centric HTTP microservices that run directly inside of

19

ArangoDB.” (“ArangoDB - highly available multi-model NoSQL database” 2017) The benefits

Foxx provides include the standardizing data access and storage, reduced network overhead due to

running logic within the database itself, and the ability to restrict access to sensitive data within the

database.

By creating Foxx services within the database, complex queries can be run against the in-

memory data from within the database itself while only requiring the client to conduct a single

HTTP request to the database. Additionally, Foxx services can be developed to run periodically and

provide results to a client. This structure is well suited for the goal of the research to identify patterns

of potentially malicious activity, query the database for matching events, and provide the results

of the query to the analyst automatically. With the alert queries in place, additional investigative

queries can be created as Foxx services to be accessed on demand by the analyst without requiring

the analyst to understand the AQL syntax.

3.7 The Model

To leverage the power of ArangoDB and AQL, the individual data sources must be transformed

into a graph model that accounts for the relationships between the data elements. As previously

discussed, the data sources forward events to Kafka for consumption by Python services that con-

duct the ETL of the events into ArangoDB. The transformation logic within the Python service is

specific to each data source’s Kafka topic. This structure allows for modification and additions to

the logic for each topic as well as additions of entirely new topic for inclusion into the data model.

Creation of a graph model for the components of an individual data source is a straightforward

process as the relationships between them are relatively intuitive. For example, an alert from Suri-

cata contains the alert content, the source IP, and the destination IP which translates to a three-node

graph with two edges to connect the components. Full insight into the network activity comes from

creating the edges that relate components of individual data sources to each other. The completed

graph model, displayed in Figure 3.1 provides the ability to create queries that traverse the entire

graph by analyzing the relationships of nodes from the different data sources as a single data set.

20

Figure 3.1: Complete Graph Model

3.7.1 Bro

A default installation of Bro, as implemented in the research, provides logs for connection,

DNS, HTTP, FTP, SSH, files, and other common network protocols and services. To support the

internal correlation of the individual logs, Bro uses a unique identifier (UID) for the individual con-

nections. The connection log serves as the central reference point for the remaining logs created

21

by Bro. For example, the HTTP log contains a field for the UID that correlates to an entry in the

connection log that contains specific information about the connection such as source and destina-

tion IP address while the HTTP log contains relevant information about the HTTP session such as

URL visited and the browser user agent. For the files transferred as part of a connection, there is an

additional UID for each file observed as part of a service. If multiple files were transferred as part

of an HTTP session each of the files is an entry in the files logs and the HTTP log entry contains a

field with a list of the file UID’s.

Figure 3.2: Bro Graph Model

Figure 3.2 displays the high-level graph model for the data provided in the Bro logs. The two

‘IP’ nodes account for the source and destination IPs of the connection. The ‘connection’ node con-

tains all of the information contained within the conn.log from Bro. A single ‘connection’ node may

connect to multiple ‘service’ nodes if multiple services are present within the connection. The ‘ser-

vice’ node is a generic placeholder for the information about the specific service contained within

the connection while the ‘file’ and ‘cert’ nodes contain information about the files or SSL certifi-

cates transferred within the service as part of the connection. Each file and certificate transferred

within the service will appear as a separate node within the database.

22

This structure provides the ability to traverse the graph and identify all of the connections,

services, and files associated with a particular IP address. Similarly, the graph could be queried to

locate all of the IPs associated with the single instance of a particular file to determine how many

hosts downloaded the suspected file. As a single data source in the graph database, the Bro logs

provide useful insight into the nature of network traffic. When combined with the additional data

sources the analyst will be able to gain a deeper understanding of the activity within the network.

3.7.2 Suricata

To provide network intrusion detection capabilities, a default installation of Suricata is used as

a sensor to monitor network traffic. Suricata provides alerts for any observed network traffic that

matches a defined set of rules. While custom rules can be created based on the needs of a particular

organization, this research uses publicly available rules to monitor for malicious network activity.

As shown in Figure 3.3, the graph model for Suricata alerts is relatively simple. Consisting of

three nodes, the model captures the IP addresses associated with an alert. The ‘alert’ node contains

all of the relevant information associated with the alert from Suricata.

Figure 3.3: Suricata Graph Model

Using the IP address as a common node between the Bro and Suricata models, the overall data

model begins to come together. With both data sources in the graph database, an analyst can now

query for the IP address associated with a particular Suricata alert and determine which files were

associated with the alert from the Bro logs. The combination of Bro and Suricata in the graph model

provide the analyst with insight into network activity and the hosts involved but do little to provide

an understanding of the activity occurring on the host itself.

23

3.7.3 osquery

To enrich the network-based portions of the graph model, information about each of the hosts

within the network is added through the use of osquery on the hosts. osquery is an open source

product released by Facebook that provides “an operating system instrumentation framework for

Windows, OS X (macOS), Linux, and FreeBSD.” (“osquery Docs” 2017) osquery provides the

ability run scheduled queries against the host for information such as operating system version,

installed software, connected USB devices. For more time-sensitive information that may be missed

between scheduled queries, such as process and networking events, osquery provides an eventing

framework to capture these events as they occur on the host.

Although osquery allows for querying of detailed information from the host, this research fo-

cused on the components shown in Figure 3.4. The elements of the data model provided by osquery

focus on locating potential malicious activity on the host while connecting the host data to the

network-based data through the IP address associated with the host.

Figure 3.4: osquery Graph Model

24

With osquery installed on hosts within the network, the configuration runs the scheduled queries

and tracks event-based data. The results of the queries are pushed to a single topic within the Kafka

cluster for transformation and loading into ArangoDB.

3.7.4 Complete Model

With the network and host-based components of the data model completed, the overall data

model for the research represents a view of the individual data components connected by their

relationships to each other. Figure 3.1 provides an overview of the full graph data model used for

the research to identify malicious network activity through the use of crated AQL queries of the

database.

As all of the data sources provide the IP addresses of the hosts involved, the IP address is

the vertex of the graph that connects each of the individual data models. As an example query,

from this model, the graph could be traversed from a NIDS alert provided by Suricata to determine

which users were logged into the destination host of the alert as provided by osquery and the files

transferred during the connection as provided by Bro.

25

Chapter 4 – Research

4.1 Overview

Although attacker skill sets can range from a low-level “script-kiddie” to well-funded nation

states (Sanders and Smith 2014), the overall methodology employed during an attack follows a

typical pattern or lifecycle of activity. Additionally, the objective of the attacks can vary based on

the skill level and intent of the attack and the nature of the target network. A “script-kiddie” may use

freely available tools and publicly known vulnerabilities to deface a website solely for the sense of

accomplishment while a fully funded nation-state team of attackers with custom tools and privately

developed exploits may attack and persist within a corporate network for exfiltration of proprietary

information.

At a high level, regardless of the skill level or intent of the attacker, the majority of attacks will

create events or artifacts in the network or endpoints of the targeted environment. Timely collection

and analysis of attacker generated events can lead to locating malicious activity in hours vice the

weeks or months observed in recent high profile attacks (Brumfield 2017). While there has been

significant research and commercial application of event collection and aggregation, correlation of

the events into a cohesive model is typically left to the analyst and their understanding of the system.

This research demonstrates the correlation of events from network and endpoint based data

sources into a graph model to support early detection of malicious network activity. By collecting

and transforming events from individual sources within the target system, a graph-based represen-

tation of activity can be constructed and queried based on the data and the relationships between

the individual data sources. As discussed in Chapter 3, the graph model used in the research is con-

structed from network and endpoint data sources to capture and represent artifacts that are typically

used by network defenders and incident responders to detect malicious activity.

4.1.1 Attack Lifecycle

To build a context based graph model to detect malicious activity, an understanding of the typi-

cal lifecycle of attack must first be understood. Published models for the attack lifecycle vary from

simple four-step cycle (Shakarian, Shakarian, and Ruef, n.d., p. 134) to more detailed, and military

26

focused linear kill chain models (Yadav and Rao 2016). This research leveraged the model provided

in the FireEye M-Trends 2017 report to frame the context of an attack. The lifecycle provided by

FireEye, as shown in Figure 4.1 (FireEye 2017), represents the lifecycle as a combination of linear

events and a repeating cycle.

Figure 4.1: FireEye Cyber Attack Lifecycle Reprinted from M-Trends 2017 - A View From theFrontlines, by FireEye Copyright 2017 by FireEye. Reprinted with permission.

As discussed in Chapter 1, each stage of the lifecycle may produce artifacts related to the ac-

tivity of an attacker. Representing the attack lifecycle in this manner efficiently captures the overall

structure of most objective based attacks. At a high level, the first linear portion of the attack

represents the reconnaissance, initial exploitation, and persistence phases of an attack. During a

the reconnaissance phase the attacker may directly scan the network for open network ports and

available services or conduct out-of-band open source research without interacting with the target

network. Initial exploitation provides the attacker with access to the first compromised host within

the network. Exploitation methods may include but are not limited to, password guessing, social

engineering, or remote exploitation of a software vulnerability. Depending on the nature of the ini-

tial exploitation, the attacker’s access may be lost due to instability of the exploit or the need to have

a user logged in to the host to maintain access. To maintain permanent access to the compromised

host the attacker carries out steps, such as installing a remote access trojan, in the persistence phase

to ensure repeatable access to the host. The cyclic part of the lifecycle captures the fact that once

inside a network, an adversary will repeatedly conduct additional reconnaissance and exploitation

within the target network to reach the ultimate objective. As the first step in the cycle, privilege27

escalation to an administrative account may be required if the initial exploitation resulted in the

attacker gaining access to the targeted host as an unprivileged user. The remainder of the cycle de-

picts the steps carried out by the attacker to gain access to additional hosts within the target network

and follows the same structure as the initial exploitation. The cycle of lateral movement continues

until the attacker reaches the objective. The cycle ends when the attacker achieves the desired ob-

jective. Mission accomplishment for the attacker can vary from extraction of sensitive information

to destruction of critical data assets and all manner of malicious activity in between.

Each phase of the attack lifecycle presents an opportunity to potentially collect events or arti-

facts that result from the attacker’s activity during that phase. Using the above model as a frame-

work, this research focused on the relevant events to capture, how the individual events are related to

each other, and how to query the resulting graph database for detection of malicious activity. While

some phases provide more opportunity to capture events related to attacker activity, each phase has

the potential to create artifacts that help put the entire picture of an attack together.

4.1.2 Indicators of Attack

Using the attack lifecycle as a guide and the event data available from the network and endpoint

sensors, specific events relevant to each stage of the lifecycle can be extracted from the event data.

Some actions such as network scanning during reconnaissance may require aggregation of many

entries while others, such as an IDS alert, provide context with a single entry.

During the initial recon phase of an attack, much of the activity may be accomplished out-

side the detection capability of the sensors. Passive events such as organization research, open

source reconnaissance conduct against outside entities, and other information gathering steps are

not observable. However, active events against the target system such as network port scanning and

vulnerability scanning will be captured by the sensors. For example, Bro collects information on

every attempted connection to ports being monitored, and repeated attempts from a single source

over a period can be aggregated to identify scanning activity. Additionally, the user-agent string uti-

lized by many open source tools can be determined by Bro and Suricata as an indicator of scanning

activity.

Event detection during the initial compromise phase can be the most difficult to determine. Net-

28

work intrusion detection systems like Snort and Suricata rely on rules based on previous detection

and analysis of exploitation events. As such, previously unseen or obfuscated exploits may pass

through the sensor undetected. Similarly, for host-based detection mechanisms, a signature is typ-

ically required to identify the exploit code as it is executed on the system. Based on the dynamic

nature of rule-based signature detection, organizations such as Talos (“Talos - Author of the Of-

ficial Snort Rule Sets” 2017) and Emerging Threats (“Emerging Threats” 2017) provide free and

subscription-based updates to rules for Snort and Suricata.

Depending on the techniques in ‘establish foothold’ and ‘maintain persistence’ phases vents

may be captured by the network and endpoint sensors. Activities such as creating a new user

or installing a new service on the host would be reported by osquery during regularly scheduled

queries. The creation of a service that also opens a listening port for external connections would

be reported by osquery while connections to the newly opened port would be reported by Bro.

Similarly, if the installed service/malware beacons out to an external host, the running process

would be reported by osquery while the outbound connection would report by Bro and possibly

Suricata if the outbound connection matches an existing rule.

As privilege escalation typically occurs within the confines of the targeted host, detection of

this activity will be detected and reported via osquery. Additions or modifications to user accounts,

changes to the running kernel, and execution of new processes will be captured through appropriate

configuration of osquery running on the host.

During the ‘internal recon’ and ‘lateral movement’ phases, the attacker is using an already

compromised host within the network to conduct network and vulnerability scanning of the internal

network and launch new exploits against additional hosts. This cycle continues until the attacker

gains access to the hosts or accounts necessary to carry out the objective of the attack. As with

many of the other phases, this activity would be present in the endpoint and network-based event

data from the sensors. However, the network sensor must be positioned to capture all internal traffic

in addition to the external network traffic through the use of a monitoring port on the internal switch.

Due to the widely varying nature of attacker objectives, the final phase of accomplishing the

mission can be difficult to programmatically identify with the event data used in this research. At-

tack objectives can range from simple website defacement, denial of service attacks, ransomware,

exfiltration of sensitive or proprietary data, and all manner of malicious intent. While the combina-29

tion of Bro, Suricata, and osquery could identify many of the indicators of the attack objectives, the

customization, and configuration of the sensors for this phase is beyond the scope of this research.

4.2 Graph-Based Event Correlation

Using current industry tools such as Splunk, the Elastic stack, and other commercially available

security information and event management (SIEM) tools, events from each of the attack lifecy-

cle phases are already being collected and stored by most organizations seeking to defend a net-

work. The nature of these tools, however, typically leaves correlation of the individual events from

the phases to the capabilities and knowledge of the network defenders. The current tooling for

event collection excels at ingesting information from multiple sources and storing the events into a

database for analysis through crafted queries and alerts to identify malicious behavior.

While current methods have proven successful at ultimately identifying attacker activities within

a network, the average time to detect the activity can still be on the order of months, leaving the

attacker with ample time to carry out their objectives. Although some of the extended time can be

attributed to the increased capabilities of attackers to hide activity from the defenders, manual cor-

relation of suspected events to identify malicious activity is time-consuming and requires detailed

understanding of the events and their relationships.

To simplify the process of identifying potentially malicious activity and reduce the overall de-

tection time, this research uses a graph database for storage and analysis of the event data. Using

the model depicted in Figure 3.1, events are collected from Bro, Suricata, and osquery for trans-

formation and loading into the graph database ArangoDB. With the events and their relationships

populated in the database in real-time, queries written in the Arango Query Language (AQL) is used

to identify malicious activity related to various stages of the attack lifecycle.

4.3 Software Architecture

All of the components of the architecture of the research run in Docker containers with Docker

Compose providing the orchestration for managing the starting of services, volumes, and inter-

service network connectivity. Figure 4.2 provides an overview of the software architecture for the

collection, transformation, and storage of event data into the graph database.

30

Figure 4.2: Software Architecture Overview

The containers for Bro and Suricata run on a Raspberry Pi to monitor all of the traffic within the

network and provide network security monitoring and network intrusion capabilities. For endpoint

activity monitoring, osquery is installed and configured on each of the monitored hosts within the

network. The remainder of Figure 4.2 represents the collection, transformation, and storage func-

tions. The Kafka/Zookeeper cluster provides the central messaging functionality of receiving events

from the data sources and provide the events for consumption by the ETL services. The instance

of Fluentd simply provides for transport of alerts from Suricata into the Kafka cluster as Suricata

does not have a native Kafka client to transmit the alerts. The ETL services retrieve messages from

the Kafka cluster, transform the message contents into the appropriate vertices and edges based on

the graph model, and inserts the data into ArangoDB. Finally, ArangoDB provides the storage and

query functions of the observed and transformed events.

4.3.1 Collecting Events

With the focus of the research on reducing the time to detect malicious network activity, the

architecture must support real-time collection and storage of events from the data sources. Central-

ized collection requires that each data source provide events to a single storage platform through

the interface provided by the storage platform. Complicating this requirement is the fact that each

31

data source stores event data in a different format and provides various interfaces for extraction of

event data to outside entities.

To address the collection requirement, a cluster of Kafka instances supported by Zookeeper as

shown in the center of Figure 4.2 serves as the central messaging fabric of the architecture. The use

of Kafka Producers for each of the data sources provides a common interface to the Kafka cluster

for transport and serialization of the event data into the Kafka cluster for storage and consumption

by the ETL services.

4.3.1.1 Kafka and Zookeeper

From the Kafka documentation, as a distributed streaming platform Kafka provides the ability

to

• publish and subscribe to streams of records similar to a message queue;

• store streams of records in a fault-tolerant way;

• process streams of records as they occur (“Apache Kafka - Intro” 2017)

The capabilities of Kafka combined with the provided Producer and Consumer APIs provide

the necessary framework to collect, store, and consume the event data from each of the data sources.

Additionally, this architecture ensures that additional data sources can be added to the architecture

by merely including the Kafka Producer for the new data source.

For the research architecture, Kafka is deployed as a three node cluster with each instance

running in a separate Docker container. A clustered deployment provides for scalability and fault

tolerance. Scalability is achieved through the partitioning of each topic across the nodes. Each

data source sends events to a single topic and Kafka handles the partitioning of the topic across

the nodes. By partitioning the topic, multiple Producers and Consumers can handle messages for a

single topic in parallel and Kafka ensures messages are received and transmitted in sequence. Fault

tolerance is achieved through replication of the partitions across all nodes within the cluster. If a

single Kafka node fails, the Producers and Consumers using that node will shift to one of the two

operational nodes without losing messages.

The Zookeeper cluster (“Apache Zookeeper” 2017) in the architecture manages the configura-

tion of the Kafka cluster and topics. Zookeeper will track members of the cluster and their status,32

the configuration of topics within the Kafka cluster, election of the controller in the Kafka cluster,

etc. Similar to the Kafka cluster, Zookeeper is deployed as a three node cluster with each instance

in a separate Docker container to provide fault tolerance should one of the instances fail.

4.3.1.2 Fluentd

The instance of Fluentd, as shown between Suricata and Kafka in Figure 4.2, in the architecture

provides merely a transport mechanism for data sources that do not support the Kafka Producer

API either natively or through the use of a plugin. Fluentd provides “ collecting, filtering, buffering,

and outputting logs across multiple sources and destinations” (“What is Fluentd?” 2017). In the

research architecture, Fluentd monitors the alerts log file produced by Suricata and sends the alerts

to the Kafka cluster with the Kafka output plugin.

The inclusion of Fluentd in the architecture ensures new data sources that do not natively sup-

port the Kafka Producer API can be added. Fluentd supports input plugins from many from differ-

ent data sources and formats, requiring only simple configuration changes to Fluentd and the Kafka

output plugin to forward events into a topic within the Kafka cluster.

4.3.1.3 Bro

As a network analysis framework, Bro provides the ability to analyze network connections,

the protocols used as part of the connection, and the data contained in the protocol. (“The Bro

Project” 2014) While monitoring network traffic via a network tap or switch monitoring port, Bro

will process every network connection and the underlying protocol data. The output logs from Bro

include a log of the connections and individual logs for the protocols Bro observed in the connection.

Although Figure 4.2 displays a single instance of Bro, the architecture supports multiple instances

of Bro monitoring different segments of the network.

The connection log contains information about all IP, TCP, UDP, and ICMP connections ob-

served in the network. Table 4.1 provides a summary of field names and description of the data

available in the connection log produced by Bro. The data contained in the log entry is used to

create the vertices and edges of the graph model as shown in Figure 3.1 and populate the properties

edge and vertex properties.

33

Table 4.1: Bro conn.log data fields

Field Type Description

ts time Timestamp of first packetuid string Unique ID of the connection

id.orig h address Originating endpoint’s IP addressid.orig p port Originating endpoint’s TCP/UDP portid.resp h address Responding endpoint’s IP addressid.resp p port Responding endpoint’s TCP/UDP port

proto protocol Transport layer protocol of connectionservice string Detected application protocol

duration interval Connection length

Bro also provides logs for each of the application protocols observed as part of a given connec-

tion. As a single example, a summary of the content for the HTTP log is shown in Table 4.2. The

content of the HTTP log is representative of the level of information available in the other service

logs including SSH, DNS, FTP, etc. In the graph model from Figure 3.1, the content of the service

vertex is populated by the information contained in the individual log entry.

Table 4.2: Bro http.log data fields

Field Type Description

ts time Timestamp of first HTTP requestmethod string HTTP Request verb

host string Value of the Host headeruri string URI used in the request

user agent string Value of the User-Agent headerorig fuids vector Ordered vector of file unique IDs from orig

The ‘uid’ field in the HTTP log corresponds the unique log entry in the connection log which

allows for the creation of the edge between the connection and service in the graph model. Similarly,

the ‘orig fuid’ field corresponds to the unique identifiers of any files transferred from the originator

to the responder and provides for creating the relationship between the HTTP session and any

associated files transferred during the session.

A default installation, as used in the research with minor configuration changes, produces logs

for the connections, associated services, and files transferred. By default, the logs are stored in a

tab-separated file that is easily parsed by tools included as part of the Bro installation. To support the34

sending of log entries to Kafka, a configuration change for Bro stores the log entries in Javascript

Object Notation (JSON) format.

With the logs stored in JSON format, an open-source plugin for Bro (“Metron - Logging Bro

Output to Kafka” 2017) provides the functionality required to send the entries created by Bro for

each of the logs to a Kafka cluster. The configuration of the plugin requires the address and port of

the Kafka brokers, the Kafka topic name, and the names of which Bro logs to send to Kafka.

After starting Bro to monitor the network traffic, the plugin will attempt to connect to the Kafka

cluster to begin forwarding log entries. Although there are three brokers in the cluster, the Bro

plugin only requires initial access to one of the brokers. After the initial connection with the broker,

the plugin will receive the address of the remaining brokers and available partition information for

the topic. With access to all three brokers in the cluster, the plugin can send log entries to each of the

brokers in parallel providing for increased throughput. If the initial broker is unavailable, the plugin

will continue to attempt connection while maintaining a list of unsent log entries for transmission

when a connection is established.

As with all of the components of the architecture, the Bro instance is built and run as a Docker

container. The build process consists of installing dependencies, installing Bro and the Kafka plu-

gin, and copying configuration scripts to support sending logs to the Kafka cluster. The completed

container image will run on any host with the Docker daemon installed within the network, monitor

the visible network traffic, and forward log entries to the Kafka cluster.

4.3.1.4 Suricata

To provide network intrusion detection system (IDS) capabilities within the architecture, Suri-

cata monitors the same network connection as the Bro sensor. Suricata is free and open source

engine “capable of real-time instruction detection, inline intrusion prevention, network security

monitoring and offline pcap processing” (“Suricata Open Source IDS / IPS / NSM Engine” 2017).

This research focuses on the IDS capabilities provided by Suricata. Just as with Bro, although

Figure 4.2 shows a single instance of Suricata, multiple instances of Suricata may be deployed to

monitor additional network segments.

As an IDS, Suricata uses signature-based rules to detect potentially malicious activity. From

the Suricata documentation , a signature consists of:35

• The action, that determines what happens when the signature matches;

• The header, defining the protocol, IP addresses, ports and direction of the rule;

• The rule options, defining the specifics of the rule (“Suricata User Guide” 2016)

Actions include pass, drop, reject, and alert. The pass action will stop examining the packet and

skip the remaining rules. The drop action is only applicable when Suricata is operating in IDS mode

and will silently discard the packet. A rejected packet sends a reset to both the sender and receiver

to stop the connection. Finally, the alert action will generate a Suricata alert for the offending packet

that matches the rule signature.

The header section provides control over the protocol to be examined and the nature of the

connection. For example, the signature can be limited to only evaluate transmission control protocol

(TCP) traffic that originates from outside the monitored network address space from any port that

is destined for the specific IP address of a web server in the internal network on TCP ports 80 and

443. The granular control provided by the header allows for tuning of the signatures to both improve

speed of analysis and minimizing false alerts due to excessive and incorrect signature matches.

The rule options contain the detailed information of the signature to match against packets

under investigation. Rule options utilize a combination of keywords to control the actions Suri-

cata takes when a packet matches a signature, the specific content to match in the payload, and

application-specific keywords depending on the content under investigation. A detailed discussion

of rule options and signature creation is beyond the scope of this research. Further information can

be found in the Suricata documentation (“Suricata User Guide” 2016).

Creation of custom rules provides the ability to monitor for signatures specific to an organiza-

tion. Custom rules can be used to monitor for the exfiltration of sensitive data out of the network

during the ‘mission accomplishment’ phase of the attack or to identify newly discovered exploita-

tion vectors by internal analysis. However, management of signatures for existing and newly dis-

covered public exploits is beyond the capacity of most organizations. To support management of the

near-daily discovery of new attack vectors several organizations provide free and paid subscriptions

to newly created rules to maintain the rule signatures within Suricata up to date.

An open source Pearl script, pulledpork.pl (Cummings and Shirk 2017), is provided to manage

the updating of rules for Suricata and Snort. Running the script ensures new rules are downloaded

36

from the specified third-party providers and add, deleted, or modifies the installed rules as applica-

ble. By running the script on a scheduled basis the installed Suricata rules are maintained up to date

against emerging threat signatures.

The software architecture of the Suricata service consists of three Docker containers:

• The Suricata application

• A ‘helper’ container to run the pulledpork.pl script

• A container running Fluentbit to forward alerts to Fluentd

The Suricata container image build process consists of installing dependencies, the application

itself, and copying configuration files tailored to the environment. The configuration files manage

logging, enabling of rule sets, and additional runtime options for Suricata such as the definition of

internal and external network segments.

The PulledPork helper container image contains the dependencies and pulledpork.pl script.

Running the PulledPork container updates the rule sets in the Suricata container and restarts the

Suricata container to enable the newly updated rule set. A configurable cron job in the PulledPork

container manages the automatic periodic running of the pulledpork.pl script to ensure frequent

updates of the installed rule set.

There is no direct integration or plugin for Suricata that supports the transmission of alert events

to the Kafka cluster. The use of an additional helper container running Fluentbit handles forwarding

alerts to Fluentd which in turn sends the events to the Kafka cluster. The Fluentbit container simply

monitors the alerts log file created by Suricata and forwards the events to the Fluentd instance.

Similar Fluentbit containers could be used for additional data sources that do not support direct

integration with the Kafka cluster.

The data contained with a Suricata alert includes the source and destination IP address and port

of the connection, the category of the matching rule, the name of the specific rule, the protocol of

the connection, and the Timestamp of the observed packet. The contents of the alert are used to

create the edges and vertices of the graph model as depicted in Figure 3.1 which connects the alert

data to the other sources via the source and destination IP addresses of the associated endpoints.

37

4.3.1.5 osquery

Within the architecture, monitoring of endpoint activity is accomplished with osquery. As

shown in Figure 4.2, osquery is installed on all of the monitored hosts within the network. osquery

is an open source product provided by Facebook to gain insight into the activity and configura-

tion of hosts within a network. By exposing “an operating system as a high-performance relational

database” (“osquery Docs” 2017), osquery provides a Structured Query Language (SQL) interface

to system analytics and monitoring of the endpoint. osquery supports the Windows, OS X, Linux,

and FreeBSD operating systems, providing coverage for the majority of endpoints typically present

within an enterprise.

Access to the osquery interface is provided by an interactive command-line shell (osqueryi) or

through a monitoring daemon (osqueryd) that runs scheduled queries. The interface provided by

osqueryi is useful for conducting ad-hoc queries or testing new queries for inclusion with osqueryd.

The configuration of osqueryd supports running multiple groups of queries, or packs, on a schedule

while forwarding the query results to a log aggregator. The research architecture uses the native

functionality of osquery to forward scheduled query results to the Kafka cluster.

To effectively monitor the network osquery should be deployed to every endpoint. While there

are several methods to deploy osquery throughout the enterprise, the research architecture utilizes

Ansible to manage the installation and configuration of osquery on the target hosts within the net-

work. As an automation tool, Ansible “can configure systems, deploy software, and orchestrate

more advanced IT tasks” (“Ansible Documentation” 2017). Through the use of Ansible playbooks,

osquery is installed on the endpoint, configured to run selected queries and forward results to Kafka,

and begin running as a service on the endpoint.

To support detection of potentially malicious activity at the host level, osquery is configured to

provide information relevant to intrusion detection and incident response. For baseline configuration

status of each host, queries provide the installed operating system, information about the running

kernel, the state of all installed network adapters, and general system information including installed

central processing units (CPUs) and available memory. Figure 4.3 provides an example query and

the results for the operating system version. Monitoring of interactive events such as user login,

the connection of universal serial bus (USB) devices, the execution of processes, and the status of

38

listening network ports provide situational awareness of the endpoint activity.

osquery> SELECT name , v e r s i o n , major , minor , p a t c h FROM o s v e r s i o n ;

+−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−+−−−−−−−+−−−−−−−+| name | v e r s i o n | major | minor | p a t c h |+−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−+−−−−−−−+−−−−−−−+| Ubuntu | 1 6 . 0 4 . 3 LTS ( X e n i a l Xerus ) | 16 | 4 | 0 |+−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−+−−−−−−−+−−−−−−−+

Figure 4.3: Example osquery results

Queries may be customized to meet the needs of the organization. However, Facebook pro-

vides groups of preformatted queries in packs (“Packs” 2017) to support analytic monitoring and

adversary activity detection. Adding a pack to the configuration runs all of the included queries at

the specified frequency. For this research, the Facebook provided packs are used as a starting point

to create a custom pack for inclusion on each of the endpoints to control the scope of content and

ensure that the query results support the graph data model.

In addition to regularly scheduled queries, osquery supports event-driven queries for reporting

of time-sensitive information that may be missed between consecutively scheduled queries. Events

including process starting and stopping, user login and logout, and changes to file contents can occur

between scheduled queries and would therefore not appear in the query results. The event-driven

framework of osquery follows a publisher/subscriber model to write events to a queue as they occur.

The recorded events are then available to be returned at query time.

4.3.2 Extracting, Transforming, and Loading Events

With all of the data sources now forwarding events into separate topics in the Kafka cluster,

the events can be extracted from Kafka, transformed to match the graph data model, and loaded

into ArangoDB. The ETL process for each data source occurs in individual Docker containers,

represented in Figure 4.2 for each data source, which contain a Python script written for the specific

source. Each container is built with the Python runtime, the script to handle the ETL process, and

any Python libraries required by the script.39

To support extraction of events from Kafka, the kafka-python (Oh 2016) library provides the

functionality necessary to create a Kafka Consumer. By providing the Kafka Consumer with the

address of the Kafka broker in the cluster and the name of the topic to consume, the Kafka Consumer

will continuously poll the cluster to check for new messages. To ensure that no messages are missed

due to an error with the Consumer, Kafka keeps track of which messages the Consumer retrieves.

Additionally, through the use of topic partitioning and message offset tracking, multiple Consumers

can retrieve messages from the same topic. Running multiple instances of the Consumer against a

single topic increases the processing throughput of the ETL process.

Transformation of the consumed event data into components of the graph data model requires

processing specific to the event and the role of the data in the graph model. As each of the data

sources provides multiple types of events, the Python script for a particular topic applies trans-

formation logic specific to the event data contents. The flexibility provided by the transformation

logic allows for the addition of new event types or modification of existing transformations as the

graph data model grows or changes. New events may be added from existing data sources with

corresponding additions to the transformation logic. However, new data sources would require the

addition of another Docker container with a Python script for the resulting new Kafka topic.

As a result of the transformation process, the Python script creates objects for the edges and ver-

tices of the graph data model for insertion into ArangoDB. The python-arango library (Powers and

Arthur 2016) provides the necessary APIs for the ETL script to interact with ArangoDB. Through

the python-arango API, the ETL script manages the creation and updating of vertices and edges in

the associated collections within the database based on the event data and the graph data model.

4.3.3 Storing Events

ArangoDB provides for the storage of the transformed event data in a highly available, cluster

architecture. Although represented as a single node in Figure 4.2, the ArangoDB cluster is com-

prised of multiple containers to provide redundant storage and increased throughput of event data.

As a multi-model database (“ArangoDB - highly available multi-model NoSQL database” 2017),

ArangoDB supports key/value, document, and graph stores. Each of the stores provides advantages

for storage and querying of data records. This research focuses on the document and graph stores

40

for aggregation of records, modeling, and querying.

The document store maintains records in collections, similar to tables in a traditional relational

database. Records are stored as JSON documents that support deeply nested data fields comprised

of mixed data types. Storing records in JSON format provides flexibility in the data structure by

allowing for addition and modification of data fields without requiring schema migrations typically

associated with relational databases.

To store the vertices created during the ETL process, collections are created for each of the

node data types in the graph data model. For example, there are separate collections for external IP

addresses, internal IP addresses, operating systems, user accounts, etc. An example document from

the operating system collection is shown in Figure 4.4. Each document has a unique value for the

‘ key’ field and proper selection of the value in the schema allows for the creation of a single node

where multiple endpoints posses the same value. In the example below, the ‘ key’ field is selected

to provide a single node for the operating system version when multiple endpoints are running the

same version. The ‘ id’ field is simply a combination of the collection name and the value of the

‘ key’ field and the ‘ rev’ field is internally generated by ArangoDB for version tracking of the

document. The remaining fields are created from the results of the ETL process for the event from

osquery.

{” key ” : ” 1 6 . 0 4 . 3 LTS ( X e n i a l X e r u s ) 1 6 4 0 ” ,” i d ” : ” os / 1 6 . 0 4 . 3 LTS ( X e n i a l X e r u s ) 1 6 4 0 ” ,” r e v ” : ” WKHo8M2−−E” ,” f i r s t s e e n ” : ”Thu Jan 4 1 8 : 2 3 : 4 0 2018 UTC” ,” l a s t s e e n ” : ”Thu Jan 4 1 8 : 2 3 : 4 0 2018 UTC” ,” b u i l d ” : ” ” ,” codename ” : ” x e n i a l ” ,” major ” : ”16” ,” minor ” : ” 4 ” ,” p a t c h ” : ” 0 ” ,” p l a t f o r m l i k e ” : ” d e b i a n ” ,” v e r s i o n ” : ” 1 6 . 0 4 . 3 LTS ( X e n i a l Xerus ) ”

}

Figure 4.4: Example ArangoDB vertex document

41

Collections in the document store are also used for the edges, or relationships, of the data model.

The documents in a vertex collection consist of a field for the ‘ id’ of the source and destination

nodes of the relationship in addition to fields for properties of the relationship. Continuing with

the previous example, Figure 4.5 displays the edge document that connects a host node with the

associated operating system node. The ‘ key’, ‘ id’, and ‘ rev’ fields serve the same role as in the

vertex document. The ‘ from’ and ‘ to’ fields hold the ‘ id’ of the associated edge documents.

{” key ” : ”3618158” ,” i d ” : ” r u n n i n g o s / 3 6 1 8 1 5 8 ” ,” f rom ” : ” h o s t s /63864 D56−E55E−5B91−B4A8−2EE4E7C16F61 ” ,” t o ” : ” os / 1 6 . 0 4 . 3 LTS ( X e n i a l X e r u s ) 1 6 4 0 ” ,” r e v ” : ” WKHo8OG−−−”,” f i r s t s e e n ” : ”Thu Jan 4 1 8 : 2 3 : 4 0 2018 UTC”

}

Figure 4.5: Example ArangoDB edge document

The graph store is essentially a combination of the vertex collections and their associated edge

collections. With the edge and vertex collections populated with events from the data sources, the

graph store contains the transformed events in accordance with the data model of Figure 3.1. The

graph store provides the advantage of allowing for graph queries based on the relationships of the

edge nodes. Queries for the shortest path between nodes, traversing the graph, and pattern matching

become available when storing data within a graph.

4.3.4 Querying the Database

The Arango Query Language (AQL) included with ArangoDB provides a unified query in-

terface to the key/value, document, and graph data stores. The unified query interface allows for

queries that pull from any combination of data stores to retrieve the desired results. The syntax of

AQL, similar to the SQL syntax of relational databases, provides additional capability and flexibility

to retrieve data from the database, and ultimately gain insight into the activity within the network.

With the graph data model represented by documents within the edge and vertex collections

42

of ArangoDB, graph and document-based queries can be constructed to locate potential malicious

activity based on the data from the sensors. Queries can range from simple matching queries run

against a single document collection to complex traversal queries run against the entire graph store.

Queries run against a single collection can be used to locate a particular indicator in the data

such as a file hash or website address or to aggregate statistics such as bytes transferred by hosts

within the network. While single collection queries can provide useful information about the activity

in the network, the power and basis of this research come from the ability to query the entire graph

and connect data from multiple sources. By collecting and transforming events from all of the data

sources into a graph model, AQL queries can be constructed to determine which user was logged

into a particular host when a specific website was visited and which files were transferred during

the connection.

Using the attack lifecycle as a guide, AQL queries can be constructed to identify activity typi-

cally associated with attacks against a network. A collection of targeted queries, run on a scheduled

basis, could assist network defenders in rapidly identifying potential malicious activity within in

the network. After the possible activity has been detected by the automated queries, the interactive

AQL function provides incident responders the ability to conduct further investigation into the na-

ture of the activity. Expanded AQL queries can localize the source of the incident or determine the

spread of activity throughout the network.

43

Chapter 5 – Simulation

Evaluation of the graph data model and the software architecture developed in this praxis oc-

curred through execution of attacks against a simulation network. Section 5.1 discusses the im-

plementation and architecture of the simulation environment to include the vulnerable targets for

the attacks. Attacks against the simulation environment consisted of a series of scripted attacks by

the researcher and blind attacks conducted by an independent third-party. Section 5.2 discusses the

scripted and blind attacks. The analysis of the attacks as supported by the graph correlation system

developed during this research are covered in Section 5.3.

5.1 Simulation Environment

Evaluation of the graph-based event correlation methodology required the collection of network

and endpoint data from a representative network environment subject to attacks against endpoints

within the environment. For this research, the simulation environment was based on a home network

consisting of various network-connected devices and multiple users. The device types included

physical desktop computers and laptops, phones and tablets, virtual machines, smart home devices,

and internet-enabled media devices. All of the endpoints in the simulation environment belong to

a single network segment that is protected by a firewall from connections originating outside the

network.

To provide an attack surface for the simulation events, multiple instances of a pre-configured

virtual machine were hosted within the environment. The virtual machines (“Basic Penetesting: 1”

2017) include software and configuration vulnerabilities representative of issues found in typical

network environments based on poor software patching and configuration management policies.

The use of vulnerable virtual machines within the environment increases the likelihood of successful

attacks during the simulation events and provides for limiting the scope of the attacks to prevent

disclosure of sensitive personal information contained on the other endpoints in the environment.

While reducing the scope of the attacks to the vulnerable virtual machines limits the realism of the

scenario, the detection capabilities demonstrated in the research are not impacted by the reduced

the scope of the simulation environment.

Access to the simulation environment for the scripted and blind attacks was provided through

44

a virtual private network (VPN) connection provided by the edge router. The VPN creates an en-

crypted connection between the attacker’s computer and a virtual network that is separate from

the simulation network. The separate virtual network provided a means to simulate attack traffic

originating from the Internet while not exposing the simulation network directly to the Internet.

Additionally, to further simulate a realistic network environment the firewall limits access from

endpoints in the VPN to the web server running on one of the virtual machines within the sim-

ulation network. This network configuration ensures attackers had limited external visibility into

the simulation network and required exploitation of the externally facing web server before gaining

access to the internal simulation network and the remaining vulnerable virtual machines.

An additional network segment is provided by the router to contain the processing and storage

components of the graph-based correlation architecture. The correlation network contains the Bro

and Suricata sensors, the Fluentd and Kafka messaging nodes, the event processing containers, and

the ArangoDB cluster. The firewall limits access to the correlation architecture to the simulation

network and the ports required for transmission of events and logs into the correlation network.

Figure 5.1 provides an overview of the network structure of the simulation environment.

Figure 5.1: Simulation Environment Overview

45

5.1.1 Background Activity

In a typical network environment, the ability to successfully detect malicious activity is compli-

cated by the volume of normal, benign activity within in the network. For the simulation environ-

ment, the background activity was provided by out of scope targets and users within the simulation

network. By monitoring and collecting all of the network activity within the simulation environ-

ment, a more realistic scenario is created for the detection of the malicious activity associated with

the attack events.

Failure to collect sufficient background activity from the benign endpoints in the network would

result in only the collection of attacker based activity, reducing the validity of the methods described

in the research in a representative network environment. To ensure sufficient volume of background

activity, the simulation network traffic was monitored and collected over several days before the

commencement of attack events. Table 5.1 provides a summary count of metrics related to the

observed network activity during the simulation. The volume of observed traffic before the attack

events ensured that the activity contributed by the attack events would contribute a small percentage

of the overall observed activity within the simulation network, and therefore be more representative

of a typical network environment.

Table 5.1: Summary of Network Activity During Simulation

Metric Count

Unique connections 1,506,870Unique domains requested 20,823Unique external IP addresses 25,760Files transferred 1,324,770

The background activity consists of network traffic associated with typical multi-user environ-

ments. The activities include web browsing, media streaming, smart home device communication,

machine to machine communication over secure shell (SSH), and interaction with other Internet-

based services such as GitHub. Additionally, to include the vulnerable VMs in the background

activity, connections over HTTP, FTP, and SSH occurred periodically to simulate normal network

activity involving the vulnerable VMs.

46

5.1.2 Vulnerable Targets

As this research focused on the detection vice protection aspect of network security, virtual

machines with known, exploitable vulnerabilities served as the targets of attack for the simulation

events. The use of VMs with known vulnerabilities increased the likelihood of success during

the scripted and blind attack scenarios in the simulation events. During the scripted events, the

attacks targeted the specific vulnerable applications and configurations to support the validation of

the graph-based correlation to detect malicious activity. For the blind attack events, the attacker

attempted to exploit the VMs with no prior knowledge of the installed software or configuration.

As a baseline configuration, the ‘Basic Pentesting’ virtual machine from VulnHub (“Basic

Penetesting: 1” 2017) served as the starting point for the target VMs in the environment. The

baseline VM includes:

• A vulnerable FTP server

• An installation of WordPress with default administrative credentials

• Accounts with passwords vulnerable to cracking

• An SSH with default configuration

To provide target diversity and allow for lateral movement by the attacker, multiple instances

of the baseline vulnerable VM were configured to provide only one of the vulnerable services. The

first instance of the VM served as the web server exposed to the attacker network with the remaining

vulnerable services disabled. Similarly, additional individual instances of the baseline VM provided

FTP and SSH services with additional user accounts added for diversity within the environment.

Finally, each of the vulnerable virtual machines included an installation of osquery configured

to support the data collection requirements of the graph data model. The configuration contained

the queries required to run periodically on the VM and forward the results on the queries to the

Kafka cluster in the monitoring network for processing and insertion into ArangoDB.

5.2 Attacks

To evaluate the effectiveness of the graph-based correlation system to quickly identify poten-

tially malicious activity a series of attacks conducted against the vulnerable machines simulated47

attacker activity within the simulation environment. Attacks were carried out in two separate sce-

narios designed to meet different goals for the analysis of the attack events. During the scripted

attacks, the attacker possessed full knowledge of the simulation environment to include network

architecture and existing vulnerabilities. For the blind attacks, the third-party attacker was only

provided the IP address of the externally facing web server and did not know the internal network

structure or the configuration of the target VMs.

The first series of attacks focused on validating the ability of the graph-based approach to detect

malicious activity based on prior knowledge of the activities conducted by the attacker. The scripted

attacks targeted the known vulnerabilities in the VMs in a pre-defined series of events. Analysis of

the processed logs, alerts, and events generated by the scripted attacks demonstrated the visual anal-

ysis and database query capabilities of the graph-model to support detection of potentially malicious

activity.

The second series of attack events conducted by an independent third-party served to validate

the ability of the research to detect attack activity faster than the reported current industry averages.

The activities conducted during the blind attacks occurred with no prior knowledge to support an

independent assessment of the graph-based model to rapidly detect malicious activity. To compare

results of the analysis with the actual activities conducted during the blind attacks, the third-party

attacker provided the steps taken during the event after the completion of the analysis.

5.2.1 Scripted

With prior knowledge of the network configuration and the vulnerabilities present in the target

VMs, the scripted attacks emulated adversary activity through:

• Reconnaissance of the external facing web server

• Exploitation of the web server using the default web admin credentials

• Uploading remote access trojan to the web server

• Obtain and crack password hashes

• Login as system administrator with cracked password

The above activities provided opportunities for data collection via Bro and Suricata at the net-

48

work level and osquery at the endpoints. Additionally, the activities from the scripted scenario

provided for detection of malicious activity from signature-based events such as alerts from Suri-

cata and behavior based events such as monitoring servers for the initiation of outbound network

connections.

The scripted scenario only targeted the vulnerable web server to prevent potential overlap with

the activities of the blind attacks. Due to the relatively limited attack surface presented by the

scope of the target VMs the, limiting the activities of the scripted attacks maximized the potential

unknown actions carried out by the third-party attacker during the blind attacks. Providing a size-

able unknown threat space created a more representative environment for detection of potentially

malicious behavior in the data set.

5.2.2 Blind

Validation of the graph-correlation system’s time to detect potentially malicious activity oc-

curred through opening access to the simulation environment to a third-party attacker. The third-

party attacker gained access to the simulation environment through a VPN to provide a secure

connection to the environment while allowing the attack traffic to appear as originating from out-

side the simulation environment. The VPN credentials provided to the third-party attacker only

granted access to the VPN. Access to the simulation environment from the VPN was limited to the

externally facing web server to simulate an exposed web server that resides on the internal network.

The blind attacks conducted by the third-party occurred with no prior knowledge of the timing

or techniques employed by the attacker to simulate the complexities of activity detection in a real-

world network environment. Due to the simulation environment containing personal endpoints in

addition to the vulnerable VMs, a scoping document provided to the third-party attacker defined

which endpoints in the simulation environment were open to attacks. Although the exploitation

activities were limited to the attack targets, the entire simulation environment was open to network

scanning.

49

5.3 Detection

Rapid detection and localization of attacker activity are paramount when protective defense

measures fail. The longer an attacker remains undetected, they are more likely to persist within

in the network and carry out their objectives. Initial detection can be based on intrusion detection

sensors such as the Suricata NIDS deployed in support of this research. However, many attackers

possess the capability to bypass traditional signature-based detection mechanisms. Behavior-based

detection methods, however, can be used to detect the nature of an attackers movement through the

network.

To effectively employ behavior-based detection methods, network defenders must understand

the structure and role of endpoints in the network ‘normal’ network behavior. For example, under-

standing normal levels of network activity for each endpoint or knowing that the server endpoints

should rarely if ever, initiate outbound network connections.

Each stage of the simulation, baseline collection, scripted attacks, and blind attacks, served

to demonstrate various aspects of the ability of graph-based correlation to help defenders detect

attacker activity faster than current industry averages of weeks to months (Brumfield 2017; SANS

Analyst Program 2017; Ponemon Institute 2017).

5.3.1 Baseline Activity

As previously discussed, the simulation environment recorded and processed network and end-

point data for several days before the attack events. The baseline recording period served to:

• Verify proper operation of the system over an extended time period

• Provide background ‘white noise’ for simulation of real-world network activity

• Generate data for evaluation of ‘normal’ network activity

Table 5.1 provides high-level metrics for the volume of network traffic and key data points re-

lated to aspects of the traffic. While these metrics provide a generalized assessment of traffic volume

across the network, the metrics lack the granularity necessary for understanding the behavior within

the network.

50

A more detailed understanding of the network traffic is obtained by leveraging the graph-model

and the Arango Query Language (AQL). Figure 5.2 displays the AQL query for counting the out-

bound connections to external IP addresses and the number of unique domains associated with the

external IP address.

Figure 5.2: AQL Query of External Domains and IP Addresses

The query in Figure 5.2 utilizes the edge and vertex collections associated with external IP

addresses, domains, and connections (Line 1) to loop over each external IP address and count the

number of inbound and outbound connections (Lines 4-6). The remainder of the query (Lines 7-

13), loop through the domains collection to count the number of unique domains associated with

the external IP address and format the results of the query. Table 5.2 provides the top ten results of

the query sorted by the number of outbound connections from the network.

Table 5.2: Summary of Outbound Activity Prior to Attack Events

Unique Domains IP Address Outbound Connections75 172.217.12.238 19340

7 104.154.127.47 164061 185.132.79.54 8481

16 172.217.12.227 83764 93.184.216.34 72711 74.6.105.9 5093

1445 192.33.31.192 49205 172.217.12.228 4772

55 172.217.9.206 47373 162.125.6.3 2938

From the single query, an understanding of the relationship between domains, external IP ad-

dresses, and connection activity becomes apparent. The results of the query provide insight into51

which external IP addresses receive the most traffic from within the network as well as the number

of unique domains associated with the IP address. The larger values for Domain Counts indicate

that the IP address may be associated with a content delivery network that provides files for multiple

websites.

Similarly, obtaining an understanding of inbound traffic to the network is obtained with minor

modifications to the previous query. As shown in Figure 5.3, a change to Line 9 sorts the list by

inbound connections and a change to Line 10 returns the actual domain name vice counting the

number of unique domains.

Figure 5.3: AQL Query of Inbound Network Connections and Domains

The results of the query, provided in Table 5.3, demonstrate that the majority of the inbound

network connections are from the domain ‘mtalk.google.com’. The connections to the domain are

associated with the Google Hangouts application, and as such are expected and normal network

traffic based on the user base. The results also demonstrate the use of multiple IP addresses to serve

a single unique domain.

5.3.2 False Positives

While monitoring the simulation environment for baseline activity, several alerts reported by

Suricata indicated potential malicious activity. Analysis of the initial Suricata alerts from the real-

world traffic served as the initial opportunity to validate the ability of the graph-model to detect

and localize malicious activity. As the alerts were based on network traffic to and from personal

endpoints within the simulation environment, there were no osquery results for those endpoints in

the database. Therefore, the following analysis includes only edges and vertices from the processing

52

Table 5.3: Summary of Inbound Activity Prior to Attack Events

Unique Domains IP Outbound Connections Inbound Connectionsmtalk.google.com 173.194.68.188 63 117mtalk.google.com 209.85.201.188 64 80ntp-g7g.amazon.com 207.171.178.6 923 78mtalk.google.com 173.194.205.188 44 69mtalk.google.com 173.194.175.188 35 51mtalk.google.com 209.85.232.188 47 39mtalk.google.com 173.194.208.188 9 23mtalk.google.com 173.194.204.188 46 22mtalk.google.com 74.125.192.188 47 18mtalk.google.com 173.194.66.188 48 17

of Bro logs and Suricata events.

Correlation of the attack activity requires a starting point. Initial detection of attack activity may

come from a signature based sensor such as Suricata or through identification of activity deemed

as an outlier from normal baseline activity. Figure 5.4, as captured from the ArangoDB query

interface, provides the details of an alert observed during the baseline data capture.

Figure 5.4: Suricata Alert from ArangoDB

The alert indicates attempted exploitation of Internet Explorer via remote code execution of

the internal endpoint with an IP address of 10.10.10.107 from the external IP address 45.79.86.91.

53

While the alert provides an indicator of potentially malicious activity it does not provide an analyst

with confirmation of successful exploitation or the scope of the attacker’s activity.

Using the alert as a starting point, the AQL query displayed in Figure 5.5 searches the graph

database for connections to the alert based on the graph data model. To begin the search, the starting

node is limited to the specific alert under investigation (Lines 1-2). From the starting node, the query

then searches for the external IP address in the database that matches the source of the alert (Lines

4-5). To limit the number of returned connection to those that occurred during the time frame of the

of alert the results are filtered to connections occurring within one minute of the identified alert at a

search depth of two connections from the external IP address (Lines 6-10).

Figure 5.5: AQL Query for Suricata Alert Analysis

The resulting graph visualization based on the query is shown in Figure 5.6. The visualization

results, returned from the database in 25ms, provide an understanding of the activity associated with

the potential attacker was originating from the external IP address of the alert. Figure 5.6 provides

a representation of the components of the graph model from 3.1 based on the data returned from the

query in 5.5.

54

Figure 5.6: Visualization of Query Results

The analyst can quickly interpret several items of interest from the resulting graph visualization

of the AQL query during the two-minute time window of the alert:

• The external IP addresses ‘45.79.86.91’ was the source of three total alerts as represented by

the nodes labeled ‘3173714’, ‘3095134’, and ‘3095131’

• The external IP address communicated with two internal IP addresses as represented by the

‘conn’ labeled nodes with connections between the internal and external IP addresses

• There were six connections containing HTTP traffic between the external IP address and the

target of the alert as represented by the ‘http’ nodes connected to the ‘conn’ nodes

• The domain names associated with the external IP address as represented by the domain

names connected to the external IP address node

While the visualization provides context and understanding of the activity surrounding the alert55

it lacks the information to support a detailed analysis for incident validation and response. By

modifying the return statement in Line 15 of the query in Figure 5.5, the query returns detailed,

tabular information about the connections and HTTP sessions represented in the graph visualization.

Relevant data fields of query results for the connections and HTTP sessions during the potential

attack are shown in Table 5.4 and Table 5.5 respectively.

Table 5.4: Connection Data Associated with Alert Under Investigation

ts orig bytes resp bytes dst ip src ip src port dst port2018-01-18T20:34:19Z 4131 161033 45.79.86.91 10.10.10.107 62177 802018-01-18T22:12:35Z 859 1348 45.79.86.91 10.10.10.103 41864 802018-01-18T20:34:19Z 5846 326179 45.79.86.91 10.10.10.107 62179 802018-01-18T20:34:19Z 5370 267315 45.79.86.91 10.10.10.107 62174 802018-01-18T20:34:19Z 4213 244863 45.79.86.91 10.10.10.107 62175 802018-01-18T20:34:19Z 3696 283894 45.79.86.91 10.10.10.107 62178 802018-01-18T20:34:19Z 5771 311461 45.79.86.91 10.10.10.107 62173 80

Table 5.5: HTTP Session Data Associated With Alert Under Investigation

ts HTTP method uri dst ip src ip2018-01-18T20:34:19Z GET /preview/main.css 45.79.86.91 10.10.10.1072018-01-18T22:12:35Z GET /favicon.ico 45.79.86.91 10.10.10.1032018-01-18T20:34:19Z GET /css/responsive.css 45.79.86.91 10.10.10.1072018-01-18T20:34:19Z GET /css/jquery-ui.min.css 45.79.86.91 10.10.10.1072018-01-18T20:34:20Z GET /img/home/holiday2-thumb.jpg 45.79.86.91 10.10.10.1072018-01-18T20:34:20Z GET /js/bootstrap.js 45.79.86.91 10.10.10.1072018-01-18T20:34:19Z GET /css/main.css 45.79.86.91 10.10.10.107

The query results in Table 5.4 provide a summary of the HTTP connections between the external

IP address and the internal IP address to include the time of the connection, the amount of data sent

from the internal IP to the external IP, and the amount of data returned from the external IP to

the internal IP. Based on the amount of data in the responses being much larger than the data sent

from the originating internal IP address, the connections follow the expected pattern of normal

HTTP traffic between an internal client and an external web server. Similarly, Table 5.5 provides

the file names of files requested by the internal client from the external web server and provide no

immediate indicators of malicious activity.

At this point, the analyst has sufficient information to support responding to the potential ma-56

licious activity to include IP addresses of the hosts involved, the nature of the HTTP connections

between the hosts, and the domain names associated with the potential attacker. Through manual

construction and running of three queries from the ArangoDB provided interface, each of which

took approximately 25ms to return results, the analyst can conduct a deep and focused analysis to

determine the nature and severity of the potential attack.

Although outside the scope of this research, the information from the query results provided the

necessary data points to quickly locate the specific network packets associated with the alert in the

full packet capture logs collected by Bro. With the timestamps, IP addresses, ports, and universal

resource indicators (uri) from the query results, the contents of the network packets were inspected

and determined to be benign.

While the alert under investigation was a false positive, the investigation provided an oppor-

tunity to validate the correlation of events from multiple sources into a graph model. From the

identification of the alert to having an understanding of the nature of the traffic and collecting data

for the potential incident response required less than five minutes with the analyst utilizing manually

constructed AQL queries. From a practical perspective, although the previously discussed manual

process would provide an improvement over current workflow practices, automation of selected

queries would provide for a system that scales to meet the demand of typical enterprise networks.

Additionally, the data points collected as a result of the queries allowed for a target investigation of

the full packet capture to determine the exact contents of all communications between the external

IP address and the internal host in less than ten minutes.

5.3.3 Scripted Attacks

Validation of the collection, transformation, and storage of events from multiple data sources

into the graph-model occurred via execution of a scripted attack. The attacker targeted the vul-

nerable web server from the VPN network with the goals of successful exploitation and privilege

escalation. Due to the limited attack surface provided in the simulation environment, the scripted

attacks were limited to one of the three vulnerable targets to maximize unique opportunities for

adversarial activity detection during the blind attack events.

Although limited in scope, the activities conducted during the scripted attack followed the pat-

57

tern of reconnaissance, exploitation, and privilege escalation associated with real-world network

intrusions. The specific activities conducted during the scripted attack included:

• Service discovery with nmap

• WordPress vulnerability scanning with nikto

• WordPress password bruteforce attack

• Uploading a malicious WordPress plugin

• Extraction and cracking of passwords

• Privilege escalation with cracked password

Detection of the initial reconnaissance activity occurred through monitoring the alerts generated

by the Suricata NIDS over time. Figure 5.7 illustrates the high volume of alerts generated by the

scanning activity of the attacker against the web server in the simulation environment. The results in

Figure 5.7 show the count of alerts by the source IP of the alert over a 24 hour period. Approximately

12,000 alerts were observed from the source IP of the attacker within less than an hour.

58

Figure 5.7: Scripted Attacks - NIDS Alerts Over Time

While monitoring alerts over time do not leverage the capabilities of the graph-model, the re-

sults provide a starting point for further analysis. The visualization provided by Figure 5.7 localizes

the attacker IP address and the time frame of the service discovery and vulnerability scanning. With

the IP address and time frame information available, the AQL shown in Figure 5.8 provides context

to the nature of the observed alerts.

59

Figure 5.8: Scripted Attacks - AQL Alerts Summary

Based on the results from Figure 5.7 for alerts over time, the ‘LET’ statements in Figure 5.8

set the variables for the attacker and target IP address and the start and stop time for the window of

potentially malicious activity. The use of ‘LET’ statements demonstrates the ability to parameterize

AQL queries for future automation of queries. The remainder of the query handles looping through

all of the alerts in the database while filtering the results based on the attacker and target IP’s and

alerts that occurred during the time frame of interest and counting the number of occurrences of

each alert name. While this query sorts by the count of observed alerts and only returns the first ten

results, the full query response could be returned to get a more detailed view of the alert activity

associated with the attacker.

Table 5.6 displays the results of the query and provides the analyst with an overview of the

alerts to determine that the majority of the alerts are associated with scanning activity based on the

high number of remote file inclusion and exploitation attempts reported by Suricata’s ‘ET WEB

SERVER’ rule set and the appearance of the NMAP, an open source network scanner, user-agent

string.

With the knowledge that the attacker is possibly targeting the WordPress installation on the web

server, the analyst can leverage the capability of the graph-model to gain further understanding of

the details of the attacker activity. The AQL query displayed in Figure 5.9 builds on the information

from previous results to traverse the graph and provide the analyst with the specific HTTP methods,

60

Table 5.6: Scripted Attack - Alerts Summary Query Results

Alert Name CountET WEB SERVER PHP Possible http Remote File Inclusion Attempt 4684ET WEB SERVER PHP Generic Remote File Include Attempt (HTTP) 4684ET WEB SERVER Script tag in URI Possible Cross Site Scripting Attempt 530ET WEB SPECIFIC APPS Mambo Exploit 204ET WEB SERVER Poison Null Byte 160ET WEB SERVER Exploit Suspected PHP Injection Attack (cmd=) 132ET WEB SERVER Possible CVE-2014-6271 Attempt 122ET WEB SERVER Possible CVE-2014-6271 Attempt in Headers 122ET WEB SPECIFIC APPS Generic phpbb arbitrary command attempt 118ET SCAN Nmap Scripting Engine User-Agent Detected (Nmap Scripting Engine) 48

URIs, and the timestamps of the events.

Figure 5.9: Scripted Attacks - HTTP Service Query

The ‘WITH’ statement in Figure 5.9 ensures that the ‘has service’ edge collection and the

‘services’ vertex collection from the graph data model are included as part of the graph traversal

query. Similar to the query in Figure 5.8, the ‘LET’ statements set the variables for attacker IP,

target IP, and time frame of activity. The first ‘FOR’ loop filters all of the connections in the

database based on the attacker and target IPs observed during the time frame of interest. The next

‘FOR’ loop, nested inside the first loop, uses each of the filtered connection nodes to traverse the

graph and return the ‘service’ vertex nodes associated with each of the ‘connection’ vertex nodes

61

based on the ‘has service’ relationship as discussed in the graph model of Figure 3.2.

Analysis of the query results displayed in Table 5.7 provides the analyst with a timeline of the

attacker’s remote interactions with the web server. From the timeline, the analyst can determine

that the attacker accessed the administration console of the WordPress installation, uploaded and

installed a plugin to the server, and then accessed the plugin. The first three entries in Table 5.7

indicate successful login to the WordPress installation by an administrative user. The remaining

entries in Table 5.7 indicate the uploading of a WordPress plugin named ‘BPXqPCLBWi.php’ into

the ‘bpzNrgxDf’ folder in the WordPress instance. The use of random, eight-character filenames is

indicative of the attacker utilizing the Metasploit attack framework (Kennedy et al. 2011) to conduct

the attacks.

Table 5.7: Scripted Attack - HTTP Service Results

Time Method URI2018-01-29T16:02:38Z POST /secret/wp-login.php2018-01-29T16:03:43Z POST /secret/wp-admin/admin-ajax.php2018-01-29T16:04:33Z GET /secret2018-01-29T16:04:34Z POST /secret/wp-admin/update.php?action=upload-plugin2018-01-29T16:04:34Z POST /secret/wp-login.php2018-01-29T16:04:34Z GET /secret/wp-admin/plugin-install.php?tab=upload2018-01-29T16:04:36Z GET /secret/wp-content/plugins/bpzNxrgxDf/BPXqPCLBWi.php

The analyst now has sufficient evidence of successful exploitation of the web server via a po-

tentially malicious WordPress plugin. To obtain confirmation of the exploitation the connection

activity of the web server is examined for outbound network connections to the attacker. As web

servers should not initiate connections to Internet clients, any observed outbound connections are

indicative of a compromised server. The AQL query and results for outbound connections from

the web server during the time frame of the attack are displayed in Figure 5.10 and Figure 5.11,

confirming the web server made an outbound connection to the attacker at the same time the ma-

licious plugin was accessed. The structure of the query in 5.10 is similar to that of Figure 5.8 and

Figure 5.9, the main difference being switching the source and destination variables to the ‘target’

and ‘attacker’ respectively to identify connection originating from the web server.

62

Figure 5.10: Scripted Attacks - Web Server Outbound Query

Figure 5.11: Scripted Attacks - Web Server Outbound Results

The query results from Figure 5.11 indicate a single outbound connection from the web server

to the attacker on port 4444, also indicative of the Metasploit framework (Kennedy et al. 2011), as

port 4444 is the default port that Metasploit utilizes to listen for inbound connections from targets.

With the time of the outbound callback from the previous query, the analyst can now traverse

63

the graph to determine the actions taken by the attacker on the targeted server. The query as shown

in Figure 5.12 uses the same structure as the previous query in Figure 5.9 to traverse the graph.

The ‘WITH’ statement includes the required edge and vertex collections necessary to traverse from

the IP address node to the ‘process’ nodes associated with the IP address as shown in Figure 3.1.

The remainder of the query is again a pair of nested ‘FOR’ loops that return the timestamp and

commands run on the internal host.

Figure 5.12: Scripted Attacks - Command Query

From the results of the query, shown in Table 5.8, the analyst can see the commands run on

the compromised web server after exploitation. The commands show that the attacker spawned an

interactive shell, read the contents of the /etc/passwd and /etc/shadow files, then logged into the

privileged marlinspike user account. Based on the time difference between reading the password

files and logging into the marlinspike account, the attacker appears to have successfully cracked the

password of the marlinspike account.

Table 5.8: Scripted Attack - Command Line Query Results

Time Command2018-01-29T16:05:56Z sh -c /bin/sh2018-01-29T16:05:56Z /bin/sh2018-01-29T16:05:56Z python -c import pty:pty.spawn(”/bin/sh”)2018-01-29T16:06:26Z /bin/sh2018-01-29T16:06:26Z id2018-01-29T16:06:26Z ls -al /etc/passwd2018-01-29T16:06:26Z python -c import pty;pty.spawn(”/bin/sh”)2018-01-29T16:07:26Z ls -al /etc/shadow2018-01-29T16:07:56Z cat /etc/shadow2018-01-29T16:07:56Z cat /etc/passwd2018-01-29T16:11:57Z su - marlinspike

64

From reconnaissance to privilege escalation the entire attack took approximately 10 minutes.

The analysis of the attack scenario, conducted manually, required under an hour to complete. The

analysis identified the attacker, the target, the method of exploitation, and actions taken by the

attacker on the compromised web server. The graph correlation system provided the analyst with

the ability to query the database based on the data elements and their relationships vice having to

query multiple sources for individual pieces of information and then determine how the separate

results are related. From the analysis, a defender has the information required to take corrective

action to remove the attacker’s access and prevent further attacks via the same vector.

With the information collected in less than one hour of analysis, the defender knows the actions

needed to respond and recover from the attack:

• Change the admin password for the WordPress installation based on the observed WordPress

administrator account login and upload of a malicious plugin

• Change the password for the marlinspike account based on the attacker switching to the mar-

linspike user account

• Change the passwords for any remaining accounts on the web server based on the attack

extracting password hashes and successfully logging in as the marlinspike user

• Institute firewall rules that prevent initiation of outbound connections from the web server

based on observing the web server initiating an outbound connection to the attacker

• Remove the malicious plugin from the WordPress installation based on the observed installa-

tion of a plugin resulting in compromise of the web server

From a practical standpoint, although outside the scope of this research, effective tuning of the

NIDS to minimize false alerts would further improve the speed of analysts attempting to identify

potentially malicious activity. While tuning of the Suricata instance in the simulation environment

reduced false positives that required investigation by the analyst, the volume of monitored network

traffic in the simulation environment was a small fraction of the traffic typically observed in a large-

scale enterprise network.

While the scripted attack analysis validated the collection, transformation, storage, and analysis

of events in a graph model, the analysis was predicated on knowledge of the attacker’s actions. To

validate the practical application of the research’s ability to improve the workflow of the analyst65

and reduce the time to detection a series of blind attacks conducted by an independent third-party

are analyzed in the next section.

5.3.4 Blind Attacks

The independent third-party attacks occurred with no knowledge of the time frame or actions

taken by the attacker. The attacker received access to the simulation environment through the VPN

with the goal of compromising all three of the vulnerable targets in the simulation environment.

Initial detection of attacker activity occurred through periodic analysis of observed alerts from

the NIDS over time. By grouping the source IP address of the alerts over a time interval, the analyst

can quickly identify high volumes of alerts and the offending IP addresses as shown in Figure 5.13.

The results indicate an unusually high volume of alerts from the external IP address 10.0.8.2 and

alerts from 10.10.10.10, the IP address of the externally facing web server.

Figure 5.13: Blind Attacks - Alerts Over Time

By adjusting the start and stop times in the AQL query from Figure 5.8, the analyst obtains the

alert summary results displayed in Table 5.9. The results, similar to those from the scripted attack

scenario, are indicative of web application vulnerability scanning. The similarity is expected due

66

to the web application being the only service exposed to external IP addresses and the pattern of

attackers following the attack lifecycle and conducting reconnaissance before attempting exploita-

tion.

Table 5.9: Blind Attack - Alerts Summary Query Results

Alert Name CountET WEB SERVER PHP Possible http Remote File Inclusion Attempt 2342ET WEB SERVER PHP Generic Remote File Include Attempt (HTTP) 2342ET WEB SERVER Script tag in URI Possible Cross Site Scripting Attempt 264ET WEB SPECIFIC APPS Mambo Exploit 102ET WEB SERVER Poison Null Byte 80ET WEB SERVER Exploit Suspected PHP Injection Attack (cmd=) 66ET WEB SERVER Possible CVE-2014-6271 Attempt in Headers 61ET WEB SERVER Possible CVE-2014-6271 Attempt 61ET WEB SPECIFIC APPS Generic phpbb arbitrary command attempt 59ET WEB SERVER /system32/ in Uri - Possible Protected Directory Access Attempt 23

Due to the limited attack surface presented by the externally facing web server, the analysis of

HTTP traffic and connections between the attack and the web server during the blind attacks follow

the pattern observed during the scripted attacks. The results are similar to those displayed in Figure

5.9, Table 5.7, Figure 5.10, and Figure 5.11 from the scripted attack scenario.

Determining what actions the blind attacker conducted after compromising the web server only

requires modifying the start time of the query from Figure 5.12. The results of the query, displayed

in Table 5.10, reveal the extent of the blind attacker’s actions after compromising the web server.

From the commands run on the web server the analyst can determine that the attacker carried

out the following actions after compromising the web server:

• Extracted encrypted password information based on the ‘cat /etc/passwd’ and ‘cat /etc/shadow’

entries

• Privilege escalated to the marlinspike user account based on the ‘su - marlinspike’ entry

• Port scanned the entire simulation network based on the ‘nmap 10.10.10.0/24’ entry

• Version scan of two hosts in the simulation network based on the ‘nmap -sV 10.10.10.11’ and

‘nmap -sV 10.10.10.12’ entries

• Connected to FTP server on 10.10.10.11 with telnet based on the ‘telnet 10.10.10.11 21’ entry

67

Table 5.10: Blind Attack - Command Line Query Results for 10.10.10.10

Time Command2018-02-15T18:25:57Z sh -c /bin/sh2018-02-15T18:25:57Z id2018-02-15T18:25:57Z /bin/sh2018-02-15T18:26:27Z /bin/sh /usr/bin/which python2018-02-15T18:26:27Z python -c import pty; pty.spawn(”/bin/bash”)2018-02-15T18:26:57Z cat /etc/shadow2018-02-15T18:26:57Z cat /etc/passwd2018-02-15T18:28:27Z su - marlinspike2018-02-15T18:28:57Z /bin/sh /usr/bin/which nmap2018-02-15T18:29:27Z nmap 10.10.10.0/242018-02-15T18:29:57Z nmap -sV 10.10.10.122018-02-15T18:29:57Z nmap -sV 10.10.10.112018-02-15T18:41:58Z sh -c /bin/sh2018-02-15T18:41:58Z /bin/sh2018-02-15T18:42:28Z python -c import pty; pty.spawn(”/bin/bash”)2018-02-15T18:42:28Z /bin/bash2018-02-15T18:42:28Z su - marlinspike2018-02-15T18:44:59Z telnet 10.10.10.11 212018-02-15T18:48:59Z sh -c /bin/sh2018-02-15T18:48:59Z /bin/sh2018-02-15T18:49:29Z /bin/bash2018-02-15T18:49:29Z python -c import pty; pty.spawn(”/bin/bash”)2018-02-15T18:49:29Z su - marlinspike2018-02-15T18:49:59Z ssh [email protected]

• Connected to 10.10.10.12 with ssh using the sshuser account based on the

‘ssh [email protected]’ entry

Determining which commands the attacker executed on 10.10.10.11 and 10.10.10.12 required

only modification of the target field in the query in Figure 5.12 to traverse the graph. The results for

the two servers are shown in Table 5.11 and Table 5.12.

From the query results, the analyst confirms that the attacker interacted with both servers,

viewed the contents of the password files and a file named ’secret’. Determining how the attacker

connected to the two servers required further investigation into the commands run on the web server

and querying the graph.

For the 10.10.10.11 server, based on the results in Table 5.10 the attacker initiated a telnet con-

68


Time Command2018-02-15T18:39:44Z /bin/sh2018-02-15T18:39:44Z sh -c /bin/sh;/sbin/sh2018-02-15T18:41:14Z /bin/sh2018-02-15T18:41:14Z sh -c /bin/sh;/sbin/sh2018-02-15T18:45:14Z sh -c /bin/sh;/sbin/sh2018-02-15T18:45:14Z /bin/sh2018-02-15T18:46:44Z id2018-02-15T18:47:14Z cat /etc/passwd2018-02-15T18:47:14Z cat /etc/shadow2018-02-15T18:47:44Z ls2018-02-15T18:48:14Z ls2018-02-15T18:48:44Z ls


Time Command2018-02-15T18:50:16Z ls –color=auto2018-02-15T18:50:16Z cat secret2018-02-15T18:50:46Z ls –color=auto2018-02-15T18:50:46Z cat /etc/passwd2018-02-15T18:50:46Z cat /etc/shadow

nection from 10.10.10.10 to 10.10.10.11 on port 21. As port 21 is typically associated file transfer

protocol (FTP) activity, this connection required further investigation. Running the AQL query in

Figure 5.14 identified the alert in Figure 5.15, confirming the exploitation attempt of 10.10.10.11 on

port 21 from 10.10.10.10 during the time frame of the attack. The name and classification fields of

the alert identify the activity as exploiting a backdoor in the ProFTPD software (“ProFTPd 1.3.3c -

Compromised Source Backdoor Remote Code Execution” 2018) installed on the 10.10.10.11 server.

69

Figure 5.14: Blind Attacks - Alerts for 10.10.10.11 Query

Figure 5.15: Blind Attacks - Alerts for 10.10.10.11 Results

Modifying the query in Figure 5.14 to search for alerts between 10.10.10.10 and 10.10.10.12

returned no results, indicating that the attacker used an exploit undetected by the NIDS or the

attacker logged into 10.10.10.12 with valid user credentials. Based on the commands run on each

of the hosts from Tables 5.10, 5.11, and 5.12, the attacker viewed the password files for each of the

servers, and therefore could crack passwords for users on those servers.

From the results in Table 5.10, the attacker attempted a secure shell (SSH) connection from

10.10.10.10 to 10.10.10.12. The AQL query in Figure 5.16 traverses the graph to identify the

hostname and IP address of any server where the ’sshuser’ account exists. From the results in

Table 5.13, the ’sshuser’ account exists on the 10.10.10.11 and 10.10.10.12 servers. By extracting

the password hash for the ’sshuser’ from the 10.10.10.11 server, the attacker could have cracked

70

the password for the account and used valid credentials to complete the SSH connection to the

10.10.10.12 server.

Figure 5.16: Blind Attacks - sshuser Query

Table 5.13: Blind Attack - sshuser Query Results

Hostname IP Addressattack2 10.10.10.12attack2 fe80::d787:b9b1:3d39:bff0attack2 fe80::ee4e:bbe0:ae3f:de1aattack1 10.10.10.11attack1 fe80::369f:2a6d:ccb2:48a7attack1 fe80::eb2b:88a9:c1d1:6517

Using the ’sshuser’ account and the time frame of the suspected login, the AQL query in Fig-

ure 5.17 traverses the graph to return any login events for the ’sshuser’ on the 10.10.10.12 server.

The query results in Figure 5.18 confirm an SSH login from 10.10.10.10 to 10.10.10.12 with the

’sshuser’ account during the time of the attack.

Figure 5.17: Blind Attacks - sshuser Logins Query

71

Figure 5.18: Blind Attacks - sshuser Logins Query Results

From reconnaissance to successful exploitation of the three servers the entire attack took ap-

proximately 30 minutes. Full analysis of the attacker’s actions required under two hours to com-

plete with manual query construction and reuse. The analysis identified the attacker, the targets, the

method of exploitation against each target, and actions taken by the attacker on the compromised

servers. From the analysis, a defender has the information required to take corrective action to

remove the attacker’s access and prevent further attacks via the same vectors.

With the information collected during the analysis, the defender knows the actions needed to

respond and recover from the attack:

• Change the admin password for the WordPress installation based on the observed WordPress

administrator account login and upload of a malicious plugin

• Change the password for the marlinspike account based on the attacker switching to the mar-

linspike user account

• Change the password for the sshuser account based on the attacker completing SSH connec-

tion to 10.10.10.12

• Change the passwords for any remaining accounts on the servers based on the attack extract-

ing password hashes and successfully logging in as the marlinspike and sshuser accounts

• Institute firewall rules that prevent initiation of outbound connections from the web server72

based on observing the web server initiating an outbound connection to the attacker

• Remove the malicious plugin from the WordPress installation based on the observed installa-

tion of a plugin resulting in compromise of the web server

• Patch the vulnerable version of ProFTPD on the 10.10.10.11 server based on the confirmation

of successful exploitation

In summary, the above analysis demonstrated the capability of the research to collect, trans-

form, and store network events from multiple sources into a graph-model and integrate the query

capability of the graph-model into the analyst workflow. In less than two hours the analyst detected

the source and method of the attacks against multiple servers and obtained the needed information

to respond and recover from the attacks. Through manual query construction and analysis, the de-

fender understands the IP addresses of the involved hosts, the extent of the accounts compromised

as part of the attack, identification of vulnerable software running, and the command history from

each of the servers compromised during the time frame of the attack. The results obtained from the

queries provide the analyst with the information necessary to take corrective action to respond and

recover from the successful attack.

73

Chapter 6 – Conclusions

6.1 Results

This research demonstrated the application of an architecture that collects security-related events

from multiple sources and transforms the events for storage in a graph database for integration into

the workflow of network defenders. Through the use of open source technologies, the development

of the architecture and supporting software aggregate events from a network intrusion detection

system (NIDS), a network security monitoring system (NSM), and telemetry from hosts within the

network into a central messaging system. The collected events are then transformed and stored in a

graph database for querying by a network defender.

With the architecture in place, a series of scripted and blind attacks conducted against hosts

within the network to generate events related to attacker activity within a targeted network. Analysis

of the collected network data from the perspective of a network defender demonstrated the capability

of the architecture and the graph-model to rapidly extract relevant information from the collected

events and determine the actions taken by the attacker and the necessary response and recovery

steps in response to the attack.

Based on the average breach detection time, as reported by multiple sources in Table 6.1 (Brum-

field 2017; SANS Analyst Program 2017; Ponemon Institute 2017) the observed time to detection

in this research of 1-2 hours demonstrates a measurable improvement over current industry trends.

As demonstrated in the analysis of the attack scenarios, the graph-model provides an analyst with

the ability to traverse the graph from an observed alert reported by the NIDS to retrieve commands

run by the targeted host as reported by osquery. By transforming events from multiple sources into

a context-based graph, traversing the graph based on the relationships between the data elements

removes the need for an analyst to manual determine the relationships and query multiple sources,

reducing the time to identify and confirm malicious activity.

6.2 Limitations

While the analysis of the scripted and blind attack scenarios validated the hypothesis, there are

limitations in the results of the research. The different sources of average breach detection times,

74

Table 6.1: Average Breach Detection Times

Source Average Detection TimePonemon Institute 191 daysVerizon 2017 DBIR weeks to months2017 SANS Incident Response Survey 6-24 hoursThis research 1-2 hours

the limited scale of the simulation environment, and the limited diversity of specific attacks limit

the evaluation of the efficacy of the research to reduce average detection time during a breach.

The simulation environment contained over 30 desktops, laptops, mobile devices, and virtual

machines. Typical enterprise networks, the envisioned environments that would benefit from the

research, contain hundreds to thousands of endpoints. While the architecture discussed in this

research could scale to meet the demand of such large networks, the data collected and analyzed

from the simulation network is a small fraction of that typically seen in enterprise networks.

The scripted and blind attacks provided sufficient data to validate the graph-model approach

to detecting malicious network activity. However, the attack surface presented by the vulnerable

targets limited the number of potential attacks available to the attackers. The diversity of hardware,

software, and configuration typically present in an enterprise network provides a much larger attack

surface and opportunity for exploitation.

Finally, the reported average breach detection time ranged from 6 hours to months. While

the large range may be due to varying data sources and analysis methods used by organizations

conducting the analysis, this research assumed an approximation of the average breach detection

time to be on the order of days. Additionally, without access to the actual events from the analyzed

real-world breaches, there is no direct comparison of the detection methods used during the real-

world breaches and the methods discussed in this research.

6.3 Future Research

This research demonstrated the network defense capabilities of an architecture that collects,

transforms, and stores events from multiple sources into a graph-model to improve the workflow of

network defenders. Future research should be directed against:

75

• Larger Data Sets - Enhancing the architecure to support enterprise network level data collec-

tion and analysis to further validate the capabilities provide by graph-based network defense

and testing the architecture in a typical enterprise network deployment or a research honeynet.

• Additional Data Sources - Inclusion of additional data sources such as vulnerability scan-

ners, network scanners, threat intelligence, host-based intrusion detection, and additional

queries for osquery to enrich the graph-model and provide more detailed information for

network defenders.

• Model Improvements - Modify and improve the graph-model data structure based on exist-

ing and new data sources to provide for additional context and improved graph query perfor-

mance.

• Automation - Parameterize and automate graph database queries to minimize the operator

interaction required for detection and localization of attacker activity.

6.4 Practical Application

With organizations in all industry sectors facing the threat of attack, implementation of the

event correlation system discussed in this praxis would provide network defenders with a capability

that could improve their workflow and reduce the time to detect and localize attacker activity. By

reducing the time available to an attacker in the network, defenders greatly reduce the risk of the

attacker moving laterally through the network and reaching their objective.

As the components of the architecture developed in this praxis are all open source software

offerings, the initial cost to install an implementation of the concepts in this research are minimal

given the average cost of a successful breach into the network. However, organizations must also

consider the potential maintenance and training costs associated with any new technology such as

licensing for enterprise support of the open source software and ensuring defenders fully understand

how to utilize new tools and techniques for network defense.

As discussed in Chapter 1, by striving to meet the design goals of modularity, scalability, and a

distributed architecture, the event correlation system can be integrated into networks of varying size,

structure, and purpose. The container-first approach in the software architecture permits organiza-

tions to add or modify individual components within the architecture while not requiring updates

76

to the entire system. Additionally, the use of containers allows organizations to host the system in

any environment capable of running containers to include bare metal servers, virtual machines, or

any of the major cloud provides such as Google, Microsoft, and Amazon. The scalability of the

system also ensures that networks of any size can implement the correlation system to meet the

higher data throughput requirements. Although the simulation events occurred in relatively small

home network environment, the cluster architecture provided by Kafka and ArangoDB combined

with the ability to run multiple instances of the same container ensures the system can grow to meet

an organizations current and future monitoring needs.

Although this research used Bro, Suricata, and osquery as data sources for the model, the mod-

ularity provided by the architecture allows organizations to use any data sources that are currently

available in the network with minimal updates to the processing functions. By using existing data

sources, the organization can minimize the impact of bringing the new system online while retaining

the capability to add new data sources as the needs of the organization change.

Finally, while the research demonstrated the capability of graph-based event correlation, the

future research areas discussed in Section 6.3 would provide a system more capable of assisting

network defenders. Additional data sources and updates to the graph model will provide defenders

with a more holistic view of the activity occurring within the network while parameterization and

automation of AQL queries will further improve defender workflow and reduce time to detection of

malicious activity.

77

References

“Apache Kafka.” 2017. Apache. Accessed December 10, 2017. https://kafka.apache.org.

“Apache Kafka - Uses.” 2017. Apache. Accessed December 10, 2017. https://kafka.apache.org/uses.

“Apache Kafka - Intro.” 2017. Apache Kafka. Accessed December 29, 2017. https://kafka.apache.

org/intro.

“Metron - Logging Bro Output to Kafka.” 2017. Apache Metron. Accessed December 8, 2017.

https://metron.apache.org/current-book/metron-sensors/bro-plugin-kafka/index.html.

ArangoDB. 2016. What is multi-model database and why use it? Technical report. ArangoDB,

December.

“ArangoDB - highly available multi-model NoSQL database.” 2017. ArangoDB. Accessed Decem-

ber 8, 2017. https://www.arangodb.com/.

“Foxx at a glance.” 2017. ArangoDB. Accessed December 8, 2017. https://docs.arangodb.com/3.3/

Manual/Foxx/AtAGlance.html.

Bromiley, Matt. 2017. Enhance Your Investigations with Network Data. Technical report. SANS

Institute, October.

Brumfield, J. 2017. 2017 Data Breach Investigations Report. Technical report. Verizon Enterprise.

Critical Security Controls for Effective Cyber Defense. 2015. Technical report. The Center for In-

ternet Security, October.

Cummings, JJ, and Michael Shirk. 2017. “shirkdog/pulledpork.” Github. https://github.com/shirkd

og/pulledpork.

“DB-Engines Ranking - popularity ranking of graph DBMS.” 2017. DB-Engines. Accessed Decem-

ber 23, 2017. https://db-engines.com/en/ranking/graph+dbms.

78

Djanali, Supeno, Baskoro Pratomo, Hudan Studiawan, R Anggoro, and T.C. Henning. 2015. “Coro :

Graph-based automatic intrusion detection system signature generator for evoting protection.”

81 (November): 535–546.

“Swarm mode key concepts.” 2017. Docker. Accessed December 10, 2017. https://docs.docker.

com/engine/swarm/key-concepts/.

“What is a Container.” 2017. Docker. Accessed December 10, 2017. https://www.docker.com/what-

container.

“Beats: Data Shippers for Elasticsearch.” 2017. elastic. Accessed December 1, 2017. https://www.

elastic.co/products/beats.

“Powering Data Search, Log Analysis, Analytics.” 2017. elastic. Accessed December 1, 2017. https:

//www.elastic.co/products.

“Emerging Threats.” 2017. Emerging Threats. Accessed December 8, 2017. http://doc.emergingthr

eats.net/bin/view/Main/WebHome.

“Packs.” 2017. Facebook. https://osquery.io/schema/packs/.

FireEye. 2017. M-Trends 2017 - A View From the Frontlines. Technical report. FireEye.

“What is Fluentd?” 2017. Fluentd Project. Accessed December 12, 2017. https://www.fluentd.org/

architecture.

Fredj, Ouissem Ben. 2015. “A realistic graph-based alert correlation system.” SCN-12-0553.R1,

Security and Communication Networks 8 (15): 2477–2493. ISSN: 1939-0122. doi:10.1002/sec.

1190. http://dx.doi.org/10.1002/sec.1190.

Friedberg, Ivo, Florian Skopik, Giuseppe Settanni, and Roman Fiedler. 2015. “Combating advanced

persistent threats: From network event correlation to incident detection.” Computers & Security

48:35–57. ISSN: 0167-4048. doi:https://doi.org/10.1016/j.cose.2014.09.006. http://www.

sciencedirect.com/science/article/pii/S0167404814001461.

79

Garcia-Teodoro, P., J. Diaz-Verdejo, G. Macia-Fernandez, and E. Vazquez. 2009. “Anomaly-based

network intrusion detection: Techniques, systems and challenges.” Computers & Security 28

(1): 18–28. ISSN: 0167-4048. doi:https://doi.org/10.1016/j.cose.2008.08.003. http://www.

sciencedirect.com/science/article/pii/S0167404808000692.

Kennedy, David, Jim O’Gorman, Devon Kearns, and Mati Aharoni. 2011. Metasploit: The Pene-

tration Tester’s Guide. 1st. San Francisco, CA, USA: No Starch Press. ISBN: 159327288X,

9781593272883.

Kent, Alexander D., Lorie M. Liebrock, and Joshua C. Neil. 2015. “Authentication graphs: Analyz-

ing user behavior within an enterprise network.” Computers and Security 48:150–166. ISSN:

0167-4048. doi:https://doi.org/10.1016/j.cose.2014.09.001. http://www.sciencedirect.com/

science/article/pii/S0167404814001321.

Milenkoski, Aleksandar, Marco Vieira, Samuel Kounev, Alberto Avritzer, and Bryan D. Payne.

2015. “Evaluating Computer Intrusion Detection Systems: A Survey of Common Practices.”

ACM Comput. Surv. (New York, NY, USA) 48, no. 1 (September): 12:1–12:41. ISSN: 0360-

0300. doi:10.1145/2808691. http://doi.acm.org/10.1145/2808691.

“The Neo4j Graph Platform.” 2017. Neo4j, Inc. Accessed December 1, 2017. https://neo4j.com/.

NIST. 2014. Framework for Improving Critical Infrastructure Cybersecurity. Technical report. Na-

tional Institue of Standardss and Technology, February. https://www.nist.gov/framework.

Oh, Joohwan. 2016. “kafka-python.” Accessed December 7, 2017. http://python-driver-for-arango

db.readthedocs.io/en/master/index.html.

“Suricata User Guide.” 2016. Open Information Security Foundation. http://suricata.readthedocs.

io/en/latest/index.html.

“Suricata Open Source IDS / IPS / NSM Engine.” 2017. Open Information Security Foundation.

Accessed December 30, 2017. https://suricata-ids.org/.

“osquery Docs.” 2017. Accessed December 23, 2017. https://osquery.readthedocs.io/en/stable/.

80

Pharate, Abhishek, Harsha Bhat, Vaibhav Shilimkar, and Nalini Mhetre. 2015. “Article: Classifi-

cation of Intrusion Detection System.” Full text available, International Journal of Computer

Applications 118, no. 7 (May): 23–26.

“RabbitMQ - Messaging that just works.” 2017. Pivotal. Accessed December 1, 2017. https://www.

rabbitmq.com/.

Ponemon Institute. 2017. 2017 Cost of Data Breach Study. Technical report. Ponemon Institute.

Powers, Dana, and David Arthur. 2016. “python-arango.” Accessed December 7, 2017. http://kafka-

python.readthedocs.io/en/master/.

“ProFTPd 1.3.3c - Compromised Source Backdoor Remote Code Execution.” 2018. https://www.

exploit-db.com/exploits/15662/.

Ramaki, Ali Ahmadian, and Reza Ebrahimi Atani. 2016. “A survey of IT early warning systems:

architectures, challenges, and solutions.” Sec.1647, Security and Communication Networks 9

(17): 4751–4776. ISSN: 1939-0122. doi:10.1002/sec.1647. http://dx.doi.org/10.1002/sec.1647.

“Ansible Documentation.” 2017. Red Hat. http://docs.ansible.com/ansible/latest/index.html.

Reed, Theodore, Robert G. Abbott, Benjamin Anderson, Kevin Nauer, and Chris Forsythe. 2014.

“Simulation of Workflow and Threat Characteristics for Cyber Security Incident Response

Teams.” Proceedings of the Human Factors and Ergonomics Society Annual Meeting 58 (1):

427–431. doi:10.1177/1541931214581089. eprint: https://doi.org/10.1177/154193121458108

9. https://doi.org/10.1177/1541931214581089.

Robinson, Ian, Jim Webber, and Emil Eifrem. 2015. Graph Databases. OReilly.

Sanders, Chris, and Jason Smith. 2014. Applied Network Security Monitoring. Syngress.

SANS Analyst Program. 2017. 2017 SANS Incident Response Survey. Technical report. SANS In-

stitute.

81

Security and Privacy Controls for Information Systems and Organizations. 2017. Technical report.

National Institute for Standards and Technology, August.

Shakarian, Paulo, Jana Shakarian, and Andrew Ruef. n.d. Introduction to Cyber-Warfare: A Multi-

disciplinary Approach. Syngress.

Shapoorifard, Hossein, and Pirooz Shamsinejad. 2017. “A Novel Cluster-based Intrusion Detec-

tion Approach Integrating Multiple Learning Techniques.” International Journal of Computer

Applications (New York, USA) 166, no. 3 (May): 13–16. ISSN: 0975-8887. doi:10.5120/ijca

2017913948. http://www.ijcaonline.org/archives/volume166/number3/27649-2017913948.

Shostack, Adam. 2014. Threat Modeling: Designing for Security. Wiley.

“Talos - Author of the Official Snort Rule Sets.” 2017. Snort. Accessed December 8, 2017. https:

//www.snort.org/talos.

“SIEM, AIOps, Application Management, Log Management, Machine Learning, and Compliance.”

2017. Splunk. Accessed December 1, 2017. https://www.splunk.com.

“Squil - Open Source Network Security Monitoring.” 2014. Squil. Accessed December 8, 2017.

http://bammv.github.io/sguil/index.html.

“Nessus Professional Vulnerability Scanner.” 2017. tenable. Accessed December 7, 2017. https :

//www.tenable.com/products/nessus/nessus-professional.

“Apache Zookeeper.” 2017. The Apache Software Foundation. Accessed December 29, 2017. https:

//zookeeper.apache.org.

“The Bro Project.” 2014. The Bro Project. Accessed December 26, 2017. https://www.bro.org/.

“Trail of Bits.” 2017. Trail of Bits. Accessed December 7, 2017. https://www.trailofbits.com/.

“Fluent Bit.” 2017. Treasure Data. Accessed December 7, 2017. https://fluentbit.io.

82

Vasilomanolakis, Emmanouil, Shankar Karuppayah, Max Muhlhauser, and Mathias Fischer. 2015.

“Taxonomy and Survey of Collaborative Intrusion Detection.” ACM Comput. Surv. (New York,

NY, USA) 47, no. 4 (May): 55:1–55:33. ISSN: 0360-0300. doi:10.1145/2716260. http://doi.

acm.org/10.1145/2716260.

“Basic Penetesting: 1.” 2017. VulnHub. https://www.vulnhub.com/entry/basic-pentesting-1,216/.

Yadav, Tarun, and Arvind Mallari Rao. 2016. “Technical Aspects of Cyber Kill Chain.” CoRR

abs/1606.03184. arXiv: 1606.03184. http://arxiv.org/abs/1606.03184.

83

graph-based event correlation for network security defense

Documents