graph-based event correlation for network security defense
TRANSCRIPT
Graph-based Event Correlation for Network Security Defense
by Patrick Neise
B.S in Electrical Engineering, May 1999, The University of Texas at AustinM.A. in Information Technology Managment, June 2010, Webster University
M.S. in Information Security Engineering, June 2017, SANS Technical Institute
A Praxis submitted to
The Faculty ofThe School of Engineering and Applied Science
of The George Washington Universityin partial fulfillment of the requirementsfor the degree of Doctor of Engineering
May 20, 2018
Praxis directed by
Thomas F. BerssonAdjunct Professor of Engineering and Applied Science
The School of Engineering and Applied Science of The George Washington University certifies that
Patrick Neise has passed the Final Examination for the degree of Doctor of Engineering as of March
23, 2018. This is the final and approved form of the Praxis.
Graph-based Event Correlation for Network Security Defense
Patrick Neise
Praxis Research Committee:
Thomas F. Bersson, Adjunct Professor of Engineering and Applied Science, Praxis Director
Amirhossein Etemadi, Assistant Professor of Engineering and Applied Science, Committee Member
Ebrahim Malalla, Visiting Associate Professor of Engineering and Applied Science, Committee Member
Johannes Ullrich, Senior Instructor; Dean of Research, SANS Technical Institute, Committee Member
ii
c© Copyright 2018 by Patrick NeiseAll rights reserved
iii
Dedication
This praxis is dedicated to my wife and children. Thank you for your support and enduring the
long nights and missed weekends when I was unavailable. Stephanie, Addison, and Austin, thank
you for being there as I completed this important educational and career milestone.
iv
Acknowledgements
I would like to acknowledge my advisors Drs. Bersson and Young for their support and feed-
back throughout the process. I would also like to thank Dr. Ullrich for his guidance and technical
advice during the research and Mr. Roy Luongo for participation as the third-party attacker during
the simulation phase of the research.
v
Abstract of Praxis
Graph-based Event Correlation for Network Security Defense
Organizations of all types and their computer networks are constantly under threat of attack. While
the overall detection time of these attacks is getting shorter, the average detection time of weeks
to months allows the attacker ample time to potentially cause damage to the organization. Current
detection methods are primarily signature based and typically rely on analyzing the available data
sources in isolation. Any analysis of how the individual data sources relate to each other is usually
a manual process, and will most likely occur as a forensic endeavor after the attack identification
occurs via other means. The use of graph theory and the graph databases built to support its appli-
cation can provide a repeatable and automated analysis of the data sources and their relationships.
By aggregating the individual data sources into a graph database based on a model that supports
the data types and relationships, database queries can extract information relevant to the detection
of attack behavior within the network. The work in this Praxis shows how the graph model and
database queries will reduce the overall time to detection of a successful attack by enabling defend-
ers to understand better how the data elements and what they represent are related.
Keywords: graph model, graph database, network intrusion, network security, event correlation
vi
Table of Contents
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Abstract of Praxis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Chapter 1 – Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 The Threat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 State of Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 A New Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Practical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Organization of Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2 – Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Data Breaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Intrusion Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Collect the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Defend the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 3 – Problem Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1 Modular, Scalable, Distributed . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Container Based . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Centralized Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.1 Bro Network Security Monitor . . . . . . . . . . . . . . . . . . . . . . . . 14
3.4.2 Suricata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.4.3 osquery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4.4 Additional Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5 Events to Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18vii
3.6 The Graph Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.7 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.7.1 Bro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.7.2 Suricata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.7.3 osquery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.7.4 Complete Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Chapter 4 – Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 Attack Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.2 Indicators of Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Graph-Based Event Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Software Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.1 Collecting Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3.1.1 Kafka and Zookeeper . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.1.2 Fluentd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.1.3 Bro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.1.4 Suricata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.1.5 osquery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.2 Extracting, Transforming, and Loading Events . . . . . . . . . . . . . . . 39
4.3.3 Storing Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.4 Querying the Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Chapter 5 – Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.1 Simulation Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.1.1 Background Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.2 Vulnerable Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.1 Scripted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.2 Blind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3.1 Baseline Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50viii
5.3.2 False Positives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3.3 Scripted Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3.4 Blind Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Chapter 6 – Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.4 Practical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
ix
List of Figures
Figure 3.1 – Complete Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Figure 3.2 – Bro Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Figure 3.3 – Suricata Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Figure 3.4 – osquery Graph Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Figure 4.1 – FireEye Cyber Attack Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Figure 4.2 – Software Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . 31
Figure 4.3 – Example osquery results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Figure 4.4 – Example ArangoDB vertex document . . . . . . . . . . . . . . . . . . . . . . 41
Figure 4.5 – Example ArangoDB edge document . . . . . . . . . . . . . . . . . . . . . . . 42
Figure 5.1 – Simulation Environment Overview . . . . . . . . . . . . . . . . . . . . . . . . 45
Figure 5.2 – AQL Query of External Domains and IP Addresses . . . . . . . . . . . . . . . 51
Figure 5.3 – AQL Query of Inbound Network Connections and Domains . . . . . . . . . . . 52
Figure 5.4 – Suricata Alert from ArangoDB . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Figure 5.5 – AQL Query for Suricata Alert Analysis . . . . . . . . . . . . . . . . . . . . . . 54
Figure 5.6 – Visualization of Query Results . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Figure 5.7 – Scripted Attacks - NIDS Alerts Over Time . . . . . . . . . . . . . . . . . . . . 59
Figure 5.8 – Scripted Attacks - AQL Alerts Summary . . . . . . . . . . . . . . . . . . . . . 60
Figure 5.9 – Scripted Attacks - HTTP Service Query . . . . . . . . . . . . . . . . . . . . . 61
Figure 5.10 –Scripted Attacks - Web Server Outbound Query . . . . . . . . . . . . . . . . . 63
Figure 5.11 –Scripted Attacks - Web Server Outbound Results . . . . . . . . . . . . . . . . . 63
Figure 5.12 –Scripted Attacks - Command Query . . . . . . . . . . . . . . . . . . . . . . . 64
Figure 5.13 –Blind Attacks - Alerts Over Time . . . . . . . . . . . . . . . . . . . . . . . . . 66
Figure 5.14 –Blind Attacks - Alerts for 10.10.10.11 Query . . . . . . . . . . . . . . . . . . . 70
Figure 5.15 –Blind Attacks - Alerts for 10.10.10.11 Results . . . . . . . . . . . . . . . . . . 70
Figure 5.16 –Blind Attacks - sshuser Query . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Figure 5.17 –Blind Attacks - sshuser Logins Query . . . . . . . . . . . . . . . . . . . . . . . 71
Figure 5.18 –Blind Attacks - sshuser Logins Query Results . . . . . . . . . . . . . . . . . . 72
x
List of Tables
Table 4.1 – Bro conn.log data fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Table 4.2 – Bro http.log data fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Table 5.1 – Summary of Network Activity During Simulation . . . . . . . . . . . . . . . . . 46
Table 5.2 – Summary of Outbound Activity Prior to Attack Events . . . . . . . . . . . . . . 51
Table 5.3 – Summary of Inbound Activity Prior to Attack Events . . . . . . . . . . . . . . . 53
Table 5.4 – Connection Data Associated with Alert Under Investigation . . . . . . . . . . . 56
Table 5.5 – HTTP Session Data Associated With Alert Under Investigation . . . . . . . . . 56
Table 5.6 – Scripted Attack - Alerts Summary Query Results . . . . . . . . . . . . . . . . . 61
Table 5.7 – Scripted Attack - HTTP Service Results . . . . . . . . . . . . . . . . . . . . . . 62
Table 5.8 – Scripted Attack - Command Line Query Results . . . . . . . . . . . . . . . . . 64
Table 5.9 – Blind Attack - Alerts Summary Query Results . . . . . . . . . . . . . . . . . . . 67
Table 5.10 –Blind Attack - Command Line Query Results for 10.10.10.10 . . . . . . . . . . 68
Table 5.11 –Blind Attack - Command Line Query Results for 10.10.10.11 . . . . . . . . . . 69
Table 5.12 –Blind Attack - Command Line Query Results for 10.10.10.12 . . . . . . . . . . 69
Table 5.13 –Blind Attack - sshuser Query Results . . . . . . . . . . . . . . . . . . . . . . . 71
Table 6.1 – Average Breach Detection Times . . . . . . . . . . . . . . . . . . . . . . . . . . 75
xi
List of Acronyms
API Application Programming Interface
AQL Arango Query Language
DNS Domain Name System
ETL Extract, Transform, Load
EWS Early Warning System
FTP File Transfer Protocol
HIDS Host Intrusion Detection System
HTTP Hypertext Transfer Protocol
ICMP Internet Control Message Protocol
IDS Intrusion Detection System
IP Internet Protocol
IPS Intrusion Prevention System
JSON Javascript Object Notation
NIDS Network Intrusion Detection System
NIST National Institute of Standards and Technology
NSM Network Security Monitor
PCAP Packet Capture
PII Personally Identifiable Information
SANS SysAdmin, Audit, Network, Security
SIEM Security Incident and Event Management
SMB Server Message Block
SMTP Simple Mail Transfer Protocol
SQL Structured Query Language
SSH Secure Shell
SSL Secure Sockets Layer
TCP Transmission Control Protocol
UDP User Datagram Protocol
UID Unique Identifierxii
URI Uniform Resource Identifier
URL Uniform Resource Locator
USB Universal Serial Bus
VM Virtual Machine
VPN Virtual Private Network
xiii
Chapter 1 – Introduction
1.1 The Threat
Attacks against networks span the spectrum of industries and include goals such as denial of
service, ransom for retrieval of sensitive information, and theft of intellectual property and personal
information. The skill set of the attacker capabilities include hobbyists trying to learn or to claim
bragging rights, organized crime syndicates seeking monetary gain, and state-backed organizations
that are conducting intelligence-gathering operations. The tactics, techniques, and procedures range
from free and open source tools using exploits against known vulnerabilities to custom developed
toolchains using previously unreported software vulnerabilities.
The range of targets, attackers, and tools require network defenders to detect and respond to a
complex and growing combination of threats against the network. The attack life cycle typically
follows the stages of reconnaissance, initial exploitation, gaining persistence, lateral movement,
and attaining the objective, producing numerous network and host-based artifacts for detection of
malicious activity. Identification and recognition of these artifacts, and how they relate to each
other, allows the defender to separate regular activity from that generated by an attacker. To reduce
the total time between initial compromise and remediation, or dwell time, defenders require the
ability to rapidly identify malicious indicators and the supporting related information from multiple,
unconnected sources.
1.2 State of Practice
Although the average dwell time has decreased based on recent analysis (Brumfield 2017), there
are still many cases where adversaries persist within a network for months if not years. With the
current state of network intrusion detection focused primarily on signature or rule-based alerting
of individual events, network defenders are required to correlate events from many data sources
manually.
The current state of event correlation for security relies primarily on Structure Query Language
(SQL) or full text-based searches against vast data stores of event logs and alerts. Products such
as Splunk (“SIEM, AIOps, Application Management, Log Management, Machine Learning, and
1
Compliance” 2017) and the Elastic stack (“Powering Data Search, Log Analysis, Analytics” 2017)
provide defenders with the capability to collect all of the relevant security information into a single
location from which they can search and perform event correlation. Having these systems in place
give the defender the ability to gain insight into the status of the network and possible malicious
activity.
What these products do not provide is an easily queryable model that represents how all of the
data sources are related. By placing data into relational database tables or a centralized full-text
database, it is typically up to the defender to create queries that define how the data are related
and the desired results. While this method can prove useful; it requires extensive knowledge of the
underlying data structure from each of the sources, an understanding of how those sources relate to
each other, and how to write queries that can merge the appropriate sources to gain insight into the
activity within the network.
The complexity presented by the current tooling, although useful, can result in increased time
to detection or missed indicators of compromise within the network. This complication can also
reduce the ability of a defender to respond to an identified incidents due to data storage and query
methods. To empower defenders to locate and remediate and incident rapidly, a new approach to
store and query the large volumes of data in a contextual model is needed.
1.3 A New Model
To address the problem of long adversary dwell times within organization’s networks, this
Praxis proposes that the use of graph databases to correlate data will provide network defenders
the ability to analyze the relationships between individual data sets resulting in higher detection
rates and shorter dwell times.
Specific objectives of the research were to 1) assemble current methods of data collection within
a typical network structure, 2) to analyze the relationships between individual data sources within
context of detection of malicious activity 3) identify a graph model that supports detection of adver-
sarial network activity, and 4) evaluate the effectiveness of a graph model to reduce time to detection
of malicious network activity.
Graph databases treat the relationships between data nodes with the same level of importance
2
as the data nodes themselves (Robinson, Webber, and Eifrem 2015). The storing of the data about
the connections between nodes as an entity within the database removes the need to conduct com-
plicated SQL JOIN queries and creates a contextual representation of the data stored directly in the
database. Storing the data and its relationships in a graph model allow the user to traverse the graph
with queries that move through the model based on how the nodes are related to each other.
The graph model for the types of data required for the defense of a network provides the frame-
work to define the information contained within each of the data sources, how those data sources are
related, and supporting information about those relationships. A fully developed graph model pro-
vides the level of understanding required to perform extraction, transformation, and loading (ETL)
of the individual data sources, typically available in a wide variety of formats, for insertion into
the graph databases. Additionally, the graph model provides the ability to generate queries of the
database in a manner that is representative of how the data is related.
The research in this praxis demonstrates that the use of graph databases in network defense can
reduce the dwell time of adversaries within a network by providing defenders with a contextual
correlation of the data sources typically available in a network. The graph model seeks to provide
a holistic view of the data that allows the defender to locate and understand how individual events
within their logs are directly related to each other. A distributed, modular, and scalable architec-
ture supports the collection and ETL of individual data sources into the graph database within a
test network to capture representative activity during normal operations. Simulated attacks against
the network demonstrated the ability of the model to rapidly identify potential malicious activity
resulting in lower dwell time for the adversary.
The graph model and software architecture produced as a result of the research in this praxis
can be applied to existing networks of varying scale and structure. Implementation of the sensors
and graph correlation system discussed in the following chapters would provide network defenders
within any industry with the capability to gain a more informed understanding of the activity within
the network and reduce the time required to detect and act against the potential malicious activity.
3
1.4 Practical Application
As a scalable, modular, and distributed system, the software architecure and graph data model
developed in this Praxis supports network defenders in more rapid detection of potential adversarial
activity. By centralizing data from multiple sensors and sources into a unified graph model with a
single user interface, network defenders’ workflow is improved while reducing the training burden
typically associated with the diverse tool chains used in network defense. Additionally, through
the use of a container based architecture supported by open source software, the deployment and
lifecycle costs of the system are minimized. Finally, by reducing the time to detection of adversary
activity within the network, organizations could prevent the cost of a data breach, which in 2017
averaged nearly $4 million per breach.
Engineering managers in the areas of Risk Management, Technology Management, Information
Management, and Enterprise Information Assurance may apply the techniques, methodology, and
technologies discussed in this Praxis. By employing the correlation system, which is based on
open source software, the reduced cost of implementation and training would make the decision to
implement the system more palatable for security risk managers concerned with cost versus benefit
of new defensive technologies. For technology and information managers, the modular, scalable,
and distributed design of the system provides the flexibility to grow and modify the system to best
suit the existing and future architecure needs of the individual organization. Finally, as the goal of
the research is focused on better defending networks, managers in enterprise information assurance
can implement the correlation system within their organization to support more rapid detection of
adversarial activity, therefore better protecting their organizations sensitive data.
1.5 Organization of Chapters
This praxis is organized into chapters. Chapter 2 discusses the graph, intrusion detection, and
security related literature reviewed to support the development of the graph model and software
architecture. Chapter 3 discusses the foundation of the graph data model and the data sources and
architecture needed to implement the event correlation system. Chapter 4 describes the process of
collecting, transforming, and storing of events from the data sources into a graph database. Chapter
5 covers the execution and analysis of attack scenarios in a simulation environment for validation
4
of the graph model. Finally, Chapter 6 summarizes the results of the research and provides recom-
mended areas of future research.
5
Chapter 2 – Literature Review
Topics of investigation for this research included industry trends in data breaches, methods of
intrusion detection, graph databases, and their uses, and techniques for the collection and aggrega-
tion of network and host-based data for computer network defense. Sources of research included
books, peer-reviewed journal publications, industry technical reports, and online documentation of
software to support developing the implemented architecture.
2.1 Data Breaches
Although the methods and objectives of attackers change over time, organizations in every
industry are victims of attacks (Brumfield 2017) against their networks. While the costs of these
breaches globally dropped 10% in the last year, the average cost per breach is still $3.62 million
(Ponemon Institute 2017). One factor in reducing the average cost per breach is the reduction in
time to detect the breach.
The average time to detection varies based on the organization conducting the analysis. While
Verizon, the Ponemon Institute, and the SANS Institute agree that defenders are getting better at
detecting breaches, the average detection time reported varies from 6 hours to months. (Brumfield
2017; Ponemon Institute 2017; SANS Analyst Program 2017)
Perpetrators of the attacks include insider threats, outsiders, state-affiliated actors, and orga-
nized crime organizations using malware, phishing, stolen credentials, social engineering, and
physical access to breach targeted networks (Brumfield 2017). The wide range of threat vectors
and techniques require that network defenders and analysts be able to identify the malicious activity
of varying capabilities and intents from within all of the event data, both good and bad, available
for analysis.
As a result of the growing threat to networks within all major industry verticals, “a substantial
industry has arisen focused on research and development of software products to monitor and scan
data networks, and detect events indicative of potential attacks” (Reed et al. 2014).
6
2.2 Intrusion Detection
“Intrusion detection is the process of identifying malicious activity targeted to computing and
network resources” (Pharate et al. 2015). and the intrusion detection systems may be host or network
based. The host-based intrusion detection system (HIDS) is a “single computer specific intrusion
detection system which monitors the security of that system or computer from internal and external
attacks” (Pharate et al. 2015) while the network-based intrusion detection system “monitors network
traffic and analyzes the passing traffic for attacks” (Pharate et al. 2015). As a complement to current
intrusion detection systems, new research in the area of early warning systems (EWS) seek to take
a proactive approach to alert correlation from multiple sensors (Ramaki and Atani 2016).
As networks have grown, the deployment of IDS solutions has also evolved to evaluate larger
amounts of data. Collaborative IDS solutions collect information and communicate with each other
in a centralized, decentralized, or distributed architecture (Vasilomanolakis et al. 2015). The large
number of options available for intrusion detection systems has resulted in creation of new research
areas including the evaluation of IDS effectiveness (Milenkoski et al. 2015).
The inclusion of network artifacts such NetFlow and full packet captures (PCAP) in addition to
host-based logs and intrusion detection alerts can further enhance the ability to detect and analyze
malicious activity. The use of network-based artifacts can add context to traditional host artifacts,
provide baseline network activity, and identification of anomalous traffic (Bromiley 2017). While
NetFlow and PCAP were not directly utilized in this research, the Bro network security monitor
provides output similar to NetFlow and the ability to collect PCAP.
Other research seeks to improve current intrusion detection systems through a learning model
based on system events and their dependencies (Friedberg et al. 2015; Garcia-Teodoro et al. 2009).
These approaches seeks to apply statistical methods and machine learning through event correlation
to support detection of anomalies in support of intrusion detection.
As the number of IDS solutions and other sensors deployed to networks grows, the volume of
alerts and event data created by the sensors also grows. Collecting and aggregating the data into a
central location for analysis creates the need for an architecture capable of handling the data volume.
7
2.3 Collect the Data
With sensors from multiple software vendors installed on hosts running various operating sys-
tems, a flexible platform for event and alert collection is required. To support the goal of collecting
data from multiple sensors located throughout the network, the collection agent should support
cross-platform operation and the ability to parse data in multiple formats.
Several open source software offerings meet the desired requirements for centralized logging
of alerts and events. The Beats family of shippers include agents to monitor and transfer log files,
metrics, network data, Windows Event Logs, and audit data to Elasticsearch for storage or Logstash
for parsing and forwarding to a storage backend (“Beats: Data Shippers for Elasticsearch” 2017).
While the Beats shippers are fully integrated into the Elastic software stack, forwarding events to
Logstash provides the ability to parse the event data for ultimate storage in various databases or
messaging frameworks.
Similarly, Fluentd is an open source project that aims to provide a unified logging layer that
is reliable, scalable, and extensible (“What is Fluentd?” 2017). Through a plugin architecture,
Fluentd provides the capability to ingest, parse, transform, and forward log and event data from
multiple data sources to multiple data outputs concurrently.
While Beats and Fluentd support the collection requirements for multiple data sources, they
do not completely satisfy the desire for extensibility in the consumption of data. Both products
support forwarding data to multiple outputs. However, changes in the data consumer also require
configuration changes in Beats or Fluentd. The inclusion of a centralized messaging broker in the
logging architecture provides the desired flexibility in consumption of event and alert data.
Two open-source message brokers that meet the goals of a centralized logging platform are
RabbitMQ (“RabbitMQ - Messaging that just works” 2017) and Kafka (“Apache Kafka - Intro”
2017). Both products support distributed deployment in clusters for high availability and through-
put, essential for reliable processing of high volumes of data in modern enterprise networks. Both
products also offer cross-language development of message producers and consumers, allowing for
flexibility in the development of the logging architecture. The publisher/subscriber architecture pro-
vided by a message broker allows the addition and modification of data sources and data consumers
without perturbing the rest of the architecture.
8
2.4 Graphs
While graph theory itself is over 300 years old, the concept of graph databases has only emerged
in the last two decades. Graph databases leverage “complex and dynamic relationships in highly
connected data to generate insight” (Robinson, Webber, and Eifrem 2015) to understand the rela-
tionships between the data elements. A common modeling paradigm for data and relationships for
storage in a graph database is the labeled property graph. A labeled property graph:
• Contains nodes and relationships;
• Nodes contain properties (key-value pairs);
• Nodes can be labeled with one or more labels;
• Relationships are named and directed, and always have a start and end node;
• Relationships can also contain properties; (Robinson, Webber, and Eifrem 2015)
One of the first, and most popular, databases to utilize the label property graph is Neo4j. As a
full-featured graph database, Neo4j provides scalability, graph transactions, graph analytics, visu-
alization support, and application program interfaces (APIs) (“The Neo4j Graph Platform” 2017).
As an alternative to a pure graph database, multi-model databases such as ArangoDB, Couch-
Base, and OrientDB utilize a combination of key-value, document, graph, geospatial, and relational
models to store data within a single database. A native multi-model database can use multiple data
models within a single core and provides a single query language to access the data (ArangoDB
2016). As an example, ArangoDB provides document, key-value, and graph model data stores
within a single database. The document store can contain complex nested documents while the
graph store connects individual documents based on their relationships.
Although the use of graphs and graph theory appears in recent intrusion detection and security-
related research, there is little research against the direct application of graph or multi-model
databases to the correlation of security and metric related events from multiple data sources within
a network. There are examples of graph-based alert correlation (Fredj 2015), graph-based created
of IDS rules from HTTP logs (Djanali et al. 2015), graphs for monitoring user authentication events
(Kent, Liebrock, and Neil 2015), and the use of clustering and nearest neighbor for evaluation of
IDS alerts (Shapoorifard and Shamsinejad 2017). The previous research demonstrates the capabil-9
ity of approaching the security problem as a graph, but focuses on specific indicators vice modeling
all of the available data as a graph for storage and query within a database.
2.5 Defend the Network
Development of a graph model for the network alert and event data requires an understanding
of which data elements are important for network defense and how the individual events relate
to each other. Additionally, understanding how and where this research fits into the workflow of
network defenders (Reed et al. 2014) provides insight into the construction of the graph model and
the queries for extraction of relevant information from the database.
Although focused on the implementation of specific defensive measures, the Critical Security
Controls (Critical Security Controls for Effective Cyber Defense 2015) and Special Publication
800-53 (Security and Privacy Controls for Information Systems and Organizations 2017) provide
valuable insight into data sources and their relationships. The framework provided by the National
Institute of Standards and Technology (NIST) lays out the functional categories of network defense
as Identify, Protect, Detect, Respond, and Recover (NIST 2014). Using the NIST framework as a
guide, this research focused on improving the defender workflow in the detection phase to reduce
time to detection while providing the information necessary to respond and recover from the attack.
10
Chapter 3 – Problem Solution
The research in this Praxis focused on developing the software, architecture, and graph-based
data model to support reducing the time for adversary activity detection through improving the
workflow of network defenders. Sections 3.1, 3.2, and 3.3 discuss the goals and implementation of a
software architecture capable of being implemented in current enterprise networks of all industries.
Section 3.4 discusses the data sources that contribute to the graph data model and identifies possible
future data sources for inclusion into the graph model. Sections 3.5 and 3.6 cover the graph database
and the insertion of events from the data sources into the database. The development of the graph
model based on the available data sources is discussed in Section 3.7.
3.1 Modular, Scalable, Distributed
To support networks of various sizes and architectures while providing for future growth, the
event correlation system developed as a result of the research in this praxis is modular, scalable,
and distributed. Designing the architecture in this manner provides for a system capable of be-
ing integrated into existing networks and adapting to an organization’s specific needs and network
architecture. Modularity offers the ability to add or modify input sources, change the data trans-
formation process to meet business needs, and add additional queries and alerting methods. The
scalability of the system allows for on-demand increases in the processing of the modular event
streams to meet the current level of observed network activity. The distributed cluster architecture
provides for redundancy by splitting core functionality across numerous physical servers while also
delivering performance gains through the distribution of workload across multiple instances.
3.2 Container Based
The use of container images supports the above goals of modularity, scalability, and a dis-
tributed architecture. A container image is a lightweight bundle of the software and libraries re-
quired to run executable code (“What is a Container” 2017). When combined with an orchestrator
to manage the deployment, starting, and monitoring of the images, a container based platform in-
troduces techniques and capabilities typically unavailable in traditional physical or virtual machine
11
based environments.
Running a single process within each container supports the goal of achieving modularity within
the system. In the development of the system, each container focuses on a single task. For example,
one container to transfer network intrusion detection system (NIDS) alerts across the network and
the second container to transform and load the alerts into a database. When adequately architected
this separation provides the ability to modify or add the functionality of individual components
independent of the rest of the system.
With each function running in its own container, scalability is achieved by starting more in-
stances of the service required to handle the additional workload. If the number of alerts being
process by the transform function exceeds the capacity of the running container, the orchestrator
can be used to start other, identical containers to split the workload. The startup speed of containers
over virtual machines also highlights another advantage. As containers only run a single process
and utilize the resources of the host machine they do not require the traditional boot process of a
physical or virtual machine resulting in start times of single seconds.
Through the use of an orchestrator such as Docker Swarm (“Swarm mode key concepts” 2017)
distribution of the system occurs across many physical or virtual hosts that are viewed as a single
platform to run the containers. By distributing across several hosts, the system becomes resilient
to failures in a single host as there are multiple instances of core functionality deployed across the
platform. The distributed system also provides for increased performance through the running of
various instances in a clustered environment that allows sharing of the workload of core function-
ality such as message distribution. For example, running Apache Kafka (“Apache Kafka” 2017) in
a clustered architecture across multiple hosts ensures that messages are replicated across all of the
Kafka instances while allowing clients to connect to any of the instances to produce or consume
messages.
3.3 Centralized Logging
A core component of any event correlation system is the ability to collect and aggregate the
logs and alerts centrally from the sensors, hosts, and applications within a network. The volume
of data produced from various sources within a network can provide invaluable insight into the
12
activity within the network. The challenge becomes the management and distribution of the logs in
a manner that supports analysis of the events as a connected system.
Through the use of a logging pipeline, events produced by multiple sources in various formats
are collected, standardized, and transferred into a single repository for consumption. To support
the logging pipeline in this research, Apache Kafka (“Apache Kafka - Uses” 2017) provides for
producers and consumers of event information. Apache Kafka contains the libraries and interfaces
needed to develop an integrated production or consumption capability into the components of the
logging pipeline. For example, by adding a plugin to the Bro network security framework, the logs
produced by Bro can be automatically sent to the Kafka cluster in a format for consumption by other
services. Where this direct integration with Kafka is not possible the addition of Fluentd (“What is
Fluentd?” 2017) as a log transport mechanism provides the ability to add many different sources
into the centralized logging framework. With no direct Kafka plugin for the intrusion detection
system Suricata, Fluentd is used to collect, transform, and transport the alerts into Kafka.
The production of messages from all sources within the network results in Kafka topics grouped
by the nature of the information source. Collection of logs from all instances of Bro under a single
topic allows for consumption of all Bro events by an individual consumer. This architecture supports
the overall goals of modularity and scalability by enabling the addition of similar sensors into the
network while still collecting all of the events into the same location for consumption. Adding a
new type of data source only requires the addition of a Kafka or Fluentd supported plugin to add the
events to a new topic ready for consumption. Additionally, running Kafka in a distributed cluster
supports the resilience and performance requirements necessary in large environments.
With event data from all sources centrally located within the Kafka, consumers can retrieve
messages from the available topics for processing. Kafka tracks which consumers are pulling from
each topic and the retrieval of individual messages. This tracking allows for multiple consumers
to read from the same topic providing for increased throughput when processing through the same
stream or for numerous different output streams to pull from the same topic. The flexibility offered
in this architecture supports the goals of scalability and modularity by allowing numerous identical
consumers to pull from the same topic to keep up with message rates while also enabling new
consumers to pull from existing topics to support additional functionality.
13
3.4 Data Sources
The framework provided by a centralized logging system enables the collection of the vast ma-
jority of data sources typically found within modern networks. Additionally, the modularity of the
previously discussed architecture allows the addition of new security or performance related data
sources as they are added to the network. As this research is concerned with the presence of mali-
cious activity within a network, the data sources will focus on information relevant to identifying
potential adversarial activity.
3.4.1 Bro Network Security Monitor
For an analyst to gain insight and understanding into the activity occurring within a network,
they must be able to determine information about which hosts are communicating over which pro-
tocols and the type and volume of data being communicated (Shostack 2014). The Bro Network
Security Monitor is a potent tool that provides the capability to monitor network activity.
At the core, Bro is a framework of network protocol decoders and event handlers enabled by
a scripting language, also named Bro. The network protocol decoders understand the structure of
many of the protocols used within modern networks including HTTP, DNS, SMTP, FTP, SMB,
and others. By following the structure of the protocols, Bro can identify specific protocol traffic
even when it is not using standard ports such as TCP port 80 for HTTP traffic. The knowledge of
network protocols also enables Bro to extract all relevant information about the session including
files transmitted across the connection.
To support the correlation of connection, protocol, and file information Bro uses a unique iden-
tifier (UID) associated to each recorded connection. The UID appears in each of the logs files
produced by Bro to enable the analyst to determine information about the connection itself such as
duration, IP addresses, ports, and packets transferred. For example, using the UID from the con-
nection, the analyst can then query the HTTP log for a web connection and determine information
about the underlying HTTP session within the connection including the User-Agent header, the re-
quest URI, and the HTTP Request verb. Finally, the connection UID can be used to query the files
log to determine which files the HTTP session contained.
By default Bro stores the logs for connections, protocols, and files within individual files which
14
simplifies ingesting the information into a logging pipeline. However, as previously discussed this
research utilized a plugin for the Bro framework (“Metron - Logging Bro Output to Kafka” 2017)
that transmits the log entries to the Kafka cluster as they occur. The plugin will stream the log entries
to each node in the Kafka cluster in a load balanced manner as Kafka will separate the storage of
the topic messages into partitions to maximize available throughput and processing.
3.4.2 Suricata
While Bro can provide detailed information about the connections occurring within the net-
work and detection of activity across multiple network flows, it may not be the ideal means to
detect malicious activity based on signatures. When paired with a rules-based IDS such as Suricata,
the combination of flow and signature-based detection becomes a powerful mechanism to detect
adversary activity.
Where Bro succeeds in having a detailed understanding at the protocol level, Suricata and sim-
ilar tools such as Snort look for signatures at the byte level as network traffic passes through the
sensor. By examining traffic at the byte level, detection of specific anomalies within the traffic is
accomplished by writing rules to match the pattern of the anomaly at the byte level. Rules can
support the detection of known malware variants, policy violations, and post-exploitation activities
such as adding a user account over the network (Shostack 2014).
For both Suricata and Snort rules can be obtained from a third-party provider such as Emerging
Threats (“Emerging Threats” 2017) or Talos (“Talos - Author of the Official Snort Rule Sets” 2017).
Both providers offer free rule sets in addition to subscription-based updates for the most recently
identified network threats. The third-party rule sets provide a significant foundation to detect mali-
cious activity known to occur in the wild. In addition to the provided rules, an analyst can quickly
write or modify existing rules to meet the specific needs of the monitored network. For example,
rules can be created to monitor for sensitive information such as personally identifiable information
(PII) or corporate secrets exiting the network.
The fundamental components of any rule are the protocol of concern, the source and destination
IP address and port, the action taken, and the signature itself. The use of variables in the rule for
internal and external networks allows the analyst to create rules that are based on the direction of the
15
monitored network traffic, providing for finer control and tuning of alerts. For example, based on the
nature of operations within the network, outbound SSH connections are part of normal day-to-day
operations but the analyst wants to be alerted for any inbound SSH connections.
With the third-party and custom rules in place, Suricata will log alerts for any network traffic
that matches a signature rule. Traditionally the alerts are evaluated in an alert management console
such as Squil (“Squil - Open Source Network Security Monitoring” 2014) or forwarded to a SIEM
such as Splunk (“SIEM, AIOps, Application Management, Log Management, Machine Learning,
and Compliance” 2017). As previously discussed, while these methods improve the ability for an
analyst to evaluate activity within the network, much of the correlation and subsequent investigation
is a manual process. By adding the components of a Suricata alert to the graph model, they are
already correlated to other events such as the components of the Bro logs, simplifying the work of
the analyst to develop a full picture of the activity under investigation and providing a means for
automated queries against the graph database.
Although there is currently no plugin for Suricata to automatically send alerts into the Kafka
cluster, the addition of Fluentd (“What is Fluentd?” 2017) and Fluentbit (“Fluent Bit” 2017) into
the centralized logging framework provides the needed functionality. The role of Fluentd in the
centralized logging framework is to receive inputs from various sources, transform the messages
from the sources, and forward the messages to the Kafka cluster for consumption by other services.
Fluentbit simply monitors the alert log generated by Suricata and transmits the alerts as they occur
to the Fluentd instance. This architecture allows deployment of multiple instances of Suricata in the
network and ensures alerts from all instances are forwarded to the Kafka cluster.
3.4.3 osquery
The combination of Bro and Suricata provides for detailed insight into the network-based ac-
tivity but does not address the need to understand what activity is occurring on the hosts and servers
within the network. A host-based monitoring agent is required to capture and understand the events
that occur at the host level. While there are many host-based solutions available, the open source of-
fering osquery (“osquery Docs” 2017) from Facebook provides the detail, flexibility, and scalability
desired to add host-based information to the graph model.
16
As the software that runs on the target host machine, osquery provides the ability to obtain
extremely detailed and valuable information about the host via automated or manual queries. At its
core, osquery essentially turns the host into a series of queryable relational database tables. SQL
queries run against the host provide information such as running processes, open network ports, file
integrity changes, installed software, and connected USB devices.
Facebook itself (“osquery Docs” 2017) and other organizations in the security community
(“Trail of Bits” 2017) provide users of osquery with queries or packs of queries to run on every
major operating system. While some of the queries are limited to running on specific operating
systems, much of the core functional queries run on Mac, Windows, and Linux operating systems.
Another benefit of osquery related to the research is the inclusion of a Kafka Producer as part
of the default installation. (“osquery Docs” 2017) A few modifications and additions to the osquery
configuration will send all of the query results to the appropriate Kafka topic in the cluster. The flex-
ible configuration minimizes the number of programs that must run on each of the monitored hosts
while also allowing for rapid addition of new hosts into the logging system. A standard configura-
tion can be deployed to every host in the monitored environment with configuration management
tools such as Ansible, Puppet, or Chef.
3.4.4 Additional Sources
The use of Bro, Suricata, and osquery in the environment provides a significant amount of
detail for network intrusion detection, network security monitoring, and host-based analysis. While
many enterprise networks include other tools such as vulnerability scanners, host-based intrusion
detection, and network scanners, the scope of this research is limited to the selected tools to maintain
the complexity of system development within in the time requirements of the research phase.
However, the previously discussed goal of modularity in the system provides for the relatively
simple integration of the additional data sources into the system and the data model. For example,
the vulnerability reports for hosts provided by the Nessus Vulnerability Scanner (“Nessus Profes-
sional Vulnerability Scanner” 2017) could be forwarded to the Kafka cluster and processed for
insertion into the graph database.
Addition of new data sources would also provide for further queries of the database to gain
17
additional insight into the activity within in the network. The structure of the data model and the
functionality of the query language provides seamless integration of the new data sources into the
workflow of the analyst. Adding existing and future data sources into the system offers several
opportunities for future research and development.
3.5 Events to Graph Model
With the events from each data source now stored in individual topics in the Kafka cluster, the
events are transformed and loaded into the graph database. The use of unique containers to provide
the extraction, transformation, and loading of event data by topic supports the goals of modularity
and scalability. This architecture provides the ability to modify the transform and load process for
a given topic or to add processing of new topics from additional data sources. Additionally, the
structure of the transform and load process combined with scalable nature of Kafka Consumers
allows for multiple instances of the processing container to pull events from the same topic to
increase the processing throughput for the given topic.
Each of the processing containers provides three core functions, a Kafka Consumer to retrieve
events, the transformation of the event data, and loading of the transformed data into ArangoDB.
A Python script running within the container provides all of the required functionality. Python
modules for the Kafka Consumer (Oh 2016) and the ArangoDB Client (Powers and Arthur 2016)
are used to simplify the extraction and loading functionality of the process. The modules provide
a standard interface to Kafka and ArangoDB with support for error handling and load balancing
within each cluster.
The Python code for the transformation of the event data depends on the content of the individ-
ual topics. Each of the data sources produces events in their formats with potentially different vari-
able names for the same data point. For example, Bro refers to the source IP address as ‘id.orig h’
while Suricata uses ‘src ip’ as the variable name for the originating IP address of the connection.
The code also ensures that the event data is transformed to conform to the graph data model by
creating the appropriate vertices and edges for loading into ArangoDB.
18
3.6 The Graph Database
As the central point of the correlation system, the database receives the transformed event
data and provides the query language and interface to analyze the correlated events. While there
are several graph database technologies available (“DB-Engines Ranking - popularity ranking of
graph DBMS” 2017), this research utilized ArangoDB (“ArangoDB - highly available multi-model
NoSQL database” 2017) to provide the storage and querying of the event data.
ArangoDB is a multi-model database that provides document store, key/value store, and graph
store in one database. The multi-model database allows for storage of rich data sets and the rela-
tionships between the individual data items. The document store maintains individual documents
within collections and supports storing deeply nested data items. The ability to store nested data
items within a single document allows for the combining of data from separate sources into a single
representation of the related data. The graph store maintains entries in vertices and edges based on
the data model. By storing the data in a graph model, the queries can traverse the graph edges based
on the relationships between the vertices.
In meeting the scalable and distributed goals of the system, ArangoDB supports deployment
in a clustered architecture. The clustered architecture provides for distribution of the data across
multiple nodes resulting in improved performance of reads and writes through sharding while im-
proving data resiliency through replication. A three node cluster similar to that used for Kafka is
employed in the system by running multiple instances of ArangoDB within separate containers.
To query the database ArangoDB provides the ArangoDB Query Language (AQL). AQL is sim-
ilar in structure to the Structured Query Language (SQL) used in relational databases (“ArangoDB
- highly available multi-model NoSQL database” 2017). AQL operates against the multi-model
database allowing for the ability to query from a document store, a graph store, and a key/value
store within the same query. The flexibility of AQL allows for queries that traverse the graph store
based on relationships between vertices and returning results from deeply nested vertices stored as
documents within the document store.
Another benefit of choosing ArangoDB as the database for the correlation system is the Foxx
Microservice Framework (“Foxx at a glance” 2017) provided by ArangoDB. Foxx provides “a
JavaScript framework for writing data-centric HTTP microservices that run directly inside of
19
ArangoDB.” (“ArangoDB - highly available multi-model NoSQL database” 2017) The benefits
Foxx provides include the standardizing data access and storage, reduced network overhead due to
running logic within the database itself, and the ability to restrict access to sensitive data within the
database.
By creating Foxx services within the database, complex queries can be run against the in-
memory data from within the database itself while only requiring the client to conduct a single
HTTP request to the database. Additionally, Foxx services can be developed to run periodically and
provide results to a client. This structure is well suited for the goal of the research to identify patterns
of potentially malicious activity, query the database for matching events, and provide the results
of the query to the analyst automatically. With the alert queries in place, additional investigative
queries can be created as Foxx services to be accessed on demand by the analyst without requiring
the analyst to understand the AQL syntax.
3.7 The Model
To leverage the power of ArangoDB and AQL, the individual data sources must be transformed
into a graph model that accounts for the relationships between the data elements. As previously
discussed, the data sources forward events to Kafka for consumption by Python services that con-
duct the ETL of the events into ArangoDB. The transformation logic within the Python service is
specific to each data source’s Kafka topic. This structure allows for modification and additions to
the logic for each topic as well as additions of entirely new topic for inclusion into the data model.
Creation of a graph model for the components of an individual data source is a straightforward
process as the relationships between them are relatively intuitive. For example, an alert from Suri-
cata contains the alert content, the source IP, and the destination IP which translates to a three-node
graph with two edges to connect the components. Full insight into the network activity comes from
creating the edges that relate components of individual data sources to each other. The completed
graph model, displayed in Figure 3.1 provides the ability to create queries that traverse the entire
graph by analyzing the relationships of nodes from the different data sources as a single data set.
20
Figure 3.1: Complete Graph Model
3.7.1 Bro
A default installation of Bro, as implemented in the research, provides logs for connection,
DNS, HTTP, FTP, SSH, files, and other common network protocols and services. To support the
internal correlation of the individual logs, Bro uses a unique identifier (UID) for the individual con-
nections. The connection log serves as the central reference point for the remaining logs created
21
by Bro. For example, the HTTP log contains a field for the UID that correlates to an entry in the
connection log that contains specific information about the connection such as source and destina-
tion IP address while the HTTP log contains relevant information about the HTTP session such as
URL visited and the browser user agent. For the files transferred as part of a connection, there is an
additional UID for each file observed as part of a service. If multiple files were transferred as part
of an HTTP session each of the files is an entry in the files logs and the HTTP log entry contains a
field with a list of the file UID’s.
Figure 3.2: Bro Graph Model
Figure 3.2 displays the high-level graph model for the data provided in the Bro logs. The two
‘IP’ nodes account for the source and destination IPs of the connection. The ‘connection’ node con-
tains all of the information contained within the conn.log from Bro. A single ‘connection’ node may
connect to multiple ‘service’ nodes if multiple services are present within the connection. The ‘ser-
vice’ node is a generic placeholder for the information about the specific service contained within
the connection while the ‘file’ and ‘cert’ nodes contain information about the files or SSL certifi-
cates transferred within the service as part of the connection. Each file and certificate transferred
within the service will appear as a separate node within the database.
22
This structure provides the ability to traverse the graph and identify all of the connections,
services, and files associated with a particular IP address. Similarly, the graph could be queried to
locate all of the IPs associated with the single instance of a particular file to determine how many
hosts downloaded the suspected file. As a single data source in the graph database, the Bro logs
provide useful insight into the nature of network traffic. When combined with the additional data
sources the analyst will be able to gain a deeper understanding of the activity within the network.
3.7.2 Suricata
To provide network intrusion detection capabilities, a default installation of Suricata is used as
a sensor to monitor network traffic. Suricata provides alerts for any observed network traffic that
matches a defined set of rules. While custom rules can be created based on the needs of a particular
organization, this research uses publicly available rules to monitor for malicious network activity.
As shown in Figure 3.3, the graph model for Suricata alerts is relatively simple. Consisting of
three nodes, the model captures the IP addresses associated with an alert. The ‘alert’ node contains
all of the relevant information associated with the alert from Suricata.
Figure 3.3: Suricata Graph Model
Using the IP address as a common node between the Bro and Suricata models, the overall data
model begins to come together. With both data sources in the graph database, an analyst can now
query for the IP address associated with a particular Suricata alert and determine which files were
associated with the alert from the Bro logs. The combination of Bro and Suricata in the graph model
provide the analyst with insight into network activity and the hosts involved but do little to provide
an understanding of the activity occurring on the host itself.
23
3.7.3 osquery
To enrich the network-based portions of the graph model, information about each of the hosts
within the network is added through the use of osquery on the hosts. osquery is an open source
product released by Facebook that provides “an operating system instrumentation framework for
Windows, OS X (macOS), Linux, and FreeBSD.” (“osquery Docs” 2017) osquery provides the
ability run scheduled queries against the host for information such as operating system version,
installed software, connected USB devices. For more time-sensitive information that may be missed
between scheduled queries, such as process and networking events, osquery provides an eventing
framework to capture these events as they occur on the host.
Although osquery allows for querying of detailed information from the host, this research fo-
cused on the components shown in Figure 3.4. The elements of the data model provided by osquery
focus on locating potential malicious activity on the host while connecting the host data to the
network-based data through the IP address associated with the host.
Figure 3.4: osquery Graph Model
24
With osquery installed on hosts within the network, the configuration runs the scheduled queries
and tracks event-based data. The results of the queries are pushed to a single topic within the Kafka
cluster for transformation and loading into ArangoDB.
3.7.4 Complete Model
With the network and host-based components of the data model completed, the overall data
model for the research represents a view of the individual data components connected by their
relationships to each other. Figure 3.1 provides an overview of the full graph data model used for
the research to identify malicious network activity through the use of crated AQL queries of the
database.
As all of the data sources provide the IP addresses of the hosts involved, the IP address is
the vertex of the graph that connects each of the individual data models. As an example query,
from this model, the graph could be traversed from a NIDS alert provided by Suricata to determine
which users were logged into the destination host of the alert as provided by osquery and the files
transferred during the connection as provided by Bro.
25
Chapter 4 – Research
4.1 Overview
Although attacker skill sets can range from a low-level “script-kiddie” to well-funded nation
states (Sanders and Smith 2014), the overall methodology employed during an attack follows a
typical pattern or lifecycle of activity. Additionally, the objective of the attacks can vary based on
the skill level and intent of the attack and the nature of the target network. A “script-kiddie” may use
freely available tools and publicly known vulnerabilities to deface a website solely for the sense of
accomplishment while a fully funded nation-state team of attackers with custom tools and privately
developed exploits may attack and persist within a corporate network for exfiltration of proprietary
information.
At a high level, regardless of the skill level or intent of the attacker, the majority of attacks will
create events or artifacts in the network or endpoints of the targeted environment. Timely collection
and analysis of attacker generated events can lead to locating malicious activity in hours vice the
weeks or months observed in recent high profile attacks (Brumfield 2017). While there has been
significant research and commercial application of event collection and aggregation, correlation of
the events into a cohesive model is typically left to the analyst and their understanding of the system.
This research demonstrates the correlation of events from network and endpoint based data
sources into a graph model to support early detection of malicious network activity. By collecting
and transforming events from individual sources within the target system, a graph-based represen-
tation of activity can be constructed and queried based on the data and the relationships between
the individual data sources. As discussed in Chapter 3, the graph model used in the research is con-
structed from network and endpoint data sources to capture and represent artifacts that are typically
used by network defenders and incident responders to detect malicious activity.
4.1.1 Attack Lifecycle
To build a context based graph model to detect malicious activity, an understanding of the typi-
cal lifecycle of attack must first be understood. Published models for the attack lifecycle vary from
simple four-step cycle (Shakarian, Shakarian, and Ruef, n.d., p. 134) to more detailed, and military
26
focused linear kill chain models (Yadav and Rao 2016). This research leveraged the model provided
in the FireEye M-Trends 2017 report to frame the context of an attack. The lifecycle provided by
FireEye, as shown in Figure 4.1 (FireEye 2017), represents the lifecycle as a combination of linear
events and a repeating cycle.
Figure 4.1: FireEye Cyber Attack Lifecycle Reprinted from M-Trends 2017 - A View From theFrontlines, by FireEye Copyright 2017 by FireEye. Reprinted with permission.
As discussed in Chapter 1, each stage of the lifecycle may produce artifacts related to the ac-
tivity of an attacker. Representing the attack lifecycle in this manner efficiently captures the overall
structure of most objective based attacks. At a high level, the first linear portion of the attack
represents the reconnaissance, initial exploitation, and persistence phases of an attack. During a
the reconnaissance phase the attacker may directly scan the network for open network ports and
available services or conduct out-of-band open source research without interacting with the target
network. Initial exploitation provides the attacker with access to the first compromised host within
the network. Exploitation methods may include but are not limited to, password guessing, social
engineering, or remote exploitation of a software vulnerability. Depending on the nature of the ini-
tial exploitation, the attacker’s access may be lost due to instability of the exploit or the need to have
a user logged in to the host to maintain access. To maintain permanent access to the compromised
host the attacker carries out steps, such as installing a remote access trojan, in the persistence phase
to ensure repeatable access to the host. The cyclic part of the lifecycle captures the fact that once
inside a network, an adversary will repeatedly conduct additional reconnaissance and exploitation
within the target network to reach the ultimate objective. As the first step in the cycle, privilege27
escalation to an administrative account may be required if the initial exploitation resulted in the
attacker gaining access to the targeted host as an unprivileged user. The remainder of the cycle de-
picts the steps carried out by the attacker to gain access to additional hosts within the target network
and follows the same structure as the initial exploitation. The cycle of lateral movement continues
until the attacker reaches the objective. The cycle ends when the attacker achieves the desired ob-
jective. Mission accomplishment for the attacker can vary from extraction of sensitive information
to destruction of critical data assets and all manner of malicious activity in between.
Each phase of the attack lifecycle presents an opportunity to potentially collect events or arti-
facts that result from the attacker’s activity during that phase. Using the above model as a frame-
work, this research focused on the relevant events to capture, how the individual events are related to
each other, and how to query the resulting graph database for detection of malicious activity. While
some phases provide more opportunity to capture events related to attacker activity, each phase has
the potential to create artifacts that help put the entire picture of an attack together.
4.1.2 Indicators of Attack
Using the attack lifecycle as a guide and the event data available from the network and endpoint
sensors, specific events relevant to each stage of the lifecycle can be extracted from the event data.
Some actions such as network scanning during reconnaissance may require aggregation of many
entries while others, such as an IDS alert, provide context with a single entry.
During the initial recon phase of an attack, much of the activity may be accomplished out-
side the detection capability of the sensors. Passive events such as organization research, open
source reconnaissance conduct against outside entities, and other information gathering steps are
not observable. However, active events against the target system such as network port scanning and
vulnerability scanning will be captured by the sensors. For example, Bro collects information on
every attempted connection to ports being monitored, and repeated attempts from a single source
over a period can be aggregated to identify scanning activity. Additionally, the user-agent string uti-
lized by many open source tools can be determined by Bro and Suricata as an indicator of scanning
activity.
Event detection during the initial compromise phase can be the most difficult to determine. Net-
28
work intrusion detection systems like Snort and Suricata rely on rules based on previous detection
and analysis of exploitation events. As such, previously unseen or obfuscated exploits may pass
through the sensor undetected. Similarly, for host-based detection mechanisms, a signature is typ-
ically required to identify the exploit code as it is executed on the system. Based on the dynamic
nature of rule-based signature detection, organizations such as Talos (“Talos - Author of the Of-
ficial Snort Rule Sets” 2017) and Emerging Threats (“Emerging Threats” 2017) provide free and
subscription-based updates to rules for Snort and Suricata.
Depending on the techniques in ‘establish foothold’ and ‘maintain persistence’ phases vents
may be captured by the network and endpoint sensors. Activities such as creating a new user
or installing a new service on the host would be reported by osquery during regularly scheduled
queries. The creation of a service that also opens a listening port for external connections would
be reported by osquery while connections to the newly opened port would be reported by Bro.
Similarly, if the installed service/malware beacons out to an external host, the running process
would be reported by osquery while the outbound connection would report by Bro and possibly
Suricata if the outbound connection matches an existing rule.
As privilege escalation typically occurs within the confines of the targeted host, detection of
this activity will be detected and reported via osquery. Additions or modifications to user accounts,
changes to the running kernel, and execution of new processes will be captured through appropriate
configuration of osquery running on the host.
During the ‘internal recon’ and ‘lateral movement’ phases, the attacker is using an already
compromised host within the network to conduct network and vulnerability scanning of the internal
network and launch new exploits against additional hosts. This cycle continues until the attacker
gains access to the hosts or accounts necessary to carry out the objective of the attack. As with
many of the other phases, this activity would be present in the endpoint and network-based event
data from the sensors. However, the network sensor must be positioned to capture all internal traffic
in addition to the external network traffic through the use of a monitoring port on the internal switch.
Due to the widely varying nature of attacker objectives, the final phase of accomplishing the
mission can be difficult to programmatically identify with the event data used in this research. At-
tack objectives can range from simple website defacement, denial of service attacks, ransomware,
exfiltration of sensitive or proprietary data, and all manner of malicious intent. While the combina-29
tion of Bro, Suricata, and osquery could identify many of the indicators of the attack objectives, the
customization, and configuration of the sensors for this phase is beyond the scope of this research.
4.2 Graph-Based Event Correlation
Using current industry tools such as Splunk, the Elastic stack, and other commercially available
security information and event management (SIEM) tools, events from each of the attack lifecy-
cle phases are already being collected and stored by most organizations seeking to defend a net-
work. The nature of these tools, however, typically leaves correlation of the individual events from
the phases to the capabilities and knowledge of the network defenders. The current tooling for
event collection excels at ingesting information from multiple sources and storing the events into a
database for analysis through crafted queries and alerts to identify malicious behavior.
While current methods have proven successful at ultimately identifying attacker activities within
a network, the average time to detect the activity can still be on the order of months, leaving the
attacker with ample time to carry out their objectives. Although some of the extended time can be
attributed to the increased capabilities of attackers to hide activity from the defenders, manual cor-
relation of suspected events to identify malicious activity is time-consuming and requires detailed
understanding of the events and their relationships.
To simplify the process of identifying potentially malicious activity and reduce the overall de-
tection time, this research uses a graph database for storage and analysis of the event data. Using
the model depicted in Figure 3.1, events are collected from Bro, Suricata, and osquery for trans-
formation and loading into the graph database ArangoDB. With the events and their relationships
populated in the database in real-time, queries written in the Arango Query Language (AQL) is used
to identify malicious activity related to various stages of the attack lifecycle.
4.3 Software Architecture
All of the components of the architecture of the research run in Docker containers with Docker
Compose providing the orchestration for managing the starting of services, volumes, and inter-
service network connectivity. Figure 4.2 provides an overview of the software architecture for the
collection, transformation, and storage of event data into the graph database.
30
Figure 4.2: Software Architecture Overview
The containers for Bro and Suricata run on a Raspberry Pi to monitor all of the traffic within the
network and provide network security monitoring and network intrusion capabilities. For endpoint
activity monitoring, osquery is installed and configured on each of the monitored hosts within the
network. The remainder of Figure 4.2 represents the collection, transformation, and storage func-
tions. The Kafka/Zookeeper cluster provides the central messaging functionality of receiving events
from the data sources and provide the events for consumption by the ETL services. The instance
of Fluentd simply provides for transport of alerts from Suricata into the Kafka cluster as Suricata
does not have a native Kafka client to transmit the alerts. The ETL services retrieve messages from
the Kafka cluster, transform the message contents into the appropriate vertices and edges based on
the graph model, and inserts the data into ArangoDB. Finally, ArangoDB provides the storage and
query functions of the observed and transformed events.
4.3.1 Collecting Events
With the focus of the research on reducing the time to detect malicious network activity, the
architecture must support real-time collection and storage of events from the data sources. Central-
ized collection requires that each data source provide events to a single storage platform through
the interface provided by the storage platform. Complicating this requirement is the fact that each
31
data source stores event data in a different format and provides various interfaces for extraction of
event data to outside entities.
To address the collection requirement, a cluster of Kafka instances supported by Zookeeper as
shown in the center of Figure 4.2 serves as the central messaging fabric of the architecture. The use
of Kafka Producers for each of the data sources provides a common interface to the Kafka cluster
for transport and serialization of the event data into the Kafka cluster for storage and consumption
by the ETL services.
4.3.1.1 Kafka and Zookeeper
From the Kafka documentation, as a distributed streaming platform Kafka provides the ability
to
• publish and subscribe to streams of records similar to a message queue;
• store streams of records in a fault-tolerant way;
• process streams of records as they occur (“Apache Kafka - Intro” 2017)
The capabilities of Kafka combined with the provided Producer and Consumer APIs provide
the necessary framework to collect, store, and consume the event data from each of the data sources.
Additionally, this architecture ensures that additional data sources can be added to the architecture
by merely including the Kafka Producer for the new data source.
For the research architecture, Kafka is deployed as a three node cluster with each instance
running in a separate Docker container. A clustered deployment provides for scalability and fault
tolerance. Scalability is achieved through the partitioning of each topic across the nodes. Each
data source sends events to a single topic and Kafka handles the partitioning of the topic across
the nodes. By partitioning the topic, multiple Producers and Consumers can handle messages for a
single topic in parallel and Kafka ensures messages are received and transmitted in sequence. Fault
tolerance is achieved through replication of the partitions across all nodes within the cluster. If a
single Kafka node fails, the Producers and Consumers using that node will shift to one of the two
operational nodes without losing messages.
The Zookeeper cluster (“Apache Zookeeper” 2017) in the architecture manages the configura-
tion of the Kafka cluster and topics. Zookeeper will track members of the cluster and their status,32
the configuration of topics within the Kafka cluster, election of the controller in the Kafka cluster,
etc. Similar to the Kafka cluster, Zookeeper is deployed as a three node cluster with each instance
in a separate Docker container to provide fault tolerance should one of the instances fail.
4.3.1.2 Fluentd
The instance of Fluentd, as shown between Suricata and Kafka in Figure 4.2, in the architecture
provides merely a transport mechanism for data sources that do not support the Kafka Producer
API either natively or through the use of a plugin. Fluentd provides “ collecting, filtering, buffering,
and outputting logs across multiple sources and destinations” (“What is Fluentd?” 2017). In the
research architecture, Fluentd monitors the alerts log file produced by Suricata and sends the alerts
to the Kafka cluster with the Kafka output plugin.
The inclusion of Fluentd in the architecture ensures new data sources that do not natively sup-
port the Kafka Producer API can be added. Fluentd supports input plugins from many from differ-
ent data sources and formats, requiring only simple configuration changes to Fluentd and the Kafka
output plugin to forward events into a topic within the Kafka cluster.
4.3.1.3 Bro
As a network analysis framework, Bro provides the ability to analyze network connections,
the protocols used as part of the connection, and the data contained in the protocol. (“The Bro
Project” 2014) While monitoring network traffic via a network tap or switch monitoring port, Bro
will process every network connection and the underlying protocol data. The output logs from Bro
include a log of the connections and individual logs for the protocols Bro observed in the connection.
Although Figure 4.2 displays a single instance of Bro, the architecture supports multiple instances
of Bro monitoring different segments of the network.
The connection log contains information about all IP, TCP, UDP, and ICMP connections ob-
served in the network. Table 4.1 provides a summary of field names and description of the data
available in the connection log produced by Bro. The data contained in the log entry is used to
create the vertices and edges of the graph model as shown in Figure 3.1 and populate the properties
edge and vertex properties.
33
Table 4.1: Bro conn.log data fields
Field Type Description
ts time Timestamp of first packetuid string Unique ID of the connection
id.orig h address Originating endpoint’s IP addressid.orig p port Originating endpoint’s TCP/UDP portid.resp h address Responding endpoint’s IP addressid.resp p port Responding endpoint’s TCP/UDP port
proto protocol Transport layer protocol of connectionservice string Detected application protocol
duration interval Connection length
Bro also provides logs for each of the application protocols observed as part of a given connec-
tion. As a single example, a summary of the content for the HTTP log is shown in Table 4.2. The
content of the HTTP log is representative of the level of information available in the other service
logs including SSH, DNS, FTP, etc. In the graph model from Figure 3.1, the content of the service
vertex is populated by the information contained in the individual log entry.
Table 4.2: Bro http.log data fields
Field Type Description
ts time Timestamp of first HTTP requestmethod string HTTP Request verb
host string Value of the Host headeruri string URI used in the request
user agent string Value of the User-Agent headerorig fuids vector Ordered vector of file unique IDs from orig
The ‘uid’ field in the HTTP log corresponds the unique log entry in the connection log which
allows for the creation of the edge between the connection and service in the graph model. Similarly,
the ‘orig fuid’ field corresponds to the unique identifiers of any files transferred from the originator
to the responder and provides for creating the relationship between the HTTP session and any
associated files transferred during the session.
A default installation, as used in the research with minor configuration changes, produces logs
for the connections, associated services, and files transferred. By default, the logs are stored in a
tab-separated file that is easily parsed by tools included as part of the Bro installation. To support the34
sending of log entries to Kafka, a configuration change for Bro stores the log entries in Javascript
Object Notation (JSON) format.
With the logs stored in JSON format, an open-source plugin for Bro (“Metron - Logging Bro
Output to Kafka” 2017) provides the functionality required to send the entries created by Bro for
each of the logs to a Kafka cluster. The configuration of the plugin requires the address and port of
the Kafka brokers, the Kafka topic name, and the names of which Bro logs to send to Kafka.
After starting Bro to monitor the network traffic, the plugin will attempt to connect to the Kafka
cluster to begin forwarding log entries. Although there are three brokers in the cluster, the Bro
plugin only requires initial access to one of the brokers. After the initial connection with the broker,
the plugin will receive the address of the remaining brokers and available partition information for
the topic. With access to all three brokers in the cluster, the plugin can send log entries to each of the
brokers in parallel providing for increased throughput. If the initial broker is unavailable, the plugin
will continue to attempt connection while maintaining a list of unsent log entries for transmission
when a connection is established.
As with all of the components of the architecture, the Bro instance is built and run as a Docker
container. The build process consists of installing dependencies, installing Bro and the Kafka plu-
gin, and copying configuration scripts to support sending logs to the Kafka cluster. The completed
container image will run on any host with the Docker daemon installed within the network, monitor
the visible network traffic, and forward log entries to the Kafka cluster.
4.3.1.4 Suricata
To provide network intrusion detection system (IDS) capabilities within the architecture, Suri-
cata monitors the same network connection as the Bro sensor. Suricata is free and open source
engine “capable of real-time instruction detection, inline intrusion prevention, network security
monitoring and offline pcap processing” (“Suricata Open Source IDS / IPS / NSM Engine” 2017).
This research focuses on the IDS capabilities provided by Suricata. Just as with Bro, although
Figure 4.2 shows a single instance of Suricata, multiple instances of Suricata may be deployed to
monitor additional network segments.
As an IDS, Suricata uses signature-based rules to detect potentially malicious activity. From
the Suricata documentation , a signature consists of:35
• The action, that determines what happens when the signature matches;
• The header, defining the protocol, IP addresses, ports and direction of the rule;
• The rule options, defining the specifics of the rule (“Suricata User Guide” 2016)
Actions include pass, drop, reject, and alert. The pass action will stop examining the packet and
skip the remaining rules. The drop action is only applicable when Suricata is operating in IDS mode
and will silently discard the packet. A rejected packet sends a reset to both the sender and receiver
to stop the connection. Finally, the alert action will generate a Suricata alert for the offending packet
that matches the rule signature.
The header section provides control over the protocol to be examined and the nature of the
connection. For example, the signature can be limited to only evaluate transmission control protocol
(TCP) traffic that originates from outside the monitored network address space from any port that
is destined for the specific IP address of a web server in the internal network on TCP ports 80 and
443. The granular control provided by the header allows for tuning of the signatures to both improve
speed of analysis and minimizing false alerts due to excessive and incorrect signature matches.
The rule options contain the detailed information of the signature to match against packets
under investigation. Rule options utilize a combination of keywords to control the actions Suri-
cata takes when a packet matches a signature, the specific content to match in the payload, and
application-specific keywords depending on the content under investigation. A detailed discussion
of rule options and signature creation is beyond the scope of this research. Further information can
be found in the Suricata documentation (“Suricata User Guide” 2016).
Creation of custom rules provides the ability to monitor for signatures specific to an organiza-
tion. Custom rules can be used to monitor for the exfiltration of sensitive data out of the network
during the ‘mission accomplishment’ phase of the attack or to identify newly discovered exploita-
tion vectors by internal analysis. However, management of signatures for existing and newly dis-
covered public exploits is beyond the capacity of most organizations. To support management of the
near-daily discovery of new attack vectors several organizations provide free and paid subscriptions
to newly created rules to maintain the rule signatures within Suricata up to date.
An open source Pearl script, pulledpork.pl (Cummings and Shirk 2017), is provided to manage
the updating of rules for Suricata and Snort. Running the script ensures new rules are downloaded
36
from the specified third-party providers and add, deleted, or modifies the installed rules as applica-
ble. By running the script on a scheduled basis the installed Suricata rules are maintained up to date
against emerging threat signatures.
The software architecture of the Suricata service consists of three Docker containers:
• The Suricata application
• A ‘helper’ container to run the pulledpork.pl script
• A container running Fluentbit to forward alerts to Fluentd
The Suricata container image build process consists of installing dependencies, the application
itself, and copying configuration files tailored to the environment. The configuration files manage
logging, enabling of rule sets, and additional runtime options for Suricata such as the definition of
internal and external network segments.
The PulledPork helper container image contains the dependencies and pulledpork.pl script.
Running the PulledPork container updates the rule sets in the Suricata container and restarts the
Suricata container to enable the newly updated rule set. A configurable cron job in the PulledPork
container manages the automatic periodic running of the pulledpork.pl script to ensure frequent
updates of the installed rule set.
There is no direct integration or plugin for Suricata that supports the transmission of alert events
to the Kafka cluster. The use of an additional helper container running Fluentbit handles forwarding
alerts to Fluentd which in turn sends the events to the Kafka cluster. The Fluentbit container simply
monitors the alerts log file created by Suricata and forwards the events to the Fluentd instance.
Similar Fluentbit containers could be used for additional data sources that do not support direct
integration with the Kafka cluster.
The data contained with a Suricata alert includes the source and destination IP address and port
of the connection, the category of the matching rule, the name of the specific rule, the protocol of
the connection, and the Timestamp of the observed packet. The contents of the alert are used to
create the edges and vertices of the graph model as depicted in Figure 3.1 which connects the alert
data to the other sources via the source and destination IP addresses of the associated endpoints.
37
4.3.1.5 osquery
Within the architecture, monitoring of endpoint activity is accomplished with osquery. As
shown in Figure 4.2, osquery is installed on all of the monitored hosts within the network. osquery
is an open source product provided by Facebook to gain insight into the activity and configura-
tion of hosts within a network. By exposing “an operating system as a high-performance relational
database” (“osquery Docs” 2017), osquery provides a Structured Query Language (SQL) interface
to system analytics and monitoring of the endpoint. osquery supports the Windows, OS X, Linux,
and FreeBSD operating systems, providing coverage for the majority of endpoints typically present
within an enterprise.
Access to the osquery interface is provided by an interactive command-line shell (osqueryi) or
through a monitoring daemon (osqueryd) that runs scheduled queries. The interface provided by
osqueryi is useful for conducting ad-hoc queries or testing new queries for inclusion with osqueryd.
The configuration of osqueryd supports running multiple groups of queries, or packs, on a schedule
while forwarding the query results to a log aggregator. The research architecture uses the native
functionality of osquery to forward scheduled query results to the Kafka cluster.
To effectively monitor the network osquery should be deployed to every endpoint. While there
are several methods to deploy osquery throughout the enterprise, the research architecture utilizes
Ansible to manage the installation and configuration of osquery on the target hosts within the net-
work. As an automation tool, Ansible “can configure systems, deploy software, and orchestrate
more advanced IT tasks” (“Ansible Documentation” 2017). Through the use of Ansible playbooks,
osquery is installed on the endpoint, configured to run selected queries and forward results to Kafka,
and begin running as a service on the endpoint.
To support detection of potentially malicious activity at the host level, osquery is configured to
provide information relevant to intrusion detection and incident response. For baseline configuration
status of each host, queries provide the installed operating system, information about the running
kernel, the state of all installed network adapters, and general system information including installed
central processing units (CPUs) and available memory. Figure 4.3 provides an example query and
the results for the operating system version. Monitoring of interactive events such as user login,
the connection of universal serial bus (USB) devices, the execution of processes, and the status of
38
listening network ports provide situational awareness of the endpoint activity.
osquery> SELECT name , v e r s i o n , major , minor , p a t c h FROM o s v e r s i o n ;
+−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−+−−−−−−−+−−−−−−−+| name | v e r s i o n | major | minor | p a t c h |+−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−+−−−−−−−+−−−−−−−+| Ubuntu | 1 6 . 0 4 . 3 LTS ( X e n i a l Xerus ) | 16 | 4 | 0 |+−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−+−−−−−−−+−−−−−−−+
Figure 4.3: Example osquery results
Queries may be customized to meet the needs of the organization. However, Facebook pro-
vides groups of preformatted queries in packs (“Packs” 2017) to support analytic monitoring and
adversary activity detection. Adding a pack to the configuration runs all of the included queries at
the specified frequency. For this research, the Facebook provided packs are used as a starting point
to create a custom pack for inclusion on each of the endpoints to control the scope of content and
ensure that the query results support the graph data model.
In addition to regularly scheduled queries, osquery supports event-driven queries for reporting
of time-sensitive information that may be missed between consecutively scheduled queries. Events
including process starting and stopping, user login and logout, and changes to file contents can occur
between scheduled queries and would therefore not appear in the query results. The event-driven
framework of osquery follows a publisher/subscriber model to write events to a queue as they occur.
The recorded events are then available to be returned at query time.
4.3.2 Extracting, Transforming, and Loading Events
With all of the data sources now forwarding events into separate topics in the Kafka cluster,
the events can be extracted from Kafka, transformed to match the graph data model, and loaded
into ArangoDB. The ETL process for each data source occurs in individual Docker containers,
represented in Figure 4.2 for each data source, which contain a Python script written for the specific
source. Each container is built with the Python runtime, the script to handle the ETL process, and
any Python libraries required by the script.39
To support extraction of events from Kafka, the kafka-python (Oh 2016) library provides the
functionality necessary to create a Kafka Consumer. By providing the Kafka Consumer with the
address of the Kafka broker in the cluster and the name of the topic to consume, the Kafka Consumer
will continuously poll the cluster to check for new messages. To ensure that no messages are missed
due to an error with the Consumer, Kafka keeps track of which messages the Consumer retrieves.
Additionally, through the use of topic partitioning and message offset tracking, multiple Consumers
can retrieve messages from the same topic. Running multiple instances of the Consumer against a
single topic increases the processing throughput of the ETL process.
Transformation of the consumed event data into components of the graph data model requires
processing specific to the event and the role of the data in the graph model. As each of the data
sources provides multiple types of events, the Python script for a particular topic applies trans-
formation logic specific to the event data contents. The flexibility provided by the transformation
logic allows for the addition of new event types or modification of existing transformations as the
graph data model grows or changes. New events may be added from existing data sources with
corresponding additions to the transformation logic. However, new data sources would require the
addition of another Docker container with a Python script for the resulting new Kafka topic.
As a result of the transformation process, the Python script creates objects for the edges and ver-
tices of the graph data model for insertion into ArangoDB. The python-arango library (Powers and
Arthur 2016) provides the necessary APIs for the ETL script to interact with ArangoDB. Through
the python-arango API, the ETL script manages the creation and updating of vertices and edges in
the associated collections within the database based on the event data and the graph data model.
4.3.3 Storing Events
ArangoDB provides for the storage of the transformed event data in a highly available, cluster
architecture. Although represented as a single node in Figure 4.2, the ArangoDB cluster is com-
prised of multiple containers to provide redundant storage and increased throughput of event data.
As a multi-model database (“ArangoDB - highly available multi-model NoSQL database” 2017),
ArangoDB supports key/value, document, and graph stores. Each of the stores provides advantages
for storage and querying of data records. This research focuses on the document and graph stores
40
for aggregation of records, modeling, and querying.
The document store maintains records in collections, similar to tables in a traditional relational
database. Records are stored as JSON documents that support deeply nested data fields comprised
of mixed data types. Storing records in JSON format provides flexibility in the data structure by
allowing for addition and modification of data fields without requiring schema migrations typically
associated with relational databases.
To store the vertices created during the ETL process, collections are created for each of the
node data types in the graph data model. For example, there are separate collections for external IP
addresses, internal IP addresses, operating systems, user accounts, etc. An example document from
the operating system collection is shown in Figure 4.4. Each document has a unique value for the
‘ key’ field and proper selection of the value in the schema allows for the creation of a single node
where multiple endpoints posses the same value. In the example below, the ‘ key’ field is selected
to provide a single node for the operating system version when multiple endpoints are running the
same version. The ‘ id’ field is simply a combination of the collection name and the value of the
‘ key’ field and the ‘ rev’ field is internally generated by ArangoDB for version tracking of the
document. The remaining fields are created from the results of the ETL process for the event from
osquery.
{” key ” : ” 1 6 . 0 4 . 3 LTS ( X e n i a l X e r u s ) 1 6 4 0 ” ,” i d ” : ” os / 1 6 . 0 4 . 3 LTS ( X e n i a l X e r u s ) 1 6 4 0 ” ,” r e v ” : ” WKHo8M2−−E” ,” f i r s t s e e n ” : ”Thu Jan 4 1 8 : 2 3 : 4 0 2018 UTC” ,” l a s t s e e n ” : ”Thu Jan 4 1 8 : 2 3 : 4 0 2018 UTC” ,” b u i l d ” : ” ” ,” codename ” : ” x e n i a l ” ,” major ” : ”16” ,” minor ” : ” 4 ” ,” p a t c h ” : ” 0 ” ,” p l a t f o r m l i k e ” : ” d e b i a n ” ,” v e r s i o n ” : ” 1 6 . 0 4 . 3 LTS ( X e n i a l Xerus ) ”
}
Figure 4.4: Example ArangoDB vertex document
41
Collections in the document store are also used for the edges, or relationships, of the data model.
The documents in a vertex collection consist of a field for the ‘ id’ of the source and destination
nodes of the relationship in addition to fields for properties of the relationship. Continuing with
the previous example, Figure 4.5 displays the edge document that connects a host node with the
associated operating system node. The ‘ key’, ‘ id’, and ‘ rev’ fields serve the same role as in the
vertex document. The ‘ from’ and ‘ to’ fields hold the ‘ id’ of the associated edge documents.
{” key ” : ”3618158” ,” i d ” : ” r u n n i n g o s / 3 6 1 8 1 5 8 ” ,” f rom ” : ” h o s t s /63864 D56−E55E−5B91−B4A8−2EE4E7C16F61 ” ,” t o ” : ” os / 1 6 . 0 4 . 3 LTS ( X e n i a l X e r u s ) 1 6 4 0 ” ,” r e v ” : ” WKHo8OG−−−”,” f i r s t s e e n ” : ”Thu Jan 4 1 8 : 2 3 : 4 0 2018 UTC”
}
Figure 4.5: Example ArangoDB edge document
The graph store is essentially a combination of the vertex collections and their associated edge
collections. With the edge and vertex collections populated with events from the data sources, the
graph store contains the transformed events in accordance with the data model of Figure 3.1. The
graph store provides the advantage of allowing for graph queries based on the relationships of the
edge nodes. Queries for the shortest path between nodes, traversing the graph, and pattern matching
become available when storing data within a graph.
4.3.4 Querying the Database
The Arango Query Language (AQL) included with ArangoDB provides a unified query in-
terface to the key/value, document, and graph data stores. The unified query interface allows for
queries that pull from any combination of data stores to retrieve the desired results. The syntax of
AQL, similar to the SQL syntax of relational databases, provides additional capability and flexibility
to retrieve data from the database, and ultimately gain insight into the activity within the network.
With the graph data model represented by documents within the edge and vertex collections
42
of ArangoDB, graph and document-based queries can be constructed to locate potential malicious
activity based on the data from the sensors. Queries can range from simple matching queries run
against a single document collection to complex traversal queries run against the entire graph store.
Queries run against a single collection can be used to locate a particular indicator in the data
such as a file hash or website address or to aggregate statistics such as bytes transferred by hosts
within the network. While single collection queries can provide useful information about the activity
in the network, the power and basis of this research come from the ability to query the entire graph
and connect data from multiple sources. By collecting and transforming events from all of the data
sources into a graph model, AQL queries can be constructed to determine which user was logged
into a particular host when a specific website was visited and which files were transferred during
the connection.
Using the attack lifecycle as a guide, AQL queries can be constructed to identify activity typi-
cally associated with attacks against a network. A collection of targeted queries, run on a scheduled
basis, could assist network defenders in rapidly identifying potential malicious activity within in
the network. After the possible activity has been detected by the automated queries, the interactive
AQL function provides incident responders the ability to conduct further investigation into the na-
ture of the activity. Expanded AQL queries can localize the source of the incident or determine the
spread of activity throughout the network.
43
Chapter 5 – Simulation
Evaluation of the graph data model and the software architecture developed in this praxis oc-
curred through execution of attacks against a simulation network. Section 5.1 discusses the im-
plementation and architecture of the simulation environment to include the vulnerable targets for
the attacks. Attacks against the simulation environment consisted of a series of scripted attacks by
the researcher and blind attacks conducted by an independent third-party. Section 5.2 discusses the
scripted and blind attacks. The analysis of the attacks as supported by the graph correlation system
developed during this research are covered in Section 5.3.
5.1 Simulation Environment
Evaluation of the graph-based event correlation methodology required the collection of network
and endpoint data from a representative network environment subject to attacks against endpoints
within the environment. For this research, the simulation environment was based on a home network
consisting of various network-connected devices and multiple users. The device types included
physical desktop computers and laptops, phones and tablets, virtual machines, smart home devices,
and internet-enabled media devices. All of the endpoints in the simulation environment belong to
a single network segment that is protected by a firewall from connections originating outside the
network.
To provide an attack surface for the simulation events, multiple instances of a pre-configured
virtual machine were hosted within the environment. The virtual machines (“Basic Penetesting: 1”
2017) include software and configuration vulnerabilities representative of issues found in typical
network environments based on poor software patching and configuration management policies.
The use of vulnerable virtual machines within the environment increases the likelihood of successful
attacks during the simulation events and provides for limiting the scope of the attacks to prevent
disclosure of sensitive personal information contained on the other endpoints in the environment.
While reducing the scope of the attacks to the vulnerable virtual machines limits the realism of the
scenario, the detection capabilities demonstrated in the research are not impacted by the reduced
the scope of the simulation environment.
Access to the simulation environment for the scripted and blind attacks was provided through
44
a virtual private network (VPN) connection provided by the edge router. The VPN creates an en-
crypted connection between the attacker’s computer and a virtual network that is separate from
the simulation network. The separate virtual network provided a means to simulate attack traffic
originating from the Internet while not exposing the simulation network directly to the Internet.
Additionally, to further simulate a realistic network environment the firewall limits access from
endpoints in the VPN to the web server running on one of the virtual machines within the sim-
ulation network. This network configuration ensures attackers had limited external visibility into
the simulation network and required exploitation of the externally facing web server before gaining
access to the internal simulation network and the remaining vulnerable virtual machines.
An additional network segment is provided by the router to contain the processing and storage
components of the graph-based correlation architecture. The correlation network contains the Bro
and Suricata sensors, the Fluentd and Kafka messaging nodes, the event processing containers, and
the ArangoDB cluster. The firewall limits access to the correlation architecture to the simulation
network and the ports required for transmission of events and logs into the correlation network.
Figure 5.1 provides an overview of the network structure of the simulation environment.
Figure 5.1: Simulation Environment Overview
45
5.1.1 Background Activity
In a typical network environment, the ability to successfully detect malicious activity is compli-
cated by the volume of normal, benign activity within in the network. For the simulation environ-
ment, the background activity was provided by out of scope targets and users within the simulation
network. By monitoring and collecting all of the network activity within the simulation environ-
ment, a more realistic scenario is created for the detection of the malicious activity associated with
the attack events.
Failure to collect sufficient background activity from the benign endpoints in the network would
result in only the collection of attacker based activity, reducing the validity of the methods described
in the research in a representative network environment. To ensure sufficient volume of background
activity, the simulation network traffic was monitored and collected over several days before the
commencement of attack events. Table 5.1 provides a summary count of metrics related to the
observed network activity during the simulation. The volume of observed traffic before the attack
events ensured that the activity contributed by the attack events would contribute a small percentage
of the overall observed activity within the simulation network, and therefore be more representative
of a typical network environment.
Table 5.1: Summary of Network Activity During Simulation
Metric Count
Unique connections 1,506,870Unique domains requested 20,823Unique external IP addresses 25,760Files transferred 1,324,770
The background activity consists of network traffic associated with typical multi-user environ-
ments. The activities include web browsing, media streaming, smart home device communication,
machine to machine communication over secure shell (SSH), and interaction with other Internet-
based services such as GitHub. Additionally, to include the vulnerable VMs in the background
activity, connections over HTTP, FTP, and SSH occurred periodically to simulate normal network
activity involving the vulnerable VMs.
46
5.1.2 Vulnerable Targets
As this research focused on the detection vice protection aspect of network security, virtual
machines with known, exploitable vulnerabilities served as the targets of attack for the simulation
events. The use of VMs with known vulnerabilities increased the likelihood of success during
the scripted and blind attack scenarios in the simulation events. During the scripted events, the
attacks targeted the specific vulnerable applications and configurations to support the validation of
the graph-based correlation to detect malicious activity. For the blind attack events, the attacker
attempted to exploit the VMs with no prior knowledge of the installed software or configuration.
As a baseline configuration, the ‘Basic Pentesting’ virtual machine from VulnHub (“Basic
Penetesting: 1” 2017) served as the starting point for the target VMs in the environment. The
baseline VM includes:
• A vulnerable FTP server
• An installation of WordPress with default administrative credentials
• Accounts with passwords vulnerable to cracking
• An SSH with default configuration
To provide target diversity and allow for lateral movement by the attacker, multiple instances
of the baseline vulnerable VM were configured to provide only one of the vulnerable services. The
first instance of the VM served as the web server exposed to the attacker network with the remaining
vulnerable services disabled. Similarly, additional individual instances of the baseline VM provided
FTP and SSH services with additional user accounts added for diversity within the environment.
Finally, each of the vulnerable virtual machines included an installation of osquery configured
to support the data collection requirements of the graph data model. The configuration contained
the queries required to run periodically on the VM and forward the results on the queries to the
Kafka cluster in the monitoring network for processing and insertion into ArangoDB.
5.2 Attacks
To evaluate the effectiveness of the graph-based correlation system to quickly identify poten-
tially malicious activity a series of attacks conducted against the vulnerable machines simulated47
attacker activity within the simulation environment. Attacks were carried out in two separate sce-
narios designed to meet different goals for the analysis of the attack events. During the scripted
attacks, the attacker possessed full knowledge of the simulation environment to include network
architecture and existing vulnerabilities. For the blind attacks, the third-party attacker was only
provided the IP address of the externally facing web server and did not know the internal network
structure or the configuration of the target VMs.
The first series of attacks focused on validating the ability of the graph-based approach to detect
malicious activity based on prior knowledge of the activities conducted by the attacker. The scripted
attacks targeted the known vulnerabilities in the VMs in a pre-defined series of events. Analysis of
the processed logs, alerts, and events generated by the scripted attacks demonstrated the visual anal-
ysis and database query capabilities of the graph-model to support detection of potentially malicious
activity.
The second series of attack events conducted by an independent third-party served to validate
the ability of the research to detect attack activity faster than the reported current industry averages.
The activities conducted during the blind attacks occurred with no prior knowledge to support an
independent assessment of the graph-based model to rapidly detect malicious activity. To compare
results of the analysis with the actual activities conducted during the blind attacks, the third-party
attacker provided the steps taken during the event after the completion of the analysis.
5.2.1 Scripted
With prior knowledge of the network configuration and the vulnerabilities present in the target
VMs, the scripted attacks emulated adversary activity through:
• Reconnaissance of the external facing web server
• Exploitation of the web server using the default web admin credentials
• Uploading remote access trojan to the web server
• Obtain and crack password hashes
• Login as system administrator with cracked password
The above activities provided opportunities for data collection via Bro and Suricata at the net-
48
work level and osquery at the endpoints. Additionally, the activities from the scripted scenario
provided for detection of malicious activity from signature-based events such as alerts from Suri-
cata and behavior based events such as monitoring servers for the initiation of outbound network
connections.
The scripted scenario only targeted the vulnerable web server to prevent potential overlap with
the activities of the blind attacks. Due to the relatively limited attack surface presented by the
scope of the target VMs the, limiting the activities of the scripted attacks maximized the potential
unknown actions carried out by the third-party attacker during the blind attacks. Providing a size-
able unknown threat space created a more representative environment for detection of potentially
malicious behavior in the data set.
5.2.2 Blind
Validation of the graph-correlation system’s time to detect potentially malicious activity oc-
curred through opening access to the simulation environment to a third-party attacker. The third-
party attacker gained access to the simulation environment through a VPN to provide a secure
connection to the environment while allowing the attack traffic to appear as originating from out-
side the simulation environment. The VPN credentials provided to the third-party attacker only
granted access to the VPN. Access to the simulation environment from the VPN was limited to the
externally facing web server to simulate an exposed web server that resides on the internal network.
The blind attacks conducted by the third-party occurred with no prior knowledge of the timing
or techniques employed by the attacker to simulate the complexities of activity detection in a real-
world network environment. Due to the simulation environment containing personal endpoints in
addition to the vulnerable VMs, a scoping document provided to the third-party attacker defined
which endpoints in the simulation environment were open to attacks. Although the exploitation
activities were limited to the attack targets, the entire simulation environment was open to network
scanning.
49
5.3 Detection
Rapid detection and localization of attacker activity are paramount when protective defense
measures fail. The longer an attacker remains undetected, they are more likely to persist within
in the network and carry out their objectives. Initial detection can be based on intrusion detection
sensors such as the Suricata NIDS deployed in support of this research. However, many attackers
possess the capability to bypass traditional signature-based detection mechanisms. Behavior-based
detection methods, however, can be used to detect the nature of an attackers movement through the
network.
To effectively employ behavior-based detection methods, network defenders must understand
the structure and role of endpoints in the network ‘normal’ network behavior. For example, under-
standing normal levels of network activity for each endpoint or knowing that the server endpoints
should rarely if ever, initiate outbound network connections.
Each stage of the simulation, baseline collection, scripted attacks, and blind attacks, served
to demonstrate various aspects of the ability of graph-based correlation to help defenders detect
attacker activity faster than current industry averages of weeks to months (Brumfield 2017; SANS
Analyst Program 2017; Ponemon Institute 2017).
5.3.1 Baseline Activity
As previously discussed, the simulation environment recorded and processed network and end-
point data for several days before the attack events. The baseline recording period served to:
• Verify proper operation of the system over an extended time period
• Provide background ‘white noise’ for simulation of real-world network activity
• Generate data for evaluation of ‘normal’ network activity
Table 5.1 provides high-level metrics for the volume of network traffic and key data points re-
lated to aspects of the traffic. While these metrics provide a generalized assessment of traffic volume
across the network, the metrics lack the granularity necessary for understanding the behavior within
the network.
50
A more detailed understanding of the network traffic is obtained by leveraging the graph-model
and the Arango Query Language (AQL). Figure 5.2 displays the AQL query for counting the out-
bound connections to external IP addresses and the number of unique domains associated with the
external IP address.
Figure 5.2: AQL Query of External Domains and IP Addresses
The query in Figure 5.2 utilizes the edge and vertex collections associated with external IP
addresses, domains, and connections (Line 1) to loop over each external IP address and count the
number of inbound and outbound connections (Lines 4-6). The remainder of the query (Lines 7-
13), loop through the domains collection to count the number of unique domains associated with
the external IP address and format the results of the query. Table 5.2 provides the top ten results of
the query sorted by the number of outbound connections from the network.
Table 5.2: Summary of Outbound Activity Prior to Attack Events
Unique Domains IP Address Outbound Connections75 172.217.12.238 19340
7 104.154.127.47 164061 185.132.79.54 8481
16 172.217.12.227 83764 93.184.216.34 72711 74.6.105.9 5093
1445 192.33.31.192 49205 172.217.12.228 4772
55 172.217.9.206 47373 162.125.6.3 2938
From the single query, an understanding of the relationship between domains, external IP ad-
dresses, and connection activity becomes apparent. The results of the query provide insight into51
which external IP addresses receive the most traffic from within the network as well as the number
of unique domains associated with the IP address. The larger values for Domain Counts indicate
that the IP address may be associated with a content delivery network that provides files for multiple
websites.
Similarly, obtaining an understanding of inbound traffic to the network is obtained with minor
modifications to the previous query. As shown in Figure 5.3, a change to Line 9 sorts the list by
inbound connections and a change to Line 10 returns the actual domain name vice counting the
number of unique domains.
Figure 5.3: AQL Query of Inbound Network Connections and Domains
The results of the query, provided in Table 5.3, demonstrate that the majority of the inbound
network connections are from the domain ‘mtalk.google.com’. The connections to the domain are
associated with the Google Hangouts application, and as such are expected and normal network
traffic based on the user base. The results also demonstrate the use of multiple IP addresses to serve
a single unique domain.
5.3.2 False Positives
While monitoring the simulation environment for baseline activity, several alerts reported by
Suricata indicated potential malicious activity. Analysis of the initial Suricata alerts from the real-
world traffic served as the initial opportunity to validate the ability of the graph-model to detect
and localize malicious activity. As the alerts were based on network traffic to and from personal
endpoints within the simulation environment, there were no osquery results for those endpoints in
the database. Therefore, the following analysis includes only edges and vertices from the processing
52
Table 5.3: Summary of Inbound Activity Prior to Attack Events
Unique Domains IP Outbound Connections Inbound Connectionsmtalk.google.com 173.194.68.188 63 117mtalk.google.com 209.85.201.188 64 80ntp-g7g.amazon.com 207.171.178.6 923 78mtalk.google.com 173.194.205.188 44 69mtalk.google.com 173.194.175.188 35 51mtalk.google.com 209.85.232.188 47 39mtalk.google.com 173.194.208.188 9 23mtalk.google.com 173.194.204.188 46 22mtalk.google.com 74.125.192.188 47 18mtalk.google.com 173.194.66.188 48 17
of Bro logs and Suricata events.
Correlation of the attack activity requires a starting point. Initial detection of attack activity may
come from a signature based sensor such as Suricata or through identification of activity deemed
as an outlier from normal baseline activity. Figure 5.4, as captured from the ArangoDB query
interface, provides the details of an alert observed during the baseline data capture.
Figure 5.4: Suricata Alert from ArangoDB
The alert indicates attempted exploitation of Internet Explorer via remote code execution of
the internal endpoint with an IP address of 10.10.10.107 from the external IP address 45.79.86.91.
53
While the alert provides an indicator of potentially malicious activity it does not provide an analyst
with confirmation of successful exploitation or the scope of the attacker’s activity.
Using the alert as a starting point, the AQL query displayed in Figure 5.5 searches the graph
database for connections to the alert based on the graph data model. To begin the search, the starting
node is limited to the specific alert under investigation (Lines 1-2). From the starting node, the query
then searches for the external IP address in the database that matches the source of the alert (Lines
4-5). To limit the number of returned connection to those that occurred during the time frame of the
of alert the results are filtered to connections occurring within one minute of the identified alert at a
search depth of two connections from the external IP address (Lines 6-10).
Figure 5.5: AQL Query for Suricata Alert Analysis
The resulting graph visualization based on the query is shown in Figure 5.6. The visualization
results, returned from the database in 25ms, provide an understanding of the activity associated with
the potential attacker was originating from the external IP address of the alert. Figure 5.6 provides
a representation of the components of the graph model from 3.1 based on the data returned from the
query in 5.5.
54
Figure 5.6: Visualization of Query Results
The analyst can quickly interpret several items of interest from the resulting graph visualization
of the AQL query during the two-minute time window of the alert:
• The external IP addresses ‘45.79.86.91’ was the source of three total alerts as represented by
the nodes labeled ‘3173714’, ‘3095134’, and ‘3095131’
• The external IP address communicated with two internal IP addresses as represented by the
‘conn’ labeled nodes with connections between the internal and external IP addresses
• There were six connections containing HTTP traffic between the external IP address and the
target of the alert as represented by the ‘http’ nodes connected to the ‘conn’ nodes
• The domain names associated with the external IP address as represented by the domain
names connected to the external IP address node
While the visualization provides context and understanding of the activity surrounding the alert55
it lacks the information to support a detailed analysis for incident validation and response. By
modifying the return statement in Line 15 of the query in Figure 5.5, the query returns detailed,
tabular information about the connections and HTTP sessions represented in the graph visualization.
Relevant data fields of query results for the connections and HTTP sessions during the potential
attack are shown in Table 5.4 and Table 5.5 respectively.
Table 5.4: Connection Data Associated with Alert Under Investigation
ts orig bytes resp bytes dst ip src ip src port dst port2018-01-18T20:34:19Z 4131 161033 45.79.86.91 10.10.10.107 62177 802018-01-18T22:12:35Z 859 1348 45.79.86.91 10.10.10.103 41864 802018-01-18T20:34:19Z 5846 326179 45.79.86.91 10.10.10.107 62179 802018-01-18T20:34:19Z 5370 267315 45.79.86.91 10.10.10.107 62174 802018-01-18T20:34:19Z 4213 244863 45.79.86.91 10.10.10.107 62175 802018-01-18T20:34:19Z 3696 283894 45.79.86.91 10.10.10.107 62178 802018-01-18T20:34:19Z 5771 311461 45.79.86.91 10.10.10.107 62173 80
Table 5.5: HTTP Session Data Associated With Alert Under Investigation
ts HTTP method uri dst ip src ip2018-01-18T20:34:19Z GET /preview/main.css 45.79.86.91 10.10.10.1072018-01-18T22:12:35Z GET /favicon.ico 45.79.86.91 10.10.10.1032018-01-18T20:34:19Z GET /css/responsive.css 45.79.86.91 10.10.10.1072018-01-18T20:34:19Z GET /css/jquery-ui.min.css 45.79.86.91 10.10.10.1072018-01-18T20:34:20Z GET /img/home/holiday2-thumb.jpg 45.79.86.91 10.10.10.1072018-01-18T20:34:20Z GET /js/bootstrap.js 45.79.86.91 10.10.10.1072018-01-18T20:34:19Z GET /css/main.css 45.79.86.91 10.10.10.107
The query results in Table 5.4 provide a summary of the HTTP connections between the external
IP address and the internal IP address to include the time of the connection, the amount of data sent
from the internal IP to the external IP, and the amount of data returned from the external IP to
the internal IP. Based on the amount of data in the responses being much larger than the data sent
from the originating internal IP address, the connections follow the expected pattern of normal
HTTP traffic between an internal client and an external web server. Similarly, Table 5.5 provides
the file names of files requested by the internal client from the external web server and provide no
immediate indicators of malicious activity.
At this point, the analyst has sufficient information to support responding to the potential ma-56
licious activity to include IP addresses of the hosts involved, the nature of the HTTP connections
between the hosts, and the domain names associated with the potential attacker. Through manual
construction and running of three queries from the ArangoDB provided interface, each of which
took approximately 25ms to return results, the analyst can conduct a deep and focused analysis to
determine the nature and severity of the potential attack.
Although outside the scope of this research, the information from the query results provided the
necessary data points to quickly locate the specific network packets associated with the alert in the
full packet capture logs collected by Bro. With the timestamps, IP addresses, ports, and universal
resource indicators (uri) from the query results, the contents of the network packets were inspected
and determined to be benign.
While the alert under investigation was a false positive, the investigation provided an oppor-
tunity to validate the correlation of events from multiple sources into a graph model. From the
identification of the alert to having an understanding of the nature of the traffic and collecting data
for the potential incident response required less than five minutes with the analyst utilizing manually
constructed AQL queries. From a practical perspective, although the previously discussed manual
process would provide an improvement over current workflow practices, automation of selected
queries would provide for a system that scales to meet the demand of typical enterprise networks.
Additionally, the data points collected as a result of the queries allowed for a target investigation of
the full packet capture to determine the exact contents of all communications between the external
IP address and the internal host in less than ten minutes.
5.3.3 Scripted Attacks
Validation of the collection, transformation, and storage of events from multiple data sources
into the graph-model occurred via execution of a scripted attack. The attacker targeted the vul-
nerable web server from the VPN network with the goals of successful exploitation and privilege
escalation. Due to the limited attack surface provided in the simulation environment, the scripted
attacks were limited to one of the three vulnerable targets to maximize unique opportunities for
adversarial activity detection during the blind attack events.
Although limited in scope, the activities conducted during the scripted attack followed the pat-
57
tern of reconnaissance, exploitation, and privilege escalation associated with real-world network
intrusions. The specific activities conducted during the scripted attack included:
• Service discovery with nmap
• WordPress vulnerability scanning with nikto
• WordPress password bruteforce attack
• Uploading a malicious WordPress plugin
• Extraction and cracking of passwords
• Privilege escalation with cracked password
Detection of the initial reconnaissance activity occurred through monitoring the alerts generated
by the Suricata NIDS over time. Figure 5.7 illustrates the high volume of alerts generated by the
scanning activity of the attacker against the web server in the simulation environment. The results in
Figure 5.7 show the count of alerts by the source IP of the alert over a 24 hour period. Approximately
12,000 alerts were observed from the source IP of the attacker within less than an hour.
58
Figure 5.7: Scripted Attacks - NIDS Alerts Over Time
While monitoring alerts over time do not leverage the capabilities of the graph-model, the re-
sults provide a starting point for further analysis. The visualization provided by Figure 5.7 localizes
the attacker IP address and the time frame of the service discovery and vulnerability scanning. With
the IP address and time frame information available, the AQL shown in Figure 5.8 provides context
to the nature of the observed alerts.
59
Figure 5.8: Scripted Attacks - AQL Alerts Summary
Based on the results from Figure 5.7 for alerts over time, the ‘LET’ statements in Figure 5.8
set the variables for the attacker and target IP address and the start and stop time for the window of
potentially malicious activity. The use of ‘LET’ statements demonstrates the ability to parameterize
AQL queries for future automation of queries. The remainder of the query handles looping through
all of the alerts in the database while filtering the results based on the attacker and target IP’s and
alerts that occurred during the time frame of interest and counting the number of occurrences of
each alert name. While this query sorts by the count of observed alerts and only returns the first ten
results, the full query response could be returned to get a more detailed view of the alert activity
associated with the attacker.
Table 5.6 displays the results of the query and provides the analyst with an overview of the
alerts to determine that the majority of the alerts are associated with scanning activity based on the
high number of remote file inclusion and exploitation attempts reported by Suricata’s ‘ET WEB
SERVER’ rule set and the appearance of the NMAP, an open source network scanner, user-agent
string.
With the knowledge that the attacker is possibly targeting the WordPress installation on the web
server, the analyst can leverage the capability of the graph-model to gain further understanding of
the details of the attacker activity. The AQL query displayed in Figure 5.9 builds on the information
from previous results to traverse the graph and provide the analyst with the specific HTTP methods,
60
Table 5.6: Scripted Attack - Alerts Summary Query Results
Alert Name CountET WEB SERVER PHP Possible http Remote File Inclusion Attempt 4684ET WEB SERVER PHP Generic Remote File Include Attempt (HTTP) 4684ET WEB SERVER Script tag in URI Possible Cross Site Scripting Attempt 530ET WEB SPECIFIC APPS Mambo Exploit 204ET WEB SERVER Poison Null Byte 160ET WEB SERVER Exploit Suspected PHP Injection Attack (cmd=) 132ET WEB SERVER Possible CVE-2014-6271 Attempt 122ET WEB SERVER Possible CVE-2014-6271 Attempt in Headers 122ET WEB SPECIFIC APPS Generic phpbb arbitrary command attempt 118ET SCAN Nmap Scripting Engine User-Agent Detected (Nmap Scripting Engine) 48
URIs, and the timestamps of the events.
Figure 5.9: Scripted Attacks - HTTP Service Query
The ‘WITH’ statement in Figure 5.9 ensures that the ‘has service’ edge collection and the
‘services’ vertex collection from the graph data model are included as part of the graph traversal
query. Similar to the query in Figure 5.8, the ‘LET’ statements set the variables for attacker IP,
target IP, and time frame of activity. The first ‘FOR’ loop filters all of the connections in the
database based on the attacker and target IPs observed during the time frame of interest. The next
‘FOR’ loop, nested inside the first loop, uses each of the filtered connection nodes to traverse the
graph and return the ‘service’ vertex nodes associated with each of the ‘connection’ vertex nodes
61
based on the ‘has service’ relationship as discussed in the graph model of Figure 3.2.
Analysis of the query results displayed in Table 5.7 provides the analyst with a timeline of the
attacker’s remote interactions with the web server. From the timeline, the analyst can determine
that the attacker accessed the administration console of the WordPress installation, uploaded and
installed a plugin to the server, and then accessed the plugin. The first three entries in Table 5.7
indicate successful login to the WordPress installation by an administrative user. The remaining
entries in Table 5.7 indicate the uploading of a WordPress plugin named ‘BPXqPCLBWi.php’ into
the ‘bpzNrgxDf’ folder in the WordPress instance. The use of random, eight-character filenames is
indicative of the attacker utilizing the Metasploit attack framework (Kennedy et al. 2011) to conduct
the attacks.
Table 5.7: Scripted Attack - HTTP Service Results
Time Method URI2018-01-29T16:02:38Z POST /secret/wp-login.php2018-01-29T16:03:43Z POST /secret/wp-admin/admin-ajax.php2018-01-29T16:04:33Z GET /secret2018-01-29T16:04:34Z POST /secret/wp-admin/update.php?action=upload-plugin2018-01-29T16:04:34Z POST /secret/wp-login.php2018-01-29T16:04:34Z GET /secret/wp-admin/plugin-install.php?tab=upload2018-01-29T16:04:36Z GET /secret/wp-content/plugins/bpzNxrgxDf/BPXqPCLBWi.php
The analyst now has sufficient evidence of successful exploitation of the web server via a po-
tentially malicious WordPress plugin. To obtain confirmation of the exploitation the connection
activity of the web server is examined for outbound network connections to the attacker. As web
servers should not initiate connections to Internet clients, any observed outbound connections are
indicative of a compromised server. The AQL query and results for outbound connections from
the web server during the time frame of the attack are displayed in Figure 5.10 and Figure 5.11,
confirming the web server made an outbound connection to the attacker at the same time the ma-
licious plugin was accessed. The structure of the query in 5.10 is similar to that of Figure 5.8 and
Figure 5.9, the main difference being switching the source and destination variables to the ‘target’
and ‘attacker’ respectively to identify connection originating from the web server.
62
Figure 5.10: Scripted Attacks - Web Server Outbound Query
Figure 5.11: Scripted Attacks - Web Server Outbound Results
The query results from Figure 5.11 indicate a single outbound connection from the web server
to the attacker on port 4444, also indicative of the Metasploit framework (Kennedy et al. 2011), as
port 4444 is the default port that Metasploit utilizes to listen for inbound connections from targets.
With the time of the outbound callback from the previous query, the analyst can now traverse
63
the graph to determine the actions taken by the attacker on the targeted server. The query as shown
in Figure 5.12 uses the same structure as the previous query in Figure 5.9 to traverse the graph.
The ‘WITH’ statement includes the required edge and vertex collections necessary to traverse from
the IP address node to the ‘process’ nodes associated with the IP address as shown in Figure 3.1.
The remainder of the query is again a pair of nested ‘FOR’ loops that return the timestamp and
commands run on the internal host.
Figure 5.12: Scripted Attacks - Command Query
From the results of the query, shown in Table 5.8, the analyst can see the commands run on
the compromised web server after exploitation. The commands show that the attacker spawned an
interactive shell, read the contents of the /etc/passwd and /etc/shadow files, then logged into the
privileged marlinspike user account. Based on the time difference between reading the password
files and logging into the marlinspike account, the attacker appears to have successfully cracked the
password of the marlinspike account.
Table 5.8: Scripted Attack - Command Line Query Results
Time Command2018-01-29T16:05:56Z sh -c /bin/sh2018-01-29T16:05:56Z /bin/sh2018-01-29T16:05:56Z python -c import pty:pty.spawn(”/bin/sh”)2018-01-29T16:06:26Z /bin/sh2018-01-29T16:06:26Z id2018-01-29T16:06:26Z ls -al /etc/passwd2018-01-29T16:06:26Z python -c import pty;pty.spawn(”/bin/sh”)2018-01-29T16:07:26Z ls -al /etc/shadow2018-01-29T16:07:56Z cat /etc/shadow2018-01-29T16:07:56Z cat /etc/passwd2018-01-29T16:11:57Z su - marlinspike
64
From reconnaissance to privilege escalation the entire attack took approximately 10 minutes.
The analysis of the attack scenario, conducted manually, required under an hour to complete. The
analysis identified the attacker, the target, the method of exploitation, and actions taken by the
attacker on the compromised web server. The graph correlation system provided the analyst with
the ability to query the database based on the data elements and their relationships vice having to
query multiple sources for individual pieces of information and then determine how the separate
results are related. From the analysis, a defender has the information required to take corrective
action to remove the attacker’s access and prevent further attacks via the same vector.
With the information collected in less than one hour of analysis, the defender knows the actions
needed to respond and recover from the attack:
• Change the admin password for the WordPress installation based on the observed WordPress
administrator account login and upload of a malicious plugin
• Change the password for the marlinspike account based on the attacker switching to the mar-
linspike user account
• Change the passwords for any remaining accounts on the web server based on the attack
extracting password hashes and successfully logging in as the marlinspike user
• Institute firewall rules that prevent initiation of outbound connections from the web server
based on observing the web server initiating an outbound connection to the attacker
• Remove the malicious plugin from the WordPress installation based on the observed installa-
tion of a plugin resulting in compromise of the web server
From a practical standpoint, although outside the scope of this research, effective tuning of the
NIDS to minimize false alerts would further improve the speed of analysts attempting to identify
potentially malicious activity. While tuning of the Suricata instance in the simulation environment
reduced false positives that required investigation by the analyst, the volume of monitored network
traffic in the simulation environment was a small fraction of the traffic typically observed in a large-
scale enterprise network.
While the scripted attack analysis validated the collection, transformation, storage, and analysis
of events in a graph model, the analysis was predicated on knowledge of the attacker’s actions. To
validate the practical application of the research’s ability to improve the workflow of the analyst65
and reduce the time to detection a series of blind attacks conducted by an independent third-party
are analyzed in the next section.
5.3.4 Blind Attacks
The independent third-party attacks occurred with no knowledge of the time frame or actions
taken by the attacker. The attacker received access to the simulation environment through the VPN
with the goal of compromising all three of the vulnerable targets in the simulation environment.
Initial detection of attacker activity occurred through periodic analysis of observed alerts from
the NIDS over time. By grouping the source IP address of the alerts over a time interval, the analyst
can quickly identify high volumes of alerts and the offending IP addresses as shown in Figure 5.13.
The results indicate an unusually high volume of alerts from the external IP address 10.0.8.2 and
alerts from 10.10.10.10, the IP address of the externally facing web server.
Figure 5.13: Blind Attacks - Alerts Over Time
By adjusting the start and stop times in the AQL query from Figure 5.8, the analyst obtains the
alert summary results displayed in Table 5.9. The results, similar to those from the scripted attack
scenario, are indicative of web application vulnerability scanning. The similarity is expected due
66
to the web application being the only service exposed to external IP addresses and the pattern of
attackers following the attack lifecycle and conducting reconnaissance before attempting exploita-
tion.
Table 5.9: Blind Attack - Alerts Summary Query Results
Alert Name CountET WEB SERVER PHP Possible http Remote File Inclusion Attempt 2342ET WEB SERVER PHP Generic Remote File Include Attempt (HTTP) 2342ET WEB SERVER Script tag in URI Possible Cross Site Scripting Attempt 264ET WEB SPECIFIC APPS Mambo Exploit 102ET WEB SERVER Poison Null Byte 80ET WEB SERVER Exploit Suspected PHP Injection Attack (cmd=) 66ET WEB SERVER Possible CVE-2014-6271 Attempt in Headers 61ET WEB SERVER Possible CVE-2014-6271 Attempt 61ET WEB SPECIFIC APPS Generic phpbb arbitrary command attempt 59ET WEB SERVER /system32/ in Uri - Possible Protected Directory Access Attempt 23
Due to the limited attack surface presented by the externally facing web server, the analysis of
HTTP traffic and connections between the attack and the web server during the blind attacks follow
the pattern observed during the scripted attacks. The results are similar to those displayed in Figure
5.9, Table 5.7, Figure 5.10, and Figure 5.11 from the scripted attack scenario.
Determining what actions the blind attacker conducted after compromising the web server only
requires modifying the start time of the query from Figure 5.12. The results of the query, displayed
in Table 5.10, reveal the extent of the blind attacker’s actions after compromising the web server.
From the commands run on the web server the analyst can determine that the attacker carried
out the following actions after compromising the web server:
• Extracted encrypted password information based on the ‘cat /etc/passwd’ and ‘cat /etc/shadow’
entries
• Privilege escalated to the marlinspike user account based on the ‘su - marlinspike’ entry
• Port scanned the entire simulation network based on the ‘nmap 10.10.10.0/24’ entry
• Version scan of two hosts in the simulation network based on the ‘nmap -sV 10.10.10.11’ and
‘nmap -sV 10.10.10.12’ entries
• Connected to FTP server on 10.10.10.11 with telnet based on the ‘telnet 10.10.10.11 21’ entry
67
Table 5.10: Blind Attack - Command Line Query Results for 10.10.10.10
Time Command2018-02-15T18:25:57Z sh -c /bin/sh2018-02-15T18:25:57Z id2018-02-15T18:25:57Z /bin/sh2018-02-15T18:26:27Z /bin/sh /usr/bin/which python2018-02-15T18:26:27Z python -c import pty; pty.spawn(”/bin/bash”)2018-02-15T18:26:57Z cat /etc/shadow2018-02-15T18:26:57Z cat /etc/passwd2018-02-15T18:28:27Z su - marlinspike2018-02-15T18:28:57Z /bin/sh /usr/bin/which nmap2018-02-15T18:29:27Z nmap 10.10.10.0/242018-02-15T18:29:57Z nmap -sV 10.10.10.122018-02-15T18:29:57Z nmap -sV 10.10.10.112018-02-15T18:41:58Z sh -c /bin/sh2018-02-15T18:41:58Z /bin/sh2018-02-15T18:42:28Z python -c import pty; pty.spawn(”/bin/bash”)2018-02-15T18:42:28Z /bin/bash2018-02-15T18:42:28Z su - marlinspike2018-02-15T18:44:59Z telnet 10.10.10.11 212018-02-15T18:48:59Z sh -c /bin/sh2018-02-15T18:48:59Z /bin/sh2018-02-15T18:49:29Z /bin/bash2018-02-15T18:49:29Z python -c import pty; pty.spawn(”/bin/bash”)2018-02-15T18:49:29Z su - marlinspike2018-02-15T18:49:59Z ssh [email protected]
• Connected to 10.10.10.12 with ssh using the sshuser account based on the
‘ssh [email protected]’ entry
Determining which commands the attacker executed on 10.10.10.11 and 10.10.10.12 required
only modification of the target field in the query in Figure 5.12 to traverse the graph. The results for
the two servers are shown in Table 5.11 and Table 5.12.
From the query results, the analyst confirms that the attacker interacted with both servers,
viewed the contents of the password files and a file named ’secret’. Determining how the attacker
connected to the two servers required further investigation into the commands run on the web server
and querying the graph.
For the 10.10.10.11 server, based on the results in Table 5.10 the attacker initiated a telnet con-
68
Table 5.11: Blind Attack - Command Line Query Results for 10.10.10.11
Time Command2018-02-15T18:39:44Z /bin/sh2018-02-15T18:39:44Z sh -c /bin/sh;/sbin/sh2018-02-15T18:41:14Z /bin/sh2018-02-15T18:41:14Z sh -c /bin/sh;/sbin/sh2018-02-15T18:45:14Z sh -c /bin/sh;/sbin/sh2018-02-15T18:45:14Z /bin/sh2018-02-15T18:46:44Z id2018-02-15T18:47:14Z cat /etc/passwd2018-02-15T18:47:14Z cat /etc/shadow2018-02-15T18:47:44Z ls2018-02-15T18:48:14Z ls2018-02-15T18:48:44Z ls
Table 5.12: Blind Attack - Command Line Query Results for 10.10.10.12
Time Command2018-02-15T18:50:16Z ls –color=auto2018-02-15T18:50:16Z cat secret2018-02-15T18:50:46Z ls –color=auto2018-02-15T18:50:46Z cat /etc/passwd2018-02-15T18:50:46Z cat /etc/shadow
nection from 10.10.10.10 to 10.10.10.11 on port 21. As port 21 is typically associated file transfer
protocol (FTP) activity, this connection required further investigation. Running the AQL query in
Figure 5.14 identified the alert in Figure 5.15, confirming the exploitation attempt of 10.10.10.11 on
port 21 from 10.10.10.10 during the time frame of the attack. The name and classification fields of
the alert identify the activity as exploiting a backdoor in the ProFTPD software (“ProFTPd 1.3.3c -
Compromised Source Backdoor Remote Code Execution” 2018) installed on the 10.10.10.11 server.
69
Figure 5.14: Blind Attacks - Alerts for 10.10.10.11 Query
Figure 5.15: Blind Attacks - Alerts for 10.10.10.11 Results
Modifying the query in Figure 5.14 to search for alerts between 10.10.10.10 and 10.10.10.12
returned no results, indicating that the attacker used an exploit undetected by the NIDS or the
attacker logged into 10.10.10.12 with valid user credentials. Based on the commands run on each
of the hosts from Tables 5.10, 5.11, and 5.12, the attacker viewed the password files for each of the
servers, and therefore could crack passwords for users on those servers.
From the results in Table 5.10, the attacker attempted a secure shell (SSH) connection from
10.10.10.10 to 10.10.10.12. The AQL query in Figure 5.16 traverses the graph to identify the
hostname and IP address of any server where the ’sshuser’ account exists. From the results in
Table 5.13, the ’sshuser’ account exists on the 10.10.10.11 and 10.10.10.12 servers. By extracting
the password hash for the ’sshuser’ from the 10.10.10.11 server, the attacker could have cracked
70
the password for the account and used valid credentials to complete the SSH connection to the
10.10.10.12 server.
Figure 5.16: Blind Attacks - sshuser Query
Table 5.13: Blind Attack - sshuser Query Results
Hostname IP Addressattack2 10.10.10.12attack2 fe80::d787:b9b1:3d39:bff0attack2 fe80::ee4e:bbe0:ae3f:de1aattack1 10.10.10.11attack1 fe80::369f:2a6d:ccb2:48a7attack1 fe80::eb2b:88a9:c1d1:6517
Using the ’sshuser’ account and the time frame of the suspected login, the AQL query in Fig-
ure 5.17 traverses the graph to return any login events for the ’sshuser’ on the 10.10.10.12 server.
The query results in Figure 5.18 confirm an SSH login from 10.10.10.10 to 10.10.10.12 with the
’sshuser’ account during the time of the attack.
Figure 5.17: Blind Attacks - sshuser Logins Query
71
Figure 5.18: Blind Attacks - sshuser Logins Query Results
From reconnaissance to successful exploitation of the three servers the entire attack took ap-
proximately 30 minutes. Full analysis of the attacker’s actions required under two hours to com-
plete with manual query construction and reuse. The analysis identified the attacker, the targets, the
method of exploitation against each target, and actions taken by the attacker on the compromised
servers. From the analysis, a defender has the information required to take corrective action to
remove the attacker’s access and prevent further attacks via the same vectors.
With the information collected during the analysis, the defender knows the actions needed to
respond and recover from the attack:
• Change the admin password for the WordPress installation based on the observed WordPress
administrator account login and upload of a malicious plugin
• Change the password for the marlinspike account based on the attacker switching to the mar-
linspike user account
• Change the password for the sshuser account based on the attacker completing SSH connec-
tion to 10.10.10.12
• Change the passwords for any remaining accounts on the servers based on the attack extract-
ing password hashes and successfully logging in as the marlinspike and sshuser accounts
• Institute firewall rules that prevent initiation of outbound connections from the web server72
based on observing the web server initiating an outbound connection to the attacker
• Remove the malicious plugin from the WordPress installation based on the observed installa-
tion of a plugin resulting in compromise of the web server
• Patch the vulnerable version of ProFTPD on the 10.10.10.11 server based on the confirmation
of successful exploitation
In summary, the above analysis demonstrated the capability of the research to collect, trans-
form, and store network events from multiple sources into a graph-model and integrate the query
capability of the graph-model into the analyst workflow. In less than two hours the analyst detected
the source and method of the attacks against multiple servers and obtained the needed information
to respond and recover from the attacks. Through manual query construction and analysis, the de-
fender understands the IP addresses of the involved hosts, the extent of the accounts compromised
as part of the attack, identification of vulnerable software running, and the command history from
each of the servers compromised during the time frame of the attack. The results obtained from the
queries provide the analyst with the information necessary to take corrective action to respond and
recover from the successful attack.
73
Chapter 6 – Conclusions
6.1 Results
This research demonstrated the application of an architecture that collects security-related events
from multiple sources and transforms the events for storage in a graph database for integration into
the workflow of network defenders. Through the use of open source technologies, the development
of the architecture and supporting software aggregate events from a network intrusion detection
system (NIDS), a network security monitoring system (NSM), and telemetry from hosts within the
network into a central messaging system. The collected events are then transformed and stored in a
graph database for querying by a network defender.
With the architecture in place, a series of scripted and blind attacks conducted against hosts
within the network to generate events related to attacker activity within a targeted network. Analysis
of the collected network data from the perspective of a network defender demonstrated the capability
of the architecture and the graph-model to rapidly extract relevant information from the collected
events and determine the actions taken by the attacker and the necessary response and recovery
steps in response to the attack.
Based on the average breach detection time, as reported by multiple sources in Table 6.1 (Brum-
field 2017; SANS Analyst Program 2017; Ponemon Institute 2017) the observed time to detection
in this research of 1-2 hours demonstrates a measurable improvement over current industry trends.
As demonstrated in the analysis of the attack scenarios, the graph-model provides an analyst with
the ability to traverse the graph from an observed alert reported by the NIDS to retrieve commands
run by the targeted host as reported by osquery. By transforming events from multiple sources into
a context-based graph, traversing the graph based on the relationships between the data elements
removes the need for an analyst to manual determine the relationships and query multiple sources,
reducing the time to identify and confirm malicious activity.
6.2 Limitations
While the analysis of the scripted and blind attack scenarios validated the hypothesis, there are
limitations in the results of the research. The different sources of average breach detection times,
74
Table 6.1: Average Breach Detection Times
Source Average Detection TimePonemon Institute 191 daysVerizon 2017 DBIR weeks to months2017 SANS Incident Response Survey 6-24 hoursThis research 1-2 hours
the limited scale of the simulation environment, and the limited diversity of specific attacks limit
the evaluation of the efficacy of the research to reduce average detection time during a breach.
The simulation environment contained over 30 desktops, laptops, mobile devices, and virtual
machines. Typical enterprise networks, the envisioned environments that would benefit from the
research, contain hundreds to thousands of endpoints. While the architecture discussed in this
research could scale to meet the demand of such large networks, the data collected and analyzed
from the simulation network is a small fraction of that typically seen in enterprise networks.
The scripted and blind attacks provided sufficient data to validate the graph-model approach
to detecting malicious network activity. However, the attack surface presented by the vulnerable
targets limited the number of potential attacks available to the attackers. The diversity of hardware,
software, and configuration typically present in an enterprise network provides a much larger attack
surface and opportunity for exploitation.
Finally, the reported average breach detection time ranged from 6 hours to months. While
the large range may be due to varying data sources and analysis methods used by organizations
conducting the analysis, this research assumed an approximation of the average breach detection
time to be on the order of days. Additionally, without access to the actual events from the analyzed
real-world breaches, there is no direct comparison of the detection methods used during the real-
world breaches and the methods discussed in this research.
6.3 Future Research
This research demonstrated the network defense capabilities of an architecture that collects,
transforms, and stores events from multiple sources into a graph-model to improve the workflow of
network defenders. Future research should be directed against:
75
• Larger Data Sets - Enhancing the architecure to support enterprise network level data collec-
tion and analysis to further validate the capabilities provide by graph-based network defense
and testing the architecture in a typical enterprise network deployment or a research honeynet.
• Additional Data Sources - Inclusion of additional data sources such as vulnerability scan-
ners, network scanners, threat intelligence, host-based intrusion detection, and additional
queries for osquery to enrich the graph-model and provide more detailed information for
network defenders.
• Model Improvements - Modify and improve the graph-model data structure based on exist-
ing and new data sources to provide for additional context and improved graph query perfor-
mance.
• Automation - Parameterize and automate graph database queries to minimize the operator
interaction required for detection and localization of attacker activity.
6.4 Practical Application
With organizations in all industry sectors facing the threat of attack, implementation of the
event correlation system discussed in this praxis would provide network defenders with a capability
that could improve their workflow and reduce the time to detect and localize attacker activity. By
reducing the time available to an attacker in the network, defenders greatly reduce the risk of the
attacker moving laterally through the network and reaching their objective.
As the components of the architecture developed in this praxis are all open source software
offerings, the initial cost to install an implementation of the concepts in this research are minimal
given the average cost of a successful breach into the network. However, organizations must also
consider the potential maintenance and training costs associated with any new technology such as
licensing for enterprise support of the open source software and ensuring defenders fully understand
how to utilize new tools and techniques for network defense.
As discussed in Chapter 1, by striving to meet the design goals of modularity, scalability, and a
distributed architecture, the event correlation system can be integrated into networks of varying size,
structure, and purpose. The container-first approach in the software architecture permits organiza-
tions to add or modify individual components within the architecture while not requiring updates
76
to the entire system. Additionally, the use of containers allows organizations to host the system in
any environment capable of running containers to include bare metal servers, virtual machines, or
any of the major cloud provides such as Google, Microsoft, and Amazon. The scalability of the
system also ensures that networks of any size can implement the correlation system to meet the
higher data throughput requirements. Although the simulation events occurred in relatively small
home network environment, the cluster architecture provided by Kafka and ArangoDB combined
with the ability to run multiple instances of the same container ensures the system can grow to meet
an organizations current and future monitoring needs.
Although this research used Bro, Suricata, and osquery as data sources for the model, the mod-
ularity provided by the architecture allows organizations to use any data sources that are currently
available in the network with minimal updates to the processing functions. By using existing data
sources, the organization can minimize the impact of bringing the new system online while retaining
the capability to add new data sources as the needs of the organization change.
Finally, while the research demonstrated the capability of graph-based event correlation, the
future research areas discussed in Section 6.3 would provide a system more capable of assisting
network defenders. Additional data sources and updates to the graph model will provide defenders
with a more holistic view of the activity occurring within the network while parameterization and
automation of AQL queries will further improve defender workflow and reduce time to detection of
malicious activity.
77
References
“Apache Kafka.” 2017. Apache. Accessed December 10, 2017. https://kafka.apache.org.
“Apache Kafka - Uses.” 2017. Apache. Accessed December 10, 2017. https://kafka.apache.org/uses.
“Apache Kafka - Intro.” 2017. Apache Kafka. Accessed December 29, 2017. https://kafka.apache.
org/intro.
“Metron - Logging Bro Output to Kafka.” 2017. Apache Metron. Accessed December 8, 2017.
https://metron.apache.org/current-book/metron-sensors/bro-plugin-kafka/index.html.
ArangoDB. 2016. What is multi-model database and why use it? Technical report. ArangoDB,
December.
“ArangoDB - highly available multi-model NoSQL database.” 2017. ArangoDB. Accessed Decem-
ber 8, 2017. https://www.arangodb.com/.
“Foxx at a glance.” 2017. ArangoDB. Accessed December 8, 2017. https://docs.arangodb.com/3.3/
Manual/Foxx/AtAGlance.html.
Bromiley, Matt. 2017. Enhance Your Investigations with Network Data. Technical report. SANS
Institute, October.
Brumfield, J. 2017. 2017 Data Breach Investigations Report. Technical report. Verizon Enterprise.
Critical Security Controls for Effective Cyber Defense. 2015. Technical report. The Center for In-
ternet Security, October.
Cummings, JJ, and Michael Shirk. 2017. “shirkdog/pulledpork.” Github. https://github.com/shirkd
og/pulledpork.
“DB-Engines Ranking - popularity ranking of graph DBMS.” 2017. DB-Engines. Accessed Decem-
ber 23, 2017. https://db-engines.com/en/ranking/graph+dbms.
78
Djanali, Supeno, Baskoro Pratomo, Hudan Studiawan, R Anggoro, and T.C. Henning. 2015. “Coro :
Graph-based automatic intrusion detection system signature generator for evoting protection.”
81 (November): 535–546.
“Swarm mode key concepts.” 2017. Docker. Accessed December 10, 2017. https://docs.docker.
com/engine/swarm/key-concepts/.
“What is a Container.” 2017. Docker. Accessed December 10, 2017. https://www.docker.com/what-
container.
“Beats: Data Shippers for Elasticsearch.” 2017. elastic. Accessed December 1, 2017. https://www.
elastic.co/products/beats.
“Powering Data Search, Log Analysis, Analytics.” 2017. elastic. Accessed December 1, 2017. https:
//www.elastic.co/products.
“Emerging Threats.” 2017. Emerging Threats. Accessed December 8, 2017. http://doc.emergingthr
eats.net/bin/view/Main/WebHome.
“Packs.” 2017. Facebook. https://osquery.io/schema/packs/.
FireEye. 2017. M-Trends 2017 - A View From the Frontlines. Technical report. FireEye.
“What is Fluentd?” 2017. Fluentd Project. Accessed December 12, 2017. https://www.fluentd.org/
architecture.
Fredj, Ouissem Ben. 2015. “A realistic graph-based alert correlation system.” SCN-12-0553.R1,
Security and Communication Networks 8 (15): 2477–2493. ISSN: 1939-0122. doi:10.1002/sec.
1190. http://dx.doi.org/10.1002/sec.1190.
Friedberg, Ivo, Florian Skopik, Giuseppe Settanni, and Roman Fiedler. 2015. “Combating advanced
persistent threats: From network event correlation to incident detection.” Computers & Security
48:35–57. ISSN: 0167-4048. doi:https://doi.org/10.1016/j.cose.2014.09.006. http://www.
sciencedirect.com/science/article/pii/S0167404814001461.
79
Garcia-Teodoro, P., J. Diaz-Verdejo, G. Macia-Fernandez, and E. Vazquez. 2009. “Anomaly-based
network intrusion detection: Techniques, systems and challenges.” Computers & Security 28
(1): 18–28. ISSN: 0167-4048. doi:https://doi.org/10.1016/j.cose.2008.08.003. http://www.
sciencedirect.com/science/article/pii/S0167404808000692.
Kennedy, David, Jim O’Gorman, Devon Kearns, and Mati Aharoni. 2011. Metasploit: The Pene-
tration Tester’s Guide. 1st. San Francisco, CA, USA: No Starch Press. ISBN: 159327288X,
9781593272883.
Kent, Alexander D., Lorie M. Liebrock, and Joshua C. Neil. 2015. “Authentication graphs: Analyz-
ing user behavior within an enterprise network.” Computers and Security 48:150–166. ISSN:
0167-4048. doi:https://doi.org/10.1016/j.cose.2014.09.001. http://www.sciencedirect.com/
science/article/pii/S0167404814001321.
Milenkoski, Aleksandar, Marco Vieira, Samuel Kounev, Alberto Avritzer, and Bryan D. Payne.
2015. “Evaluating Computer Intrusion Detection Systems: A Survey of Common Practices.”
ACM Comput. Surv. (New York, NY, USA) 48, no. 1 (September): 12:1–12:41. ISSN: 0360-
0300. doi:10.1145/2808691. http://doi.acm.org/10.1145/2808691.
“The Neo4j Graph Platform.” 2017. Neo4j, Inc. Accessed December 1, 2017. https://neo4j.com/.
NIST. 2014. Framework for Improving Critical Infrastructure Cybersecurity. Technical report. Na-
tional Institue of Standardss and Technology, February. https://www.nist.gov/framework.
Oh, Joohwan. 2016. “kafka-python.” Accessed December 7, 2017. http://python-driver-for-arango
db.readthedocs.io/en/master/index.html.
“Suricata User Guide.” 2016. Open Information Security Foundation. http://suricata.readthedocs.
io/en/latest/index.html.
“Suricata Open Source IDS / IPS / NSM Engine.” 2017. Open Information Security Foundation.
Accessed December 30, 2017. https://suricata-ids.org/.
“osquery Docs.” 2017. Accessed December 23, 2017. https://osquery.readthedocs.io/en/stable/.
80
Pharate, Abhishek, Harsha Bhat, Vaibhav Shilimkar, and Nalini Mhetre. 2015. “Article: Classifi-
cation of Intrusion Detection System.” Full text available, International Journal of Computer
Applications 118, no. 7 (May): 23–26.
“RabbitMQ - Messaging that just works.” 2017. Pivotal. Accessed December 1, 2017. https://www.
rabbitmq.com/.
Ponemon Institute. 2017. 2017 Cost of Data Breach Study. Technical report. Ponemon Institute.
Powers, Dana, and David Arthur. 2016. “python-arango.” Accessed December 7, 2017. http://kafka-
python.readthedocs.io/en/master/.
“ProFTPd 1.3.3c - Compromised Source Backdoor Remote Code Execution.” 2018. https://www.
exploit-db.com/exploits/15662/.
Ramaki, Ali Ahmadian, and Reza Ebrahimi Atani. 2016. “A survey of IT early warning systems:
architectures, challenges, and solutions.” Sec.1647, Security and Communication Networks 9
(17): 4751–4776. ISSN: 1939-0122. doi:10.1002/sec.1647. http://dx.doi.org/10.1002/sec.1647.
“Ansible Documentation.” 2017. Red Hat. http://docs.ansible.com/ansible/latest/index.html.
Reed, Theodore, Robert G. Abbott, Benjamin Anderson, Kevin Nauer, and Chris Forsythe. 2014.
“Simulation of Workflow and Threat Characteristics for Cyber Security Incident Response
Teams.” Proceedings of the Human Factors and Ergonomics Society Annual Meeting 58 (1):
427–431. doi:10.1177/1541931214581089. eprint: https://doi.org/10.1177/154193121458108
9. https://doi.org/10.1177/1541931214581089.
Robinson, Ian, Jim Webber, and Emil Eifrem. 2015. Graph Databases. OReilly.
Sanders, Chris, and Jason Smith. 2014. Applied Network Security Monitoring. Syngress.
SANS Analyst Program. 2017. 2017 SANS Incident Response Survey. Technical report. SANS In-
stitute.
81
Security and Privacy Controls for Information Systems and Organizations. 2017. Technical report.
National Institute for Standards and Technology, August.
Shakarian, Paulo, Jana Shakarian, and Andrew Ruef. n.d. Introduction to Cyber-Warfare: A Multi-
disciplinary Approach. Syngress.
Shapoorifard, Hossein, and Pirooz Shamsinejad. 2017. “A Novel Cluster-based Intrusion Detec-
tion Approach Integrating Multiple Learning Techniques.” International Journal of Computer
Applications (New York, USA) 166, no. 3 (May): 13–16. ISSN: 0975-8887. doi:10.5120/ijca
2017913948. http://www.ijcaonline.org/archives/volume166/number3/27649-2017913948.
Shostack, Adam. 2014. Threat Modeling: Designing for Security. Wiley.
“Talos - Author of the Official Snort Rule Sets.” 2017. Snort. Accessed December 8, 2017. https:
//www.snort.org/talos.
“SIEM, AIOps, Application Management, Log Management, Machine Learning, and Compliance.”
2017. Splunk. Accessed December 1, 2017. https://www.splunk.com.
“Squil - Open Source Network Security Monitoring.” 2014. Squil. Accessed December 8, 2017.
http://bammv.github.io/sguil/index.html.
“Nessus Professional Vulnerability Scanner.” 2017. tenable. Accessed December 7, 2017. https :
//www.tenable.com/products/nessus/nessus-professional.
“Apache Zookeeper.” 2017. The Apache Software Foundation. Accessed December 29, 2017. https:
//zookeeper.apache.org.
“The Bro Project.” 2014. The Bro Project. Accessed December 26, 2017. https://www.bro.org/.
“Trail of Bits.” 2017. Trail of Bits. Accessed December 7, 2017. https://www.trailofbits.com/.
“Fluent Bit.” 2017. Treasure Data. Accessed December 7, 2017. https://fluentbit.io.
82
Vasilomanolakis, Emmanouil, Shankar Karuppayah, Max Muhlhauser, and Mathias Fischer. 2015.
“Taxonomy and Survey of Collaborative Intrusion Detection.” ACM Comput. Surv. (New York,
NY, USA) 47, no. 4 (May): 55:1–55:33. ISSN: 0360-0300. doi:10.1145/2716260. http://doi.
acm.org/10.1145/2716260.
“Basic Penetesting: 1.” 2017. VulnHub. https://www.vulnhub.com/entry/basic-pentesting-1,216/.
Yadav, Tarun, and Arvind Mallari Rao. 2016. “Technical Aspects of Cyber Kill Chain.” CoRR
abs/1606.03184. arXiv: 1606.03184. http://arxiv.org/abs/1606.03184.
83