xiaoyan sun-com

147
The Pennsylvania State University The Graduate School College of Information Sciences and Technology USING BAYESIAN NETWORKS FOR ENTERPRISE NETWORK SECURITY ANALYSIS A Dissertation in Information Sciences and Technology by Xiaoyan Sun © 2016 Xiaoyan Sun Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy August 2016

Upload: khangminh22

Post on 04-Feb-2023

2 views

Category:

Documents


0 download

TRANSCRIPT

The Pennsylvania State University

The Graduate School

College of Information Sciences and Technology

USING BAYESIAN NETWORKS FOR ENTERPRISE NETWORK

SECURITY ANALYSIS

A Dissertation in

Information Sciences and Technology

by

Xiaoyan Sun

© 2016 Xiaoyan Sun

Submitted in Partial Fulfillment

of the Requirements

for the Degree of

Doctor of Philosophy

August 2016

The dissertation of Xiaoyan Sun was reviewed and approvedú by the following:

Peng Liu

Professor of Information Sciences and Technology

Dissertation Advisor, Chair of Committee

John Yen

Professor of Information Sciences and Technology

Dinghao Wu

Assistant Professor of Information Sciences and Technology

George Kesidis

Professor of Computer Science and Engineering

Professor of Electrical Engineering

Andrea Tapia

Associate Professor of Information Sciences and Technology

Director of Graduate Programs, College of Information Sciences and Tech-

nology

úSignatures are on file in the Graduate School.

ii

Abstract

Achieving complete and accurate cyber situation awareness (SA) is crucial for

security analysts to make right decisions. A large number of algorithms and

tools have been developed to aid the cyber security analysis, such as vulnerability

analysis, intrusion detection, network and system monitoring and recovery, and so

on. Although these algorithms and tools have eased the security analysts’ work

to some extent, their knowledge bases are usually isolated from each other. It’s a

very challenging task for security analysts to combine these knowledge bases and

generate a wholistic understanding towards the enterprise networks’ real situation.

To address the above problem, this paper takes the following approach. 1)

Based on existing theories of situation awareness, a Situation Knowledge Reference

Model (SKRM) is constructed to integrate data, information, algorithms/tools, and

human knowledge into a whole stack. SKRM serves as an umbrella model that

enables e�ective analysis of complex cyber-security problems. 2) The Bayesian

Network is employed to incorporate and fuse information from di�erent knowledge

bases. Due to the overwhelming amount of alerts and the high false rates, digging

out real facts is di�cult. In addition, security analysis is usually bound with a

number of uncertainties. Hence, Bayesian Networks is an e�ective approach to

iii

leverage the collected evidence and eliminate uncertainties.

With SKRM as the guidance, two independent security problems are identified:

the stealthy bridge problem in cloud and the zero-day attack path problem. This

paper will demonstrate how these problems can be analyzed and addressed by

constructing proper Bayesian Networks on top of di�erent layers from SKRM.

First, the stealthy bridge problem. Enterprise network islands in cloud are

expected to be absolutely isolated from each other except for some public services.

However, current virtualization mechanism cannot ensure such perfect isolation.

Some “stealthy bridges” may be created to break the isolation due to virtual

machine image sharing and virtual machine co-residency. This paper proposes to

build a cloud-level attack graph to capture the potential attacks enabled by stealthy

bridges and reveal possible hidden attack paths that are previously missed by

individual enterprise network attack graphs. Based on the cloud-level attack graph,

a cross-layer Bayesian network is constructed to infer the existence of stealthy

bridges given supporting evidence from other intrusion steps.

Second, the zero-day attack path problem. A zero-day attack path is a multi-

step attack path that includes one or more zero-day exploits. This paper proposes

a probabilistic approach to identify the zero-day attack paths. An object instance

graph is first established to capture the intrusion propagation. A Bayesian network is

then built to compute the probabilities of object instances being infected. Connected

through dependency relations, the instances with high infection probabilities form

a path, which is viewed as the zero-day attack path.

iv

Contents

List of Figures viii

List of Tables x

List of Symbols xi

Acknowledgments xii

Chapter 1Introduction 11.1 Cyber Situation Awareness . . . . . . . . . . . . . . . . . . . . . . . 11.2 Two Identified Problems . . . . . . . . . . . . . . . . . . . . . . . . 31.3 A Common Tool: Bayesian Networks . . . . . . . . . . . . . . . . . 5

Chapter 2SKRM: Where Security Techniques Talk to Each Other 82.1 Basic Concepts of Situation Awareness . . . . . . . . . . . . . . . . 82.2 A Model of Cyber Situation Knowlege Abstraction: the Application

of SA to Cyber Field . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 SKRM Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.1 Why do we need SKRM? . . . . . . . . . . . . . . . . . . . . 132.3.2 What is the main structure of SKRM? . . . . . . . . . . . . 142.3.3 How can SKRM enable cyber situation awareness? . . . . . 17

2.4 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 182.4.2 Capability 1: Mission Asset Identification and Classification 202.4.3 Capability 2: Mission Damage and Impact Assessment . . . 23

2.4.3.1 The System Object Dependency Graph . . . . . . . 272.4.3.2 Mission-Task-Asset Map . . . . . . . . . . . . . . . 312.4.3.3 MTA based Bayesian Networks . . . . . . . . . . . 33

v

2.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Chapter 3Inferring the Stealthy Bridges between Enterprise Network Is-

lands in Cloud Using Cross-Layer Bayesian Networks 423.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.2 Cloud-level Attack Graph Model . . . . . . . . . . . . . . . . . . . 46

3.2.1 Logical Attack Graph . . . . . . . . . . . . . . . . . . . . . . 473.2.2 Cloud-level Attack Graph . . . . . . . . . . . . . . . . . . . 49

3.3 Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.4 Cross-layer Bayesian Networks . . . . . . . . . . . . . . . . . . . . . 55

3.4.1 Identify the Uncertainties . . . . . . . . . . . . . . . . . . . 563.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

3.5.1 Cloud-level Attack Graph Generation . . . . . . . . . . . . . 623.5.2 Construction of Bayesian Networks . . . . . . . . . . . . . . 64

3.6 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.6.1 Attack Scenario . . . . . . . . . . . . . . . . . . . . . . . . . 653.6.2 Experiment Result . . . . . . . . . . . . . . . . . . . . . . . 68

3.6.2.1 Experiment 3.1: Probability Inferring . . . . . . . . 693.6.2.2 Experiment 3.2: Impact of False Alerts . . . . . . . 733.6.2.3 Experiment 3.3: Impact of Evidence Confidence

Value . . . . . . . . . . . . . . . . . . . . . . . . . 733.6.2.4 Experiment 3.4: Impact of Evidence Input Order . 753.6.2.5 Experiment 3.5: Mitigate Impact of False Alerts

by Tuning Evidence Confidence Value . . . . . . . 763.6.2.6 Experiment 3.6: Complexity . . . . . . . . . . . . . 77

3.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.8 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . 80

Chapter 4ZePro: Probabilistic Identification of Zero-day Attack Paths 824.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.2 Rationales and Models . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.2.1 System Object Dependency Graph . . . . . . . . . . . . . . 864.2.2 Why use Bayesian Network? . . . . . . . . . . . . . . . . . . 884.2.3 Problems of Constructing BN based on SODG . . . . . . . . 904.2.4 Object Instance Graph . . . . . . . . . . . . . . . . . . . . . 91

4.3 Instance-graph-based Bayesian Networks . . . . . . . . . . . . . . . 95

vi

4.3.1 The Infection Propagation Models . . . . . . . . . . . . . . . 954.3.2 Evidence Incorporation . . . . . . . . . . . . . . . . . . . . . 97

4.4 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.6 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.6.1 Attack Scenario . . . . . . . . . . . . . . . . . . . . . . . . . 1044.6.2 Experiment Results . . . . . . . . . . . . . . . . . . . . . . . 106

4.6.2.1 Correctness . . . . . . . . . . . . . . . . . . . . . . 1064.6.2.2 Size of Instance Graph and Zero-day Attack Paths 1124.6.2.3 Influence of Evidence . . . . . . . . . . . . . . . . . 1164.6.2.4 Influence of False Alerts . . . . . . . . . . . . . . . 1174.6.2.5 Sensitivity Analysis and Influence of · and fl . . . . 1184.6.2.6 Complexity and Scalability . . . . . . . . . . . . . 119

4.7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1214.8 Limitation and Conclusion . . . . . . . . . . . . . . . . . . . . . . . 122

Chapter 5Conclusion 123

Bibliography 126

vii

List of Figures

2.1 A Model of Cyber Situation Knowledge Abstraction [12] . . . . . . 112.2 The Situation Knowledge Reference Model (SKRM) [11] . . . . . . 152.3 The Testbed Network and Attack Scenario [12] . . . . . . . . . . . . 192.4 Mission Asset Identification and Classification [12] . . . . . . . . . . 212.5 The Dependency Attack Graph [12] . . . . . . . . . . . . . . . . . . 222.6 Mission Damage and Impact Assessment [12] . . . . . . . . . . . . . 242.7 The SODG as the Construct between Attack and Mission [13] 1 . . 272.8 An example SODG built from the simplified system call log [13] . . 302.9 Mission-Task-Asset Map [13] 2 . . . . . . . . . . . . . . . . . . . . . 322.10 An Example of Benign Mission Dependency Graph [13] . . . . . . . 352.11 An Example of Tainted Mission Dependency Graph [13] . . . . . . . 362.12 An Example of MTA-based BN [13] . . . . . . . . . . . . . . . . . . 38

3.1 The Stealthy Bridges between Enterprise Network Islands in Cloud [14] 433.2 A Portion of an Example Logical Attack Graph [14] . . . . . . . . . 483.3 Features of the Public Cloud Structure [14] . . . . . . . . . . . . . . 503.4 An Example Cloud-level Attack Graph Model [14] . . . . . . . . . . 513.5 A Portion of Bayesian Network with associated CPT [14] . . . . . . 543.6 A Portion of Bayesian Network with AAN node [14] . . . . . . . . . 583.7 The Evidence-Condidence Pair [14] . . . . . . . . . . . . . . . . . . 613.8 The Attack Scenario [14] . . . . . . . . . . . . . . . . . . . . . . . . 663.9 The Cross-Layer Bayesian Network Constructed for the Attack

Scenario [14] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723.10 Time Used for BN Compilation . . . . . . . . . . . . . . . . . . . . 793.11 Memory Used for BN Compilation . . . . . . . . . . . . . . . . . . 80

4.1 An SODG. An SODG generated by parsing an example set ofsimplified system call log. The label on each edge shows the timeassociated with the corresponding system call. . . . . . . . . . . . . 87

4.2 An Example Bayesian Network. . . . . . . . . . . . . . . . . . . . . 89

viii

4.3 An Instance Graph. An instance graph generated by parsing thesame set of simplified system call log as in Figure 4.1a. The labelon each edge shows the time associated with the correspondingsystem call operation. The dotted rectangle and ellipse are newinstances of already existed objects. The solid edges and the dottededges respectively denote the contact dependencies and the statetransition dependencies. . . . . . . . . . . . . . . . . . . . . . . . . 94

4.4 The Infection Propagation Models. . . . . . . . . . . . . . . . . . . 954.5 Local Observation Model. . . . . . . . . . . . . . . . . . . . . . . . 984.6 System Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074.7 Attack Scenario. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074.8 The zero-day Attack Path in the Form of an Instance Graph for

Experiment 4.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.9 The zero-day Attack Path in the Form of an Instance Graph for

Experiment 4.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104.10 The Object-level Zero-day Attack Path in Experiment 4.1. . . . . . 1144.11 The Object-level Zero-day Attack Path in Experiment 4.2. . . . . . 115

ix

List of Tables

2.1 System Call Dependency Rules . . . . . . . . . . . . . . . . . . . . 292.2 CPT of Mission 1 in the Figure 2.12 [13] . . . . . . . . . . . . . . . 372.3 Modified CPT of Mission 1 in the Figure 2.12 [13] . . . . . . . . . . 38

3.1 CPT for Node Evidence [14] . . . . . . . . . . . . . . . . . . . . . . 623.2 A Sample Set of Interaction Rules [14] . . . . . . . . . . . . . . . . 633.3 Network Deployment [14] . . . . . . . . . . . . . . . . . . . . . . . . 703.4 Collected Evidence Corresponding to Attack Steps [14] . . . . . . . 713.5 Results of experiment 3.1 [14] . . . . . . . . . . . . . . . . . . . . . 733.6 Results of experiment 3.2 [14] . . . . . . . . . . . . . . . . . . . . . 743.7 Results of Experiment 3.3 [14] . . . . . . . . . . . . . . . . . . . . . 743.8 Results of experiment 3.4 . . . . . . . . . . . . . . . . . . . . . . . . 753.9 Results of experiment 3.5 . . . . . . . . . . . . . . . . . . . . . . . . 763.10 Size of Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . 78

4.1 CPT for Node p2 in Figure 4.2 . . . . . . . . . . . . . . . . . . . . . 904.2 CPT for Node sink

j+1 . . . . . . . . . . . . . . . . . . . . . . . . . 964.3 The Impact of Pruning the Instance Graphs . . . . . . . . . . . . . 1114.4 The Collected Evidence . . . . . . . . . . . . . . . . . . . . . . . . . 1124.5 The Influence of Evidence in Experiment 4.1 . . . . . . . . . . . . . 1164.6 The Influence of Evidence in Experiment 4.2 . . . . . . . . . . . . . 1164.7 The Influence of False Alerts . . . . . . . . . . . . . . . . . . . . . . 118

x

List of Symbols

SA Situation Awareness

SKRM Situation Knowledge Reference Model

AAN Attacker Action Node

AC Access Complexity

AMI Amazon Machine Image

BN Bayesian Network

CVSS Common Vulnerability Scoring System

CPT Conditional Probability Table

IDS Intrusion Detection System

OS Operating System

SODG System Object Dependency Graph

VMI Virtual Machine Image

xi

Acknowledgments

My first thanks go to my doctoral advisor, Prof. Peng Liu, for his endless support

and help throughout my entire PhD study. He spent countless hours meeting with

me for every detail of the projects and papers. With his brilliance, creativeness,

insights, diligence and patience, he guides and inspires me to discover and tackle

research problems in the wonderland of cyber security. He is my role model. What

I learned from him will benefit my entire life.

In addition, I want to express my sincere thanks to my doctoral committee

members, Prof. John Yen, Prof. Dinghao Wu and Prof. George Kesidis. They

are all amazing and successful professors, and also the sources of strong support,

prompt feedback and invaluable comments for the work presented in this paper.

I also would like to thank my collaborator, Dr. Anoop Singhal at National

Institute of Standards and Technology (NIST). His insightful comments in numerous

discussions are always important inputs to my work.

I also want to thank my labmates and friends here at Pennsylvania State

University. We share similar dreams, goals, interests, and experience. Their help

and support is always there whenever needed.

Finally, I feel very grateful to my parents, my husband and my son. They

xii

are the ones with warmest words, unconditional understanding, and sometimes

surprises. Life with them is so gorgeous.

xiii

Chapter 1 |Introduction

1.1 Cyber Situation Awareness

To better secure a network, human decision makers should clearly know and

understand what is going on in the network. This is basically what we call

cyber situation awareness (cyber SA). Human is the key role of cyber SA because

only human can be “aware”. Technologies regarding cyber security have made

remarkable progress in the past decades. A lot of algorithms and tools are developed

for vulnerability analysis, detection of attacks, damage and impact assessment, and

system recovery, etc. These technologies have significantly enhanced human analysts’

cyber situation awareness and facilitated their network security management. Attack

graph is one typical example. By combining vulnerabilities in the network, potential

attack paths can be automatically generated with attack graph tools. Through

generated attack paths, security analysts can clearly know how the attackers may

exploit the network. Without attack graph, it is very di�cult for them to construct

reasonable attack scenarios for even a small network only by reading the vulnerability

scan results, let alone for large scale enterprise network with hundreds to thousands

1

of hosts. In addition, due to the information asymmetry between security defenders

and attackers, defenders have to deploy a number of security sensors to monitor

the enterprises’ IT infrastructure. The main responsibility of security analysts is to

go through all types of reports from the security sensors to generate a wholistic

understanding towards the enterprise networks’ real situation. Although these

algorithms, tools, and security sensors have greatly eased the analysts’ work in

some aspects, they usually have di�erent knowledge bases. These knowledge bases

are isolated from each other. It is very challenging for security analysts to combine

the isolated information together to reveal real facts and achieve correct situation

awareness, especially when the amount of information is overwhelming.

To enhance human analysts’ situation awareness in cyber space, some existing

theories in situation awareness are applied into the cyber security field. A new

reference model called SKRM (Situation Knowledge Reference Model) is established

in Chapter 2. SKRM is a model that integrates cyber knowledge from di�erent

perspectives by coupling data, information, algorithms and tools, and human

knowledge, to enhance cyber analysts’ situation awareness. It mainly contains

four abstraction layers of cyber situation knowledge, including Workflow Layer,

App/Service Layer, Operating System Layer and Instruction Layer. These four

layers are generated by abstracting isolated situation knowledge from di�erent

perspectives of network. In addition to these four layers, attack graph is also an

essential part of SKRM. Attack Graph is not a specific layer in this stack, but

rather an interconnection technique between App/Service Layer and Operating

System Layer. Attack graph can generate potential attack paths in the network by

analyzing the vulnerabilities existing in the applications and services. These attack

paths can reveal which hosts are likely to be compromised. The lower level system

2

objects related to these hosts can then be scrutinized.

1.2 Two Identified Problems

SKRM model is not simply of a mapping of situation knowledge in di�erent spaces

to the four abstraction layers. It integrates data, information, algorithms and

tools, and human knowledge into a whole stack. Each abstraction layer generates a

graph that covers the entire enterprise network. In addition, each abstraction layer

views the same network from a di�erent perspective and at a di�erent granularity.

Most importantly, each abstraction layer leverages current available algorithms,

tools, and techniques in its corresponding area to extract the most critical and

useful information to present to human security analysts. Hence, SKRM serves as

an umbrella model that could enable solutions to di�erent security problems. In

this paper, two independent problems are identified on di�erent layers of SKRM,

including the stealthy bridge problem in cloud and the zero-day attack path problem.

Stealthy Bridge Problem in Cloud. Many enterprises have already mi-

grated into cloud by replacing their physical servers with virtual machines, such as

web server, mail server, etc. A public cloud can provide virtual infrastructures to

many enterprises. Except for some public services, these enterprise networks are

expected to be absolutely isolated from each other: connections from the outside

network to the protected internal network should be prohibited. However, current

virtualization mechanism cannot ensure such perfect isolation. Some “stealthy

bridges” can be created between the isolated enterprise network islands by exploit-

ing vulnerabilities caused by virtual machine image sharing and virtual machine

co-residency.

3

Stealthy bridges are stealthy information tunnels existing between disparate

networks in cloud, through which information (data, commands, etc.) can be

acquired, transmitted or exchanged maliciously. However, these stealthy bridges are

inherently unknown or hard to detect: they either exploit unknown vulnerabilities,

or cannot be easily distinguished from authorized activities by security sensors.

For example, side-channel attacks extract information by passively observing the

activities of resources shared by the attacker and the target virtual machine (e.g.

CPU, cache), without interfering the normal running of the target virtual machine.

Similarly, the activity of logging into an instance by leveraging intentionally left

credentials (passwords, public keys, etc.) also hides in the authorized user activities.

The stealthy bridges are usually used for constructing a multi-step attack and

facilitate subsequent intrusion steps across enterprise network islands in cloud. By

taking advantage of the stealthy bridges, attackers can carry on the malicious

activities from one enterprise network to another. The stealthy bridges per se are

di�cult to detect, but the intrusion steps before and after the construction of stealthy

bridges may trigger some abnormal activities. Human administrators or security

sensors like IDS could notice such abnormal activities and raise corresponding

alerts, which can be collected as the evidence of attack happening. However, due to

the overwhelming amount of alerts and the high false rates, human analysts cannot

easily achieve accurate situation awareness. They may not even be aware of the

existence of such stealthy bridges, let alone the exact locating and analyzing of the

stealthy bridges. Therefore, a solution should be proposed to save human analysts

from the sea of alerts and infer the existence of stealthy bridges.

Zero-day Attack Path Problem. Zero-day attacks continue to challenge

the enterprise network security defense. They are usually enabled by unknown

4

vulnerabilities. The information asymmetry between what the attacker knows and

what the defender knows makes individual zero-day exploits extremely hard to

detect. Therefore, detecting zero-day attack paths is a more feasible way than

detecting individual zero-day exploits. Considering the current enterprise network

is usually protected by the intrusion detection systems and firewalls, it is very hard

for attackers to directly break into the final target. Instead, attackers may use some

stepping stones. For example, attackers taking a workstation as the attack goal may

first compromise the web server and file server as the intermediate steps. This is

known as a multi-step attack. A zero-day attack path is formed when a multi-step

attack contains one or more zero-day exploits. Some previous work such as alert

correlation and attack graphs are both potential solutions to generate attack paths,

but they are not able to reveal zero-day segments in the attack paths. Patrol [41] is

an e�ective system for detecting zero-day attack paths, but the approach relies on

a strong assumption to distinguish real zero-day attack paths from suspicious ones:

extensive pre-knowledge about common features of known exploitations can be

extracted at the OS-level to help recognize future unknown exploitations. Therefore,

a new solution that doesn’t depend on such a strong assumption is needed for

zero-day attack path identification.

1.3 A Common Tool: Bayesian Networks

The SKRM has identified the abstraction layers needed to generate a correct and

accurate “big picture” for enhancing human analysts’ SA. However, even alerts

from di�erent security sensors are present in front of the human analysts, digging

out the real fact is still di�cult. In addition, human analysts may face a number

of uncertainties during the near real-time security analysis. For example, has the

5

attacker launched the attack? If he launched it, did he succeed to compromise the

host? How confident are we towards a certain alert? Obviously, a powerful tool

is needed to aid the near real-time security analysis by leveraging the collected

evidence and eliminating the uncertainties. Bayesian Network is such a tool that

we are looking for.

A Bayesian network (BN) is a probabilistic graphical model representing cause

and e�ect relations. For example, it is able to show the probabilistic causal

relationships between a disease and the corresponding symptoms. Therefore, by

taking evidence as input, a BN can calculate the probabilities of interested events.

For instance, in the stealthy bridge problem, a properly constructed BN is able to

infer the probability of a stealthy bridge existing on a certain host.

Bayesian Networks will gain much more power when combining with the SKRM

model. In SKRM, each abstraction layer views the same network from a di�erent

perspective and at a di�erent granularity. Each layer can serve as the complementary

support to the other layer. Therefore, the same attack may cause di�erent intrusion

symptoms on di�erent layers. For example, at the workflow layer, the symptom

could be abnormal business behavior, such as noticeable financial loss. At the

operating system layer, however, the intrusion system could be modified system

files, or compromised services, etc. When building Bayesian Networks based on

SKRM model, the intrusion symptoms from one layer can serve as the evidence to

the other layer.

Therefore, the two problems identified in the above section can be solved by

constructing proper Bayesian Networks on top of di�erent layers of SKRM.

First, the stealthy bridge problem can be studied by combining attack graph

and the operating system layer in SKRM. A cloud-level attack graph can be built to

6

capture the potential attacks enabled by stealthy bridges and reveal possible hidden

attack paths that are previously missed by individual enterprise network attack

graphs. Based on the cloud-level attack graph, a cross-layer Bayesian network is

constructed to infer the existence of stealthy bridges given supporting evidence

from other intrusion steps.

Second, the zero-day attack path problem is addressed on the operating system

layer of SKRM. An object instance graph is first built from system calls to capture

the intrusion propagation. To further reveal the zero-day attack paths hiding in

the instance graph, the proposed ZePro system constructs an instance-graph-based

Bayesian network. By leveraging intrusion evidence, the Bayesian network can

quantitatively compute the probabilities of object instances being infected. The

object instances with high infection probabilities reveal themselves and form the

zero-day attack paths.

In the following chapters, Chapter 2 briefly introduces the SKRM model. Chap-

ter 3 presents the stealthy bridge problem in cloud and a cross-layer Bayesian

Network to infer the existence of stealthy bridges. Chapter 4 mainly focuses on

the ZePro system for detecting zero-day attack paths at operating system level.

Chapter 5 concludes the whole paper.

7

Chapter 2 |SKRM: Where Security Tech-niques Talk to Each Other

In this chapter, section 2.1 first introduces some key concepts of situation awareness

and section 2.2 discusses how to apply SA to cyber field. Based on that, an SKRM

model is proposed in section 2.3.

2.1 Basic Concepts of Situation Awareness

There have been a number of definitions towards situation awareness. The very

first definitions are mostly related to aircraft domain, which are presented in the

review from Dominguez [1] and Fracker [2]. Endsley [3] provides a formal definition

of SA in dynamic environments: “situation awareness is the perception of the

elements of the environment within a volume of time and space, the comprehension

of their meaning, and the projection of their status in the near future.” From this

definition, Endsley basically view situation awareness as containing three levels:

perception, comprehension, and projection. Salerno et al. [4] slightly modified the

above definition and define SA as “situation awareness is the perception ... and the

8

projection of their status in order to enable decision superiority.” Salerno’s definition

implies the importance of situation awareness to the decision process. McGuinness

and Foy [5] add a fourth level to Endsley’s definition named resolution, which tries

to identify the best path to follow to achieve the desire state change to the current

situation. Resolution does not directly make decisions for humans regarding what

should be done, but provides available options and the corresponding impact of

these options to the environment. To help understand the four levels of SA, we use

the analogy made by McGuinness and Foy to explain them: perception represents

“What are the current facts?” Comprehension means, “What is actually going on?”

Projection asks, “What is most likely to happen if ...?” And Resolution means,

“What exactly shall I do?”

Alberts et al. [6] provides another definition of situation awareness, which

“describes the awareness of a situation that exists in part or all of the battle space

at a particular point in time”. For situation, they identify three main components:

missions and constraints on missions, capabilities and intentions of relevant forces,

and key attributes of the environment. For awareness, they say “awareness exists

in the cognitive domain” and awareness is “the result of a complex interaction

between prior knowledge and current perceptions of reality”. This definition basically

emphasizes the role of cognition in awareness and uncovers a fact that awareness

is not just perceptions of reality, but also includes prior knowledge as a crucial

factor. This explains why experienced analysts usually gain situation awareness

more rapidly and accurately than novice analysts. Actually all the above definitions

consider time as a basic element of SA. Decision makers rely on previous experience

and prior knowledge to keep aware of changing environment, make decisions, and

perform actions. As in the OODA (Observe, Orient, Decision, Act) loop [7],

9

decisions and actions provide feedback to the environment again and a new cycle

will start. Therefore, time is an essential element of SA.

2.2 A Model of Cyber Situation Knowlege Abstrac-

tion: the Application of SA to Cyber Field

Researchers from di�erent communities have established various reference models

or frameworks for situation awareness. Salerno et al. [4] construct a situation

awareness framework based on Joint Directors of Laboratories (JDL) data fusion

model [8] and Endsley’s model of SA in dynamic decision making [3]. With the

same definition of SA as in [5], Tadda and Salerno [9] propose a situation awareness

reference model and provide clear definition to concepts such as entity, object, group,

event, activity, etc. Both of the work demonstrates how to apply the established

model to di�erent domains.

The focus of this chapter is not to establish a reference model for situation

awareness, but to find a way to enhance human analysts’ SA by apply existing SA

theories to cyber security field. Therefore, a model of cyber Situation Knowledge

Abstraction is constructed based on the work by Tadda and Salerno [9] and by

Endsley [10], as shown in Figure 2.1. The key part of this model is an embedded

sub-model we proposed: Situation Knowledge Reference Model (SKRM). Simply

put, SKRM is a model that integrates cyber knowledge from di�erent perspectives by

coupling data, information, algorithms and tools, and human knowledge, to enhance

cyber analysts’ situation awareness. This following paragraphs will first explain the

cyber SA model, and then justify why and how to establish SKRM.

In the cyber Situation Knowledge Abstraction model in Figure 2.1, cyber

10

sensors

tools&algorithms

Level 1: Perception

Human Analysts

real world

Level 2: ComprehensionDamage Assessment

Level 3: ProjectionImpact Assessment

Level 4: ResolutionSecurity Measure Options

&Consequence

Automation System

data

Information

System Interface

Cyber Situation Awareness

Instruction Layer

App/Service Layer

Workflow Layer

Operating System Layer

t5

t3 *a node is a task

*a green dotted line is a control dependency*a red line is a data dependency

t6

t4

t2t1

*a blue line is an execution path*a yellow line is an unexecuted path

*a rectangle node is a primitive fact node*an edge is a causality relation

*a node is an application or service

*a line is a service dependency

*a node is a system object(file, process, socket ...)*an edge is a dependency (7 types)

*a node is a register, memory cell, or instruction*an edge is a data/control dependency

Mem addr[4bf0000,4K], [4bff000, 4K]/bin/gzip process: loads

/etc/group, /etc/ld.so.cache, etc

Mem addr[4b92000,12K], [4bcf000,4K]tar process:

loads /lib/libc.so.6, /etc/selinux/config,, etc

Sector(268821, 120), ...

file system info sector

t7

22:RULE3(remote exploit of a server program)

24:RULE7(direct network access )

25:hacl(internet,webServer,tcp,22)

23:netAccess(webServer,tcp,22)

19:attackerLocated(internet)

26:networkServiceInfo(webServer,openssl,tcp,22,root)

27:vulExists(webServer,‘CVE-2008-0166’,openssl,remoteExploit ,privEscalation )

Avactis Server

(172.18.34.4, 3306, tcp)

(192.168.101.5, 80, tcp)

(192.168.101.*, 53, tcp)

(*, 80, tcp)

Database Server

Web Server3rd Party Web Server

DNS Server

service dependencynetwork connection

14:execCoce(webServer, )

31:hacl(webServer,fileServer,nfsProtocol,nfsPort)

32:nfsExportInfo(fileServer,‘/export’,write, webServer)

30:RULE18(NFS shell)

6:accessFile(fileServer,write, ‘/export’) *an ellipse node is a rule node

*a diamond node is a derived fact node

execve/root/.ssh/authorized_keys

/etc/passwd/etc/ssh/ssh_host_rsa_key

...

clone

exit

/usr/sbin/sshd

execveclone exit/usr/sbin/sshd

…(repeat )

/mnt/wunderbar_emporium.tar.gz /usr/bin/ssh

mountd

/export/wunderbar_emporium.tar.gz on NFS Server

/etc/exports

/mnt/wunderbar_emporium.tar.gz on Workstation

/mnt/wunderbar_emporium.tar.gz

on Web Server 36038-4execve wunderbar_em

porium

Files in /home/workstation/workstation_attack

Exploit .sh

Workstation

...

execve

/home/workstation/wunderbar_emporium.tar.gz

/mnt/wunderbar_emporium.tar.gz

execve

/bin/cp

/bin/sh /bin/tar /bin/gzip

NFS ServerWeb Server

… *a blue arrow is an extension from host to network

Dependency Attack Graph

NFS4 Server

(192.168.101.5, 798, tcp/udp)

(172.18.34.5, 2049, tcp/udp)

(10.0.0.3, 973, tcp)

NFS Server

Financial Workstation

SKRM

Data flow

Output of SKRM Information Source of Human Analysts

Input of SKRM

Figure 2.1: A Model of Cyber Situation Knowledge Abstraction [12]

situation awareness consists of four levels: perception, comprehension, projection,

and resolution. The basic idea of this model is: taking input from data, information,

tools and algorithms, and intelligence of human experts from di�erent areas, SKRM

enables the four levels of situation awareness. On the other hand, the output of

11

SKRM, as well as data, information, system interfaces, and real world, all serve as

human analysts’ information sources for cyber SA.

The perception level is di�erent from the one in Tadda and Salerno’s model

in [9]: Other than data and information, real world and system interface are

explicitly included as the information sources of SA [3] [10] that are perceived by

human analysts. System interface is directly related to the e�ectiveness of human

cognition to system knowledge. Well-designed interface can present information

and knowledge in an intuitive way and facilitate interactive analysis. In addition,

information from real world is directly perceived by human analysts without being

processed through automation systems. Such information influences human analysts’

SA in some way, good or bad, although the “some way” is out of our research scope.

For example, a piece of news regarding a recent popular attack pattern may trigger

security analysts to relate it to similar symptoms found in their own network. Or

their colleagues’ talk about recent financial abnormality may implicitly confirm

security analysts’ inference of a computer being compromised.

In terms of cyber security, level 2 and 3 are mainly about impact assessment,

which includes two parts [15]: assessment of current impact that is damage assess-

ment, and assessment of future impact which mainly involves vulnerability analysis

and threat assessment. Resolution level [5] is included in the model due to its

importance for cyber security analysis: human analysts have a variety of security

measures for security management, either confronting attacks by network hardening,

or recovering from the damage caused by attacks. These security measures have

di�erent consequences towards network security. Thus human decision makers can

choose the best option, at least that they think the best, based on the available

security measures and the corresponding consequences.

12

2.3 SKRM Framework

To better present SKRM framework, three questions should be answered: 1) Why

do we need SKRM? 2) What is the main structure of SKRM? 3) How can SKRM

enable cyber situation awareness?

2.3.1 Why do we need SKRM?

We need SKRM for several reasons. First, the isolation between di�erent knowledge

bases. Cyber security has made significant advancement in a variety of areas, but

these areas rarely “talk” to each other. When it comes to cyber SA, we have

experts from di�erent areas working on the same topic, but they cannot e�ectively

communicate with each other. For example, system experts exactly know which file

is stolen or modified, but they hardly know how this can impact the business level.

On the other hand, business managers can rapidly notice a suspicious financial loss,

but they won’t relate it to an unallowed system call parameter inside the operating

system. This is one reason for constructing SKRM: we need a model to integrate

knowledge from di�erent areas to break the isolation between them.

Second, the isolation between techniques and human. Human intelligence is the

most powerful and valuable resource that needs to be well utilized in security analysis.

Many microscopic tools, algorithms, and techniques are developed for specific

purposes, but few macroscopic models or framework are provided to synthesize

functions of these techniques, reduce the complexity of security problems and

ease the cognition of human analysts. Therefore, we need to couple the available

techniques to enhance cyber SA and construct a bridge between techniques and

human analysts.

13

2.3.2 What is the main structure of SKRM?

Similar with the work by Tadda and Salerno[9], the key to construct SKRM is to

identify relevant activities of interest. In terms of cyber SA, the activities of interest

are mainly attacks, which may be associated with items ranging from business

level processes, to network level applications and services, to operating system level

entities, and finally to the lowest physical level devices (memory cells, disk sectors,

registers, etc.). Based on this, the SKRM model is constructed, as shown in Figure

2.2.

SKRM model seamlessly integrates four abstraction layers of cyber situation

knowledge, including Workflow Layer, App/Service Layer, Operating System Layer

and Instruction Layer. As the layer goes down, information is presented in finer

granularity in terms of technical details. These four layers are abstracted by

categorizing isolated situation knowledge from di�erent perspectives of network.

Experts with expertise in di�erent layers can communicate with each other on the

same platform provided by SKRM.

Workflow layer is most human-understandable layer that mainly captures the

mission or business processes within an organization or enterprise. Organizations

take workflow management as the main technology for performing business processes

[16]. A workflow typically consists of a number of tasks that are essential for fulfilling

a business process. Usually an organization keeps consistent and reliable workflows

for their daily business. Attackers injecting malicious tasks or modifying data

will cause abnormal behaviors in workflow. Therefore, workflow layer can enable

cyber SA at business level. Workflow in this layer can be generated in two ways:

either manually defined by business managers, or extracted from logs with workflow

mining techniques [17,18].

14

Instruction Layer

App/Service Layer

Workflow Layer

Operating System Layer

t5

t3 *a node is a task

*a green dotted line is a control dependency*a red line is a data dependency

t6

t4

t2t1

*a blue line is an execution path*a yellow line is an unexecuted path

*a rectangle node is a primitive fact node*an edge is a causality relation

*a node is an application or service

*a line is a service dependency

*a node is a system object(file, process, socket ...)*an edge is a dependency (7 types)

*a node is a register, memory cell, or instruction*an edge is a data/control dependency

Mem addr[4bf0000,4K], [4bff000, 4K]/bin/gzip process: loads

/etc/group, /etc/ld.so.cache, etc

Mem addr[4b92000,12K], [4bcf000,4K]tar process:

loads /lib/libc.so.6, /etc/selinux/config,, etc

Sector(268821, 120), ...

file system info sector

t7

22:RULE3(remote exploit of a server program)

24:RULE7(direct network access)

25:hacl(internet,webServer,tcp,22)

23:netAccess(webServer,tcp,22)

19:attackerLocated(internet)

26:networkServiceInfo(webServer,openssl,tcp,22,root)

27:vulExists(webServer,�CVE-2008-0166�,openssl,remoteExploit,privEscalation)

Avactis Server

(172.18.34.4, 3306, tcp)

(192.168.101.5, 80, tcp)

(192.168.101.*, 53, tcp)

(*, 80, tcp)

Database Server

Web Server3rd Party Web Server

DNS Server

service dependencynetwork connection

14:execCoce(webServer, )

31:hacl(webServer,fileServer,nfsProtocol,nfsPort)

32:nfsExportInfo(fileServer,�/export�,write, webServer)

30:RULE18(NFS shell)

6:accessFile(fileServer,write, �/export�) *an ellipse node is a rule node

*a diamond node is a derived fact node

execve/root/.ssh/authorized_keys

/etc/passwd/etc/ssh/ssh_host_rsa_key

...

clone

exit

/usr/sbin/sshd

execveclone exit/usr/sbin/sshd

…(repeat)

/mnt/wunderbar_emporium.tar.gz /usr/bin/ssh

mountd

/export/wunderbar_emporium.tar.gz on NFS Server

/etc/exports

/mnt/wunderbar_emporium.tar.gz on Workstation

/mnt/wunderbar_emporium.tar.gz

on Web Server 36038-4execve wunderbar_emporium

Files in /home/workstation/workstation_attack

Exploit.sh

Workstation

...

execve

/home/workstation/wunderbar_emporium.tar.gz

/mnt/wunderbar_emporium.tar.gz

execve

/bin/cp

/bin/sh /bin/tar /bin/gzip

NFS ServerWeb Server

… *a blue arrow is an extension from host to network

Dependency Attack Graph

NFS4 Server

(192.168.101.5, 798, tcp/udp)

(172.18.34.5, 2049, tcp/udp)

(10.0.0.3, 973, tcp)

NFS Server

Financial Workstation

Figure 2.2: The Situation Knowledge Reference Model (SKRM) [11]

The function of business process relies on a variety application and services.

A workflow can be divided into block tasks [19], which is actually a sub-workflow

containing a set of atomic tasks. Therefore, the execution of a workflow depends

on the execution of tasks, which then relies on corresponding application software.

These applications have further dependence relationship on a set of services, such

as web service, DNS service, etc. Therefore, App/Service Layer is incorporated into

15

SKRM to capture the required applications and services for workflow execution,

and the dependency relationship between them as well. Service discovery and

dependency analysis techniques [20] can be applied to App/Service Layer.

Attackers compromise network by exploiting security holes existing in applica-

tions and services. These attacks will leave trace inside operating system, which

could be deleted logs, prohibited access to password files, or abnormal system

call patterns, etc. All these operating system objects, processes and files, as well

as the dependency relationship between them, are included in Operating System

(OS) Layer. Operating system layer usually adopts techniques of system level taint

tracking [21] and intrusion recovery [22].

Instruction Layer can identify missed intrusions in operating system layer, and

assist taint analysis and attack recovery at instruction level. Instruction layer maps

the entities and relationships on OS layer to memory cells, disk sectors, registers,

kernel address space, and other devices. Techniques of intrusion harm analysis [23],

including taint tracking and intrusion recovery, are often involved in instruction

layer.

Attack Graph is not a specific layer in this stack, but rather an interconnection

technique between App/Service Layer and Operating System Layer. By analyzing

the vulnerabilities exist in the applications and services, attack graph can generate

potential attack paths for the entire network. Through the attack paths, security

analysts will know which hosts are most dangerous and need to be further scrutinized.

Moreover, the corresponding system objects related to the vulnerable services or

applications will be highlighted.

16

2.3.3 How can SKRM enable cyber situation awareness?

SKRM model is not simply a mapping of situation knowledge in di�erent areas

to the above abstraction layers. It is in fact an integration of data, information,

algorithms and tools, and human knowledge through cross-layer interaction. It

interconnects the perception level elements to elevate awareness to comprehension,

projection, and resolution levels. SKRM model has the following characteristics

that could enable the four levels of situation awareness:

1) Each abstraction layer generates a graph that covers the entire enterprise

network. This ensures completeness of the overall network environment awareness.

2) Each abstraction layer views the same network from a di�erent perspective

and at a di�erent granularity. These perspectives complement, assist and confirm

each other for more accurate situation awareness.

3) Each abstraction layer leverages current available algorithms, tools, and

techniques in its corresponding area to extract the most critical and useful informa-

tion to present to human security analysts. Such techniques include but are not

limited to workflow mining and attack recovery, service discovery and dependency

analysis, system level taint tracking and recovery, and instruction level intrusion

harm analysis, etc. Future developed algorithms, tools, or techniques can also be

incorporated into SKRM to elevate its capability.

4) Cross-layer analysis is the “soul” of SKRM. SKRM captures cross-layer

relationships by mapping, translating, bridging semantic gaps, and utilizing existing

techniques such as attack graph. Performing top-down, bottom up, and U-shape

cross-layer analysis can enhance the comprehension, projection and resolution

levels of security analysts’ SA. For example, when business level abnormality such

as financial loss is noticed, top-down analysis could find the damage caused by

17

attackers in each abstraction layer: which service is compromised, which system

file is deleted, or which memory cell is tainted, etc. This is an instance of damage

assessment, corresponding to comprehension level SA. On the other hand, if an

IDS alert is raised from operating system layer, a bottom-up analysis will find

out how could the attack have future impact on the business level. This can be

viewed as example of impact assessment or threat assessment, corresponding to

projection level SA. If options of security measures and their corresponding impact

are obtained through either bottom up or U-shape analysis, resolution level SA is

achieved.

2.4 Case Study

A case study is presented to demonstrate that the SKRM graph stack is useful

to enable capabilities toward holistic perception and comprehension. It is also

an illustration of the practical generation of the SKRM graph stack to perform

cross-layer analysis.

2.4.1 Implementation

To illustrate the application of SKRM framework to cyber security analysis, we

implement a web-shop in our test-bed which uses a business scenario similar

as the one described in [16]. To observe the network under cyber-attack, we

further implement a 3-step attack scenario as in [50,51] with di�erent vulnerability

choices (CVE-2008-0166-OpenSSL brute force key guessing attack, NFS mount

misconfiguration, CVE-2009-2692-bypassing mmap_min_addr). The test-bed

business and attack scenario is shown in Figure 2.3.

18

V. Case Study The security analyst needs to leverage information across different abstraction layers to diagnose an attack and assess its impact in an enterprise network. Business-level symptoms (alerts raised by human managers at high layer) or system level events (alerts provided by security monitoring systems like Snort, tripwire, anti-virus, etc.) are all invaluable to compensate the situation awareness of each other.

Since SKRM is proposed to break stovepipes through cross-layer diagnosis, we present the following case study to demonstrate that the SKRM graph stack is useful to enable capabilities toward holistic perception and comprehension. It is also an illustration of the practical generation of the SKRM graph stack to perform cross-layer analysis.

A. Implementation To illustrate the application of SKRM framework to cyber security analysis, we implement a web-shop in our test-bed which uses a business scenario similar as the one described in [30]. To observe the network under cyber-attack, we further implement a 3-step attack scenario as in [21, 28] with different vulnerability choices (CVE-2008-0166-OpenSSL brute force key guessing attack, NFS mount misconfiguration, CVE-2009-2692-bypassing mmap_min_addr). The test-bed business and attack scenario is shown in Fig. 7.

In addition, we also deploy intrusion detectors and auditing tools in our web-shop test-bed, such as the Nessus server to scan for the vulnerability and machine information of all the hosts, the MulVAL reasoning engin to generate the attack graph, Snort and Ntop to detect intrusions and monitor the network traffic, and strace to intercept and log system calls. We leverage these situation knowledge collectors to acquire real data for further cross-layer security diagnosis.

InternetAttacker(http, ssh)

DMZ Firewall

Web Server(httpd, sshd):-ecommerce travel agency

Financial Workstation(sshd)

Intranet Firewall

NFS Server(nfsd, mountd, sshd)

financial confidentials

shared binaries/files

Hotel

Car Rental

Bank

Bruteforce

DMZ

Intranet

Database Server(mysqld)

Inside

NFS mount

Trojan-horse

Inside Firewall

Fig. 7 The test-bed network and attack scenario

B. Capability: Mission Asset Identification and Classification

Usually an obvious intrusion symptom of an enterprise is the business level financial loss. The responsibility of security analysts is to reason over such symptoms so as to identify the exact intrusion root and all the infected mission assets, for better protection and recovery. That is, the capability of mission asset

identification and classification is required. As shown in Fig. 8, top-down cross-layer SKRM diagnosis will enable this capability.

Workflow Layer

App/Service Layer

OS Layer

dependency AG

12

downward traversing cross-layer edges

3

forward inter-host dependency/taint tracking

t2 is responsible for changing the execution path from non-member service path P1 to member service path P2

Host-switch level mission assets (Web Server, NFS Server and Workstation) are classified to be “clean but in danger” because they are critical for transactions about t2.

financial loss

Application level mission assets (tikiwiki and sshd for the Web Server, samba and unfsd for NFS Server and Linux kernel (2.6.27) for the Workstation) are classified to be “clean but in danger” because they are involved in the attack paths.

4

OS-object level mission assets (process - /usr/sbin/sshd and files - /root/.ssh/authorized_keys, /etc/passwd, /etc/ssh/ssh_host_rsa_key for the Web Server) are classified to be “clean but in danger” because they are mapped to the above-tagged applications/services.

5

The above-mentioned OS objects are updated to be “polluted” because of the mapping between the “repeating” dependency pattern on OS Layer graph and a vulnerability exploitation in dependency AG .

Corresponding mission assets at different levels are updated from the status of “clean but in danger” to “polluted” by reverse tracking.

OS-object level mission assets (/mnt/wunderbar_emporium.tar.gz on Web Server, /export on NFS Server, /mnt/wunderbar_emporium.tar.gz, /home/workstation/workstation_attack/wunderbar_emporium and /home/workstation on Workstation) are classified to be “polluted” because of the propagation of pollution.

Fig. 8 Mission asset identification and classification

Generally, mission asset identification and prioritization achieves at the identification and classification of host-switch level, application level and OS-object level mission critical assets into such classes as “polluted”, “clean but in danger”, and “clean and safe”. For example, the business managers of the web-shop found the profit much lower than expected. Through analysis on the Workflow Layer (Fig. 2), the security analysts suspected that non-member attackers cheated by getting service from the web-shop via the member service path P2. According to the control dependence relation in the workflow, they found that task t2 is responsible for changing the execution path from P1 to P2 (step 1). So they tracked down the cross-layer edges between Workflow Layer and App/Service Layer, with particular inspection on task t2 (step 2). Such cross-layer edges revealed the critical host-switch level mission assets involved in transactions about t2: Web Server, NFS Server and Workstation. Hence, as the most possible attack goals, these assets were tagged into “clean but in danger”. The analysts further tracked down the cross-layer edges between App/Service Layer and OS Layer (step 3), and found that there were four possible attack paths in the dependency AG: {23, 14, 6, 4, 1}, {16, 14, 11, 9, 6, 4, 1}, {16, 14, 6, 4, 1} and {23, 14, 11, 9, 6, 4, 1}. The four paths all lead to the compromise of Web Server, NFS Server, and Workstation, but exploit vulnerabilities of different applications/services. Fig. 6 differentiates the paths with red, blue, purple and green colors respectively. All the application level mission assets involved in the four attack paths were regarded as “clean but in danger”: tikiwiki and sshd for the Web Server, samba and unfsd for NFS Server and Linux kernel (2.6.27) for the Workstation.

The analysts continued to track down the cross-layer edges from dependency AG to OS Layer, and identified fine-grained OS-object level mission assets: process - /usr/sbin/sshd and files - /root/.ssh/authorized_keys, /etc/passwd, /etc/ssh/ssh_host_rsa_key for the Web Server (step 4). These objects were considered as “clean but in danger”. The mapping between the “repeating” dependency pattern on OS Layer graph (Fig. 4) and Node 27 in dependency AG (Fig. 6) confirmed the exploitation of CVE-2008-0166. Therefore, the above-mentioned OS objects related to this vulnerability on Web Server could be determined as “polluted”.

Figure 2.3: The Testbed Network and Attack Scenario [12]

In addition, we also deploy intrusion detectors and auditing tools in our web-

shop test-bed, such as the Nessus server to scan for the vulnerability and machine

information of all the hosts, the MulVAL reasoning engin to generate the attack

graph, Snort and Ntop to detect intrusions and monitor the network tra�c, and

strace to intercept and log system calls. We leverage these situation knowledge

collectors to acquire real data for further cross-layer security diagnosis.

19

2.4.2 Capability 1: Mission Asset Identification and Classifica-

tion

Usually an obvious intrusion symptom of an enterprise is the business level financial

loss. The responsibility of security analysts is to reason over such symptoms so as

to identify the exact intrusion root and all the infected mission assets, for better

protection and recovery. That is, the capability of mission asset identification and

classification is required. As shown in Figure 2.4, top-down cross-layer SKRM

diagnosis will enable this capability.

Generally, mission asset identification and prioritization achieves at the iden-

tification and classification of host-switch level, application level and OS-object

level mission critical assets into such classes as “polluted”, “clean but in danger”,

and “clean and safe”. For example, the business managers of the web-shop found

the profit much lower than expected. Through analysis on the Workflow Layer

Figure 2.2, the security analysts suspected that non-member attackers cheated

by getting service from the web-shop via the member service path P2. According

to the control dependence relation in the workflow, they found that task t2 is

responsible for changing the execution path from P1 to P2 (step 1). So they tracked

down the cross-layer edges between Workflow Layer and App/Service Layer, with

particular inspection on task t2 (step 2). Such cross-layer edges revealed the critical

host-switch level mission assets involved in transactions about t2: Web Server, NFS

Server and Workstation. Hence, as the most possible attack goals, these assets

were tagged into “clean but in danger”. The analysts further tracked down the

cross-layer edges between App/Service Layer and OS Layer (step 3), and found

that there were four possible attack paths in the dependency AG (Figure 2.5), 23,

20

5. Case Study The security analyst needs to leverage information across different abstraction layers to diagnose an attack and assess its impact in an enterprise network. Business-level symptoms (alerts raised by human managers at high layer) or system level events (alerts provided by security monitoring systems like Snort, tripwire, anti-virus, etc.) are all invaluable to compensate the situation awareness of each other.

Since SKRM is proposed to break stovepipes through cross-layer diagnosis, we present the following case study to demonstrate that the SKRM graph stack is useful to enable capabilities toward holistic perception and comprehension. It is also an illustration of the practical generation of the SKRM graph stack to perform cross-layer analysis.

5.1 Implementation To illustrate the application of SKRM framework to cyber security analysis, we implement a web-shop in our test-bed which uses a business scenario similar as the one described in [30]. To observe the network under cyber-attack, we further implement a 3-step attack scenario as in [21, 28] with different vulnerability choices (CVE-2008-0166-OpenSSL brute force key guessing attack, NFS mount misconfiguration, CVE-2009-2692-bypassing mmap_min_addr). The test-bed business and attack scenario is shown in Fig. 7.

In addition, we also deploy intrusion detectors and auditing tools in our web-shop test-bed, such as the Nessus server to scan for the vulnerability and machine information of all the hosts, the MulVAL reasoning engin to generate the attack graph, Snort and Ntop to detect intrusions and monitor the network traffic, and strace to intercept and log system calls. We leverage these situation knowledge collectors to acquire real data for further cross-layer security diagnosis.

InternetAttacker(http, ssh)

DMZ Firewall

Web Server(httpd, sshd):-ecommerce travel agency

Financial Workstation(sshd)

Intranet Firewall

NFS Server(nfsd, mountd, sshd)

financial confidentials

shared binaries/files

Hotel

Car Rental

Bank

Bruteforce

DMZ

Intranet

Database Server(mysqld)

Inside

NFS mount

Trojan-horse

Inside Firewall

Fig. 7 The test-bed network and attack scenario

5.2 Capability: Mission Asset Identification and Classification Usually an obvious intrusion symptom of an enterprise is the business level financial loss. The responsibility of security analysts is to reason over such symptoms so as to identify the exact intrusion root and all the infected mission assets, for better protection and recovery. That is, the capability of mission asset

identification and classification is required. As shown in Fig. 8, top-down cross-layer SKRM diagnosis will enable this capability.

Workflow Layer

App/Service Layer

OS Layer

dependency AG

12

downward traversing cross-layer edges

3

forward inter-host dependency/taint tracking

t2 is responsible for changing the execution path from non-member service path P1 to member service path P2

Host-switch level mission assets (Web Server, NFS Server and Workstation) are classified to be “clean but in danger” because they are critical for transactions about t2.

financial loss

Application level mission assets (tikiwiki and sshd for the Web Server, samba and unfsd for NFS Server and Linux kernel (2.6.27) for the Workstation) are classified to be “clean but in danger” because they are involved in the attack paths.

4

OS-object level mission assets (process - /usr/sbin/sshd and files - /root/.ssh/authorized_keys, /etc/passwd, /etc/ssh/ssh_host_rsa_key for the Web Server) are classified to be “clean but in danger” because they are mapped to the above-tagged applications/services.

5

The above-mentioned OS objects are updated to be “polluted” because of the mapping between the “repeating” dependency pattern on OS Layer graph and a vulnerability exploitation in dependency AG .

Corresponding mission assets at different levels are updated from the status of “clean but in danger” to “polluted” by reverse tracking.

OS-object level mission assets (/mnt/wunderbar_emporium.tar.gz on Web Server, /export on NFS Server, /mnt/wunderbar_emporium.tar.gz, /home/workstation/workstation_attack/wunderbar_emporium and /home/workstation on Workstation) are classified to be “polluted” because of the propagation of pollution.

Fig. 8 Mission asset identification and classification

Generally, mission asset identification and prioritization achieves at the identification and classification of host-switch level, application level and OS-object level mission critical assets into such classes as “polluted”, “clean but in danger”, and “clean and safe”. For example, the business managers of the web-shop found the profit much lower than expected. Through analysis on the Workflow Layer (Fig. 2), the security analysts suspected that non-member attackers cheated by getting service from the web-shop via the member service path P2. According to the control dependence relation in the workflow, they found that task t2 is responsible for changing the execution path from P1 to P2 (step 1). So they tracked down the cross-layer edges between Workflow Layer and App/Service Layer, with particular inspection on task t2 (step 2). Such cross-layer edges revealed the critical host-switch level mission assets involved in transactions about t2: Web Server, NFS Server and Workstation. Hence, as the most possible attack goals, these assets were tagged into “clean but in danger”. The analysts further tracked down the cross-layer edges between App/Service Layer and OS Layer (step 3), and found that there were four possible attack paths in the dependency AG: {23, 14, 6, 4, 1}, {16, 14, 11, 9, 6, 4, 1}, {16, 14, 6, 4, 1} and {23, 14, 11, 9, 6, 4, 1}. The four paths all lead to the compromise of Web Server, NFS Server, and Workstation, but exploit vulnerabilities of different applications/services. Fig. 6 differentiates the paths with red, blue, purple and green colors respectively. All the application level mission assets involved in the four attack paths were regarded as “clean but in danger”: tikiwiki and sshd for the Web Server, samba and unfsd for NFS Server and Linux kernel (2.6.27) for the Workstation.

The analysts continued to track down the cross-layer edges from dependency AG to OS Layer, and identified fine-grained OS-object level mission assets: process - /usr/sbin/sshd and files - /root/.ssh/authorized_keys, /etc/passwd, /etc/ssh/ssh_host_rsa_key for the Web Server (step 4). These objects were considered as “clean but in danger”. The mapping between the “repeating” dependency pattern on OS Layer graph (Fig. 4) and Node 27 in dependency AG (Fig. 6) confirmed the exploitation of CVE-2008-0166. Therefore, the above-mentioned OS objects related to this vulnerability on Web Server could be determined as “polluted”.

Figure 2.4: Mission Asset Identification and Classification [12]

14, 6, 4, 1, 16, 14, 11, 9, 6, 4, 1, 16, 14, 6, 4, 1 and 23, 14, 11, 9, 6, 4, 1. The four

paths all lead to the compromise of Web Server, NFS Server, and Workstation, but

exploit vulnerabilities of di�erent applications/services. Figure 2.5 di�erentiates

the paths with red, blue, purple and green colors respectively. All the application

level mission assets involved in the four attack paths were regarded as “clean but

in danger”: tikiwiki and sshd for the Web Server, samba and unfsd for NFS Server

and Linux kernel (2.6.27) for the Workstation.

21

B. Cross-layer Interconnection Cross-layer diagnosis is critical for SKRM model, as traversing from one layer to another layer along the edges would lead to expected new information and ultimately a holistic understanding of the whole scenario. However, it cannot be achieved without the fulfillment of cross-layer interconnection. Only with inter-compartment interconnection we still lack the capture of cross-layer relationships that can break horizontal stovepipes.

1) Cross-layer Semantics Bridging

Basically, cross-layer relationships are captured by semantics bridging (specifically, mapping, translation, etc.) in-between the adjacent two abstraction layers of computer and information system semantics. In specific, association between the workflow tasks at Workflow Layer and the particular applications at App/Service Layer can be mined from the network traces with workflow logs, and can be used to create bi-directional mappings between them. The mappings between OS level objects and instruction level objects can be achieved by developing a reconstruction engine such as the one presented in [31]. The purple bi-directional dotted lines between adjacent layers in Fig. 1 illustrate such mappings.

2) Attack Graph Representation and Generation

Specially, we interconnect the App/Service Layer and OS Layer by vertically inserting a dependency Attack Graph between them. This enables the causality representation and tracking between App/Service Layer pre-conditions (network connection, machine configuration and vulnerability information) and OS Layer symptoms/patterns of successful exploits.

! Definition 5 (dependency Attack Graph): The dependency Attack Graph (AG) can be represented with a directed graph G(V,E), where V is the set of nodes and E is the set of directed edges. There are two

kinds of nodes in the attack graph (refer to the attack graph of Fig. 6): derivation nodes (represented with ellipses) and fact nodes. The fact nodes could be further classified into primitive fact nodes (represented with rectangles) and derived fact nodes (represented with diamonds). The directed edges represent the causality relationships between the nodes.

In the dependency Attack Graph, one or more fact nodes could serve as the preconditions of a derivation node and cause it to take effect. One or more derivation nodes could further cause a derived fact node to become true. Each derivation node represents an application of an interaction rule given in [28] that yields the derived fact. Let’s take our generated attack graph (Fig. 6) for example: Node 26, 27 (primitive fact node) and Node 23 (derived fact node) could cause Node 22 (derivation node) to take effect, and Node 22 could further cause Node 14 (derived fact node) to be valid. Besides, a derived fact node may have different ways to become true.

Fig. 1 illustrates a subset of Fig. 6. Fig. 1 also illustrates the interconnection of the dependency Attack Graph with its adjacent two layers. The conversion from App/Service Layer information (network connection, host configuration, scanned vulnerability) to the primitive nodes in Attack Graph is resulting from the Datalog representation before attack graph generation [28]. The mapping from the derived fact nodes in Attack Graph to the OS Layer intrusion symptoms (such as the system call sequence [10], intrusion pattern, signature, etc.) can be achieved by bi-directional inter-host OS level dependency tracking proposed above, using the OS level instances of host or service configuration as input. For example, the process “/usr/sbin/sshd” instantiates sshd, and “/etc/exports” instantiates unfsd. Tracking “/usr/sbin/sshd” would reveal the repeated pattern of accessing sshd-related processes and files, indicating the occurrence of Node 14 in the dependency AG.

18:hacl(internet,webServer,http,80):1

17:RULE 7 (direct network access):0

19:attackerLocated(internet):1 25:hacl(internet,webServer,tcp,22):1

24:RULE 7 (direct network access):0

16:netAccess(webServer,http,80):0

20:networkServiceInfo(webServer,tikiwiki,http,80,_):1

21:vulExists(webServer,'CVE-2007-5423',tikiwiki,remoteExploit,privEscalation):1

23:netAccess(webServer,tcp,22):0

26:networkServiceInfo(webServer,openssl,tcp,22,_):1

27:vulExists(webServer,'CVE-2008-0166',openssl,remoteExploit,privEscalation):1

15:RULE 3 (remote exploit of a server program):0 22:RULE 3 (remote exploit of a server program):0

13:hacl(webServer,fileServer,tcp,139):1 14:execCode(webServer,_):0 31:hacl(webServer,fileServer,nfsProtocol,nfsPort):1

32:nfsExportInfo(fileServer,'/export',write,webServer):1

12:RULE 6 (multi-hop access):0 30:RULE 18 (NFS shell):0

9:execCode(fileServer,_):0

28:networkServiceInfo(fileServer,samba,tcp,139,_):1

29:vulExists(fileServer,'CVE-2007-2446',samba,remoteExploit,privEscalation):1

10:RULE 3 (remote exploit of a server program):0

8:canAccessFile(fileServer,_,write,'/export'):1

7:RULE 11 (execCode implies file access):0

6:accessFile(fileServer,write,'/export'):0

33:nfsMounted(workStation,'/mnt/share',fileServer,'/export',read):1

5:RULE 17 (NFS semantics):0

4:accessFile(workStation,write,'/mnt/share'):0

3:vulExists(workStation,'CVE-2009-2692',kernel,localExploit,privEscalation):1

2:RULE 5 (Corresponding Trojan horse installation):0

1:execCode(workStation,root):0

11:netAccess(fileServer,tcp,139):0

Fig. 6 The dependency Attack Graph

Figure 2.5: The Dependency Attack Graph [12]

The analysts continued to track down the cross-layer edges from dependency AG

to OS Layer, and identified fine-grained OS-object level mission assets, including

process /usr/sbin/sshd and files /root/.ssh/authorized_keys, /etc/passwd, and

/etc/ssh/ssh_host_rsa_key for the Web Server (step 4). These objects were consid-

ered as “clean but in danger”. The mapping between the “repeating” dependency

pattern on OS Layer graph and Node 14 in dependency AG confirmed the exploita-

tion of CVE-2008-0166. Therefore, the above-mentioned OS objects related to this

vulnerability on Web Server could be determined as “polluted”. Further forward

dependency tracking on the dependency graph discovered a file named /mnt/wun-

derbar_emporium.tar.gz was created and thus “polluted” on the Web Server (step

5). Inter-host OS dependency tracking helped reveal the propagation of such pollu-

tion: the file sharing directory /export on NFS Server was “polluted”; the files or

22

directories named /home/workstation/workstation_attack/wunderbar_emporium,

/mnt/wunderbar_emporium.tar.gz, and /home/workstation on Workstation were

all “polluted”. In a similar way, the memory cells or disk sectors at Instruction Layer

corresponding to the system objects could also be classified into these categories.

Through reverse tracking to the upper layers, the status of Web Server and

its service sshd, NFS Server and its services unfsd, mountd, Workstation and its

service sshd were all updated from “clean but in danger” to “polluted”. In a word,

through such top-down cross-layer SKRM-based analysis, mission assets at the

host-switch level, application/service level and OS-object level could all be identified

and further classified into such classes as “polluted”, “clean but in danger” and

“clean and safe”.

2.4.3 Capability 2: Mission Damage and Impact Assessment

Defending missions in cyber space from various attacks continues to be a chal-

lenge. An e�ective attack can lead to great loss in the confidentiality, integrity, or

availability to the missions, and even cause some to abort in extreme cases [90].

When an attack happens, one major concern to the security administrators is how

the attack could possibly impact related missions. Specifically, they may ask the

questions such as 1) How likely is a mission a�ected? 2) To what extent is the

mission influenced? Which tasks are already tainted, and which are untouched?

Continuous e�orts have been made to construct high-level models that aid the

mission impact analysis, but concrete methods that achieve accurate quantitative

assessment are rare. Jackobson [90] constructs an impact dependency graph (IDG)

for mission situation assessment. Nevertheless, the paper doesn’t specify detailed

method for generating the dependencies in the IDG. The impact assessment provided

23

Further forward dependency tracking on the dependency graph discovered a file named /mnt/wunderbar_emporium.tar.gz was created and thus “polluted” on the Web Server (step 5). Inter-host OS dependency tracking helped reveal the propagation of such pollution: the file sharing directory /export on NFS Server was “polluted”; the files or directories named /home/workstation/workstation_attack/wunderbar_emporium, /mnt/wunderbar_emporium.tar.gz, and /home/workstation on Workstation were all “polluted”. In a similar way, the memory cells or disk sectors at Instruction Layer corresponding to the system objects could also be classified into these categories.

Through reverse tracking to the upper layers, the status of Web Server and its service sshd, NFS Server and its services unfsd, mountd, Workstation and its service sshd were all updated from “clean but in danger” to “polluted”. In a word, through such top-down cross-layer SKRM-based analysis, mission assets at the host-switch level, application/service level and OS-object level could all be identified and further classified into such classes as “polluted”, “clean but in danger” and “clean and safe”.

5.3 Capability: Mission Damage and Impact Assessment Security monitoring systems, such as Snort, tripwire, anti-virus, etc., are effective tools to provide us intrusion alerts, but do not offer us the exact damage and impact. As shown in Fig. 9, the U-shape cross-layer SKRM-enabled analysis helps us to achieve comprehensive damage and impact assessment.

Workflow Layer

App/Service Layer

OS Layer

Instruction Layer

dependency AG

12

downward traversing cross-layer edges

4

3

forward inter-host dependency/taint tracking6

upward traversing cross-layer edges7

Task t2 was compromised, causing web-shop service path changedfrom non-merber path to member path, leading to financial damage.

The vulnerability and inappropriate configurations of applications and services cause damage occurrence and propagation.

The financial membership information on Workstation motivates damage.

The corresponding memory or disk units were tainted.5

9

The corresponding files or directories were infected.

Intrusion alert

8

Fig. 9 Mission damage and impact assessment

The scenario begins with a normal status for the web-shop business, but Snort suddenly gives an alert indicating a brute force attack on the Web Server (sshd). The security analyst would like investigate the Web Server and start to inspect (scan) its information of applications and services (step 1). The downward traversing cross-layer edges between App/Service Layer and OS Layer reveals the repeated pattern of accessing sshd-related processes and files, confirming the occurrence of Node 14 (indicating successful exploit) in the dependency AG (step 2). Further through the process, the inter-host dependency tracking at the OS Layer identifies the intrusion taint seeds: the file named /mnt/wunderbar_emporium.tar.gz on the Web Server, the directory named /export on the NFS Server and the files or directories named /mnt/wunderbar_emporium.tar.gz, /home/workstation/workstation_attack/wunderbar_emporium and /home/workstation on the Workstation (step 3). Using these as input, downward traversing the cross-layer edges between OS Layer and Instruction Layer helps to identify the tainted memory and disk units (step 4). The forward inter-host taint tracking at the Instruction Layer located the fine-grained impacts on victim hosts (step 5). At this point, the OS-level and Instruction-level damage has been identified: the above files and directories were all infected and performing malicious actions at the OS Layer and their memory or disk space were therefore tainted on Instruction

Layer. This triggered the analyst to perform another round of bottom-up analysis to comprehend the damage at other layers. The analyst tracks upward along the cross-layer edges between OS Layer and dependency AG, and determined the attack path (step 6 and 7). The attack path, combined with the abnormal behavior on OS Layer, led the analyst to the missing intrusion intent of the attacker: the financial membership information under the directory named /home/workstation on Workstation is the evidence of the root cause of the damage. The mappings between dependency AG and App/Service Layer show the specific pre-conditions of the exploits (step 8). The vulnerabilities and inappropriate configurations at App/Service Layer allow the damage to be caused. Finally, the analyst tracks upward to the cross-layer relationships between App/Service Layer and Workflow Layer (step 9), and finds that: task t2 was compromised, so the web-shop’s service path was changed from non-member service path {t1, t2, t3, t4, t6, t7} to the member service path {t1, t2, t5, t6, t7} at Workflow Layer and enables significant financial damage to occur.

In a word, SKRM enables a U-shape cross-layer analysis, as illustrated in Fig. 9, to assess systematic damage and its impact from multi-layer semantics.

6. Discussion From the case study above, we identify that SKRM-enabled analytics can exceed the reach of intrusion detection and attack graph analysis, through inter-compartment awareness and cross-layer analysis (top-down, bottom-up, U-shape, etc.). SKRM actually has the potential to enable other capabilities. For example, attack path determination and attack intent identification were also involved in the above U-shape cross-layer diagnosis. The potential capabilities would be explored in future work, including but not limited to:

• U-shape cross-layer diagnosis may help us understand the adversary activity, including the attack path determination and attack intent identification.

• Bottom-up cross-layer analysis may help evaluate mission impact.

• Cross-layer Bayesian networks could be constructed to reason about uncertainty.

• Top-down cross-layer analysis may help us construct mission asset map based on asset classification.

• Comprehensive analysis may help us simulate different strategic mitigation plans.

• Comprehensive analysis may provide insights for intrusion recovery.

• Knowledge representation could be enabled for cognitive engineering.

In addition to the potentials, the current SKRM and SKRM-enabled analytics have some limitations. Although some tools have been developed to generate parts of the SKRM graph stack, the current version of SKRM is still semi-automatic, gaining computer-aided human centric cyber SA. Additional work is still required to evaluate the utility of SKRM in the scale of a real enterprise and more complex scenarios. Our future work will focus on addressing such limitations.

7. Conclusion Current cyber SA based on the technologies in intrusion detection and attack graphs lack the capability to address the needs of mission damage and impact assessment and asset identification (and prioritization). This paper proposes a cross-layer Situation

Figure 2.6: Mission Damage and Impact Assessment [12]

by the IDG is not su�ciently precise.

Security monitoring systems, such as Snort, tripwire, anti-virus, etc., are e�ective

tools to provide us intrusion alerts, but do not o�er us the exact damage and impact.

As shown in Figure 2.6, the U-shape cross-layer SKRM-enabled analysis helps us

to achieve comprehensive damage and impact assessment.

The scenario begins with a normal status for the web-shop business, but Snort

suddenly gives an alert indicating a brute force attack on the Web Server (sshd).

The security analyst would like investigate the Web Server and start to inspect

(scan) its information of applications and services (step 1). The downward travers-

ing cross-layer edges between App/Service Layer and OS Layer reveals the repeated

pattern of accessing sshd-related processes and files, confirming the occurrence of

Node 14 (indicating successful exploit) in the dependency AG (step 2). Further

through the process, the inter-host dependency tracking at the OS Layer identifies

the intrusion taint seeds: the file named /mnt/wunderbar_emporium.tar.gz on

24

the Web Server, the directory named /export on the NFS Server and the files or

directories named /mnt/wunderbar_emporium.tar.gz, /home/workstation/work-

station_attack/wunderbar_emporium and /home/workstation on the Workstation

(step 3). Using these as input, downward traversing the cross-layer edges between

OS Layer and Instruction Layer helps to identify the tainted memory and disk units

(step 4). The forward inter-host taint tracking at the Instruction Layer located

the fine-grained impacts on victim hosts (step 5). At this point, the OS-level and

Instruction-level damage has been identified: the above files and directories were

all infected and performing malicious actions at the OS Layer and their memory or

disk space were therefore tainted on Instruction Layer. This triggered the analyst to

perform another round of bottom-up analysis to comprehend the damage at other

layers. The analyst tracks upward along the cross-layer edges between OS Layer and

dependency AG, and determined the attack path (step 6 and 7). The attack path,

combined with the abnormal behavior on OS Layer, led the analyst to the missing

intrusion intent of the attacker: the financial membership information under the

directory named /home/workstation on Workstation is the evidence of the root

cause of the damage. The mappings between dependency AG and App/Service

Layer show the specific pre-conditions of the exploits (step 8). The vulnerabilities

and inappropriate configurations at App/Service Layer allow the damage to be

caused. Finally, the analyst tracks upward to the cross-layer relationships between

App/Service Layer and Workflow Layer (step 9), and finds that: task t2 was

compromised, so the web-shop’s service path was changed from non-member service

path t1, t2, t3, t4, t6, t7 to the member service path t1, t2, t5, t6, t7 at Workflow

Layer and enables significant financial damage to occur.

In a word, SKRM enables a U-shape cross-layer analysis, as illustrated in Figure

25

2.6, to assess systematic damage and its impact from multi-layer semantics.

The rest of this section will introduce a concrete approach for mission impact

assessment. The approach is to 1) build a System Object Dependency Graph

(SODG) so that the intrusion propagation process is captured at the system object

level; 2) construct a Mission-Task-Asset (MTA) map to associate the missions and

composing tasks with corresponding assets, which are namely the system objects

such as processes, files, etc. The MTA map is naturally connected to the SODG

through shared system objects; 3) establish a Bayesian network based on the MTA

map and the SODG to leverage the collected intrusion evidence and infer the

probabilities of interested events, such as a system object or a mission task being

tainted.

The approach is proposed on the basis of the following supporting rationales.

First, the SODG is a proper construct connecting the attack and the missions, as

shown in Figure 2.7. From the attack side, an attack’s impact towards the operating

systems can be reflected on the SODG. System objects that are manipulated directly

or indirectly by attackers have the possibility of being tainted. From the mission

side, a mission is fulfilled through a sequence of operations towards system objects.

These operations are caught by the SODG. As a result, the impact of an attack to

the missions can be evaluated by leveraging the SODG as the intermediate bridge.

Second, the SODG is able to capture the intrusion propagation process, which

is critical for correct mission impact assessment. An attack’s impact towards a

mission may not be explicit when they have no common associated assets. The

attack-associated assets refer to the system objects that are directly related to the

attack activities (e.g. a modified file in a Tripwire [60] alert), while the mission-

associated assets refer to the system objects that are involved in the mission

26

Attack

SODG

Intrusion Propagation

Mission

Figure 2.7: The SODG as the Construct between Attack and Mission [13] 1

commitment. The mission-associated assets do not always share the same system

objects with the attack-associated assets, but can still be a�ected by the latter

through intrusion propagation. In this case, the SODG can be employed for tracking

the intrusion propagation and assessing the missions that are indirectly a�ected by

the attack-associated assets.

Third, a Bayesian network is able to leverage intrusion evidence to perform

probabilistic inference towards interesting events. The evidence can be collected

from a variety of information sources, including system logs, security sensors such

as Snort [54] and Tcpdump [78], and even human experts.

2.4.3.1 The System Object Dependency Graph

In essence, a mission can be decomposed to a set of tasks, which are then committed

through a number of operating system operations via system calls, such as read,1The SODG is used to show how the intrusion can propagate from the attack associated assets to the mission

assocaited assets. Readers are not expected to understand the details inside the nodes of the SODG.

27

write, execve, fork, kill, etc. These system calls operate towards system objects

like processes, files, and sockets. For instance, the system call read can read from

a file and fork creates a copy of a process. An intrusion usually begins with one

or more tainted system objects that are directly or indirectly manipulated by

attackers. For example, an execution file containing a Trojan horse may have

been installed on a host; a service may have been compromised with a rootkit

program and started sending sensitive data back to the attackers’ machine; some

critical data that influences the control flow could have been corrupted so that

the execution paths of a mission workflow can be changed. In subsequent system

calls, these intrusion-originating system objects will interact with other innocent

objects and get them tainted. This is an intrusion propagation process. In this

way, the intrusion can propagate across a number of systems inside a network.

Among all the system objects tainted via intrusion propagation, some could be the

mission-associated ones so that the related tasks will get impacted as well.

Given the system call log, a System Object Dependency Graph (SODG) can be

constructed to capture the intrusion propagation process [41]. Each system call is

first parsed into three elements: a source object, a sink object, and a dependency

relation between them. This work applies similar rules, shown in Table 2.1, as

in [21, 22, 41] for system call parsing. When constructing the SODG, the parsed

objects become nodes and the dependency relations become edges. For example,

a read system call can be parsed into a process object p, a file object f, and a

dependency relation fæp, meaning that p depends on f .

Fig. 4.1b shows an example SODG built from a simplified system call log in

Fig. 4.1a. Processes, files, and sockets are represented with rectangles, ellipses,

and diamonds respectively. A process is often uniquely identified by the process

28

Table 2.1: System Call Dependency Rules

Dependency System calls

processæfile write, pwrite64, rename, mkdir, fchmod, chmod, fchownat, etc.

fileæprocess stat64, read, pread64, execve, etc.

processæprocess vfork, fork, kill, etc.

processæsocket write, pwrite64, send, sendmsg, etc.

socketæprocess read, pread64,recv, recvmsg, etc.

socketæsocket sendmsg, recvmsg, etc.

PID pid and the parent process PID ppid, and thus can be denoted with a tuple

(pid:ppid). Similarly, a file and a socket can be denoted with tuple (inode:path)

and (addr :port).

The SODG construction process for Figure 4.1b is as follows. First, the system

call clone is parsed into a dependency (6149 : 6148)æ(6558 : 6149). The dependency

becomes an edge between the two processes. Second, the system call write forms a

dependency between a process and a socket: (6558 : 6149)æ(192.168.101.5 : 22).

The dependency becomes an edge between the process and the socket. Third, the

system call read indicates that the process then reads a file, and thus creates a

dependency (19859 : /proc/6558/)æ(6558 : 6149). Finally, the process writes back

to the same file, and forms a dependency (19859 : /proc/6558/)Ω(6558 : 6149).

After the SODG is constructed, forward and backward tracking can be performed

to identify the potentially tainted objects. Since an attack can often cause security

sensors to raise alerts, the system objects involved in these alerts can be used as

the trigger points that start the tracking process. For example, if Tripwire raises an

29

syscall:clone time:t1 pid:6149 ppid:6148 pcmd:bashcpid:6558 cppid:6149 cpcmd:bashsyscall:write time:t2 pid:6558 ppid:6149 pcmd:sshdftype:SOCK addr:192.168.101.5 port:22syscall:read time:t3 pid:6558 ppid:6149 pcmd:mountftype:REG path:/proc/6558/ inode:19859syscall:write time:t4 pid:6558 ppid:6149 pcmd:sshdftype:REG path:/proc/6558/ inode:19859

(a) simplified system call log

(6149:6148)

(6558:6149)

(192.168.101.5:22)

(19859:/proc/6558/)

t1

t2

t3

t4

(b) SODG

Figure 2.8: An example SODG built from the simplified system call log [13]

alert that a file is modified abnormally, then the file can be used as a trigger point.

On the SODG, the file is marked as tainted. Starting from this file, forward and

backward tracking can be performed to generate an intrusion propagation path [41].

The objects on this path are very likely to be tainted.

30

2.4.3.2 Mission-Task-Asset Map

Constructing Mission-Task-Asset (MTA) map is to relate the system objects with

the tasks and missions. An intuitive solution is to decompose the missions into

tasks, and further associate the tasks with system objects. However, this top-down

decomposing approach requires the prior knowledge of a mission workflow. In cases

when attackers are able to insert malicious tasks into the workflow, these inserted

tasks could be missed by the MTA map.

In this work, we propose a bottom-up extraction approach that extracts the

tasks from the SODG, and then relates the tasks with specific missions, as shown

in Figure 2.9. Since the SODG captures what actually happens in the network,

extraction from the SODG accurately reflects which tasks are actually committed.

Considering the manageable number of missions and tasks an enterprise network

could deal with, relating tasks with missions is not a real issue. The key di�culty

lies in how to extract tasks from the SODG due to its daunting size. However, the

extraction is ensured to be feasible by the following principles.

First, a mission task can be viewed as an instantiation of several services that

have dependency relations. In enterprise networks, the normal function of a service

may depend on one or more other services. These services and applications often

interact and work together to accomplish specific tasks. For example, a user’s

login request requires web service from a web server, which further relies on au

authentication service to verify the user’s legitimacy. The authentication will

then depend on the database service to access the users’ account information.

In this example, a single task “user login” can be viewed as the instantiation of

combined web service, authentication service, and database service. Therefore,2Again, readers are not expected to understand the details inside the nodes of the SODG.

31

t1

t2 t3t4

t5 t6

t7t8

t9

Mission 1 Mission 2

Service Dependency Graph Pattern Matching

Mission

Task

Asset: SODG

Service Dependency

Graph Pattern

Repository

Figure 2.9: Mission-Task-Asset Map [13] 2

if such dependency relations among services can be discovered and represented

with specific graphs, then a task can be viewed as the instantiation of a service

dependency graph.

Second, through service discovery, the service dependency graphs (SDGs) can be

established at the system object level. Service discovery has been studied intensively

in previous work [91–94]. Dai [95] proposed to infer the service dependency through

identifying OS-level causal paths. Therefore, the service dependencies can be

represented with OS-level dependency graphs, such as the SODGs. Each service

dependency graph has a pattern that can be used to identify the corresponding SDG.

The patterns could be defined from the perspective of both text and graph-topology.

For example, a file node with name config and an out degree of n can be one feature

32

for a specific pattern, indicating that file config is accessed n times. Since servers

in an enterprise network often fulfill routine responsibilities, the common patterns

can be extracted to form an SDG pattern repository.

Third, the system assets can be linked to tasks automatically by matching

the SODG against the SDG patterns. Although the SODG is usually not human-

readable, it can be annotated with specific SDGs through pattern matching. For

example, if the pattern for combined web service, authentication service, and

database service appears in the SODG for several times, then as the instantiations

of these services, several “user login” tasks can be linked to the system objects

involved in these patterns.

2.4.3.3 MTA based Bayesian Networks

To perform probabilistic mission impact assessment, the Bayesian networks can be

constructed based on the established MTA maps. The Bayesian network is a type

of Directed Acyclic Graph that can be used to model the cause and e�ect relations.

In a BN, the nodes represent the variables of interest, and the edges represent the

causality relations between nodes. The strength of such causality relations can be

specified with conditional probability tables (CPT). When evidence is provided, a

properly constructed BN can infer the probabilities of interesting variables.

In this section, we propose to construct an MTA-based BN, whose input is

the intrusion evidence collected from various security sensors, and output is the

probabilities of interesting security events, such as a system object or a task being

tainted. The graphical feature of MTA enables and facilitates the construction of

MTA-based BN. With CPT tables specified and the evidence incorporated, the

33

MTA-based BN is able to infer the probabilities of tasks and missions being tainted,

and thus evaluate the impact of attacks towards interesting missions.

To build the MTA-based BN, the dependency relations existing in the MTA

map need to be well modeled. Each MTA map implies certain dependency relations

among the missions, tasks, and system objects. Such dependency relations can

be represented with certain mission dependency graphs by interpreting the MTA

maps. In the mission dependency graph, the status of a mission depends on the

status of the composing tasks, while the status of a task depends on the status of

the relevant system objects. We provide two example mission dependency graphs

based on the same MTA map to illustrate how the dependency relations can be

interpreted.

Figure 2.10 is an example of benign mission dependency graph by interpreting

an MTA map. In this graph, a mission is composed of several tasks. For each

mission to be benign, all of its composing tasks should be benign. In addition, all

the tasks should be committed in the correct sequence. Similarly, each task is also

composed of several system level operations. To ensure the task is benign, the

related system objects should be benign and the operations should be performed in

the right sequence. Therefore, all of the parent nodes have the “AND” relation for

the child node to be true. In Figure 2.10, Node 5 “Task 1 is benign” should have

4 preconditions satisfied in order to be true: Node 1, F1 is benign; Node 2, P1 is

benign; Node 3, F2 is benign; Node 4, “Process P1 reads File F1” happens before

“Process P1 writes File F2”, meaning that the read operation is executed before

the write operation. In this example, in order for Node 5 to become true, all the

relevant system objects are benign and all the system operations are performed

in the right sequence. The relationship between these conditions (Node 1 to 4) is

34

1: F1 is benign

2: P1 is benign

3: F2 is benign

4: F1-> P1 is before P1->F2

5: Task 1 is benign

6: P1 is benign

7: F2 is benign

8: Task 2 is benign

10: Mission 1 is benign

9: Task 1 is before Task 2

AND

AND

AND

Figure 2.10: An Example of Benign Mission Dependency Graph [13]

“AND”.

Figure 2.11 is an example of a tainted mission dependency graph by interpreting

the same MTA map as in Figure 2.10. In this graph, if any of the system objects are

tainted or the system operations are not performed in the right order, the associated

task can be marked as tainted. Similarly, if any of the tasks are tainted or not

committed in the correct sequence, the associated mission is tainted. Therefore, all

the parent nodes have the “OR” relation for the child node to be true, meaning

any of the preconditions being true could cause the post-condition e�ective. For

example, even if only F1 in Node 1 is tainted while F2 and P1 are still benign, Task

1 will get tainted, which will further impacts Mission 1.

35

1: F1 is tainted

2: P1 is tainted

3: F2 is tainted

4: F1-> P1 is NOT before P1->F2

5: Task 1 is tainted

6: P1 is tainted

7: F2 is tainted

8: Task 2 is tainted

10: Mission 1 is tainted

9: Task 1 is NOT before Task 2

OR

OR

OR

Figure 2.11: An Example of Tainted Mission Dependency Graph [13]

To model the above “AND” and “OR” relations, a MTA-based BN can be

constructed as shown in Figure 2.12. Instead of specifying the taint status of

objects, tasks, and missions in the nodes directly, the MTA-based BN specify the

states in the CPT tables. For example, the CPT table for Mission 1 in Figure 2.12

is shown in Table 2.2. In this table, Mission 1, Task 1, and Task 2 have possible

states of “tainted” and “not tainted”. The operation sequence “Task 1 is before

Task 2” in Node 9 has the states of “true” and “false”. Other potential states,

such as “clear but in danger”, or “not sure”, etc, could also be assigned for system

objects depending on specific situations.

In addition, the numbers in Table 2.2 modeled the “AND” and “OR” relations.

36

Table 2.2: CPT of Mission 1 in the Figure 2.12 [13]

Mission1Task 1=Tainted Task 1=Untainted

Task 2=Tainted Task 2=Untainted Task 2=Tainted Task 2=Untainted

C = True C = False C = True C = False C = True C = False C = True C = False

Tainted 1 1 1 1 1 1 1 0

Untainted 0 0 0 0 0 0 0 1

Note: C represents the condition “Task 1 is committed before Task 2”

For example, to get “mission 1 = not tainted” the probability of 1, all the three

conditions “Task 1 is tainted”, “Task 2 is tainted”, and “Task 1 is before Task 2”

have to be false. As long as any of these three conditions are true, the probability

for “mission 1 = tainted” will become 1. If the three conditions have di�erent

impact on the mission’s taint status, the numbers in the CPT table can be modified

accordingly to reflect such di�erence. For example, in Table 2.3, “Task 1 is tainted”

has greater impact on missions than the other two conditions. When “Task 1 is

tainted”, the probability for the mission being tainted is bigger than 0.9, no matter

if the other conditions are true or false. When Task 1 is not tainted, the probability

for the mission being tainted is very low, even if task 2 is tainted or the operation

sequence is incorrect. The CPT table can also be modified to accommodate other

noise factors that cannot be completely taken into consideration. For example, in

Table 2.3, even if all the three conditions are true, the probability of mission 1

being tainted may not be 1, but a number very close to 1, such as 0.99.

After the BN is constructed, the taint status of system objects is input into BN

as evidence. The BN then computes the probabilities of missions being infected

based on the given evidence.

37

1: F1

2: P1

3: F2

4: F1-> P1 is before P1->F2

5: Task 1 6: P1

7: F2

8: Task 2

10: Mission 1

9: Task 1 is before Task 2

Figure 2.12: An Example of MTA-based BN [13]

Table 2.3: Modified CPT of Mission 1 in the Figure 2.12 [13]

Mission1Task 1=Tainted Task 1=Untainted

Task 2=Tainted Task 2=Untainted Task 2=Tainted Task 2=Untainted

C = True C = False C = True C = False C = True C = False C = True C = False

Tainted 0.99 0.9 0.9 0.9 0.2 0.2 0.2 0.01

Untainted 0.01 0.1 0.1 0.10 0.8 0.8 0.8 0.99

Note: C represents the condition “Task 1 is committed before Task 2”

38

2.5 Related Work

Mission Impact Assessment. Some high level frameworks and models have

been established in recent studies to enable qualitative evaluation towards cyber

attacks’ impact on missions. Alberts et al. [97] proposed a Mission Assurance

Analysis Protocol (MAAP) to determine how the current conditions can a�ect

a project. Watters et al. [98] proposed a Risk-to-Mission Assessment Process to

map the network nodes to the business objectives. Musman et al. [96] clarified the

cyber mission impact assessment framework and related the business processes with

technology capacities. Dai et al. [11] proposed a Situation Knowledge Reference

Model (SKRM) that enables capabilities such as asset classification, mission damage

and impact assessment. [90] is one of the few works that explore quantitative mission

impact assessment. It presented an impact-oriented cyber attack model, where an

attack has an impact factor and the asset is measured with operational capacity.

The assets’ operational capacity will be a�ected by the attack’s impact factor. The

paper then briefly introduced the impact dependency graph (IDG), but didn’t

provide details for the construction method.

Bayesian Network. Bayesian networks have been employed in a number of

studies for cyber security defense. [55] presented a BN modeling approach which

modeled three uncertainty types in the security analysis process. The BN was

constructed on top of the logical attack graphs [50,51]. [14] proposed to construct a

cross-layer Bayesian network to infer stealthy bridges existing between the enterprise

network islands in cloud. [99] described a mission-impact-based approach to correlate

the security alarms collected from di�erent sensors using Bayesian networks. An

incident rank tree was built to calculate the rank of each security alert, which

39

combines the incident’s impact towards the mission, and the success probability of

the activity reported in the alert. Our work also applies Bayesian networks, but

targets a di�erent problem.

2.6 Discussion

From the case study above, we identify that SKRM-enabled analytics can exceed the

reach of intrusion detection and attack graph analysis, through inter-compartment

awareness and cross-layer analysis (top-down, bottom-up, U-shape, etc.). SKRM

actually has the potential to enable other capabilities. For example, attack path

determination and attack intent identification were also involved in the above

U-shape cross-layer diagnosis. The potential capabilities would be explored in

future work, including but not limited to:

1) U-shape cross-layer diagnosis may help us understand the adversary activity,

including the attack path determination and attack intent identification.

2) Bottom-up cross-layer analysis may help evaluate mission impact.

3) Cross-layer Bayesian networks could be constructed to reason about uncer-

tainty.

4) Top-down cross-layer analysis may help us construct mission asset map based

on asset classification.

5) Comprehensive analysis may help us simulate di�erent strategic mitigation

plans.

6) Comprehensive analysis may provide insights for intrusion recovery.

7) Knowledge representation could be enabled for cognitive engineering.

In addition to the potentials, the current SKRM and SKRM-enabled analytics

have some limitations. Although some tools have been developed to generate parts

40

of the SKRM graph stack, the current version of SKRM is still semi-automatic,

gaining computer-aided human centric cyber SA. Additional work is still required

to evaluate the utility of SKRM in the scale of a real enterprise and more complex

scenarios. Our future work will focus on addressing such limitations.

2.7 Conclusion

To achieve cyber situation awareness, the role of human cyber analysts should be

considered explicitly into the design of security tools, algorithm, and techniques,

such as attack graph. Therefore, based on existing theories of situation awareness, a

cyber Situation Knowledge Abstraction model and an embedded SKRM model are

constructed to enhance the coupling of current techniques to situation awareness

to enable security analysts’ e�ective analysis of complex cyber-security problems.

Current and future work is to demonstrate the potential capabilities of SKRM

model for enabling cyber situation awareness.

41

Chapter 3 |Inferring the Stealthy Bridgesbetween Enterprise Network Is-lands in Cloud Using Cross-LayerBayesian Networks

3.1 Introduction

Enterprises have begun to move parts of their networks (such as web server, mail

server, etc.) from traditional infrastructure into cloud computing environments.

Cloud providers such as Amazon Elastic Compute Cloud (EC2) [33], Rackspace [34],

and Microsoft’s Azure cloud platform [35] provide virtual servers that can be rented

on demand by users. This paradigm enables cloud customers to acquire computing

resources with high e�ciency, low cost, and great flexibility. However, it also

introduces some security issues that are yet to be solved.

42

!!!!!!!!!!!!!!

Enterprise!A!

Enterprise!B!

Cloud&

A!Stealthy!Bridge!

Figure 3.1: The Stealthy Bridges between Enterprise Network Islands in Cloud [14]

A public cloud can provide virtual infrastructures to many enterprises. Except

for some public services, enterprise networks are expected to be like isolated islands

in the cloud: connections from the outside network to the protected internal

network should be prohibited. Consequently, an attack path that shows the multi-

step exploitation sequence in an enterprise network should also be confined inside

this island. However, as enterprise networks migrate into the cloud and replace

traditional physical hosts with virtual machines, some “stealthy bridges” could

be created between the isolated enterprise network islands, as shown in Fig. 3.1.

Moreover, with the stealthy bridges, the attack path confined inside an enterprise

network is able to traverse to another enterprise network in cloud.

The creation of such “stealthy bridges” is enabled by two unique features of the

public cloud. First, cloud users are allowed to create and share virtual machine

images (VMIs) with other users. Besides, cloud providers also provide VMIs with

43

pre-configured software, saving users’ e�orts of installing the software from scratch.

These VMIs provided by both cloud providers and users form a large repository. For

convenience, users can take a VMI directly from the repository and instantiate it

with ease. The instance virtual machine inherits all the security characteristics from

the parent image, such as the security configurations and vulnerabilities. Therefore,

if a user instantiates a malicious VMI, it’s like moving the attacker’s machine

directly into the internal enterprise network, without triggering the Intrusion

Detection Systems (IDSs) or the firewall. In this case, a “stealthy bridge” can be

created via security holes such as backdoors. For example, in Amazon EC2, if an

attacker intentionally leaves his public key unremoved when publishing an AMI

(Amazon Machine Image), the attacker can later login into the running instances

of this AMI with his own private key.

Second, virtual machines owned by di�erent tenants may co-reside on the same

physical host machine. To achieve high e�ciency, customer workloads are multi-

plexed onto a single physical machine utilizing virtualization. Virtual machines

on the same host may belong to unrelated users, or even rivals. Thus co-resident

virtual machines are expected to be absolutely isolated from each other. However,

current virutalization mechanisms cannot ensure perfect isolation. The co-residency

relationship can still enable security problems such as information leakage, per-

formance interference [36], or even co-resident virtual machine crashing. Previous

work [37] has shown that it is possible to identify on which physical host a target

virtual machine is likely to reside, and then intentionally place an attacker virtual

machine onto the same host in Amazon EC2. Once the co-residency is achieved, a

“stealthy bridge” can be further established, such as a side-channel for passively

observing the activities of the target machine to extract information for credential

44

recovering [38], or a covert-channel for actively sending information from the target

machine [40].

Stealthy bridges are stealthy information tunnels existing between disparate

networks in cloud, that are unknown to security sensors and should have been

forbidden. Stealthy bridges are developed mainly by exploiting vulnerabilities that

are unknown to vulnerability scanners. Isolated enterprise network islands are

connected via these stealthy tunnels, through which information (data, commands,

etc.) can be acquired, transmitted or exchanged maliciously. Therefore stealthy

bridges pose very severe threats to the security of public cloud. However, the

stealthy bridges are inherently unknown or hard to detect: they either exploit

unknown vulnerabilities, or cannot be easily distinguished from authorized activities

by security sensors. For example, side-channel attacks extract information by

passively observing the activities of resources shared by the attacker and the target

virtual machine (e.g. CPU, cache), without interfering the normal running of

the target virtual machine. Similarly, the activity of logging into an instance by

leveraging intentionally left credentials (passwords, public keys, etc.) also hides in

the authorized user activties.

The stealthy bridges can be used to construct a multi-step attack and facilitate

subsequent intrusion steps across enterprise network islands in cloud. By taking

advantage of the stealthy bridges, attackers can carry on the mailicious activities

from one enterprise network to another. The stealthy bridges per se are di�cult to

detect, but the intrusion steps before and after the construction of stealthy bridges

may trigger some abnormal activities. Human administrators or security sensors

like IDS could notice such abnormal activities and raise corresponding alerts, which

can be collected as the evidence of attack happening1. So our approach has two1In our trust model, we assume cloud providers are fully trusted by cloud customers. In

45

insights: 1) It is quite straightforward to build a cloud-level attack graph to capture

the potential attacks enabled by stealthy bridges. 2) To leverage the evidence

collected from other intrusion steps, we construct a cross-layer Bayesian Network

(BN) to infer the existence of stealthy bridges. Based on the inference, security

analysts will know where stealthy bridges are most likely to exist and need to be

further scrutinized.

The main contributions of this chapter are as follows:

First, a cloud-level attack graph is built by crafting new interaction rules in

MulVAL [50], an attack graph generation tool. The cloud-level attack graph can

capture the potential attacks enabled by stealthy bridges and reveal possible hidden

attack paths that are previously missed by individual enterprise network attack

graphs.

Second, based on the cloud-level attack graph, a cross-layer Bayesian network

is constructed by identifying four types of uncertainties. The cross-layer Bayesian

network is able to infer the existence of stealthy bridges given supporting evidence

from other intrusion steps.

3.2 Cloud-level Attack Graph Model

A Bayesian network is a probabilistic graphical model that is applicable for real-time

security analysis. Prior to the construction of a Bayesian Network, an attack graph

should be built to reflect the attacks enabled by stealthy bridges.

addition to security alerts generated at cloud level, such as alerts from hypervisors or cachemonitors, the cloud providers also have the privilege of accessing alerts generated by customers’virtual machines.

46

3.2.1 Logical Attack Graph

An attack graph is a valuable tool for network vulnerability analysis. Current

network defenders should not only understand how attackers could exploit a specific

vulnerability to compromise one single host, but also clearly know how the security

holes can be combined together for achieving an attack goal. An attack graph is

powerful for dealing with the combination of security holes. Taking vulnerabilities

existing in a network as the input, attack graph can generate the possible attack

paths for a network. An attack path shows a sequence of potential exploitations to

specific attack goals. For instance, an attacker may first exploit a vulnerability on

Web Server to obtain the root privilege, and then further compromise Database

Server through the acquired privilege. A variety of attack graphs have been

developed for vulnerability analysis, mainly including state enumeration attack

graphs [44–46] and dependency attack graphs [47–49]. The tool MulVAL employed

in this chapter is able to generate the logical attack graph, which is a type of

dependency attack graph.

Fig. 3.2 shows part of an exemplar logical attack graph. There are two types of

nodes in logical attack graph: derivation nodes (also called rule nodes, represented

with ellipse), and fact nodes. The fact nodes could be further classified into

primitive fact nodes (in rectangles), and derived fact nodes (in diamonds). Primitive

fact nodes are typically objective conditions of the network, including network

connectivity, host configuration, and vulnerability information. Derived fact nodes

represent the facts inferred from logical derivation. Derivation nodes represent the

interaction rules used for derivation. The directed edges in this graph represent

the causality relationship between nodes. In a logical dependency attack graph,

47

26:networkServiceInfo(webServer,openssl,tcp,22,_)

27:vulExists(webServer,’CVE-2008-0166’,openssl,remoteExploit,privEscalation)

22:Rule(remote exploit of a server program)

14:execCode(webServer,root)

23:netAccess(webServer,tcp,22)

...

...

Figure 3.2: A Portion of an Example Logical Attack Graph [14]

one or more fact nodes could serve as the preconditions of a derivation node and

cause it to take e�ect. One or more derivation nodes could further cause a derived

fact node to become true. Each derivation node represents the application of an

interaction rule given in [51] that yields the derived fact.

For example, in Fig. 3.2, Node 26, 27 (primitive fact nodes) and Node 23 (derived

fact node) are three fact nodes. They represent three preconditions respectively:

Node 23, the attacker has access to the Web Server; Node 26, Web Server provides

OpenSSL service; Node 27, Openssl has a vulnerability CVE-2008-0166. With the

three preconditions satisfied simultaneously, the rule of Node 22 (derivation node)

can take e�ect, meaning the remote exploit of a server program could happen. This

derivation rule can further cause Node 14 (derived fact node) to be valid, meaning

attacker can execute code on Web Server.

48

3.2.2 Cloud-level Attack Graph

In the cloud, each enterprise network can scan its own virtual machines for existing

vulnerabilities and then generate an attack graph. The individual attack graph

shows how attackers could exploit certain vulnerabilities and conduct a sequence

of attack steps inside the enterprise network. However, such individual attack

graphs are confined to the enterprise networks without considering the potential

threats from cloud environment. The existence of stealthy bridges could activate the

prerequisites of some attacks that are previously impossible in traditional network

environment and thus enable new attack paths. These attack paths are easily

missed by individual attack graphs. For example, in Fig. 4.7, without assuming

the stealthy bridge existing between enterprise A and B, the individual attack

graph for enterprise B can be incomplete or even not established due to lack of

exploitable vulnerabilities. Therefore, a cloud-level attack graph needs to be built

to incorporate the existence of stealthy bridges in the cloud. By considering the

attack preconditions enabled by stealthy bridges, the cloud-level attack graph can

reveal hidden potential attack paths that are missed by individual attack graphs.

The cloud-level attack graph should be modeled based on the cloud structure.

Due to the VMI sharing feature and the co-residency feature of cloud, a public cloud

has the following structural characteristics. First, virtual machines can be created

by instantiating VMIs. Therefore virtual machines residing on di�erent hosts may

actually be instances of the same VMI. In simple words, they could have the same

VMI parents. Second, virtual machines belong to one enterprise network may be

assigned to a number of di�erent physical hosts that are shared by other enterprise

networks. That is, the virtual machines employed by di�erent enterprise networks

49

vm11 vm12 vm1i

Hypervisor 1

... vm21 vm2j vm2k

Hypervisor 2

...

Host 1 Host 2

May be instantiated from the same virtual machine image

May belong to the same enterprise network

Figure 3.3: Features of the Public Cloud Structure [14]

are likely to reside on the same host. As shown in Fig. 3.3, the vm11 on host 1

and vm2j on host 2 may be instances of the same VMI, while vm12 and vm2k could

belong to the same enterprise network. Third, the real enterprise network could be

a hybrid of a cloud network and a traditional network. For example, the servers

of an enterprise network could be implemented in the cloud, while the personal

computers and workstations could be in the traditional network infrastructure.

Due to the above characteristics of cloud structure, the model for the cloud-level

attack graph should have the following corresponding characteristics.

1) The cloud-level attack graph is a cross-layer graph that is composed of three

layers: virtual machine layer, VMI layer, and host layer, as shown in Fig. 3.4.

2) The virtual machine layer is the major layer in the attack graph stack.

This layer reflects the causality relationship between vulnerabilities existing inside

the virtual machines and the potential exploits towards these vulnerabilities. If

stealthy bridges do not exist, the attack graph generated in this layer is scattered:

each enterprise network has an individual attack graph that is isolated from

50

VM Layer

Host Layer

VMI Layer

Host h1

Enterprise A

Image v1

Enterprise CEnterprise B

Enterprise C

Enterprise D

Figure 3.4: An Example Cloud-level Attack Graph Model [14]

others. The individual attack graphs can be the same as the ones generated by

cloud customers themselves through scanning the virtual machines for known

vulnerabilities. However, if stealthy bridges exist on the other two layers, the

isolated attack graph could be connected, or even experience dramatic changes:

some hidden potential attack paths will be revealed and the original attack graph

is enriched. For example, in Fig. 3.4, without the stealthy bridge on h1, attack

paths in enterprise network C will be missing or incomplete because no exploitable

vulnerability is available as the entry point for attack.

3) The VMI layer mainly captures the stealthy bridges and corresponding attacks

caused by VMI sharing. Since virtual machines in di�erent enterprise networks may

be instantiated from the same parent VMI, they could inherit the same security

issues from parent image, such as software vulnerabilities, malware, or backdoors,

etc. Evidence from [52] shows that 98% of Windows VMI and 58% of Linux VMIs

in Amazon EC2 contain software with critical vulnerabilities. A large number of

software on these VMIs are more than two years old. Since cloud customers take

full responsibility for securing their virtual machines, many of these vulnerabilities

51

remain unpatched and thus pose great risks to cloud. Once a vulnerability or an

attack type is identified in the parent VMI, the attack graph for all the children

virtual machine instances may be a�ected: a precondition node could be activated,

or a new interaction rule should be constructed in attack graph generation tool.

The incorporation of the VMI layer provides another benefit to the subsequent

Bayesian network analysis. It enables the interaction between the virtual machine

layer and the VMI layer. On one hand, the probability of a vulnerability existence

on a VMI will a�ect the probability of the vulnerability existence on its children

instance virtual machines. On the other hand, if new evidence is found regarding

the vulnerability existence on the children instances, the probability change will

in turn influence the parent VMI. If the same evidence is observed on multiple

instances of the VMI, this VMI is very likely to be problematic.

4) The host layer is able to reason exploits of stealthy bridges caused by virtual

machine co-residency. Exploits on this layer could lead to further penetrations on

the virtual machine layer. In addition, this layer actually captures all attacks that

could happen on the host level, including those on pure physical hosts with no

virtual machines. Hence it provides a good interface to hybrid enterprise networks

that are implemented with partial cloud and partial traditional infrastructures.

The potential attack paths identified on the cloud part could possibly extend to

traditional infrastructures if all prerequisites for the remote exploits are satisfied,

such as network access being allowed, and exploitable vulnerabilities existing, etc.

As in Fig. 3.4, the attack graph for enterprise C extends from virtual machine layer

to host layer.

52

3.3 Bayesian Networks

As stated in [24] by Judea Pearl, the study for Bayesian Network was motiviated

by “attempts to devise a computational model for humans’ inferential reasoning,

namely, the mechanism by which people integrate data from various sources and

generate a coherent interpretation of the data.” This motivation well describes the

main function and potential applications of Bayesian networks.

A Bayesian network (BN) is a probabilistic graphical model representing cause

and e�ect relations. For example, it is able to show the probabilistic causal

relationships between a disease and the corresponding symptoms. Formally, a

Bayesian network is a Directed Acyclic Graph (DAG) that contains a set of nodes

and directed edges. The nodes represent random variables of interest and the

directed edges represent the causal influence among the variables. The strength

of such influence is represented with a conditional probability table (CPT). For

example, Fig. 3.5 shows a portion of a BN constructed directly from the attack

graph in Fig. 3.2 by removing the rule Node 22. Node 14 can be associated with

the CPT as shown. This CPT means that if all of the preconditions of Node 14 are

satisfied, the probability of Node 14 being true is 0.9. Node 14 is false in all other

cases.

Pearl summarized the properties of Bayesian networks in [25]. After that,

Bayesian networks have been studied widely by researchers. Heckerman et al.

describe a Bayesian approach in [26] for learning Bayesian networks from a combi-

nation of prior knowledge and statistical data. Bayesian networks are applied to

many fields of study, such as biology, artifical intelligence, and computer sciences,

to name a few. Friedman et al. use Bayesian Networks to describe the interactions

53

26_networkServiceInfo

27_vulExists

...

...

23_netAccess

14_execCode

Inferring the Stealthy Bridges in Cloud 7

a precondition node could be activated, or a new interaction rule should beconstructed in attack graph generation tool.

The incorporation of VMI layer provides another benefit to the subsequentBayesian network analysis. It enables the interaction between virtual machinelayer and VMI layer. On one hand, the probability of a vulnerability existenceon a VMI will a�ect the probability of the vulnerability existence on its chil-dren instance virtual machines. On the other hand, if new evidences are foundregarding the vulnerability existence on the children instances, the probabilitychange will in turn influence the parent VMI. If the same evidences are observedon multiple instances of the VMI, this VMI is very likely to be problematic.

4) The host layer is able to reason exploits of stealthy bridges caused byvirtual machine co-residency. Exploits on this layer could lead to further pene-trations on the virtual machine layer. In addition, this layer actually capturesall attacks that could happen on the host level, including those on pure physicalhosts with no virtual machines. Hence it provides a good interface to hybridenterprise networks that are implemented with partial cloud and partial tradi-tional infrastructures. The potential attack paths identified on cloud part couldpossibly extend to traditional infrastructures if all prerequisites for the remoteexploits are satisfied, such as network access being allowed, and exploitable vul-nerabilities existing, etc. As in Fig. 4, the attack graph for enterprise C extendsfrom virtual machine layer to host layer.

3 Cross-layer Bayesian Networks

Bayesian network is a probabilistic graphical model representing the cause ande�ect relations. For example, it is able to show the probabilistic causal relation-ships between a disease and the corresponding symptoms. Formally, a Bayesiannetwork is a Directed Acyclic Graph (DAG) that contains a set of nodes anddirected edges. The nodes represent random variables of interest and the di-rected edges represent the causal influence among the variables. The strengthof such influence is represented with a conditional probability table (CPT). Forexample, Fig. 5 shows a portion of Bayesian network constructed directly fromthe attack graph shown in Fig. 2 by removing the rule Node 22, Node 14 can beassociated with a CPT as shown in Table 1. This CPT means that if all of thepreconditions of Node 14 are satisfied, the probability of Node 14 being true is0.9. Node 14 is false in all other cases.

Table 1. a simple CPT table

26 27 23 14T T T 0.9otherwise 0

Bayesian network can be used to compute the probabilities of interested vari-ables. It is especially powerful for diagnosis and prediction analysis. For example,in diagnosis analysis, given the symptoms being observed, the network can calcu-late the probability of the causing fact (respresented with P(cause—symptom=True)).

Figure 3.5: A Portion of Bayesian Network with associated CPT [14]

between genes and describe a method for recovering gene interactions using tools

from learning Bayesian networks [27]. Jansen et al. provide another study for using

Bayesian networks to predict protein-protein interactions genome-wide in yeast [28].

Charniak explains the Bayesian Networks in a way that is easy to understand for

AI researchers with a limited grounding in probability theory [29].

Bayesian Networks have recently been applied to the field of cyber security. One

main direction is using Bayesian networks for network security metrics. Frigault et

al propose to measure the network security using Bayesian networks [30] and one

of it variants, dynamic Bayesian networks [31]. Dynamic Bayesian network is able

to incoporate time information into the inference process.

Due to Bayesian networks’ graphical property, many studies propose to combine

Bayesian network and attack graph for security analysis. Liu and Man [32] uses

Bayesian networks to perform network vulnerability assessment by modeling poten-

tial attack paths in a so-called “Bayesian attack graph”. [55] is another work that

54

analyzes which hosts are likely to be compromised based on known vulnerabilities

and observed alerts. Our work lands on a di�erent cloud environment and takes a

reverse strategy by using BN to infer the stealthy bridges, which are unknown in

nature. In the future, the inference of stealthy bridges can be further extended to

identify the zero-day attack paths in cloud, as in [41] for traditional networks.

Bayesian networks is also applied to intrusion detection. The main approaches

employed in current intrusion detection systems (IDSs) are misuse-based IDS and

anomaly-based IDS. Anomaly detection can detect previously unknown attacks,

but su�er from high false rate due to false classification of normal and abnormal

behaviors. Kruegel et al. proposes to a new event classification scheme based on

Bayesian networks [67]. The results show that the accuracy of classification is

greatly improved by using Bayesian networks.

3.4 Cross-layer Bayesian Networks

A Bayesian network can be used to compute the probabilities of variables of interest.

It is especially powerful for diagnosis and prediction analysis. For example, in

diagnosis analysis, given the symptoms being observed, a BN can calculate the

probability of the causing fact (respresented with Pr(cause | symptom = True)).

While in prediction analysis, given the causing fact, a BN will predict the probability

of the corresponding symptoms showing up (Pr(symptom|cause = True)). In the

cybersecurity field, similar diagnosis and prediction analysis can also be performed,

such as calculating the probability of an exploitation happening if related IDS alerts

are observed(Pr(exploitation|IDSalert = True)), or the probability of the IDS

raising an alert if an exploitation already happened (Pr(IDSalert|exploitation =

55

True)). This chapter mainly carries out a diagnosis analysis that computes the

probability of stealthy bridge existence by collecting evidence from other intrusion

steps. Diagnosis analysis is a kind of “backward” computation. In the cause-

and-symptom model, a concrete evidence about the symptom could change the

posterior probability of the cause by computing Pr(cause|symptom = True). More

intuitively, as more evidence is collected regarding the symptom, the probability of

the cause will become closer to reality if the BN is constructed properly.

3.4.1 Identify the Uncertainties

Inferring the existence of stealthy bridges requires real-time evidence being collected

and analyzed. BN has the capability, which attack graphs lack, of performing such

real-time security analysis. Attack graphs correlate vulnerabilities and potential

exploits in di�erent machines and enables determinstic reasoning. For example,

if all the preconditions of an attack are satisfied, the attacker should be able to

launch the attack. However, in real-time security analysis, there are a range of

uncertainties associated with this attack that cannot be reflected in an attack graph.

For example, has the attacker chosen to launch the attack? If he launched it, did

he succeed to compromise the host? Are the Snort [54] alerts raised on this host

related to the attack? Should we be more confident if we got other alerts from

other hosts in this network? Such uncertainty aspects should be taken into account

when performing real-time security analysis. BN is a valuable tool for capturing

these uncertainties.

One non-trivial di�culty for constructing a well functioning BN is to identify

and model the uncertainty types existing in the attack procedure. In this chapter,

56

we mainly consider four types of uncertainties related to cloud security.

Uncertainty of stealthy bridges existence. The presence of known vulner-

abilities is usually deterministic due to the availability of vulnerability scanners.

After scanning a virtual machine or a physical host, the vulnerability scanner such

as Nessus [56] is able to tell whether a known vulnerability exists or not2. However,

due to its unknown or hard-to-detect feature, e�ective scanners for stealthy bridges

are rare. Therefore, the existence of stealthy bridges itself is a type of uncertainty.

In this chapter, to enable the construction of a complete attack graph, stealthy

bridges are hypothesized to be existing when corresponding conditions are met.

For example, if two virtual machines co-reside on the same physical host and one

of them has been compromised by the attacker, the attack graph will be generated

by making a hypothesis that a stealthy bridge can be created between these two

virtual machines. This is enforced by crafting a new interaction rule as follows in

MulVAL:

interaction rule(

(stealthyBridgeExists(Vm_1,Vm_2, Host, stealthyBridge_id):-

execCode(Vm_1,_user),

ResideOn(Vm_1, Host),

ResideOn(Vm_2, Host)),

rule_desc(‘A stealthy bridge could be built between virtual machines

co-residing on the same host after one virtual machine is compromised’)).

Afterwards, the BN constructed based on the attack graph will infer the proba-

bility of this hypothesis being true.2The assumption here is that a capable vulnerability scanner is able to scan out all the known

vulnerabilities.

57

...

...23 26 27 AAN

14

Inferring the Stealthy Bridges in Cloud 9

cedure. In this paper, we mainly consider four types of uncertainties related tocloud security.

Uncertainty of stealthy bridges existence. Vulnerability existence isusually deterministic due to the availability of vulnerability scanners. Afterscanning a virtual machine or a physical host, the vulnerability scanner likeNessus[24] is able to tell whether a vulnerability exists or not2. However, dueto the unknown or hard-to-detect feature of stealthy bridges, e�ective scan-ners for this kind of vulnerability are rare. Therefore, the existence of stealthybridges itself is a type of uncertainty. In this paper, to enable the construction ofa complete attack graph, stealthy bridges are hypothesized to be existing whencorresponding conditions are met. For example, if two virtual machines co-resideon the same physical host, the attack graph will be generated by making a hy-pothesis that a stealthy bridge exists between these two virtual machines. Thisis enforced by crafting a new interaction rule as follows in MulVAL:

interaction rule((stealthyBridgeExists(Vm_1,Vm_2, Host, stealthyBridge_id):-

execCode(Vm_1,_user),ResideOn(Vm_1, Host),ResideOn(Vm_2, Host)),

rule_desc(‘A stealthy bridge could be built between virtual machines co-residing onthe same host after one virtual machine is compromised’)).

Afterwards, the Bayesian network constructed based on this attack graphwill infer the probability of this hypothesis being true.

Uncertainty of attacker action. Uncertainty of attacker action is firstidentified by [23]. As pointed out in [23], even if all the prerequsites for anattack are satisfied, the attack may not happen because the attacker may evennot take action. Therefore, a kind of Attack Action Node (AAN) is added intoBayesian network to model the attackers’ actions. An AAN node is introducedas an additional parent node for the attack. For example, the Bayesian networkshown in Fig. 5 is changed to Fig. 6 after adding the AAN node. Correspondingly,the CPT table shown in Table 1 is modified into Table 2. This means “attackerstaking action” is another prerequisite to be satisfied for the attack to happen.

Table 2. a CPT table with AAN node

26 27 23 AAN 14T T T T 0.9

otherwise 0

AAN node is not added for all attacks. They are needed only for importantattacks such as the very first intrustion steps in a multi-step attack, or attacksthat need attackers’ action. Since an AAN node represents the primitive fact ofwhether an attacker taking action and it has no parent node, a prior probabilityshould be assigned to an AAN node to indicate the likelihood of attack. Theposterior probability of AAN will change as more evidences are collected.

2 The assumption here is that a capable vulnerability scanner is able to scan outall the known vulnerabilities. The unknown vulnerabilities are ruled out and notconsidered in this paper.

Figure 3.6: A Portion of Bayesian Network with AAN node [14]

Uncertainty of attacker action. Uncertainty of attacker action is first

identified by [55]. Even if all the prerequsites for an attack are satisfied, the attack

may not happen because attackers may not take action. Therefore, a kind of Attack

Action Node (AAN) is added to the BN to model attackers’ actions. An AAN node

is introduced as an additional parent node for the attack. For example, the BN

shown in Fig. 3.5 is changed to Fig. 3.6 after adding an AAN node. Correspondingly,

the CPT is modified as in Fig. 3.6. This means “attacker taking action” is another

prerequisite to be satisfied for the attack to happen.

An AAN node is not added for all attacks. They are needed only for important

attacks such as the very first intrustion steps in a multi-step attack, or attacks

that need attackers’ action. Since an AAN node represents the primitive fact of

whether an attacker taking action and has no parent nodes, a prior probability

distribution should be assigned to an AAN to indicate the likelihood of an attack.

58

The posterior probability of AAN will change as more evidence is collected.

Uncertainty of exploitation success. Uncertainty of exploitation success

goes to the question of “did the attacker succeed in this step?”. Even if all the

prerequisites are satisfied and the attacker indeed launches the attack, the attack is

not guarenteed to succeed. The success likelihood of an attack mainly depends on

the exploit di�culty of vulnerabilities. For some vulnerabilities, usable exploit code

is already publicly available. While for some other vulnerabilities, the exploit is

still in the proof-of-concept stage and no successful exploit has been demonstrated.

Therefore, the exploit di�culty of a vulnerability can be used to derive the CPT

of an exploitation. For example, if the exploit di�culty for the vulnerability in

Fig. 3.5 is very high, the probability for Node 14 when all parent nodes are true

could be assigned as very low, such as 0.3. If in the future a public exploit code is

made available for this vulnerability, the probability for Node 14 may be changed

to a higher value accordingly. The National Vulnerability Database (NVD) [57]

maintains a CVSS [58] scoring system for all CVE [59] vulnerabilities. In CVSS,

Access Complexity (AC) is a metric that describes the exploit complexity of a

vulnerability using values of “high”, “medium”, “low”. Hence the AC metric can be

employed to derive CPTs of exploitations and model the uncertainty of exploitation

success.

Uncertainty of evidence. Evidence is the key factor for BN to function.

In BN, uncertainties are indicated with probability of related nodes. Each node

describes a real or hypothetical event, such as “attacker can execute code on

Web Server”, or “a stealthy bridge exists between virtual machine A and B”, etc.

Evidence is collected to reduce uncertainty and calculate the probabilities of these

events. According to the uncertainty types mentioned above, evidence is also

59

classified into three types: evidence for stealthy bridges existence, evidence for

attacker action, and evidence for exploitation success. Therefore, whenever a piece

of evidence is observed, it is assigned to one of the above evidence types to support

the corresponding event. This is done by adding evidence as the children nodes to

the event nodes related to uncertainty. For example, an IDS alert about a large

number of login attempts can be regarded as evidence of attacker action, showing

that an attacker could have tried to launch an attack. This evidence is then added

as the child node to an AAN, as exemplified in Fig. 3.7. For another example, the

alert “system log is deleted” given by Tripwire [60] can be the child of the node

“attacker can execute code”, showing that an exploit has been successfully achieved.

However, evidence per se contain uncertainty. The uncertainty is twofold. First,

the support of evidence to an event is uncertain. For analogy, a symptom of

coughing cannot completely prove the presence of lung disease. In the above

examples, could the multiple login attempts testify that attackers have launched

the attack? How likely is it that attackers have succeeded in compromising the

host if a system log deletion is observed? Second, evidence from security sensors is

not 100% accurate. IDS systems such as Snort, Tripwire, etc. su�er a lot from a

high false alert rate. For example, an event may trigger an IDS to raise an alert

while actually no attack happens. In this case, the alert is a false positive. The

reverse case is a false negative, that is, when an IDS should have raised an alarm

but doesn’t. Therefore, we propose to model the uncertainty of evidence with an

Evidence-Confidence(EC) pair as shown in Fig. 3.7. The EC pair has two nodes, an

Evidence node and an Evidence Confidence Node (ECN). An ECN is assigned as

the parent of an Evidence node to model the confidence level of the evidence. If the

confidence level is high, the child evidence node will have larger impact on other

60

26 27...

...

23

14

AAN

Evidence

ECN

Figure 3.7: The Evidence-Condidence Pair [14]

nodes. Otherwise, the evidence will have lower impact on others. An example CPT

associated with the evidence node is given in Table 3.1. Whenever new evidence is

observed, an EC pair is attached to the supported node. A node can have several

EC pairs attached with it if multiple instances of evidence are observed. With ECN

nodes, security experts can tune confidence levels of evidence with ease based on

their domain knowledge and experience. This will greatly enhance the flexibility

and accuracy of BN analysis.

61

Table 3.1: CPT for Node Evidence [14]

AAN True False

ECN VeryHigh High Medium Low None VeryHigh High Medium Low None

True 0.95 0.8 0.6 0.55 0.5 0.05 0.2 0.4 0.45 0.5

False 0.05 0.2 0.4 0.45 0.5 0.95 0.8 0.6 0.55 0.5

3.5 Implementation

3.5.1 Cloud-level Attack Graph Generation

This chapter uses MulVAL [51] as the attack graph generation tool. To construct a

cloud-level attack graph, new primitive fact nodes and interaction rules have to be

crafted in MulVAL on the VMI layer and host layer to model the existence of stealthy

bridges. Each virtual machine has an ID tuple (Vm_id, VMI_id, H_id) associated

with it, which represents the ID for the virtual machine itself, the VMI it was

derived from, and the host it resides on. The VMI layer mainly focuses on the model

of VMI vulnerability inheritance and the VMI backdoor problems. The host layer

mainly focuses on modeling the virtual machine co-residency problems. Table 3.2

provides a sample set of newly crafted interaction rules that are incorporated into

MulVAL for cloud-level attack graph generation.

62

Table 3.2: A Sample Set of Interaction Rules [14]

/***Model the Virtual Machine Image Vulnerability Inheritance***/primitive(IsInstance(Vm_id, VMI_id))primitive(ImageVulExists(VMI_id, vulID, _program, _range, _consequence))derived(VulExists(Vm_id, vulID, _program,_range,_consequence)).

%remove vulExists from the primitive fact setprimitive(vulExists(_host, _vulID, _program, _range, _consequence)

interaction rule((VulExists(Vm_id, vulID, _program, _range, _consequence):-

ImageVulExists(VMI_id, vulID, _program, _range, _consequence),IsInstance(Vm_id, VMI_id)),

rule_desc(‘A virtual machine instance inherits the vulnerabilityfrom the parent VMI’)).

/***Model the Virtual Machine Image Backdoor Problem***/primitive(IsThirdPartyImage(VMI_id)).derived(ImageVulExists(VMI_id, sealthyBridge_id, _, _remoteExploit, privEscalation)).

interaction rule((ImageVulExists(VMI_id,stealthyBridge_id, _, _remoteExploit, privEscalation):-

IsThirdPartyImage(VMI_id)),rule_desc(‘A third party VMI could contain a stealthy bridge’)).

interaction rule((execCode(Vm_id, Perm):

VulEixsts(Vm_id, stealthyBridge_id, _, _, privEscalation),netAccess(H, _Protocol, _Port)),

rule_desc(‘remoteExploit of a stealthy bridge’)).

/***Model the Virtual Machine Co-residency Problem***/primitive(ResideOn(VM_id, H_id)).derived(stealthyBridgeExists(Vm_1,Vm_2, H_id, stealthyBridge_id).

interaction rule((stealthyBridgeExists(Vm_1,Vm_2, Host, stealthyBridge_id):-

execCode(Vm_1,_user),ResideOn(Vm_1, Host),ResideOn(Vm_2, Host)),

rule_desc(‘A stealthy bridge could be built between virtual machines co-residingon the same host after one virtual machine is compromised’)).

interaction rule((execCode(Vm_2,_user):-

stealthyBridgeExists(Vm_1,Vm_2, Host, stealthyBridge_id)),rule_desc(‘A stealthy bridge could lead to privilege escalation

on victim machine’)).

interaction rule((canAccessHost(Vm_2):-

logInService(Vm_2,Protocol,Port),stealthyBridgeExists(Vm_1,Vm_2,Host,stealthyBridge_id)),

rule_desc(‘Access a host through a log-in service by obtaining authenticationinformation through stealthy bridges’)).

63

3.5.2 Construction of Bayesian Networks

Deriving Bayesian networks from cross-layer attack graphs consists of four ma-

jor components: removing rule nodes in the attack graph, adding new nodes,

determining prior probabilities, and constructing CPTs.

Remove rule nodes of attack graph.

In an attack graph, the rule nodes imply how postconditions are derived from

preconditions. The derivation is deterministic and contains no uncertainty. There-

fore, these rule nodes have no e�ect on the reasoning process, and thus can be

removed when constructing the BN. To remove a rule node, its preconditions are

connected directly to its postconditions. For example, in Fig. 3.2, Node 26, 27, and

23 will be connected directly to Node 14 by removing Node 22.

Adding new nodes.

New nodes are added to capture the uncertainty of attacker action and the

uncertainty of evidence. To capture the uncertainty of attacker action, each step

has a separate AAN node as the parent, rather than sharing the same AAN

among multiple steps. The AAN node models attacker action at the granularity of

attack steps, and thus reflects the actual attack paths. To model the uncertainty

of evidence, whenever new evidence is observed, an EC pair is constructed and

attached to the supported node with uncertainty.

Determining prior probabilities.

Prior probability distributions should be determined for all root nodes that have

no parents, such as the vulnerability existence nodes, the network access nodes, or

the AAN nodes.

64

Constructing CPTs.

Some CPTs can be determined according to a standard, such as the the AC

metric in CVSS scoring system. The AC metric describes the exploit complexity

of vulnerabilities and thus can be used to derive the CPTs for corresponding

exploitations. Some other CPTs may involve security experts’ domain knowledge

and experience. For example, the VMIs from a trusted third party may have lower

probability of containing security holes such as backdoors, while those created and

shared by individual cloud users may have higher probability.

The constructed BN should be robust against small changes in prior probabilities

and CPTs. To ensure such robustness, we use SamIam [65] for sensitivity analysis

when constructing and debugging the BN. By specifying the requirements for an

interested node’s probability, SamIam will check the associated CPTs and provide

suggestions on feasible changes. For example, if we want to change P (N5 =

True) from 0.34 to 0.2, SamIam will provide two suggestions, either changing

P (N5 = True|N2 = True, N3 = True) from 0.9 to <= 0.43, or changing P (N3 =

True|N1 = True) from 0.3 to <= 0.125.

3.6 Experiment

3.6.1 Attack Scenario

Fig. 4.7 shows the network structure in our attack scenario. We have 3 major

enterprise networks: A, B, and C. A and B are all implemented within the cloud,

while C is implemented by partially cloud, and partially traditional network (the

65

Attacker

Web Server

File Server

Database Server

DNS Server

Email Server

Web Server

File Server SSH Server

Database Server

DNS Server

Email Server

VMI repository

Web Server

NFS Server SSH

Server

Database Server

DNS Server

Email Server

Enterprise A

Enterprise B Enterprise C

Cloud

Other Enterprise networks

Figure 3.8: The Attack Scenario [14]

servers are located in the cloud and the workstations are in a traditional network).

The attack includes several steps conducted by attacker Mallory.

Step 1, Mallory first publishes a VMI that provides a web service in the cloud.

This VMI is malicious in that it contains a security hole that Mallory knows how

to exploit. For example, this security hole could be an SSH user authentication

key (the public key located in .ssh/authorized_keys) that is intentionally left in

the VMI by Mallory. The leftover creates a backdoor that allows Mallory to login

into any instances derived from this malicious VMI using his own private key. The

security hole could also be an unknown vulnerability that is not yet publicly known.

To make the attack scenario more generic, we choose a vulnerability CVE-2007-

2446 [61], existing in Samba 3.0.0 [62], as the one imbedded in the malicious VMI,

but assume it as unknown for the purpose of simulation.

Step 2, the malicious VMI is then adopted and instantiated as a web server by

66

an innocent user from A. Mallory now wants to compromise the live instances, but

he needs to know which instances are derived from his malicious VMI. [52] provides

three possible ways for machine fingerprinting: ssh matching, service matching,

and web matching. Through ssh key matching, Mallory finds the right instance in

A and completes the exploitation towards CVE-2007-2446 [61].

Step 3, enterprise network B provides web services to a limited number of

customers, including A. With the acquired root privilege from A’s web server,

Mallory is able to access B’s web server, exploit one of its vulnerabilities CVE-

2007-5423 [63] from application tikiwiki 1.9.8 [64], and create a reverse shell.

Step 4, Mallory notices that enterprise B and C has a special relationship:

their web servers are implemented with virtual machines co-residing on the same

host. C is a start-up company that has some valuable information stored on its

CEO’s workstation. Mallory then leverages the co-residency relationship of the

web servers and launches a side-channel attack towards C’s web server to extract

its password. Mallory obtains user privilege through the attack. Mallory also

establishes a covert channel between the co-resident virtual machines for convenient

information exchange.

Step 5, the NFS server in C has a directory that is shared by all the servers and

workstations inside the company. Normally C’s web server should not have write

permission to this shared directory. But due to a configuration error of the NFS

export table, the web server is given write permission. Therefore, if Mallory can

upload a Trojan horse to the shared directory, other innocent users may download

the Trojan horse from this directory and install it. Hence Mallory crafts a Trojan

horse management_tool.deb and uploads it into the shared NSF directory on web

server.

67

Step 6, The innocent CEO from C downloads management_tool.deb and installs

it. Mallory then exploits the Trojan horse and creats a unsolicited connection back

to his own machine.

Step 7, Mallory’s VMI is also adopted by several other enterprise networks, so

Mallory compromises their instances using the same method in Step 2.

In this scenario, two stealthy bridges are established3: one is from Internet to

enterprise network A through exploiting an unknown vulnerability, the other one is

between enterprise network B and C by leveraging virtual machine co-residency.

The attack path crosses over three enterprise networks that reside in the same

cloud, and extends to C’s traditional network.

3.6.2 Experiment Result

The purpose of our experiment is to check whether the BN-based tool is able to

infer the existence of stealthy bridges given the evidence. The Bayesian network

has two inputs: the network deployment (network connection, host configuration,

and vulnerability information, etc.) and the evidence. The output of BN is the

probability of specific events, such as the probability of stealthy bridges being

established, or the probability of a web server being compromised. We view the

attackers’ sequence of attack steps as a set of ground truth. To evaluate the

e�ectiveness of the constructed BN, we compare the output of the BN with the

ground truth of the attack sequence. For example, given the ground truth that a

stealthy bridge has been established, we will check the corresponding probability

provided by the BN to see whether the result is convincible.3The enterprise networks in Step 7 are not key players, so we do not analyze the stealthy

bridges established in this step, but still use the raised alerts as evidence.

68

For the attack scenario illustrated in Fig. 4.7, the cross-layer BN is constructed

as in Fig. 3.9. By taking into account the existence of stealthy bridges, the cloud-

level attack graph has the capability of revealing potential hidden attack paths.

Therefore, the constructed BN also inherits the revealed hidden paths from the

cloud-level attack graph. For example, the white part in Fig. 3.9 shows the hidden

paths enabled by the stealthy bridge between enterprise network B and C. These

paths will be missed by individual attack graphs if the stealthy bridge is not

considered. The inputs for this BN are respectively the network deployment shown

in Table 3.34 and the collected evidence is shown in Table 3.4. Evidence is collected

against the attack steps described in our attack scenario. Not all attack steps have

corresponding observed evidence.

We conducted six sets of simulation experiments, each with a specific purpose.

For simplicity, we assume all attack steps are completed instantly with no time delay.

The ground truth in our attack scenario tells that one stealthy bridge between

attacker and enterprise A is established in attack step 2, and the other one between

B and C is established in step 4. By taking evidence with a certain order as input,

the BN will generate a corresponding sequence of probabilities for events of interest.

The probabilities are compared with the ground truth to evaluate the performance

of the BN.

3.6.2.1 Experiment 3.1: Probability Inferring

In experiment 3.1, we assume all the evidence is observed in the order of the

corresponding attack steps. We are interested in four events, a stealthy bridge4Aws,Bws,Cws,Cnfs,Cworkstation denote A’s web server, B’s web server, C’s web server, C’s

NFS server, C’s workstation respectively.

69

Table 3.3: Network Deployment [14]

Node Deployed Facts

N1 IsThirdPartyImage(VMI)

N2 IsInstance(Aws, VMI)

N4 netAccess(Aws,_protocol,_port)

N17 netServiceInfo(Bws,tikiwiki,http,80,_)

N19 ResideOn(Bws,H)

N20 ResideOn(Cws,H)

N21 hacl(Cws,Cnfs,nfsProtocol,nfsPort)

N27 nfsExport(Cnfs,’/export’,write,Cws)

N30 nfsMountd(CworkStation,’/mnt/share’, Cnfs,’/export’,read)

N32 VulExists(CworkStation,’CVE-2009-2692’,kernel,localExploit,privEscalation)

N41 IsInstance(Dws,VMI)

N43 netAccess(Dws,_protocol,_port)

exists in enterprise A’s web server (N5), the attacker can execute arbitrary code on

A’s web server (N8), a stealthy bridge exists in the host that B’s web server reside

(N22), and the attacker can execute arbitrary code on C’s web server (N25). N8

and N25 respectively imply that the stealthy bridges in N5 and N22 are successfully

established. Table 3.5 shows the results of experiment 3.1 given supporting evidence

with corresponding confidence values. The results indicate that the probability

of stealthy bridge existence is initially very low, and increases as more evidence

is collected. For example, Pr(N5 = True) increases from 34% with no evidence

observed to 88.95% given all evidence presented. This means that a stealthy bridge

is very likely to exist on enterprise A’s web server after enough evidence is collected.

70

Table 3.4: Collected Evidence Corresponding to Attack Steps [14]

Node Step Collected Evidence

N9 2 Wireshark shows multiple suspicious connections established

N11 2 IDS shows malicious packet detected

N13 2 Wireshark “follow tcp stream” shows a back telnet connection is instructedto open

N23 4 Cache monitor observes abnormal cache activities

N34 5 Tripwire shows several file modification toward management_tool.deb

N37 6 IDS shows Trojan horse installation

N39 6 Wireshark “follow tcp stream” find plain text in supposed encrypted-connection

N47 7 Wireshark shows a back telnet connection is instructed to open

N49 7 IDS shows malicious packet detected

The first stealthy bridge in our attack scenario is established in attack step 2,

and the corresponding pieces of evidence are N9, N11, and N13. Pr(N8 = True)

is 95.77% after all the evidence from step 2 is observed, but Pr(N5 = True) is

only 74.64%. This means that although the BN is almost sure that A’s web server

has been compromised, it doesn’t have the same confidence of attributing the

exploitation to the stealthy bridge, which is caused by the unknown vulnerability

inherited from a VMI. Pr(N5 = True) increases to 88.95% only after evidence N47

and N49 from other enterprise networks is observed for attack step 7. This means

that if the same alerts appear in other instances of the same VMI, the VMI is very

likely to contain the related unknown vulnerability.

The second stealthy bridge is established in step 4, and the corresponding

71

VM La

yer

VMI L

ayer N1_IsThirdPartyImage

N2_IsInstance N3_ImageVulExists N41_IsInstance

N5_Vul_StealthyBridgeN4_netAccess_Aws

N8_execCode_Aws

N18_execCode_Bws

N15_netAccess_Bws

N7_AAN_Aws

N12_ECN

N11_Evd_IDS_badPkt N13_Evd_Wireshark_TelnetConn

N9_Evd_Wireshark_multiConnN10_ECN

N14_ECN

N17_netSrv_Bws

N16_VulExists_tikiwiki

N6_AAN_Bws

N19_ResideOnH_Bws

N20_ResideOnH_Cws

N22_StealthyBridge_Exists_Bws_Cws_H

N25_execCode_Cws

N21_AAN_H

N23_Evd_abnormalCacheActivityN24_ECN

N26_hacl_Cws_Cnfs

N27_CnfsExport

N28_accessFile_Cnfs

N29_accessFile_Cws N30_nfsMountd_CworkSta

N31_TrojanInstalled_CworkSta

N32_VulExists_nullPointer

N33_AAN_CworkSta

N34_Evd_Tripwire_fileModification

N35_ECN

N36_execCode_CworkSta

N37_Evd_IDS_trojanInstallN38_ECN

N39_Evd_Wireshark_plainTextInEncryptedConnN40_ECN

N42_Vul_SB

N43_netAccess_AwsN46_execCode_otherVM

N7_AAN_AwsN50_ECN

N49_Evd_IDS_badPkt

N47_Evd_Wireshark_TelnetConn

N48_ECN

Host

Laye

r

Figure 3.9: The Cross-Layer Bayesian Network Constructed for the Attack Scenario [14]

evidence is N23. Pr(N22 = True) is 57.45% after evidence N9 to N23 is collected.

The number seems to be low. However, considering the unusual di�culty of

leveraging a co-residency relationship, this low probability still should be treated

with great attention. After all evidence is observed, the increase of Pr(N22 = True)

from 13.91% to 73.29% may require security experts to carefully scrutinize the

virtual machine isolation status on the related host.

72

Table 3.5: Results of experiment 3.1 [14]

EventsNo N9 N11 N13 N23 N34 N37 N39 N47 N49

evidence Medium High High High VeryHigh High VeryHigh VeryHigh VeryHigh

N5=True 34% 34% 51.54% 74.64% 75.22% 75.22% 75.41% 75.5% 86.07% 88.95%

N8=True 20.25% 22.96% 54.38% 95.77% 96.81% 96.81% 97.14% 97.31% 98.14% 98.37%

N22=True 13.91% 14.32% 19.03% 25.23% 57.45% 57.45% 67.67% 73.04% 73.24% 73.29%

N25=True 17.52% 17.89% 22.13% 27.71% 56.7% 56.7% 68.11% 74.1% 74.27% 74.32%

3.6.2.2 Experiment 3.2: Impact of False Alerts

Experiment 3.2 tests the influence of false alerts to BN. In this experiment, we

assume evidence N11 is a false alert generated by IDS. We perform the same analysis

as in experiment 3.1 and compare results with it. Table 3.6 shows that when only 3

pieces of evidence (N9, N11, and N13) are observed, the probability of the related

event is greatly a�ected by the false alert. For instance, Pr(N5 = True) is 74.64%

when N11 is correct, and is 53.9% when N11 is a false alert. But Pr(N8 = True)

is not greatly influenced by N11 because it’s not closely related to the false alert.

When all evidence is input into the BN, the influence of false alerts to related events

is reduced to an acceptable level. This shows that a BN can provide relatively

correct answer by combining the overall evidence set.

3.6.2.3 Experiment 3.3: Impact of Evidence Confidence Value

Since security experts may change their confidence value towards evidence based

on their new knowledge and observation, experiment 3.3 tests the influence of

73

Table 3.6: Results of experiment 3.2 [14]

Eventswith 3 pieces of evidence with all evidence

N11=True N11=False N11=True N11=False

N5 74.64% 53.9% 88.95% 79.59%

N8 95.77% 58.6% 98.37% 79.07%

N22 25.23% 19.66% 73.29% 68.62%

N25 27.71% 22.7% 74.32% 70.24%

Table 3.7: Results of Experiment 3.3 [14]

Eventswith 3 pieces of evidence with all evidence

N14=VeryHigh N14=Low N14=VeryHigh N14=Low

N5 74.64% 54.29% 88.95% 79.82%

N8 95.77% 59.30% 98.37% 79.54%

N22 25.23% 19.77% 73.29% 68.73%

N25 27.71% 22.79% 74.32% 70.34%

evidence confidence value to the BN. This experiment generates similar results as

in experiment 3.2, as shown in Table 3.7. When evidence is rare, the confidence

value changes from VeryHigh to Low has larger influence to related events than

when evidence is su�cient.

74

Table 3.8: Results of experiment 3.4

EventsNo N9 N11 N13 N47 N23 N34 N49 N37 N39

evidence Medium High VeryHigh VeryHigh High VeryHigh VeryHigh High VeryHigh

N5=True 34% 34% 51.54% 74.64% 85.51% 85.89% 85.89% 88.8% 88.9% 88.95%

N8=True 20.25% 22.96% 54.38% 95.77% 97.07% 97.8% 97.8% 98.06% 98.27% 98.37%

N22=True 13.91% 14.32% 19.03% 25.23% 25.43% 57.7% 57.7% 57.77% 67.96% 73.29%

N25=True 17.52% 17.89% 22.13% 27.71% 27.89% 56.93% 56.93% 56.99% 68.37% 74.32%

3.6.2.4 Experiment 3.4: Impact of Evidence Input Order

In experiment 3.4, we test the a�ect of evidence input order to the BN analysis

result (we assume the evidence is fed into BN immediately after it is collected). We

bring forward the evidence N47 and N49 from step 7 and insert them before N23

and N37 respectively. The results in Table 3.8 show that when all the evidence

from N9 to N39 is fed into BN, the final calculated probabilities are the same. This

means, given the same set of evidence, BN will generate the same result regardless

of the input order of evidence. However, this doesn’t imply that the input order

of evidence is not important for real-time security analysis. For example, in both

Table 3.5 and Table 3.8, N23 is the crucial evidence for determining Pr(N22 =

True). If N23 is collected at an early stage of the attack, the relatively high value of

Pr(N22 = True) generated by BN may alert network defenders to check the involved

virtual machines and hosts. As a result, the potential damage and loss to the

victim enterprise network could possibly be mitigated or even stopped. Therefore,

promptly collecting and feeding the evidence into BN is vital for real-time security

analysis.

75

Table 3.9: Results of experiment 3.5

EventsN12=VeryHigh N12=Medium N12=Low

N11=True N11=False N11=True N11=False N11=True N11=False

N5 76.49% 34.00% 71.12% 65.31% 69.96% 67.09%

N8 99.08% 22.96% 89.47% 79.01% 87.38% 82.25%

N22 25.73% 14.32% 24.29% 22.73% 23.98% 23.21%

N25 28.16% 17.89% 26.86% 25.46% 26.58% 25.89%

3.6.2.5 Experiment 3.5: Mitigate Impact of False Alerts by Tuning

Evidence Confidence Value

As evaluated in experiment 3.2, the ratio of false alerts in the overall evidence set

is an important factor determining the impact of false alerts. However, in real

security analysis, the ratio of false alerts is usually not a parameter that can be

adjusted. In most cases, it is determined by the deployed security sensors and will

not change significantly. For example, if an enterprise network deploys an IDS that

su�ers from high false rates, the ratio of false alerts in the overall evidence set

will also be relatively high. The ratio will generally remain unchanged unless the

security sensor is replaced. Hence, given such relatively stable ratio, it is important

to find another way to mitigate the impact of false alerts. Tuning the evidence

confidence value is one solution.

In experiment 3.5, we still assume evidence N11 is a false alert generated by

IDS and only 3 pieces of evidence (N9, N11, and N13) are observed (so that the

influence of confidence value towards impact of false alerts will be more evident).

76

Table 3.9 shows the computed probabilities when the confidence value (specified in

N12) for false alert N11 is “VeryHigh”, “Medium”, and “Low” respectively. When

the confidence value is “VeryHigh”, the false alert can generate great impact on

the final results (e.g. Pr(N5 = True) is 76.49% when N11 is “True”, and 34.00%

when N11 is “False”). When the confidence value for false alert N11 is “Low”, the

false alert has little impact on the final result (e.g. the results for Pr(N5 = True)

are very close: 69.96% when N11 is “True”, and 67.09% when N11 is “False”).

Therefore, the impact of false alerts can be mitigated by tuning the corresponding

confidence value for the evidence. In practical application, if a security sensor su�er

from high false rates, the evidence generated by this sensor should have a relatively

low confidence value. Similarly, evidence generated by security sensors with low

false rates should have a relatively high confidence value. In such a way, the impact

of false alerts can be mitigated in BN analysis.

3.6.2.6 Experiment 3.6: Complexity

Since the BN is constructed on the basis of an attack graph, the size of BN mainly

depends on the size of attack graph. According to Theorem 2 in [50], the logical

attack graph for a network with N machines has a size at most O(N2). As we apply

logical attack graph to cloud, we consider both virtual machines and physical hosts

and regard them as normal hosts having special connections between each other.

For a cloud with n virtual machines and m physical hosts, the corresponding attack

graph has a size at most O((n + m)2). Considering n >> m in a normal cloud, the

size should be at most O(n2).

To further investigate the inference costs for BNs, we constructed 11 Bayesian

77

Table 3.10: Size of Bayesian Networks

BN 1 2 3 4 5 6 7 8 9 10 11

# of nodes 39 49 520 745 1069 1589 2068 2588 3082 5150 10300

# of edges 37 48 668 968 1244 1912 2545 3213 3854 6399 12798

networks with di�erent size (Table 3.10) in SamIam. For most exact inference

algorithms, the complexity of inference is mainly determined by the treewidth of

the network. Nevertheless, determining the treewidth is also di�cult. While we

cannot explore all di�erent tree structures and inference algorithms in this limited

space, we provide the compilation costs for the BNs we constructed, as shown in

Fig. 3.10 and Fig. 3.11, to give readers a sense regarding the time and memory

cost. The experiment was conducted in SamIam, with recursive conditioning as

the inference algorithm adopted.

3.7 Related Work

We explore the literature for the following topics that are related to this chapter.

VMI sharing. [66] explores a variety of attacks that leverage the virtual

machine image sharing in Amazon EC2. Researchers were able to extract highly

sensitive information from publicly available VMIs. The analysis revealed that 30%

of the 1100 analyzed AMIs (Amazon Machine Images) at the time of the analysis

contained public keys that are backdoors for the AMI Publishers. The backdoor

problem is not limited to AMIs created by individuals, but also a�ects those from

78

0"0.5"1"

1.5"2"

2.5"3"

3.5"4"

4.5"5"

39" 49" 520" 745" 1069" 1589" 2068" 2588" 3082" 5150" 10300"

Compila(o

n*Time*(s)*

#*of*nodes*

Compila4on"Time"

Figure 3.10: Time Used for BN Compilation

well-known open-source projects and companies.

Co-Residency. The security issues caused by virtual machine co-residency

have attracted researchers’ attention recently. [43] pointed out that the shared

resource environment of cloud will introduce security issues that are fundamentally

new and unique to cloud. [37] shows how attackers can identify on which host

a target virtual machine is likely to reside in Amazon EC2, and then place the

malicious virtual machine onto the same host through a number of instantiating

attemps. Such co-residency can be used for further malicious activities, such

as launching side-channel attack to extract information from a target virtual

machine [38]. [42] takes an opposite perspective and proposes to detect co-residency

via side-channel analysis. [36] demonstrates a new class of attacks called resource-

freeing attacks (RFAs), which leverage the performance interference of co-resident

79

0"

5"

10"

15"

20"

25"

30"

35"

39" 49" 520" 745" 1069" 1589" 2068" 2588" 3082" 5150" 10300"

Mem

ory'Used'(M

b)'

#'of'nodes'

Memory"Used"

Figure 3.11: Memory Used for BN Compilation

virtual machine. [40] presents a tra�c analysis attack that can initiate a covert

channel and confirm co-residency with a target virtual machine instance. [39] also

considers attacks towards hypervisor and propose to eliminate the hypervisor attack

surface through new system design.

3.8 Conclusion and Discussion

This chapter identifies the problem of stealthy bridges between isolated enterprise

networks in the public cloud. To infer the existence of stealthy bridges, we propose a

two-step approach. A cloud-level attack graph is first built to capture the potential

attacks enabled by stealthy bridges. Based on the attack graph, a cross-layer

80

Bayesian network is constructed by identifying uncertainty types existing in attacks

exploiting stealthy bridges. The experiments show that the cross-layer Bayesian

network is able to infer the existence of stealthy bridges given supporting evidence

from other intrusion steps. However, one challenge posed by cloud environments

needs further e�ort. Since the structure of cloud is very dynamic, generating the

cloud-level attack graph from scratch whenever a change happens is expensive

and time-consuming. Therefore, an incremental algorithm needs to be developed

to address such frequent changes such as virtual machine turning on and o�,

configuration changes, etc.

81

Chapter 4 |ZePro: Probabilistic Identifica-tion of Zero-day Attack Paths

4.1 Introduction

Defending against zero-day attacks is one of the most fundamentally challenging

security problems yet to be solved. Zero-day attacks are usually enabled by

unknown vulnerabilities. The information asymmetry between what the attacker

knows and what the defender knows makes zero-day exploits extremely hard to

detect. Signature-based detection assumes that a signature is already extracted

from detected exploits. Anomaly detection [68–70] may detect zero-day exploits,

but this solution has to cope with high false positive rates.

Considering the extreme di�culty of detecting individual zero-day exploits, a

substantially more feasible strategy is to identify zero-day attack paths. In real

world, to achive the attack goal, attack campaigns rely on a chain of attack actions,

which forms an attack path. Each attack chain is a partial order of exploits and

82

each exploit is exploiting a particular vulnerability. A zero-day attack path is a

multi-step attack path that includes one or more zero-day exploits. A key insight

in dealing with zero-day attack paths is to analyze the chaining e�ect. Typically, it

is not very likely for a zero-day attack chain to be 100% zero-day, namely having

every exploit in the chain be a zero-day exploit. Hence, defenders can assume that

1) the non-zero-day exploits in the chain are detectable; 2) these detectable exploits

have certain chaining relationships with the zero-day exploits in the chain. As a

result, connecting the detected non-zero-day segments through a path is an e�ective

way of revealing the zero-day segments in the same chain.

Both alert correlation [71, 72] and attack graphs [47, 48, 50, 51] are possible

solutions for generating potential attack paths, but they are limited in revealing the

zero-day ones. They both can identify the non-zero-day segments (i.e., “islands”)

of a zero-day attack path; however, none of them can automatically bridge all

segments into a meaningful path and reveal the zero-day segments, especially when

di�erent segments may belong to totally irrelevant attack paths.

To address these limitations, Dai et al. proposed a system called Patrol [41]

to identify real zero-day attack paths from a large set of suspicious intrusion

propagation paths generated through tracking dependencies between OS-level

objects. The set of suspicious dependency paths is usually very huge or even

su�ers from serious path explosion problem. A root cause for such explosion is that

dependencies introduced by legitimate activities and dependencies introduced by

zero-day attacks are often tangled together. Hence, Patrol made an assumption

that extensive pre-knowledge are available to distinguish real zero-day attack paths

from suspicious ones: common features or attack patterns of known exploitations

can be extracted at the OS-level to help recognize future unknown exploitations

83

if similar features appear again. However, this assumption is too strong in that

1) the acquirement of such pre-knowledge is quite di�cult. It is a very ad hoc

and e�ort consuming process. It relies heavily on the availability of the history for

known vulnerability exploitations. Even if the history is available, investigating

and crafting the common features at OS-level for all types of exploitations requires

immeasurable amount of human analysts’ e�orts or even the whole community’s

e�orts; 2) future zero-day exploits do not necessarily share similar attack patterns

with previous known exploitations.

Therefore, in this chapter, we propose a probabilistic approach to identify the

zero-day attack paths. Our approach is to 1) establish an object instance graph to

capture the intrusion propagation, where an instance of an object is a “version” of

the object with a specific timestamp; 2) build a Bayesian network (BN) based on the

instance graph to leverage the intrusion evidence collected from various information

sources. Intrusion evidence can be the abnormal system and network activities

that are noticed by human admins or security sensors such as Intrusion Detection

Systems (IDSs). With the evidence, the instance-graph-based BN can quantitatively

compute the probabilities of object instances being infected. Connected through

dependency relations, the instances with high infection probabilities form a path,

which can be viewed as a zero-day attack path. Such paths are of manageable size

as the instance-graph-based BN can significantly narrow down the set of suspicious

objects.

Our new insights are as follows. First, due to path explosion, deterministic

dependency analysis is not adequate and will fall short. Innovative ways are

required to help separate the dependency paths introduced by legitimate activities

and dependency paths introduced by zero-day attacks. Second, through Bayesian

84

networks, a key di�erence between the two types of dependency paths becomes

visible. In a Bayesian network, a dependency path becomes a causality path

associated with the probabilities of system objects being infected. Typically the

infection probabilities for system objects involved in a zero-day dependency path are

substantially higher than the infection probabilities of objects involved in legitimate

paths. Therefore, our approach does not require any pre-knowledge to distinguish

the real zero-day attack paths from the legitimate ones.

This approach is supported based on the following rationales. First, a BN

is able to capture cause-and-e�ect relations, and thus can be used to model the

infection propagation among instances of di�erent system objects: the cause is

an already infected instance of one object, while the e�ect is its infection to an

innocent instance of another object. We name this cause-and-e�ect relation as a

type of infection causality, which is formed due to the information flow between

the two objects in a system call operation. Second, an instance graph can reflect

the infection propagation process by capturing the dependencies among instances

of di�erent system objects. Third, a BN can be constructed on top of the instance

graph because they couple well with each other: the dependencies among instances

of di�erent system objects can be directly interpreted into infection causalities in

the BN. The BN’s graphical nature makes it fit well with an instance graph.

The significance of our approach is as follows:

1) Our approach is systematic because Bayesian networks can incorporate

literally all kinds of knowledge the defender has about the zero-day attack paths.

The knowledge includes but is not limited to alerts generated by security sensors

such as IDS and Tripwire, reports provided by vulnerability scanners, system logs,

or even human inputs.

85

2) Our approach does not rely on particular assumptions or preconditions.

Therefore, it is applicable to almost all kinds of enterprise networks.

3) Our approach is elastic. Whenever new knowledge is gained about zero-

day attacks, such new knowledge can be incorporated and the e�ectiveness of

our approach can be enhanced. Whenever erroneous knowledge is identified, our

approach can easily get rid of the negative e�ects of the wrong knowledge.

4) The tool we built is automated. Today’s security analysis relies largely on

the manual work of human security analysts. Our automated tool can significantly

save security analysts’ time and address the human resource challenge.

To summarize, we made the following contributions.

• To the best of our knowledge, this work is the first probabilistic approach

towards zero-day attack path identification.

• We proposed constructing Bayesian network at the system object level by

introducing the object instance graph.

• We have designed and implemented a system prototype named ZePro, which

can e�ectively and automatically identify zero-day attack paths.

4.2 Rationales and Models

4.2.1 System Object Dependency Graph

This work classifies OS-level entities in UNIX-like systems into three types of objects:

processes, files and sockets. The operating system performs a set of operations

86

1

t1: process A reads file 1t2: process A creates process Bt3: process A creates process Ct4: process B writes file 2t5: process C writes file 1t6: process B reads file 3

(a) Simplified System Call Log in Time-order

process Afile 3

file 1

process Cprocess B

file 2

t1

t5t3t2

t4

t6

(b) SODG

Figure 4.1: An SODG. An SODG generated by parsing an example set of simplified system calllog. The label on each edge shows the time associated with the corresponding system call.

towards these objects via system calls such as read, write, etc. For instance, a

process can read from a file as input, and then write to a socket. Such interactions

among system objects enable intrusions to propagate from one object to another.

Generally an intrusion starts with one or several seed objects that are created

directly or indirectly by attackers. The intrusion seeds can be processes such as

compromised service programs, or files such as viruses, or corrupted data, etc. As

the intrusion seeds interact with other system objects via system call operations,

the innocent objects can get infected. We call this process as infection propagation.

87

Therefore the intrusion will propagate throughout the system, or even propagate

to the network through socket communications.

To capture the intrusion propagation, previous work [21, 22] has explored

constructing system level dependency graphs by parsing system call traces. This

type of dependency graph is known as System Object Dependency Graphs (SODGs).

Each system call is interpreted into three parts: a source object, a sink object, and a

dependency relation between them. The objects and the dependencies respectively

become nodes and directed edges in SODGs. For example, a process reading a file

in the system call read indicates that the process (sink) depends on the file (source).

The dependency is denoted as fileæprocess. Similar rules as in Table 2.1 as used in

previous work [21,22] can be adopted to generate such dependencies. Figure 4.1b

is an example SODG generated by parsing the simplified system call log shown in

Figure 4.1a.

4.2.2 Why use Bayesian Network?

The BN is a probabilistic graphical model that represents the cause-and-e�ect

relations. It is formally defined as a Directed Acyclic Graph (DAG) that contains

a set of nodes and directed edges, where a node denotes a variable of interest,

and an edge denotes the causality relations between two nodes. The strength of

such causality relation is indicated using a conditional probability table (CPT).

Figure 4.2 shows an example BN. Table 4.1 is the CPTs associated with p2. Given

p1 is true, the probability of p2 being true is 0.9, which can be represented with

P (p2 = T |p1 = T ) = 0.9. Similarly, the probability of p4 can be determined by

the states of p2 and p3 according to a CPT at p4. BN is able to incorporate the

88

...

...

p1

p2 p3

p4

Figure 4.2: An Example Bayesian Network.

collected evidence by updating the posterior probabilities of interested variables.

For example, after evidence p2 = T is observed, it can be incorporated by computing

probability P (p1 = T |p2 = T ).

The BN is applied on top of the system level dependency graph for the following

benefits. First, BN is an e�ective tool to incorporate intrusion evidence from a

variety of information sources. Alerts generated by di�erent security sensors are

usually isolated from each other. As a unified platform, BN is able to leverage these

alerts as attack evidence to aid the security analysis. Second, BN can quantitatively

compute the probabilities of objects being infected. The inferred probabilities are

the key guidance to identify zero-day attack paths. By only focusing on the objects

with high infection probabilities, the set of suspicious objects can be significantly

narrowed down. The zero-day attack paths formed by the high-probability objects

through dependency relations is thus of manageable size.

89

Table 4.1: CPT for Node p2 in Figure 4.2

CPT at node p2

p1=T p1=F

p2=T 0.9 0.01

p2=F 0.1 0.99

4.2.3 Problems of Constructing BN based on SODG

SODG has the potential to serve as the base of BN construction. For one thing,

BN has the capability of capturing cause-and-e�ect relations in infection propaga-

tion. For another thing, SODG reflects the dependency relations among system

objects. Such dependencies imply and can be leveraged to construct the infection

causalities in BN. For example, the dependency process Aæfile 1 in an SODG

can be interpreted into an infection causality relation in BN: file 1 is likely to be

infected if process A is already infected. In such a way, an SODG-based BN can be

constructed by directly taking the structure topology of SODG.

However, several drawbacks of the SODG prevent it from being the base of BN.

First, an SODG without time labels cannot reflect the correct information flow

according to the time order of system call operations. This is a problem because

the time labels cannot be preserved when constructing BNs based on SODGs. Lack

of time information will cause incorrect causality inference in the SODG-based BNs.

For example, without the time labels, the dependencies in Figure 4.1b indicates

infection causality relations existing among file 3, process B and file 2, meaning

that if file 3 is infected, process B and file 2 are likely to be infected by file 3.

Nevertheless, the time information shows that the system call operation “process B

reads file 3” happens at time t6, which is after the operation “process B writes file

90

2” at time t4. This implies that the status of file 3 has no direct influence on the

status of file 2.

Second, the SODG contains cycles among nodes. For instance, file 1, process A

and process C in Figure 4.1b form a cycle. By directly adopting the topology of

SODG, the SODG-based BN inevitably inherits cycles from SODG. However, the

BN is an acyclic probabilistic graphical model that does not allow any cycles.

Third, a node in an SODG can end up with having too many parent nodes,

which will render the CPT assignment di�cult and even impractical in the SODG-

based BN. For example, if process B in Figure 4.1b continuously reads hundreds

of files (which is normal in a practical operating system), it will get hundreds of

file nodes as its parents. In the corresponding SODG-based BN, if each file node

has two possible states that are “infected” and “uninfected”, and the total number

of parent file nodes are denoted as n, then the CPT at process B has to assign

2n numbers in order to specify the infection causality of the parent file nodes to

process B. This is impractical when n is very large.

Therefore, in this chapter we propose a new type of dependency graph, the

object instance graph, to address the above problems.

4.2.4 Object Instance Graph

In the object instance graph, each node is not an object, but an instance of the

object with a specific timestamp. Di�erent instances are di�erent “versions” of the

same object at di�erent time points, and can thus have di�erent infection status.

Definition 1. Object Instance Graph

If the system call trace in a time window T [tbegin

, tend

] is denoted as �T

and the

91

set of system objects (mainly processes, files or sockets) involved in �T

is denoted

as OT

, then the object instance graph is a directed graph GT

(V , E), where:

• V is the set of nodes, and initialized to empty set ?;

• E is the set of directed edges, and initialized to empty set ?;

• If a system call syscall œ �T

is parsed into two system object instances srci

,

sinkj

, i, j Ø 1, and a dependency relation depc

: srci

æsinkj

(according to

dependency rules in Table 2.1), where srci

is the ith instance of system object

src œ OT

, and sinkj

is the jth instance of system object sink œ OT

, then

V = V fi {srci

, sinkj

}, E = E fi {depc

}. The timestamps for syscall, depc

,

srci

, and sinkj

are respectively denoted as t_syscall, t_depc

, t_srci

, and

t_sinkj

. The t_depc

inherits t_syscall from syscall. The indexes i and j

are determined before adding srci

and sinkj

into V by:

– For ’ srcm

, sinkn

œ V , m, n Ø 1, if imax

and jmax

are respectively the

maximum indexes of instances for object src and sink, and;

– If ÷ srck

œ V , k Ø 1, then i = imax

, and t_srci

stays the same; Otherwise,

i = 1, and t_srci

is updated to t_syscall;

– If ÷ sinkz

œ V , z Ø 1, then j = jmax

+1; Otherwise, j = 1. In both

cases t_sinkj

is updated to t_syscall; If j Ø 2, then E = E fi {deps

:

sinkj≠1æsink

j

}.

• If aæb œ E and bæc œ E, then c transitively depends on a.

According to Definition 1, for src object, a new instance is created only when

no instances of src exist in the instance graph. For sink object, however, a new

92

instance is created whenever a srcæsink dependency appears. The underlying

insight is that the status of the src object will not be altered by srcæsink, while

the status of sink will be influenced. Hence a new instance for an object should be

created when the object has the possibility of being a�ected. A dependency depc

is

added between the most recent instance of src and the newly created instance of

sink. We name depc

as contact dependency because it is generated by the contact

between two di�erent objects through a system call operation.

In addition, when a new instance is created for an object, a new dependency

relation deps

is also added between the most recent instance and the new instance

of the same object. This is necessary and reasonable because the status of the new

instance can be influenced by the status of the most recent instance. We name deps

as state transition dependency because it is caused by the state transition between

di�erent instances of the same system object.

The instance graph can well tackle the problems existing in the SODG for

constructing BNs. It can be illustrated using Figure 4.3, an instance graph created

for the same simplified system call log as in Figure 4.1a. First, the instance graph

is able to reflect correct information flows by implying time information through

creating object instances. For example, instead of parsing the system call at time t6

directly into file 3æprocess B, Figure 4.3 parsed it into file 3 instance 1æprocess B

instance 2. Comparing to Figure 4.1b in which file 3 has indirect infection causality

on file 2 through process B, the instance graph in Figure 4.3 indicates that file 3

can only infect instance 2 of process B but no previous instances. Hence in this

graph file 3 does not have infection causality on file 2.

Second, instance graphs can break the cycles contained in SODGs. Again, in

Figure 4.3, the system call at time t5 is parsed into process C instance 1æfile 1

93

file 3 instance 1

process B instance 2

t6

file 1 instance 1

process A instance 1t1

file 1 instance 2

t5

process C instance 1

t3

process B instance 1

t2

t5

file 2 instance 1

t4t6

Figure 4.3: An Instance Graph. An instance graph generated by parsing the same set of simplifiedsystem call log as in Figure 4.1a. The label on each edge shows the time associated with thecorresponding system call operation. The dotted rectangle and ellipse are new instances of alreadyexisted objects. The solid edges and the dotted edges respectively denote the contact dependenciesand the state transition dependencies.

instance 2, rather than process Cæfile 1 as in Figure 4.1b. Therefore, instead of

pointing back to file 1, the edge from process C is directed to a new instance of file

1. As a result, the cycle formed by file 1, process A and process C is broken.

Third, the mechanism of creating new sink instances for a relation srcæsink

prevents the nodes in instance graphs from getting too many parents. For example,

process B instance 2 in Figure 4.3 has two parents: process B instance 1 and

file 3 instance 1. If process B appears again as the sink object in later srcæsink

dependencies, new instances of process B will be created instead of directly adding

src as the parent to process B instance 2. Therefore, a node in an instance graph

only has 2 parents at most: one is the previous instance for the same object; the

other one is an instance for a di�erent object that the node depends on.

94

...

...

sinkj srci

sinkj+1

...

Figure 4.4: The Infection Propagation Models.

4.3 Instance-graph-based Bayesian Networks

To build a BN based on an instance graph and compute probabilities for interested

variables, two steps are required. First, the CPTs have to be specified for each

node via constructing proper infection propagation models. Second, evidence

from di�erent information sources has to be incorporated into BN for subsequent

probability inference.

4.3.1 The Infection Propagation Models

In instance-graph-based BNs, each object instance has two possible states, “infected”

and “uninfected”. The strength of the infection causalities among the instances has

to be specified in corresponding CPTs. Our infection propagation models in this

work deal with two types of infection causalities, contact infection causalities and

state transition infection causalities, which correspond to the contact dependencies

and state transition dependencies in instance graphs.

Contact Infection Causality Model. This model captures the infection

95

Table 4.2: CPT for Node sinkj+1

sinkj=Infected sinkj=Uninfected

srci=Infected srci=Uninfected srci=Infected srci=Uninfected

sinkj+1=Infected 1 1 · fl

sinkj+1=Uninfected 0 0 1 ≠ · 1 ≠ fl

propagation between instances of two di�erent objects. Figure 4.4 shows a portion

of BN constructed when a dependency srcæsink occurs. Table 4.2 is the CPT

for sinkj+1. When sink

j

is uninfected, the probability of sinkj+1 being infected

depends on the infection status of srci

, a contact infection rate · and an intrinsic

infection rate fl, 0 Æ ·, fl Æ 1.

The intrinsic infection rate fl decides how likely sinkj+1 gets infected given

srci

is uninfected. In this case, since srci

is not the infection source of sinkj+1, if

sinkj+1 is infected, it should be caused by other factors. So fl can be determined

by the prior probabilities of an object being infected, which is usually a very small

constant number.

The contact infection rate · determines how likely sinkj+1 gets infected when

srci

is infected. The value of · determines to which extent the infection can be

propagated within the range of an instance graph. In an extreme case where · = 1,

all the object instances will get contaminated as long as they have contact with

the infected objects. In another extreme case where · = 0, the infection will be

confined inside the infected object and does not propagate to any other contacting

object instances. Our system allows security experts to tune the value of · based

on their knowledge and experience.

96

Since a large number of system call traces with ground truths are often un-

available, currently it is very unlikely to learn the parameters of · and fl using

statistical techniques. Hence, now these parameters have to be assigned by security

experts. Security experts can assign parameters in batch mode or provide di�erent

parameters for specific nodes based on their knowledge. We will evaluate the impact

of · and fl in Section 4.6. Bayesian network training and parameter learning is

beyond the scope of this chapter and will be investigated in future work.

State Transition Infection Causality Model. This model captures the

infection propagation between instances of the same objects. We follow one rule to

model this type of causalities: an object will never return to the state of “uninfected”

from the state of “infected”1. That is, once an instance of an object gets infected,

all future instances of this object will remain the infected state, regardless of

the infection status of other contacting object instances. This rule is enforced in

the CPT exemplified in Table 4.2. If sinkj

is infected, the infection probability

of sinkj+1 keeps to be 1, no matter whether src

i

is infected or not. If sinkj

is

uninfected, the infection probability of sinkj+1 is decided by the infection status of

srci

according to the contact infection causality model.

4.3.2 Evidence Incorporation

BN is able to incorporate security alerts from a variety of information sources

as the evidence of attack occurrence. Numerous ways have been developed to

capture intrusion symptoms, which can be caused by attacks exploiting both known1This rule is formulated based on the assumptions that no intrusion recovery operations are

performed and attackers only conduct malicious activities.

97

p1

p2 p3

p4

...p5

p6 p7

p8

Actual State of an Instance

Observation

The rest of BN

!!!!!!!

!!!

1

CPT at node Observation

Actual=Infected Actual=Uninfected

Observation=True 0.9 0.15

Observation=False 0.1 0.85

!

!

False&nega)ve&rate False&posi)ve&rate

Figure 4.5: Local Observation Model.

vulnerabilities and zero-day vulnerabilities. A tool Wireshark [77] can notice a

back telnet connection that is instructed to open; an IDS such as Snort [54] may

recognize a malicious packet; a packet analyzer tcpdump [78] can capture suspicious

network tra�c, etc. In addition, human security admins can also manually check

the system or network logs to discover other abnormal activities that cannot be

captured by security sensors. As more correct evidence is fed into BN, the identified

zero-day attack paths get closer to real facts.

In this work, we adopt two ways to incorporate evidence. First, add evidence

directly on a node by providing the infection state of the instance. If human

security experts have scrutinized an object and proven that an object is infected

at a specific time, they can feed the evidence to the instance-graph-based BN by

directly changing the infection status of the corresponding instance into infected.

Second, leverage the local observation model (LOM) [55] to model the uncertainty

98

towards observations. Human security admins or security sensors may notice

suspicious activities that imply attack occurrence. Nonetheless, these observations

often su�er from false rates. As shown in Figure 4.5, an observation node can

be added as the direct child node to an object instance. The implicit causality

relation is that the actual state of the instance can likely a�ect the observation

to be made. If the observation comes from security alerts, the CPT inherently

indicates the false rates of the security sensors. For example, P (Observation = True

| Actual = Uninfected) shows the false positive rate and P (Observation = False |

Actual = Infected) indicates the false negative rate.

4.4 System Design

Figure 4.6 shows the overall system design, which includes 6 components.

System call auditing and filtering. System call auditing is performed against all

running processes and should preserve su�cient OS-aware information. Subsequent

system call reconstruction can thus accurately identify the processes and files by

their process IDs or file descriptors. The filtering process basically prunes system

calls that involve redundant and very likely innocent objects, such as the dynamic

linked library files or some dummy objects. We conduct system call auditing at

run time towards each host in the enterprise network.

System call parsing and dependency extraction. The collected system call traces

are then sent to a central machine for o�-line analysis, where the dependency

relations between system objects are extracted according to Table 2.1.

Graph generation. The extracted dependencies are then analyzed line by line for

99

Algorithm 1 Algorithm of Object Instance Graph GenerationRequire: set D of system object dependenciesEnsure: the instance graph G(V , E)

1: for each dep: srcæsink œD do2: look up the most recent instance src

k

of src, sinkz

of sink in V3: if sink

z

/œV then4: create new instances sink15: V Ω V fi { sink1}6: if src

k

/œV then7: create new instances src18: V Ω V fi { src1}9: E Ω E fi { src1æsink1}

10: else11: E Ω E fi { src

k

æsink1}12: end if13: end if14: if sink

z

œV then15: create new instance sink

z+116: V Ω V fi { sink

z+1}17: E Ω E fi { sink

z

æsinkz+1}

18: if srck

/œV then19: create new instances src120: V Ω V fi { src1}21: E Ω E fi { src1æsink

z+1}22: else23: E Ω E fi { src

k

æsinkz+1}

24: end if25: end if26: end for

graph generation. The generated graph can be either host-wide or network-wide,

depending on the analysis scope. A network-wide instance graph can be constructed

by concatenating individual host-wide instance graphs through instances of the

communicating sockets. Algorithm 1 is the base algorithm for instance graph

generation, which is designed according to the logic in Definition 1.

BN construction. The BN is constructed by taking the topology of an instance

graph. The instances and dependencies in an instance graph become nodes and

100

edges in BN. Basically the nodes and the associated CPTs are specified in a .net

file, which is one file type that can carry the instance-graph-based BN.

Evidence incorporation and probability inference. Evidence is incorporated by

either providing the infection state of the object instance directly, or constructing

an local observation model (LOM) for the instance. After probability inference,

each node in the instance graph receives a probability.

Zero-day Attack Paths Identification. To reveal the zero-day attack paths from

the mess of instance graphs, the nodes with high probabilities are to be preserved,

while the link between them should not be broken. We implemented Algorithm 2 on

the basis of depth-first search (DFS) algorithm [82] to tag each node in the instance

graph as either possessing high probability itself, or having both an ancestor and a

descendant with high probabilities. The tagged nodes are the ones that actually

propagate the infection through the network, and thus should be preserved in the

final graph. Our system allows a probability threshold to be tuned for recognizing

high-probability nodes. For example, if the threshold is set at 80%, only instances

that have the infection probabilities of 80% or higher will be recognized as the

high-probability nodes.

4.5 Implementation

The whole system includes online system call auditing and o�-line data analysis.

System call auditing is implemented with a loadable kernel module. For the o�-

line data analysis, our prototype is implemented with approximately 2915 lines

of gawk code that constructs a .net file for the instance-graph-based BN and a

101

Algorithm 2 Algorithm of Zero-day Attack Paths IdentificationRequire: the instance graph G(V , E), a vertex v œ VEnsure: the zero-day attack path G

z

(Vz

, Ez

)1: function DFS(G, v, direction)2: set v as visited3: if direction = ancestor then4: set next

v

as parent of v that nextv

æv œ E5: set flag as has_high_probability_ancestor6: else if direction = descendant then7: set next

v

as child of v that vænextv

œ E8: set flag as has_high_probability_descendant9: end if

10: for all nextv

of v do11: if next

v

is not labeled as visited then12: if the probability for next

v

prob[nextv

]Ø threshold or nextv

ismarked as flag then

13: set find_high_probability as True14: else15: DFS(G, next

v

, direction)16: end if17: end if18: if find_high_probability is True then19: mark v as flag20: end if21: end for22: end function23: for all v œ E do24: DFS(G, v, ancestor)25: DFS(G, v, descendant)26: end for27: for all v œ V do28: if prob[v]Ø threshold or (v is marked as has_high_probability_ancestor

and v is marked as has_high_probability_descendant) then29: V

z

Ω Vz

fi v30: end if31: end for32: for all e : væw œ E do33: if v œ V

z

and w œ Vz

then34: E

z

Ω Ez

fi e35: end if36: end for

102

dot-compatible file for visualizing the zero-day attack paths in Graphviz [83], and

145 lines of Java code for probability inference, leveraging the API provided by the

BN tool SamIam [65].

An instance graph can be very large due to the introduction of instances.

Therefore, in addition to system call filtering, we also develop several ways to prune

that instance graphs while not impede reflecting the major infection propagation

process.

One helpful way is to ignore the repeated dependencies. It is common that

the same dependency may happen between two system objects for a number of

times, even through di�erent system call operations. For example, process A may

write file 1 for several times. In such cases, each time the write operation occurs,

a new instance of file 1 is created and a new dependency is added between the

most recent instance of process A and the new instance of file 1. If the status of

process A is not a�ected by any other system objects during this time period, the

infection status of file 1 will not change neither. Hence the new instances of file 1

and the related new dependencies become redundant information in understanding

the infection propagation. Therefore, a repeated srcæsink dependency can be

ignored if the src object is not influenced by other objects since the last time that

the same srcæsink dependency appeared.

Another way to simplify an instance graph is to ignore the root instances whose

original objects have never appear as the sink object in a srcæsink dependency

during the time period of being analyzed. For instance, file 3 in Figure 4.3 only

appears as the src object in the dependencies parsed from the system call log in

Figure 4.1a, so file 3 instance 1 can be ignored in the simplified instance graph.

Such instances are not influenced by other objects in the specified time window,

103

and thus are not manipulated by attackers, neither. Hence ignoring these root

instances does not break any routes of intrusion sequence and will not hinder the

understanding of infection propagation. This method is helpful for situations such

as a process reading a large number of configuration or header files.

A third way to prune an instance graph is to ignore some repeated mutual

dependencies, in which two objects will keep a�ecting each other through creating

new instances. One situation is that a process can frequently send and receive

messages from a socket. For example, in one of our experiments, 107 new instances

are created respectively for the process (pid:6706, pcmd:sshd) and the socket

(ip:192.168.101.5, port: 22 ) due to their interaction. Since no other objects are

involved during this procedure, the infection status of these two objects will keep

the same through all the new instances. Thus a simplified instance graph can

preserve the very first and last dependencies while neglect the middle ones. Another

situation is that a process can frequently take input from a file and then write the

output to it again after some operations. The middle repeated mutual dependencies

could also be ignored in a similar way.

4.6 Experiments

4.6.1 Attack Scenario

To demonstrate the merits of our system and compare experiment results with

Patrol [41], we implemented a similar attack scenario as in Patrol. We built a

test-bed network and launched a three-step attack towards it. Figure 4.7 illustrates

the attack scenario. Step 1, the attacker exploits vulnerability CVE-2008-0166 [75]

104

to gain root privilege on SSH Server through a brute-force key guessing attack. Step

2, since the export table on NFS Server is not set up appropriately, the attacker can

upload a malicious executable file to a public directory on NFS. The malicious file

contains a Trojan-horse that can exploit a vulnerability on a specific workstation.

The public directory is shared among all the hosts in the test-bed network so that a

workstation may access and download this malicious file. Step 3, once the malicious

file is mounted and installed on the workstation, the attacker is able to execute

arbitrary code on workstation.

To verify the e�ectiveness of our approach, we conducted two major sets of

experiments by providing di�erent vulnerabilities in step 3. In experiment 4.1, the

malicious file contains a Trojan-horse that exploits CVE-2009-2692 [73] existing in

the Linux kernel of workstation 3. CVE-2009-2692 is a vulnerability that allows local

users to gain privileges by triggering a NULL pointer dereference. In experiment

4.2, the malicious file contains another Trojan-horse leveraging CVE-2011-4089 [74]

on workstation 4. This vulnerability allows local users to execute arbitrary code by

precreating a temporary directory. Our goal is to test whether ZePro can reveal

both of the attack paths enabled by di�erent vulnerabilities.

Since zero-day exploits are not readily available, we emulate zero-day vulner-

abilities with known vulnerabilities. For example, we treat CVE-2009-2692 and

CVE-2011-4089 as zero-day vulnerabilities by assuming the current time is Dec

31, 2008. In addition, the configuration error on NFS is also viewed as a special

type of unknown vulnerability because it is ruled out by vulnerability scanners like

Nessus [56]. The strategy of emulation also brings another benefit. The information

for these “known zero-day” vulnerabilities can be available to verify the correctness

of our experiment results.

105

To capture the intrusion evidence for subsequent BN probability inference,

we deployed security sensors in the test-bed, such as firewalls, Snort, Tripwire,

Wireshark, Ntop [84] and Nessus. For sensors that need configuration, we tailored

their rules or policy files to match our hosts.

4.6.2 Experiment Results

While simultaneously logging the system calls on each host and collecting the

security alerts, we conducted the described three-step attacks. In experiment 4.1,

after analyzing a total number of 143120 system calls generated by three hosts, we

constructed an instance-graph-based BN with 1853 nodes and 2249 edges. Since

experiment 4.2 just di�ers from experiment 4.1 in attack step 3, in experiment 4.2

we only analyzed 54998 system calls generated by workstation 4. The constructed

BN contains 911 nodes and 1214 edges. The evidence as in Table 4.4 is collected

and fed into the two BNs respectively. We will present evaluation results for both

experiments in terms of correctness, size of zero-day attack paths and influence

of evidence. For other metrics, we only discuss experiment 4.1 because the two

experiments share similar evaluation conclusions.

4.6.2.1 Correctness

Given the evidence, Figure 4.8 and Figure 4.9 respectively illustrate the identified

zero-day attack paths for experiment 4.1 and 2 in the form of instance graphs. The

processes, files, and sockets are denoted with rectangles, ellipses, and diamonds

respectively. For both experiments, the intrinsic infection rate fl is set as 0.0001,

106

Syst

em C

all A

uditi

ng

and

Filte

ring

Syst

em C

all T

race

s

Grap

h Ge

nera

tion

Syst

em C

all P

arsin

g and

De

pend

ency

Ext

ract

ion

BN C

onst

ruct

ion

Evid

ence

Inco

rpor

atio

n an

d Pr

obab

ility

Infe

renc

eZe

ro-d

ay A

ttac

k Pa

th

Iden

tifica

tion

Depe

nden

cies

Inst

ance

Gra

phs

Inst

ance

-gra

ph-b

ased

BN

Inst

ance

Gra

phs w

ith P

roba

bilit

ies

Zero

-day

Att

ack

Path

s

Syst

em C

ompo

nent

s

Inte

rim O

utpu

tsIn

put

Outp

ut

Figu

re4.

6:Sy

stem

Des

ign.

Intr

anet

Att

acke

rSS

H S

erve

rD

atab

ase

Serv

er

Web

Ser

ver

Emai

l Ser

ver

NFS

Ser

ver

Wo

rkst

atio

n 3

Oth

er u

sers

in w

ild

DM

Z Fi

rew

all

Intr

anet

Fir

ewal

lIn

side

Fir

ewal

l

Wo

rkst

atio

n 1

Wo

rkst

atio

n 2

Bru

tefo

rce

key

gues

sing

NFS

mo

unt

Tro

jan

hor

se d

own

load

DM

ZInternet

Wo

rkst

atio

n 4

Insi

de

Figu

re4.

7:A

ttac

kSc

enar

io.

107

and the probability threshold of recognizing high-probability nodes is 80%. The

contact infection rates · for experiment 4.1 and 2 are respectively 0.9 and 0.8. We

mark the evidence with red color and the nodes that are verified to be malicious

with grey color. Figure 4.8 shows how the malicious file is uploaded from SSH

server to NSF server, and then gets executed on workstation 3. Figure 4.9 captures

the process of renaming /tmp/evil into /tmp/ls, and leveraging /tmp/ls for further

malicious activities such as adding an unauthorized root-privilege account into

/etc/passwd and /etc/shadow. Therefore, Figure 4.8 and Figure 4.9 have testified

the e�ectiveness of our approach for revealing actual zero-day attack paths.

It is worth noting that although no evidence is provided on NFS Server in

experiment 4.1, but the identified attack path can still demonstrate how NFS Server

contributes to the overall intrusion propagation: the file workstation_attack.tar.gz

is uploaded from SSH Server to the /exports directory on NFS Server, and then

downloaded to /mnt on workstation 3. More importantly, the identified path can

expose key objects that are related to the exploits of zero-day vulnerabilities. For

example, the identified system objects on NFS Server can alert system admins

for possible configuration errors because SSH Server should not have the privilege

of writing to the /exports directory. As another example, the object PAGE0:

memory(0-4096) on workstation 3 is also exposed as highly suspicious on the iden-

tified attack path. Page-zero is actually what triggers the null pointer dereference

and enables attackers gain privilege on workstation 3. Exposing the page-zero

object can help system admins to further diagnose how the intrusion happens and

propagates.

An additional merit of our approach is that the instance-graph-based BN can

clearly show the state transitions of an object using instances. By matching the

108

Workstation 3

NFS Server

SSH Server

x350.1: Snort Brute Force Alert

x4.1:(6560:6559:mount.nfs)

x4.2:(6560:6559:mount.nfs)

x10.1:(/etc/mtab:8798397) x1007.1:(172.18.34.5:2049)

x142.25:(192.168.101.5:22)

x253.3:(6706:6703:sshd)

x253.4:(6706:6703:sshd)

x253.5:(6706:6703:sshd)

x253.6:(6706:6703:sshd)

x253.7:(6706:6703:sshd)

x253.8:(6706:6703:sshd)

x254.1:(6707:6706:sshd)

x254.2:(6707:6706:sshd)

x254.3:(6707:6706:bash)

x254.4:(6707:6706:bash)

x254.5:(6707:6706:bash)

x254.6:(6707:6706:bash)

x254.7:(6707:6706:scp)

x259.1:(/mnt/workstation_attack.tar.gz:9453574)

x260.1:(/mnt:)

x1008.1:(5118:1:unfsd)

x1007.6:(172.18.34.5:2049)

x2006.2:(6737:6736:mount)

x1008.2:(5118:1:unfsd)

x1008.3:(5118:1:unfsd)

x1008.4:(5118:1:unfsd)

x1008.5:(5118:1:unfsd)

x1017.1:(/exports/workstation_attack.tar.gz:9453574)

x2006.3:(6737:6736:mount.nfs)

x2061.1:(/etc/mtab:1493088)

x2083.1:(/mnt/workstation_attack.tar.gz:9453574)

x2078.6:(6761:6719:cp)

x2082.2:(/home/user/test-bed/workstation_attack.tar.gz:1384576)

x2086.4:(6763:6719:tar)

x2086.5:(6763:6719:tar)

x2102.1:(/home/user/test-bed/workstation_attack/exploit.sh:1540318) x2107.1:(/home/user/test-bed/workstation_attack/wunderbar_emporium/exploit.c:1548376)x2108.1:(/home/user/test-bed/workstation_attack/wunderbar_emporium/wunderbar_emporium.sh:1548377)

x2114.1:(/home/user/test-bed/workstation_attack/wunderbar_emporium/pwnkernel.c:1548383)

x2144.2:(6781:6285:bash) x2311.3:(6794:6793:cc1)

x2147.2:(6783:6781:exploit.sh)

x2114.2:(/home/user/test-bed/workstation_attack/wunderbar_emporium/pwnkernel.c:1548383)

x2153.4:(6787:6783:sed)

x2114.3:(/home/user/test-bed/workstation_attack/wunderbar_emporium/pwnkernel.c:1548383)

x2157.1:(/home/user/test-bed/workstation_attack/wunderbar_emporium/pwnkernel2.c:1548383)

x2144.3:(6781:6285:exploit.sh)

x2144.4:(6781:6285:exploit.sh)

x2147.1:(6783:6781:exploit.sh)

x2152.1:(/home/user/test-bed/workstation_attack/wunderbar_emporium/pwnkernel1.c:1548396) x2153.1:(6787:6783:wunderbar_empor) x2154.1:(6788:6783:wunderbar_empor)x2158.1:(6789:6783:wunderbar_empor) x2308.1:(6793:6783:wunderbar_empor)

x2383.1:(6798:6783:wunderbar_empor)

x2397.1:(6803:6783:wunderbar_empor) x2460.1:(6812:6783:wunderbar_empor)

x2152.2:(/home/user/test-bed/workstation_attack/wunderbar_emporium/pwnkernel1.c:1548396)

x2152.3:(/home/user/test-bed/workstation_attack/wunderbar_emporium/pwnkernel1.c:1548396)

x2160.1:(/home/user/test-bed/workstation_attack/wunderbar_emporium/pwnkernel.c:1548396)

x2153.2:(6787:6783:wunderbar_empor)

x2153.3:(6787:6783:sed)

x2154.2:(6788:6783:wunderbar_empor)

x2154.3:(6788:6783:mv)

x2154.4:(6788:6783:mv)

x2154.5:(6788:6783:mv)

x2157.2:(/home/user/test-bed/workstation_attack/wunderbar_emporium/pwnkernel2.c:1548383)

x2158.2:(6789:6783:wunderbar_empor)

x2158.3:(6789:6783:mv)

x2158.4:(6789:6783:mv)

x2158.5:(6789:6783:mv)

x2385.3:(6799:6798:cc1)

x2308.2:(6793:6783:wunderbar_empor)

x2308.3:(6793:6783:cc)

x2310.1:(/tmp/cccXQxZn.s:2984222) x2311.4:(6794:6793:cc1)x2372.1:(/tmp/ccfRR34r.o:2984223)

x2373.5:(6795:6793:as)

x2310.2:(/tmp/cccXQxZn.s:2984222)

x2310.3:(/tmp/cccXQxZn.s:2984222)

x2373.3:(6795:6793:as)

x2311.5:(6794:6793:cc1)

x2311.6:(6794:6793:cc1)

x2311.7:(6794:6793:cc1)

x2311.8:(6794:6793:cc1)

x2311.9:(6794:6793:cc1)

x2311.10:(6794:6793:cc1)

x2311.11:(6794:6793:cc1)

x2311.12:(6794:6793:cc1)

x2311.13:(6794:6793:cc1)

x2311.14:(6794:6793:cc1)

x2311.15:(6794:6793:cc1)

x2311.16:(6794:6793:cc1)

x2311.17:(6794:6793:cc1)

x2311.18:(6794:6793:cc1)

x2311.19:(6794:6793:cc1)

x2311.20:(6794:6793:cc1)

x2311.21:(6794:6793:cc1)

x2311.22:(6794:6793:cc1)

x2311.23:(6794:6793:cc1)

x2311.24:(6794:6793:cc1)

x2311.25:(6794:6793:cc1)

x2311.26:(6794:6793:cc1)

x2311.27:(6794:6793:cc1)

x2311.28:(6794:6793:cc1)

x2311.29:(6794:6793:cc1)

x2311.30:(6794:6793:cc1)

x2311.31:(6794:6793:cc1)

x2311.32:(6794:6793:cc1)

x2311.33:(6794:6793:cc1)

x2311.34:(6794:6793:cc1)

x2311.35:(6794:6793:cc1)

x2311.36:(6794:6793:cc1)

x2311.37:(6794:6793:cc1)

x2311.38:(6794:6793:cc1)

x2311.39:(6794:6793:cc1)

x2311.40:(6794:6793:cc1)

x2311.41:(6794:6793:cc1)

x2311.42:(6794:6793:cc1)

x2311.43:(6794:6793:cc1)

x2311.44:(6794:6793:cc1)

x2311.45:(6794:6793:cc1)

x2311.46:(6794:6793:cc1)

x2311.47:(6794:6793:cc1)

x2311.48:(6794:6793:cc1)

x2311.49:(6794:6793:cc1)

x2311.50:(6794:6793:cc1)

x2311.51:(6794:6793:cc1)

x2311.52:(6794:6793:cc1)

x2311.53:(6794:6793:cc1)

x2311.54:(6794:6793:cc1)

x2311.55:(6794:6793:cc1)

x2311.56:(6794:6793:cc1)

x2311.57:(6794:6793:cc1)

x2311.58:(6794:6793:cc1)

x2311.59:(6794:6793:cc1)

x2311.60:(6794:6793:cc1)

x2311.61:(6794:6793:cc1)

x2311.62:(6794:6793:cc1)

x2311.63:(6794:6793:cc1)

x2311.64:(6794:6793:cc1)

x2372.2:(/tmp/ccfRR34r.o:2984223)

x2373.4:(6795:6793:as)

x2383.2:(6798:6783:wunderbar_empor)

x2383.3:(6798:6783:cc)

x2384.1:(/tmp/ccQXpwLK.s:2984226) x2385.4:(6799:6798:cc1)x2388.1:(/tmp/ccUZcd3t.o:2984227)

x2389.5:(6800:6798:as)

x2384.2:(/tmp/ccQXpwLK.s:2984226)

x2384.3:(/tmp/ccQXpwLK.s:2984226)

x2389.3:(6800:6798:as)

x2385.5:(6799:6798:cc1)

x2385.6:(6799:6798:cc1)

x2385.7:(6799:6798:cc1)

x2385.8:(6799:6798:cc1)

x2385.9:(6799:6798:cc1)

x2385.10:(6799:6798:cc1)

x2385.11:(6799:6798:cc1)

x2385.12:(6799:6798:cc1)

x2385.13:(6799:6798:cc1)

x2385.14:(6799:6798:cc1)

x2385.15:(6799:6798:cc1)

x2385.16:(6799:6798:cc1)

x2385.17:(6799:6798:cc1)

x2385.18:(6799:6798:cc1)

x2385.19:(6799:6798:cc1)

x2385.20:(6799:6798:cc1)

x2385.21:(6799:6798:cc1)

x2385.22:(6799:6798:cc1)

x2385.23:(6799:6798:cc1)

x2385.24:(6799:6798:cc1)

x2385.25:(6799:6798:cc1)

x2385.26:(6799:6798:cc1)

x2385.27:(6799:6798:cc1)

x2385.28:(6799:6798:cc1)

x2385.29:(6799:6798:cc1)

x2385.30:(6799:6798:cc1)

x2385.31:(6799:6798:cc1)

x2385.32:(6799:6798:cc1)

x2385.33:(6799:6798:cc1)

x2385.34:(6799:6798:cc1)

x2385.35:(6799:6798:cc1)

x2388.2:(/tmp/ccUZcd3t.o:2984227)

x2389.4:(6800:6798:as)

x2397.2:(6803:6783:wunderbar_empor)

x2397.3:(6803:6783:pwnkernel)

x2397.4:(6803:6783:pulseaudio)

x2397.5:(6803:6783:pulseaudio)

x2397.6:(6803:6783:pulseaudio)

x2397.7:(6803:6783:pulseaudio)

x2397.8:(6803:6783:pulseaudio)

x2397.9:(6803:6783:pulseaudio)

x2404.1:(/tmp/pulse-cart/pid:2984081)

x2397.10:(6803:6783:pulseaudio)

x2397.11:(6803:6783:pulseaudio)

x2409.1:(/home/cart/.esd_auth:974883)

x2397.12:(6803:6783:pulseaudio)

x2411.1:(/home/cart/.pulse-cookie:974885)

x2397.13:(6803:6783:pulseaudio)

x2397.14:(6803:6783:pulseaudio)

x2397.15:(6803:6783:pulseaudio)

x2397.16:(6803:6783:pulseaudio)

x2397.17:(6803:6783:pulseaudio)

x2397.18:(6803:6783:pulseaudio)

x2397.19:(6803:6783:pulseaudio)

x2397.20:(6803:6783:pulseaudio) x2421.1:(PAGE0:memory(0-4096))

x2397.21:(6803:6783:pulseaudio) x2423.1:(/tmp/sendfile.p4lbtq:2984231)

x2429.1:(6811:6803:sh)

x2429.2:(6811:6803:sh)

x2429.3:(6811:6803:useradd)

x2429.4:(6811:6803:useradd)

x2429.5:(6811:6803:useradd)

x2429.6:(6811:6803:useradd)

x2429.7:(6811:6803:useradd)

x2429.8:(6811:6803:useradd)x2433.1:(/etc/.pwd.lock:1491065) x2434.1:(/etc/passwd.6811:1493103)

x2429.9:(6811:6803:useradd)

x2429.10:(6811:6803:useradd)x2437.1:(/etc/shadow.6811:1493104)

x2429.11:(6811:6803:useradd)x2440.1:(/etc/group.6811:1493105)

x2429.12:(6811:6803:useradd)x2443.1:(/etc/gshadow.6811:1493106)

x2429.13:(6811:6803:useradd)

x2429.14:(6811:6803:useradd)

x2448.1:(/etc/passwd-:1491134)x2449.1:(/etc/passwd+:1493107) x2451.1:(/etc/shadow-:1491147) x2452.1:(/etc/shadow+:1493108)x2454.1:(/etc/group-:1491089)x2455.1:(/etc/group+:1493109) x2457.1:(/etc/gshadow-:1491091) x2458.1:(/etc/gshadow+:1493110)

x2450.1:(/etc/passwd:1493107) x2453.1:(/etc/shadow:1493108)

x2524.4:(6828:6815:cat)

x2456.1:(/etc/group:1493109) x2459.1:(/etc/gshadow:1493110)

x2460.2:(6812:6783:wunderbar_empor)

x2460.3:(6812:6783:mv)

x2460.4:(6812:6783:mv)

x2460.5:(6812:6783:mv)

x2493.7:(6815:6813:sshd)

x2493.8:(6815:6813:bash)

x2493.9:(6815:6813:bash)

x2493.10:(6815:6813:bash)

x2493.11:(6815:6813:bash)

x2493.12:(6815:6813:bash)x2503.1:(6818:6815:bash)

x2522.1:(6827:6815:bash)x2524.1:(6828:6815:bash) x2525.1:(6829:6815:bash)x2527.1:(6830:6815:bash) x2530.1:(6831:6815:bash) x2532.1:(6832:6815:bash) x2534.1:(6833:6815:bash) x2536.1:(6834:6815:bash) x2538.1:(6835:6815:bash)x2540.1:(6836:6815:bash) x2541.1:(6837:6815:bash)

x2522.2:(6827:6815:bash)

x2522.3:(6827:6815:ls)

x2522.4:(6827:6815:ls)

x2524.2:(6828:6815:bash)

x2524.3:(6828:6815:cat)

x2525.2:(6829:6815:bash)

x2525.3:(6829:6815:ls)

x2525.4:(6829:6815:ls)

x2527.2:(6830:6815:bash)

x2527.3:(6830:6815:touch)

x2529.1:(/virus:24610)

x2530.2:(6831:6815:bash)

x2530.3:(6831:6815:whoami)

x2530.4:(6831:6815:whoami)

x2532.2:(6832:6815:bash)

x2532.3:(6832:6815:ls)

x2532.4:(6832:6815:ls)

x2534.2:(6833:6815:bash)

x2534.3:(6833:6815:ls)

x2534.4:(6833:6815:ls)

x2536.2:(6834:6815:bash)

x2536.3:(6834:6815:rm)

x2538.2:(6835:6815:bash)

x2538.3:(6835:6815:ls)

x2538.4:(6835:6815:ls)

x2540.2:(6836:6815:bash)

x2540.3:(6836:6815:rm)

x2541.2:(6837:6815:bash)

x2541.3:(6837:6815:rm)

Figure 4.8: The zero-day Attack Path in the Form of an Instance Graph for Experiment 4.1.

109

Workstation 4

x2075.2:(22826:21856:bash)

x2075.3:(22826:21856:symlinkattack.o)

x2078.1:(/tmp/evil:565696)x2079.1:(/tmp/ls:565697)

x2227.1:(22868:22826:symlinkattack.o)

x2076.1:(./symlinkattack.o:null)

x2079.2:(/tmp/ls:565696)

x2079.3:(/tmp/ls:565704)

x2115.2:(22843:22829:ls)

x2218.5:(22861:22852:ln)

x2115.36:(22843:22829:ls) x2221.2:(22864:22852:ls)

...

x2115.7:(22843:22829:ls)

...

x2118.1:(/tmp/sh.c:565699) x2115.11:(22843:22829:ls)

... x2123.1:(22844:22843:ls)

x2115.17:(22843:22829:ls)

... x2189.1:(22849:22843:ls)

x2115.23:(22843:22829:ls)

... x2195.1:(22850:22843:ls)

x2115.35:(22843:22829:ls)

x2208.1:(22852:22843:ls)

x2132.10:(22845:22844:cc1)

...

x2123.11:(22844:22843:cc)

x2131.1:(/tmp/ccQAkM1b.s:565700)

x2132.1:(22845:22844:cc1)

x2141.1:(/tmp/ccMBTbmh.o:565701)

x2143.5:(22846:22844:as) x2150.1:(22847:22844:collect2)

x2131.2:(/tmp/ccQAkM1b.s:565700)

x2143.8:(22846:22844:as)

...

x2141.2:(/tmp/ccMBTbmh.o:565701)

x2143.7:(22846:22844:as)

x2141.3:(/tmp/ccMBTbmh.o:565701)

x2181.7:(22848:22847:ld)

x2143.6:(22846:22844:as)

x2148.2:(/etc/passwd:6072405)

x2291.9:(22912:4837:sshd)

x2302.3:(22921:22912:sshd)

x2305.1:(4766:1:dbus-daemon)

x2307.6:(22922:5037:hal-acl-tool)

x2315.1:(5033:1:hald)

x2317.6:(22923:22922:polkit-read-aut)

x2319.6:(22924:5037:hal-acl-tool)

x2321.6:(22925:22924:polkit-read-aut)

x2323.6:(22926:5037:hal-acl-tool)

x2334.10:(22928:22927:id)

x2337.6:(22929:22926:polkit-read-aut)

x2361.13:(22934:22921:vi)

x2368.13:(22936:22921:vi)

x2149.2:(/etc/group:6072409)

x2291.33:(22912:4837:sshd)

x2302.7:(22921:22912:sshd)

x2305.2:(4766:1:dbus-daemon) x2315.2:(5033:1:hald) x2317.5:(22923:22922:polkit-read-aut) x2321.5:(22925:22924:polkit-read-aut)

x2334.11:(22928:22927:id)

x2337.5:(22929:22926:polkit-read-aut)

...

x2150.33:(22847:22844:collect2)

x2181.2:(22848:22847:ld)

...

x2181.6:(22848:22847:ld)

x2182.2:(/tmp/sh:565706)

x2182.3:(/tmp/sh:565706)

x2182.4:(/tmp/sh:565706)

x2182.5:(/tmp/sh:565706)

x2195.5:(22850:22843:chmod)

x2229.2:(22870:22868:sh)

...

x2189.7:(22849:22843:chown)

......

x2208.4:(22852:22843:ls)

x2210.1:(22854:22852:ls)

x2213.1:(22856:22852:ls)

x2214.1:(22857:22852:ls)

x2218.1:(22861:22852:ls)

x2221.1:(22864:22852:ls)

...

x2210.10:(22854:22852:tempfile)

x2211.2:(/tmp/gztmpQw8u37:565704)

x2211.3:(/tmp/gztmpQw8u37:565704)

x2211.4:(/tmp/gztmpQw8u37:565704)

x2214.5:(22857:22852:chmod)

x2211.5:(/tmp/gztmpQw8u37:565704)

x2218.6:(22861:22852:ln)

...

...

...

x2221.8:(22864:22852:ls)

...

x2227.10:(22868:22826:sh)

x2229.1:(22870:22868:sh)

...

x2229.8:(22870:22868:sh)

x2265.1:(22909:22870:sh)

x2234.5:(172.18.34.5:22)

x2291.17:(22912:4837:sshd)

x2234.6:(172.18.34.5:22)

x2291.32:(22912:4837:sshd)

...

x2265.9:(22909:22870:useradd)

x2265.10:(22909:22870:useradd)

x2268.1:(/etc/passwd.22909:6072390)

x2265.11:(22909:22870:useradd)

x2265.12:(22909:22870:useradd)

x2271.1:(/etc/shadow.22909:6072394)

x2265.13:(22909:22870:useradd)

x2265.14:(22909:22870:useradd)

x2274.1:(/etc/group.22909:6072402)

x2265.15:(22909:22870:useradd)

x2276.1:(/etc/gshadow.22909:6072404)

...

x2265.18:(22909:22870:useradd)

x2265.19:(22909:22870:useradd)

x2280.1:(/etc/passwd-:6070462)

x2265.20:(22909:22870:useradd)

x2281.1:(/etc/passwd+:6072405)

x2265.21:(22909:22870:useradd)

x2282.1:(/etc/shadow-:6070475)

x2265.22:(22909:22870:useradd)

x2283.1:(/etc/shadow+:6072408)

x2265.23:(22909:22870:useradd)

x2284.1:(/etc/group-:6070417)

x2265.24:(22909:22870:useradd)

x2285.1:(/etc/group+:6072409)

x2265.25:(22909:22870:useradd)

x2286.1:(/etc/gshadow-:6070419)

x2265.26:(22909:22870:useradd)

x2287.1:(/etc/gshadow+:6072410)

x2273.2:(/etc/shadow:6072408)

x2291.30:(22912:4837:sshd)

...

x2291.16:(22912:4837:sshd)

...

x2294.1:(22913:22912:sshd)

x2291.31:(22912:4837:sshd)

...

x2291.38:(22912:4837:sshd)

x2302.1:(22921:22912:sshd)

x2302.2:(22921:22912:sshd)

...

...

x2302.25:(22921:22912:bash)

x2302.26:(22921:22912:bash)x2329.1:(22927:22921:bash)

...

x2302.39:(22921:22912:bash)

x2302.40:(22921:22912:bash)

x2355.1:(22932:22921:bash)

...

x2302.44:(22921:22912:bash)

x2361.1:(22934:22921:bash)x2368.1:(22936:22921:bash)

x2317.1:(22923:22922:hal-acl-tool)

x2317.2:(22923:22922:polkit-read-aut)

x2317.3:(22923:22922:polkit-read-aut)

x2317.4:(22923:22922:polkit-read-aut)

x2321.1:(22925:22924:hal-acl-tool)

x2321.2:(22925:22924:polkit-read-aut)

x2321.3:(22925:22924:polkit-read-aut)

x2321.4:(22925:22924:polkit-read-aut)

x2337.1:(22929:22926:hal-acl-tool)

...

x2329.6:(22927:22921:bash)

x2334.1:(22928:22927:bash)

...

...

...

x2355.4:(22932:22921:touch)

x2356.1:(/virus:24580)

x2356.2:(/virus:24580)

x2361.5:(22934:22921:vi)x2368.5:(22936:22921:vi)

...

...

x2361.14:(22934:22921:vi)

x2365.1:(/.virus.swp:24581)

x2366.1:(/.virus.swpx:24582)

x2361.15:(22934:22921:vi)

x2361.16:(22934:22921:vi)

x2365.2:(/.virus.swp:24581)

x2365.3:(/.virus.swp:24582)

x2365.4:(/.virus.swp:24582)

x2368.14:(22936:22921:vi)

x2365.5:(/.virus.swp:24582)

x2366.2:(/.virus.swpx:24583)

x2368.15:(22936:22921:vi)

...

...

x2368.16:(22936:22921:vi)

x2368.17:(22936:22921:vi)

Rename /tmp/evil into /tmp/ls

Compile the malicious executable

Add an unauthorized root-privilege account into /etc/passwd and /etc/shadow

Add the virus file

Figure 4.9: The zero-day Attack Path in the Form of an Instance Graph for Experiment 4.2.

110

Table 4.3: The Impact of Pruning the Instance Graphs

SSH Server NFS Server Workstation 3

before after before after before after

number of syscalls in raw datatrace

82133 14944 46043

size of raw data trace (MB) 13.8 2.3 7.9

number of extracted object de-pendencies

10310 11535 17516

number of objects 349 20 544

number of instances(nodes) ininstance graph

10447 745 11544 39 17849 1069

number of dependencies(edges)in instance graph

20186 968 19863 37 34549 1244

number of contact dependen-cies

9888 372 8329 8 17033 508

number of state transition de-pendencies

10298 596 11534 29 17516 736

average time for graph genera-tion(s)

14 11 6 5 13 11

.net file size(KB) 2000 123 2200 8 3600 180

instances and dependencies back to the system call traces, it can even find out

the exact system call that causes the state-changing of the object. For example,

the node x2086.4:(6763:6719:tar) in Figure 4.8 represents the fourth instance of

process (pid:6763, pcmd:tar). Previous instances of the process are considered as

innocent because of their low infection probabilities. The process becomes highly

suspicious only after a dependency occurs between node x2082.2:(/home/user/test-

111

Table 4.4: The Collected Evidence

Exp ID Host Evidence

Exp 1 E1 SSH Server Snort messages “potential SSH brute force attack”

E2 Workstation 3 Tripwire reports “/virus is added”

E3 Workstation 3 Tripwire reports “/etc/passwd is modified”

E4 Workstation 3 Tripwire reports “/etc/shadow is modified”

Exp 2 E5 Workstation 4 Tripwire reports “/symlinkattack.o is added”

E6 Workstation 4 Tripwire reports “/virus is added”

bed/workstation_attack.tar.gz:1384576) and node x2086.4. Matching the depen-

dency back to the system call traces reveals that the state change of the pro-

cess is caused by “syscall:read, start:827189, end:827230, pid:6763, ppid:6719,

pcmd:tar, ftype:REG, pathname:/home/user/test-bed/workstation_attack.tar.gz,

inode:1384576”, a system call indicating that the process reads a suspicious file.

4.6.2.2 Size of Instance Graph and Zero-day Attack Paths

We also evaluated the size of instance graphs and the e�ectiveness of our pruning

techniques for reducing the number of instances. Table 4.3 summarizes the impact

of pruning instance graphs for each host in experiment 4.1. It shows that the

number of instances is reduced from 39840 to 1853. On average each object has 2.03

instances, which is quite acceptable. To further gain the object-level comprehension

of zero-day attack paths, ZePro also supports converting instance graphs to system

object dependency graph by merging all the instances belonging to the same object

112

into one node. Zero-day attack paths in SODG contain only objects and can be used

for verification when details regarding instances are not needed. Figure 4.10 and

Figure 4.11 are respectively the SODG form of zero-day attack paths for Figure 4.8

and Figure 4.9.

The experiment results have demonstrated that our system ZePro substantially

outperforms Patrol. Without any pre-knowledge towards known vulnerability

exploits and OS-level exploitation features (which are mandatory information for

Patrol to work), Zepro generates much better results than Patrol. In experiment 4.1,

the zero-day attack path identified by Patrol contains 175 objects, while the path by

our system is composed of only 77 objects (Figure 4.10). Considering that the total

number of objects involved in original instance graph is only 913, the 56% reduction

of path size is substantial. In experiment 4.2, the size of zero-day attack paths

revealed by Patrol and ZePro are very close: the path by Patrol has 60 nodes and

the path by ZePro has 61 nodes (Figure 4.11). This is because the objects involved

in these paths are already the smallest set of suspicious objects to constitute the

paths. Further reduction of objects will hurt the completeness of revealed zero-day

attack paths. More importantly, when the extensive pre-knowledge is not available

(which is usual), ZePro remains as e�ective, but Patrol will result in a large number

of suspicious intrusion propagation paths and is incapable of recognizing real attack

paths hiding in these candidates. For example, in Patrol’s dataset where SSH server

takes a workload of 1 request per 5 seconds, a 15-minute system call log generates

180 candidate paths that tangle with the real zero-day attack paths.

113

Wor

ksta

tion

3

NFS

Ser

ver

SSH

Ser

ver

x4.2

:(656

0:65

59:m

ount

.nfs

)

x10.

1:(/e

tc/m

tab:

8798

397)

x100

7.6:

(172

.18.

34.5

:204

9)

x142

.25:

(192

.168

.101

.5:2

2)

x350

.1: S

nort

Bru

te F

orce

Ale

rt

x253

.8:(6

706:

6703

:ssh

d) x254

.7:(6

707:

6706

:scp

)

x259

.1:(/

mnt

/wor

ksta

tion_

atta

ck.ta

r.gz:

9453

574)

x260

.1:(/

mnt

:)

x100

8.5:

(511

8:1:

unfs

d)x2

006.

3:(6

737:

6736

:mou

nt.n

fs)

x101

7.1:

(/exp

orts

/wor

ksta

tion_

atta

ck.ta

r.gz:

9453

574)

x206

1.1:

(/etc

/mta

b:14

9308

8)x2

083.

1:(/m

nt/w

orks

tatio

n_at

tack

.tar.g

z:94

5357

4)

x207

8.6:

(676

1:67

19:c

p)

x208

2.2:

(/hom

e/us

er/te

st-b

ed/w

orks

tatio

n_at

tack

.tar.g

z:13

8457

6)

x208

6.5:

(676

3:67

19:ta

r)

x210

2.1:

(/hom

e/us

er/te

st-b

ed/w

orks

tatio

n_at

tack

/exp

loit.

sh:1

5403

18)

x210

7.1:

(/hom

e/us

er/te

st-b

ed/w

orks

tatio

n_at

tack

/wun

derb

ar_e

mpo

rium

/exp

loit.

c:15

4837

6)x2

108.

1:(/h

ome/

user

/test

-bed

/wor

ksta

tion_

atta

ck/w

unde

rbar

_em

poriu

m/w

unde

rbar

_em

poriu

m.sh

:154

8377

)

x211

4.3:

(/hom

e/us

er/te

st-b

ed/w

orks

tatio

n_at

tack

/wun

derb

ar_e

mpo

rium

/pw

nker

nel.c

:154

8383

)

x214

4.4:

(678

1:62

85:e

xplo

it.sh

)

x231

1.64

:(679

4:67

93:c

c1)

x214

7.2:

(678

3:67

81:e

xplo

it.sh

)

x215

3.4:

(678

7:67

83:s

ed)

x215

7.2:

(/hom

e/us

er/te

st-b

ed/w

orks

tatio

n_at

tack

/wun

derb

ar_e

mpo

rium

/pw

nker

nel2

.c:1

5483

83)

x215

2.3:

(/hom

e/us

er/te

st-b

ed/w

orks

tatio

n_at

tack

/wun

derb

ar_e

mpo

rium

/pw

nker

nel1

.c:1

5483

96)

x215

4.5:

(678

8:67

83:m

v)x2

158.

5:(6

789:

6783

:mv)

x230

8.3:

(679

3:67

83:c

c)

x238

3.3:

(679

8:67

83:c

c)

x239

7.21

:(680

3:67

83:p

ulse

audi

o)x2

460.

5:(6

812:

6783

:mv)

x216

0.1:

(/hom

e/us

er/te

st-b

ed/w

orks

tatio

n_at

tack

/wun

derb

ar_e

mpo

rium

/pw

nker

nel.c

:154

8396

)

x238

5.35

:(679

9:67

98:c

c1)

x231

0.3:

(/tm

p/cc

cXQ

xZn.

s:29

8422

2)

x237

2.2:

(/tm

p/cc

fRR

34r.o

:298

4223

)

x237

3.5:

(679

5:67

93:a

s)

x238

4.3:

(/tm

p/cc

QX

pwLK

.s:29

8422

6)

x238

8.2:

(/tm

p/cc

UZc

d3t.o

:298

4227

)

x238

9.5:

(680

0:67

98:a

s)

x240

4.1:

(/tm

p/pu

lse-

cart/

pid:

2984

081)

x240

9.1:

(/hom

e/ca

rt/.e

sd_a

uth:

9748

83)

x241

1.1:

(/hom

e/ca

rt/.p

ulse

-coo

kie:

9748

85)

x242

1.1:

(PA

GE0

:mem

ory(

0-40

96))

x242

3.1:

(/tm

p/se

ndfil

e.p4

lbtq

:298

4231

)x2

429.

14:(6

811:

6803

:use

radd

)

x243

3.1:

(/etc

/.pw

d.lo

ck:1

4910

65)

x243

4.1:

(/etc

/pas

swd.

6811

:149

3103

)x2

437.

1:(/e

tc/s

hado

w.68

11:1

4931

04)

x244

0.1:

(/etc

/gro

up.6

811:

1493

105)

x244

3.1:

(/etc

/gsh

adow

.681

1:14

9310

6)x2

448.

1:(/e

tc/p

assw

d-:1

4911

34)

x244

9.1:

(/etc

/pas

swd+

:149

3107

)x2

451.

1:(/e

tc/s

hado

w-:1

4911

47)

x245

2.1:

(/etc

/sha

dow

+:14

9310

8)x2

454.

1:(/e

tc/g

roup

-:149

1089

)x2

455.

1:(/e

tc/g

roup

+:14

9310

9)x2

457.

1:(/e

tc/g

shad

ow-:1

4910

91)

x245

8.1:

(/etc

/gsh

adow

+:14

9311

0)

x245

0.1:

(/etc

/pas

swd:

1493

107)

x245

3.1:

(/etc

/sha

dow

:149

3108

)

x252

4.4:

(682

8:68

15:c

at)

x245

6.1:

(/etc

/gro

up:1

4931

09)

x245

9.1:

(/etc

/gsh

adow

:149

3110

)x2

493.

12:(6

815:

6813

:bas

h)

x250

3.1:

(681

8:68

15:b

ash)

x252

2.4:

(682

7:68

15:ls

)x2

525.

4:(6

829:

6815

:ls)

x252

7.3:

(683

0:68

15:to

uch)

x253

0.4:

(683

1:68

15:w

hoam

i)x2

532.

4:(6

832:

6815

:ls)

x253

4.4:

(683

3:68

15:ls

)x2

536.

3:(6

834:

6815

:rm)

x253

8.4:

(683

5:68

15:ls

)x2

540.

3:(6

836:

6815

:rm)

x254

1.3:

(683

7:68

15:rm

)

x252

9.1:

(/viru

s:24

610)

Figu

re4.

10:

The

Obj

ect-

leve

lZer

o-da

yA

ttac

kPa

thin

Expe

rimen

t4.

1.

114

Workstation 4

x2075.3:(22826:21856:symlinkattack.o)

x2078.1:(/tmp/evil:565696)

x2079.3:(/tmp/ls:565704) x2227.10:(22868:22826:sh)

x2076.1:(./symlinkattack.o:null)

x2115.36:(22843:22829:ls)

x2218.6:(22861:22852:ln)

x2221.8:(22864:22852:ls)

x2118.1:(/tmp/sh.c:565699) x2123.11:(22844:22843:cc)

x2189.7:(22849:22843:chown)

x2195.5:(22850:22843:chmod)

x2208.4:(22852:22843:ls)

x2132.10:(22845:22844:cc1)

x2131.2:(/tmp/ccQAkM1b.s:565700)

x2141.3:(/tmp/ccMBTbmh.o:565701)

x2143.8:(22846:22844:as)

x2150.33:(22847:22844:collect2)

x2181.7:(22848:22847:ld)

x2148.2:(/etc/passwd:6072405)

x2291.38:(22912:4837:sshd)

x2302.44:(22921:22912:bash)

x2305.2:(4766:1:dbus-daemon) x2307.6:(22922:5037:hal-acl-tool)x2315.2:(5033:1:hald)

x2317.6:(22923:22922:polkit-read-aut)

x2319.6:(22924:5037:hal-acl-tool)

x2321.6:(22925:22924:polkit-read-aut)

x2323.6:(22926:5037:hal-acl-tool)

x2334.11:(22928:22927:id)

x2337.6:(22929:22926:polkit-read-aut)

x2361.16:(22934:22921:vi)

x2368.17:(22936:22921:vi)

x2149.2:(/etc/group:6072409)

x2182.5:(/tmp/sh:565706)

x2229.8:(22870:22868:sh)

x2210.10:(22854:22852:tempfile) x2213.1:(22856:22852:ls)

x2214.5:(22857:22852:chmod)

x2211.5:(/tmp/gztmpQw8u37:565704)

x2265.26:(22909:22870:useradd)

x2234.6:(172.18.34.5:22)

x2268.1:(/etc/passwd.22909:6072390) x2271.1:(/etc/shadow.22909:6072394) x2274.1:(/etc/group.22909:6072402) x2276.1:(/etc/gshadow.22909:6072404) x2280.1:(/etc/passwd-:6070462) x2281.1:(/etc/passwd+:6072405) x2282.1:(/etc/shadow-:6070475)x2283.1:(/etc/shadow+:6072408) x2284.1:(/etc/group-:6070417)x2285.1:(/etc/group+:6072409) x2286.1:(/etc/gshadow-:6070419) x2287.1:(/etc/gshadow+:6072410)

x2273.2:(/etc/shadow:6072408)

x2294.1:(22913:22912:sshd)

x2329.6:(22927:22921:bash) x2355.4:(22932:22921:touch)

x2356.2:(/virus:24580)

x2365.5:(/.virus.swp:24582)

x2366.2:(/.virus.swpx:24583)

Figure 4.11: The Object-level Zero-day Attack Path in Experiment 4.2.

115

Table 4.5: The Influence of Evidence in Experiment 4.1

Evidence

SSH Server NFS Server Workstation 3

x4.1 x10.1 x253.3 x1007.1 x1017.1 x2006.2 x2083.1 x2108.1 x2311.32

No Evi. 0.56% 0.51% 0.57% 0.51% 0.54% 0.54% 0.51% 0.51% 1.21%

E1 63.76% 57.38% 79.13% 57.38% 46.54% 41.92% 37.75% 24.89% 26.93%

E2 63.76% 57.38% 79.13% 57.38% 46.94% 42.58% 38.34% 27.04% 30.09%

E3 86.82% 78.14% 80.76% 84.50% 75.63% 81.26% 79.56% 75.56% 81.55%

E4 86.84% 78.16% 80.77% 84.53% 75.65% 81.3% 79.59% 75.60% 81.66%

Table 4.6: The Influence of Evidence in Experiment 4.2

Evidence

Workstation 4

x2078.1 x2079.3 x2265.26 x2273.2 x2148.2

No Evi. 0.05% 0.75% 1.51% 0.93% 0.91%

E5 64.01% 74.43% 54.63% 34.95% 34.94%

E6 79.82% 93.63% 98.82% 65.63% 68.82%

4.6.2.3 Influence of Evidence

In both experiments, we choose a number of nodes in Figure 4.8 and Figure 4.9

as the representative interested instances. Table 4.5 and Table 4.6 respectively

shows how the infection probabilities of these instances change after each piece

of evidence is fed into BN. We assume the evidence is observed in the order of

attack sequence. In Table 4.5, the results show that when no evidence is available,

the infection probabilities for all nodes are very low. When E1 is added, only a

116

few instances on SSH Server receive probabilities higher than 60%. After E2 is

observed, the infection probabilities for instances on Workstation 3 increase, but

still not much. As E3 and E4 arrive, 5 of the 9 representative instances on all three

hosts become highly suspicious. Table 4.6 reflects similar probability inference

results in experiment 4.2. The infection probabilities of representative instances get

increased as E5 and E6 are added. Therefore, the evidence makes the instances on

the actual attack paths emerge gradually from the “sea” of instances in the instance

graph. However, it is also possible that the arrival of some evidence may decrease

the probabilities of certain instances, so that these instances will get removed from

the final path. In a word, as more evidence is collected, the revealed zero-day attack

paths become closer to the actual fact.

4.6.2.4 Influence of False Alerts

We assume that E4 is a false alarm generated by Tripwire and evaluate its influence

to the BN output. Table 4.7 shows that when only one piece of evidence exists, the

observation of E4 will at least greatly influence the probabilities of some instances

on Workstation 3. However, when other evidence is fed into BN, the influence of

E4 decreases. For instance, given just E1, the infection probability of x2006.2 is

97.78% when E4 is true, but should be 29.96% if E4 is a false alert. Nonetheless,

if all other evidence is already input into BN, the infection probability of x2006.2

only changes from 81.13% to 81.3% if E4 becomes a false alert. Therefore, the

impact of false alerts can be reduced as more evidence is collected.

117

Table 4.7: The Influence of False Alerts

Evidence x4.1 x10.1 x253.3 x1007.1 x1017.1 x2006.2 x2083.1 x2108.1 x2311.32

Only E1E4=True 98.46% 88.62% 81.59% 98.20% 88.30% 97.78% 97.67% 90.23% 94.44%

E4=False 56.33% 50.70% 78.60% 48.65% 37.60% 29.96% 24.92% 10.89% 12.48%

All EvidenceE4=True 86.84% 78.16% 80.77% 84.53% 75.65% 81.3% 79.59% 75.60% 81.66%

E4=False 86.74% 78.06% 80.76% 84.41% 75.54% 81.13% 79.42% 75.39% 81.38%

4.6.2.5 Sensitivity Analysis and Influence of · and fl

We also performed sensitivity analysis and evaluated the impact of the contact

infection rate · and the intrinsic infection rate fl by tuning these numbers. fl is

usually set at a very low value, so our experiment results are not very sensitive to

the value of fl. Since · decides how likely sinkj

get infected given srci

is infected in

a srci

æsinkj

dependency, the value of · will definitely influence the probabilities

produced by BN. If a node is marked as infected, other nodes that are directly

or indirectly connected to this node should expect higher infection probabilities

when · is bigger. Our experiments show that adjusting · within a small range

(e.g. changing from 0.9 to 0.8) does not influence the output probabilities much,

but a major adjustment of · (e.g. changing it from 0.9 to 0.5) can largely a�ect

the probabilities. However, we still argue that although · influences the produced

infection probabilities, it will not greatly a�ect the identification of zero-day attack

paths. Our rationale is that the probability threshold of recognizing high-probability

nodes for zero-day attack paths can be adjusted according to the value of · . For

example, when · is a small number such as 50%, even nodes that have low infection

probabilities of around 40% to 60% should be considered as highly suspicious

because it is hard for an instance to get infected with such a low contact infection

118

rate.

As mentioned before, due to constraints of data and ground truths, it is possible

but currently very di�cult to automatically learn the parameters of · and fl using

statistical techniques. The parameter learning and Bayesian network training is

beyond the scope of this chapter and will be investigated in future works.

4.6.2.6 Complexity and Scalability

We evaluated the time cost for o�-line data analysis, which includes the time

for instance-graph-based BN generation, BN probability inference and zero-day

attack path identification. The time cost for probability inference depends on

the algorithm employed in SamIam. The time complexity can be O(|V |2) for

both instance-graph-based BN generation and zero-day attack path identification,

because the DFS algorithm is applied towards every node in the instance graph.

For our experiments that conduct the o�-line analysis on a host with 2.4 GHz

Intel Core 2 Duo processor and 4G RAM, Table 4.3 shows the time required for

constructing the instance-graph-based BN for each host, so the total time of BN

construction comes to around 27 seconds. For a BN with approximately 1854

nodes, assuming that the evidence is already fed into BN and the algorithm used is

recursive conditioning, the average time cost is 1.57 seconds for BN compilation

and probability inference, and 59 seconds for zero-day attack path identification.

Combining all the time required together, the average data analysis speed is 280

KB/s, which is quite reasonable. The average memory used for compiling the BN is

4.32 Mb. As for the run-time performance overhead, the overall system slow-down

caused by the system call logging component is around 15% to 20% according to

119

the measurement with UnixBench and kernel compilation.

The scalability of the approach proposed in this chapter can be ensured by the

following aspects. First, the time window of collecting system call logs for analysis

can be adjusted. For example, individual systems can collect system calls and send

the logs to central machine for analysis every 30 or 40 minutes. In our experiments,

a 40-minute system call log generates a BN with 1854 nodes. Smaller time window

usually generates smaller BN size, but not always. The BN size mainly depends on

the actual behavior of system call logs and cannot be estimated in a determined

way. Second, although an enterprise network may contain a large number of hosts,

the instance graphs generated by the individual hosts are not necessarily connected

to each other. An actual network-wide instance graph often contains one or several

isolated instance graphs. This also limits the size of individual BNs. Third, both

instance graph generation and zero-day attack path identification can be conducted

with parallel computing. Taking the current experiment results for estimation, if an

enterprise network contains 10000 hosts and an analysis cluster with 512 processors,

the time for instance graph generation and zero-day attack path identification

could be 2.93 minutes and 6.3 minutes respectively. In addition, intensive research

has been conducted towards the scalability of BN compilation and probability

inference [86,87]. A scalable parallel implementation using junction tree has been

developed for exact inference in BN [88]. The recursive conditioning [89] algorithm

we employed in this work even o�ers a smooth tradeo� between time and space,

which also enhances the scalability of BN inference.

120

4.7 Related Work

The work that is most related to us is the Patrol system designed by Dai et

al. [41]. It touches the zero-day attack path problem at operating system level. Our

work also aims at addressing the zero-day attack path problem, but our approach

is substantially di�erent from Patrol in several aspects. First, Patrol relies on

extensive pre-knowledge regarding known vulnerability exploitations to distinguish

zero-day attack paths from the huge number of candidate paths. However, such

pre-knowledge is extremely di�cult to acquire and may not be useful when zero-day

exploits do not share common features with previous exploits at OS-level. Instead,

our approach does not require any pre-knowledge, and reveals the zero-day attack

paths solely based on collected intrusion evidence. Second, Patrol only conducts

qualitative analysis and treats every object on the identified paths as having the

same malicious status. Compared to Patrol, our approach quantifies the infection

status of each system object with probabilities. By only focusing on system objects

with relatively high probabilities, the set of suspicious objects can be significantly

narrowed down and the size of revealed zero-day attack path is relatively small.

Third, Patrol performs reachability analysis through tracking and thus generates a

huge candidate pool for zero-day attack paths. In contrast, our system does not

conduct tracking, but relies on the computed probabilities. The paths containing

highly suspicious objects reveal themselves automatically. The dependency paths

introduced by legitimate activities and the dependency paths introduced by zero-day

attacks are therefore separated with ease.

Other related work includes system call dependency tracking and zero-day attack

identification. System call dependency tracking is first proposed in [21] to help the

121

understanding of intrusion sequence. It is then applied for alert correlation in [71,72].

Instead of directly correlating these alerts, our system takes the alerts as evidence

and quantitatively computes the infection probabilities of system objects. [85]

conducts an empirical study to reveal the zero-day attacks by identifying the

executable files that are linked to exploits of known vulnerabilities. A zero-day

attack is identified if a malicious executable is found before the corresponding

vulnerability is disclosed. Attack graphs have been employed to measure the

security risks caused by zero-day attacks [79–81]. Nevertheless, the metric simply

counts the number of required unknown vulnerabilities for compromising an asset,

rather than detects the actually occurred zero-day exploits. Our system takes an

approach that is quite di�erent from the above work.

4.8 Limitation and Conclusion

The current system still has some limitations. For example, when some attack

activities evade the system calls (it’s di�cult, but possible), or the attack time span

is much longer than the analyzed time period, the constructed instance graphs may

not reflect the complete zero-day attack paths. In such cases, our system can only

reveal parts of the paths.

In conclusion, this chapter proposes to use Bayesian networks to identify the

zero-day attack paths. For this purpose, an object instance graph is built to serve

as the basis of Bayesian networks. By incorporating the intrusion evidence and

computing the probabilities of objects being infected, the implemented system

ZePro can successfully reveal the zero-day attack paths.

122

Chapter 5 |Conclusion

Achieving cyber situation awareness is the key prerequisite for human decision

makers to make right decisions. In cyber security field, a number of tools, algorithm,

and techniques are developed to monitor and protect the enterprise networks.

These tools and techniques are able to generate information and alerts to help

with the human administrators’ analysis, but in di�erent knowledge bases. These

knowledge bases are usually isolated from each other. It’s very di�cult for human

administrators to combine the information from di�erent knowledge bases to

generate a wholistic understanding towards the networks’ real situation. Therefore,

to achieve correct cyber situation awareness, a Situation Knowledge Reference

Model (SKRM) is constructed to couple the current techniques to enable security

analysts’ e�ective analysis of complex cyber-security problems.

SKRM identifies the situation knowledge from di�erent areas, but it is not just a

mapping of knowledge to four abstraction layers. In SKRM, each abstraction layer

generates a graph that covers the entire enterprise network and views the same

network from a di�erent perspective and at a di�erent granularity. In addition,

each abstraction layer leverages current available algorithms, tools, and techniques

in its corresponding area to extract the most critical and useful information to

123

present to human security analysts. SKRM actually integrates data, information,

algorithms and tools, and human knowledge into a whole stack. Hence, SKRM

serves as an umbrella model that enables solutions to di�erent cyber security

problems. In this paper, two independent problems are identified in di�erent layers

of SKRM, including the stealthy bridge problem in cloud and the zero-day attack

path problem.

With the abstraction layers from SKRM, the Bayesian network is employed

to incorporate information and dig out real facts. Bayesian network has two

capabilities. First, it is able to leverage relevant evidence to infer the facts. Second,

it is able to reduce the uncertainties faced by human analysts in security analysis.

As more evidence is collected, the analysts get closer to real facts.

Bayesian Networks gains more power when combined with the SKRM model.

In SKRM, each abstraction layer represents a di�erent perspective. Each layer

can serve as the complementary support to the other layer. Therefore, the same

attack may cause di�erent intrusion symptoms on di�erent layers. For example,

at the workflow layer, the symptom could be abnormal business behavior, such

as noticeable financial loss. At the operating system layer, however, the intrusion

system could be modified system files, or compromised services, etc. When building

Bayesian Networks based on SKRM model, the intrusion symptoms from one layer

can serve as the evidence to the other layer and confirm each other.

Therefore, this paper demonstrates how the two identified security problems

can be addressed by constructing proper Bayesian Networks on top of di�erent

layers of SKRM.

First, the stealthy bridge problem is investigated by combining the operating

system layer and the attack graph in SKRM. Chapter 3 identifies the problem

124

of stealthy bridges between isolated enterprise networks in the public cloud. To

infer the existence of stealthy bridges, the paper proposes a two-step approach.

A cloud-level attack graph is first built to capture the potential attacks enabled

by stealthy bridges. Based on the attack graph, a cross-layer Bayesian network is

constructed by identifying uncertainty types existing in attacks exploiting stealthy

bridges. The experiments show that the cross-layer Bayesian network is able to infer

the existence of stealthy bridges given supporting evidence from other intrusion

steps.

Second, the zero-day attack path is identified and addressed in the operating

system layer. Chapter 4 introduces a ZePro system, which is able to identify the

zero-day attack paths on OS-level. It first constructs an object instance graph to

capture the intrusion propagation, and then establishes a Bayesian network on

top of the instance graph to leverage the evidence collected from security sensors.

The Bayesian network computes the infection probabilities of object instances. By

connecting the instances with high probabilities, the zero-day attack paths are

formed and revealed. This paper conducted two sets of experiments to demonstrate

the e�ectiveness and performance of the ZePro system.

To sum up, the Bayesian network is a powerful tool for cyber security analysis.

When combined with SKRM, it has even more potentials. For complex security

problems, SKRM can inspire the problem identification and serve as the guidance

for solution development; Bayesian network is able to incorporate information from

relevant abstraction layers of SKRM to reveal the real facts. This can significantly

enhance human analysts’ situation awareness towards the enterprise networks’

security status.

125

Bibliography

[1] Dominguez, Cynthia. “Can SA be defined.” Situation awareness: Papers andannotated bibliography (1994): 5–15.

[2] Fracker, Martin L. “A theory of situation assessment: Implications for measur-ing situation awareness.” Proceedings of the Human Factors and ErgonomicsSociety Annual Meeting. Vol. 32. No. 2. SAGE Publications, 1988.

[3] Endsley, Mica R. “Toward a theory of situation awareness in dynamic systems.”Human Factors: The Journal of the Human Factors and Ergonomics Society37.1 (1995): 32–64.

[4] Salerno, John J., Michael L. Hinman, and Douglas M. Boulware. “A situ-ation awareness model applied to multiple domains.” Defense and Security.International Society for Optics and Photonics, 2005.

[5] McGuinness, Barry, and Louise Foy. “A subjective measure of SA: the CrewAwareness Rating Scale (CARS).” Proceedings of the first human performance,situation awareness, and automation conference, Savannah, Georgia. 2000.

[6] Alberts, David S., John J. Garstka, Richard E. Hayes, and David A. Sig-nori. “Understanding information age warfare.” Assistant Secretary of Defense(C3I/Command Control Research Program) Washington DC, 2001.

[7] Boyd, John R. “The essence of winning and losing.” Unpublished lecture notes(1996).

[8] Witthen, I., and Eibe Frank. “Data Mining-Practical Machine Learning Toolsand Techniques With Java Implementations.” (2000).

[9] Tadda, George P., and John S. Salerno. “Overview of cyber situation awareness.”Cyber Situational Awareness. Springer US, 2010. 15–35.

[10] Endsley, Mica R. “Theoretical underpinnings of situation awareness: A criticalreview.” Situation awareness analysis and measurement (2000): 3–32.

126

[11] Jun Dai, Xiaoyan Sun, Peng Liu, Nicklaus Giacobe. “Gaining Big PictureAwareness through an Interconnected Cross-layer Situation Knowledge Refer-ence Model.” 2012 ASE International Conference on Cyber Security, Washing-ton DC, 2012

[12] Xiaoyan Sun, Jun Dai, and Peng Liu. “SKRM: Where security techniques talkto each other.” In Cognitive Methods in Situation Awareness and DecisionSupport (CogSIMA), 2013 IEEE International Multi-Disciplinary Conferenceon, pp. 163-166. IEEE, 2013.

[13] Xiaoyan Sun, Anoop Singhal, and Peng Liu. “Who Touched My Mission:Towards Probabilistic Mission Impact Assessment.” In Proceedings of the2015 Workshop on Automated Decision Making for Active Cyber Defense(SafeConfig), pp. 21-26. ACM, 2015.

[14] Xiaoyan Sun, Jun Dai, Anoop Singhal, Peng Liu. “Inferring the StealthyBridges between Enterprise Network Islands in Cloud Using Cross-LayerBayesian Networks.” 10th International Conference on Security and Privacy inCommunication Networks (SecureComm 2014), Beijing, China

[15] Barford, Paul, Marc Dacier, Thomas G. Dietterich, Matt Fredrikson, JonGi�n, Sushil Jajodia, Somesh Jha et al. “Cyber SA: Situational awareness forcyber defense.” In Cyber Situational Awareness, pp. 3–13. Springer US, 2010.

[16] Yu, Meng, Peng Liu, and Wanyu Zang. “Self-healing workflow systems underattacks.” In Proceedings of 24th IEEE International Conference on DistributedComputing Systems. pp. 418–425. 2004.

[17] van der Aalst, Wil MP, Boudewijn F. van Dongen, Joachim Herbst, LauraMaruster, Guido Schimm, and Anton JMM Weijters. “Workflow mining: Asurvey of issues and approaches.” Data & knowledge engineering 47, no.2(2003):237–267.

[18] van der Aalst, Wil MP, A. J. M. M. Weijters, and Laura Maruster. “Workflowmining: Which processes can be rediscovered.” Beta working paper series, wp74, Eindhoven University of Technology, Eindhoven, 2002.

[19] Van der Aalst, Wil, Ton Weijters, and Laura Maruster. “Workflow mining:Discovering process models from event logs.” Knowledge and Data Engineering,IEEE Transactions on 16, no. 9 (2004): 1128–1142.

[20] Chen, Xu, Ming Zhang, Zhuoqing Morley Mao, and Paramvir Bahl. “Automat-ing Network Application Dependency Discovery: Experiences, Limitations,and New Solutions.” In USENIX Symposium on Operating Systems Designand Implementation (OSDI), vol. 8, pp. 117–130. 2008.

127

[21] King, Samuel T., and Peter M. Chen. “Backtracking intrusions.” In ACMSIGOPS Operating Systems Review, vol. 37, no. 5, pp. 223–236. ACM, 2003.

[22] Xiong, Xi, Xiaoqi Jia, and Peng Liu. “Shelf: Preserving business continuityand availability in an intrusion recovery system.” In Annual Computer SecurityApplications Conference (ACSAC), 2009. pp. 484–493.

[23] Zhang, Shengzhi, Xiaoqi Jia, Peng Liu, and Jiwu Jing. “Cross-layer compre-hensive intrusion harm analysis for production workload server systems.” InProceedings of the 26th Annual Computer Security Applications Conference(ACSAC), pp. 297–306. ACM, 2010.

[24] Pearl, Judea. “Bayesian Networks: A Model of Self-activated Memory forEvidential Reasoning.” 1985.

[25] Pearl, Judea. “Probabilistic reasoning in intelligent systems: networks ofplausible inference.” Morgan Kaufmann, 1988.

[26] Heckerman, David, Dan Geiger, and David M. Chickering. “Learning Bayesiannetworks: The combination of knowledge and statistical data.” Machine learn-ing 20, no. 3 (1995): 197-243.

[27] Friedman, Nir, Michal Linial, Iftach Nachman, and Dana Pe’er. “UsingBayesian networks to analyze expression data.” Journal of computationalbiology 7, no. 3-4 (2000): 601-620.

[28] Jansen, Ronald, Haiyuan Yu, Dov Greenbaum, Yuval Kluger, Nevan J. Krogan,Sambath Chung, Andrew Emili, Michael Snyder, Jack F. Greenblatt, andMark Gerstein. “A Bayesian networks approach for predicting protein-proteininteractions from genomic data.” Science 302, no. 5644 (2003): 449-453.

[29] Charniak, Eugene. “Bayesian networks without tears.” AI magazine 12, no. 4(1991): 50.

[30] Frigault, Marcel, and Lingyu Wang. “Measuring network security usingbayesian network-based attack graphs.” In 32nd Annual IEEE InternationalComputer Software and Applications Conference, Turku, 2008. pp. 698-703.

[31] Frigault, Marcel, Lingyu Wang, Anoop Singhal, and Sushil Jajodia. “Measuringnetwork security using dynamic bayesian network.” In Proceedings of the 4thACM workshop on Quality of protection, pp. 23-30. ACM, 2008.

[32] Liu, Yu, and Hong Man. “Network vulnerability assessment using Bayesiannetworks.” In Defense and Security, pp. 61-71. International Society for Opticsand Photonics, 2005.

128

[33] Amazon Elastic Compute Cloud (EC2). http://aws.amazon.com/ec2/

[34] Rackspace. http://www.rackspace.com/

[35] Windows Azure. https://www.windowsazure.com/en-us/

[36] V. Varadarajan, T. Kooburat, B. Farley, T. Ristenpart, and M. M. Swift,“Resource-freeing attacks: improve your cloud performance (at your neighbor’sexpense),” in Proceedings of the 2012 ACM conference on Computer andcommunications security (CCS), 2012, pp. 281–292.

[37] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage, “Hey, you, get o� ofmy cloud: exploring information leakage in third-party compute clouds,” inProceedings of the 16th ACM conference on Computer and communicationssecurity (CCS), 2009, pp. 199–212.

[38] D. X. Song, D. Wagner, and X. Tian, “Timing Analysis of Keystrokes andTiming Attacks on SSH.,” in USENIX Security Symposium, 2001.

[39] J. Szefer, E. Keller, R. B. Lee, and J. Rexford, “Eliminating the HypervisorAttack Surface for a More Secure Cloud,” in Proceedings of the 18th ACMConference on Computer and Communications Security (CCS), New York,NY, USA, 2011, pp. 401–412.

[40] A. Bates, B. Mood, J. Pletcher, H. Pruse, M. Valafar, and K. Butler, “Detectingco-residency with active tra�c analysis techniques,” in Proceedings of the 2012ACM Workshop on Cloud computing security workshop (CCSW), 2012, pp.1–12.

[41] J. Dai, X. Sun, and P. Liu. “Patrol: Revealing Zero-Day Attack Paths throughNetwork-Wide System Object Dependencies.” In Computer Securityâ��Eu-ropean Symposium on Research in Computer Security (ESORICS) 2013, pp.536-555. Springer Berlin Heidelberg, 2013.

[42] Y. Zhang, A. Juels, A. Oprea, and M. K. Reiter. HomeAlone: Co-residencyDetection in the Cloud via Side-Channel Analysis. in Proceedings of 2011IEEE Symposium on Security and Privacy (S&P), 2011.

[43] Y. Chen, V. Paxson, and R. H. Katz, What’s new about cloud computingsecurity. University of California, Berkeley Report No. UCB/EECS-2010-5January, 2010.

[44] O. Sheyner, J. Haines, S. Jha, R. Lippmann, and J. M. Wing, “Automatedgeneration and analysis of attack graphs,” in Proceedings of 2002 IEEESymposium on Security and Privacy (S&P). pp. 273–284.

129

[45] C. R. Ramakrishnan, R. Sekar, and others, “Model-based analysis of config-uration vulnerabilities,” Journal of Computer Security, vol. 10, no. 1/2, pp.189–209, 2002.

[46] C. Phillips and L. P. Swiler, “A graph-based system for network-vulnerabilityanalysis,” in Proceedings of the 1998 workshop on New security paradigms,1998, pp. 71–79.

[47] S. Jajodia, S. Noel, and B. O’Berry, “Topological analysis of network attackvulnerability,” Managing Cyber Threats, pp. 247–266, 2005.

[48] P. Ammann, D. Wijesekera, and S. Kaushik, “Scalable, graph-based networkvulnerability analysis,” in Proceedings of the 9th ACM conference on Computerand communications security (CCS), 2002, pp. 217–224.

[49] K. Ingols, R. Lippmann, and K. Piwowarski, “Practical attack graph genera-tion for network defense,” in 22nd Annual Computer Security ApplicationsConference,(ACSAC), 2006, pp. 121–130.

[50] X. Ou, W. F. Boyer, and M. A. McQueen, “A scalable approach to attackgraph generation,” in Proceedings of the 13th ACM conference on Computerand communications security (CCS), 2006, pp. 336–345.

[51] X. Ou, S. Govindavajhala, and A. W. Appel, “MulVAL: A logic-based networksecurity analyzer,” in Proceedings of the 14th conference on USENIX SecuritySymposium. Volume 14, 2005.

[52] M. Balduzzi, J. Zaddach, D. Balzarotti, E. Kirda, and S. Loureiro, “A securityanalysis of amazon’s elastic compute cloud service,” in Proceedings of the 27thAnnual ACM Symposium on Applied Computing, 2012, pp. 1427–1434.

[53] K. Lazri, S. Laniepce, and J. Ben-Othman, “Reconsidering Intrusion Monitor-ing Requirements in Shared Cloud Platforms,” in 2013 Eighth InternationalConference on Availability, Reliability and Security (ARES), 2013, pp. 630–637.

[54] http://www.snort.org/.

[55] Peng Xie, Jason Li, Xinming Ou, Peng Liu, and Renato Levy. “Using Bayesiannetworks for cyber security analysis.” In 2010 IEEE/IFIP International Con-ference on Dependable Systems and Networks (DSN), 2010.

[56] http://www.tenable.com/products/nessus.

[57] http://nvd.nist.gov/.

[58] http://nvd.nist.gov/cvss.cfm.

130

[59] http://cve.mitre.org/.

[60] http://www.tripwire.com/.

[61] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2007-2446.

[62] https://www.samba.org.

[63] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2007-5423.

[64] https://info.tiki.org/.

[65] http://reasoning.cs.ucla.edu/samiam/.

[66] S. Bugiel, S. Nurnberger, T. Poppelmann, A.-R. Sadeghi, and T. Schneider,“AmazonIA: when elasticity snaps back.” in Proceedings of the 18th ACMconference on Computer and communications security (CCS), 2011, pp. 389–400.

[67] C. Kruegel, D. Mutz, W. Robertson, and F. Valeur. “Bayesian event classi-fication for intrusion detection.” In Annual Computer Security ApplicationsConference, ACSAC 2003.

[68] V. Chandola, A. Banerjee, and V. Kumar. “Anomaly detection: A survey.” InACM Computing Surveys (CSUR), 2009.

[69] C. Kruegel, D. Mutz, F. Valeur, and G. Vigna. “On the detection of anomaloussystem call arguments.” In Computer Security - European Symposium onResearch in Computer Security (ESORICS)2003, pp. 326-343. Springer BerlinHeidelberg, 2003.

[70] S. Bhatkar, A. Chaturvedi, and R. Sekar. “Dataflow anomaly detection.” InProceedings of 2006 IEEE Symposium on Security and Privacy. pp. 15-pp.2006.

[71] S. T. King, Z. M. Mao, D. G. Lucchetti, P. M. Chen. “Enriching intrusion alertsthrough multi-host causality.” in Network and Distributed System Security(NDSS) Symposium, 2005.

[72] Y. Zhai, P. Ning, J. Xu. “Integrating IDS alert correlation and OS-Leveldependency tracking.” in IEEE Intelligence and Security Informatics, 2006.

[73] “https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2009-2692”

[74] “https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-4089”

[75] “https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2008-0166”

131

[76] Symantec Corporation. Internet Security Threat Report 2014, Volume19. http://www.symantec.com/content/en/us/enterprise/other_resources/b-istr_main_report_v19_21291018.en-us.pdf

[77] Wireshark. “https://www.wireshark.org/.”

[78] Tcpdump. “http://www.tcpdump.org/.”

[79] L. Wang, S. Jajodia, A. Singhal, and S. Noel. “k-zero day safety: Measuringthe security risk of networks against unknown attacks.” In Computer Security-European Symposium on Research in Computer Security (ESORICS) 2010,pp. 573-587. Springer Berlin Heidelberg, 2010.

[80] M. Albanese, S. Jajodia, A. Singhal, and L. Wang. “An E�cient Approach toAssessing the Risk of Zero-Day Vulnerabilities.” In Security and Cryptography(SECRYPT), 2013 International Conference on, pp. 1-12. IEEE, 2013.

[81] L. Wang, S. Jajodia, A. Singhal, P. Cheng, and S. Noel. “k-Zero day safety: Anetwork security metric for measuring the risk of unknown vulnerabilities.” inIEEE Transactions on Dependable and Secure Computing (TDSC), 2014.

[82] R. Tarjan. “Depth-first search and linear graph algorithms.” in SIAM journalon computing 1, 1972.

[83] GraphViz. “http://www.graphviz.org/.”

[84] Ntop. “http://www.ntop.org/.”

[85] L. Bilge, and T. Dumitras. “Before we knew it: an empirical study of zero-dayattacks in the real world.” In Proceedings of the 2012 ACM conference onComputer and communications security, pp. 833-844. ACM, 2012.

[86] Ole J. Mengshoel. “Understanding the scalability of Bayesian network inferenceusing clique tree growth curves.” Artificial Intelligence 174.12 (2010): 984-1006.

[87] Ole J. Mengshoel. “Designing resource-bounded reasoners using Bayesiannetworks: System health monitoring and diagnosis.” In 18th InternationalWorkshop on Principles of Diagnosis, 2007.

[88] V. Krishna Namasivayam, V. K. Prasanna. “Scalable parallel implementation ofexact inference in Bayesian networks.” In 12th IEEE International Conferenceon Parallel and Distributed Systems(ICPADS), 2006.

[89] Adnan Darwiche. “Recursive conditioning.” Artificial Intelligence 126. 1 (2001):5-41.

132

[90] Gabriel Jakobson. “Mission Cyber Security Situation Assessment Using ImpactDependency Graphs.”

[91] A. Natarajan, P. Ning, Y. Liu, S. Jajodia, and S.E. Hutchinson. “NSDMiner:Automated discovery of Network Service Dependencies.” In Proceeding ofIEEE International Conference on Computer Communications, 2012.

[92] Barry Peddycord III, Peng Ning, and Sushil Jajodia. “On the accurate identi-fication of network service dependencies in distributed systems.” In USENIXAssociation Proceedings of the 26th international conference on Large Installa-tion System Administration: strategies, tools, and techniques, 2012.

[93] Rodrigo Fonseca, George Porter, Randy H. Katz, Scott Shenker, and Ion Stoica.“X-trace: A pervasive network tracing framework.” In USENIX AssociationProceedings of the 4th USENIX conference on Networked systems design andimplementation, 2007.

[94] Paul Barham, Richard Black, Moises Goldszmidt, Rebecca Isaacs, John Mac-Cormick, Richard Mortier, and Aleksandr Simma. “Constellation: automateddiscovery of service and host dependencies in networked systems.” In TechRe-port MSR-TR-2008-67, 2008.

[95] Jun Dai. “Gaining Big Picture Awareness in Enterprise Cyber Security Defense.”Ph.D. dissertation, 2014.

[96] S. Musman, A. Temin, M. Tanner, D. Fox, and B. Pridemore. “Evaluating theImpact of Cyber Attacks on Missions.” MITRE Technical Paper 09-4577, July2010.

[97] Alberts C., et al. (2005). “Mission Assurance Analysis Protocol (MAAP): As-sessing Risk in Complex Environments.” CMU/SEI-2005-TN-032. Pittsburgh,PA: Carnegie Mellon University.

[98] Watters J., et al. (2009). “The Risk-to-Mission Assessment Process (RiskMAP):A Sensitivity Analysis and an Extension to Treat Confidentiality Issues.”

[99] M. Fong, P. Porras, and A. Valdes. “A Mission-Impact-Based Approach toINFOSEC Alarm Correlation.” Proceedings Recent Advances in IntrusionDetection. Zurich, Switzerland, October 2002.

133

Xiaoyan Sun

RESEARCH INTERESTS• Enterprise-level Network/Distributed System Security, Cloud Security, Cyber Situational Awareness• Information Flow Tracking, Vulnerability Analysis, Uncertainty Analysis, Bayesian Networks• Vehicular Ad hoc Network (VANET), Intelligent Transportation System (ITS)

EDUCATIONAL BACKGROUND

The Pennsylvania State University August 2011 - May 2016

Ph.D., Information Sciences and TechnologyThe Pennsylvania State University August 2010 - August 2011

Ph.D. Student, Civil EngineeringUniversity of Science and Technology of China (USTC) September 2007 - June 2010

Master of Engineering, College of Information Science and TechnologyShandong Normal University September 2003 - June 2007

Bachelor, Electrical and Information Engineering

PUBLICATIONS

1. Xiaoyan Sun, Anoop Singhal, Peng Liu, “Who Touched My Mission: Towards Probabilistic MissionImpact Assessment”, SafeConfig: Automated Decision Making for Active Cyber Defense (Collocatedwith ACM CCS 2015), Denver, Colorado, USA, 2015.

2. Xiaoyan Sun, Jun Dai, Anoop Singhal, Peng Liu, “Enterprise-level Cyber Situation Awareness”, In P.Liu, S. Jajodia, and C. Wang (Eds.), Recent Advances in Cyber Situation Awareness, Springer, Dec.2016, forthcoming. Book Chapter. To Appear.

3. Xiaoyan Sun, Jun Dai, Anoop Singhal, Peng Liu, “Inferring the Stealthy Bridges between EnterpriseNetwork Islands in Cloud Using Cross-Layer Bayesian Networks”, 10th International Conference onSecurity and Privacy in Communication Networks (SecureComm), Beijing, China, 2014. SpringerInternational Publishing. (Best Paper Award Nomination.)

4. Jun Dai, Xiaoyan Sun, Peng Liu, “Patrol: Revealing Zero-day Attack Paths through Network-wideSystem Object Dependencies”, 18th European Symposium on Research in Computer Security (ES-ORICS), RHUL, Egham, U.K., Springer Berlin Heidelberg, 2013. (Acceptance ratio: 17.8%)

5. Xiaoyan Sun, Jun Dai, Peng Liu, “SKRM: Where Techniques Talk to Each Other”, IEEE InternationalMulti-Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support(CogSIMA), San Diego, USA, 2013. Short Paper.

6. Jun Dai, Xiaoyan Sun, Peng Liu, Nicklaus Giacobe, “Gaining Big Picture Awareness through anInterconnected Cross-layer Situation Knowledge Reference Model”, 2012 ASE International Confer-ence on Cyber Security, Washington DC, USA, 2012 (Acceptance ratio: 9.6%).

7. Xiaoyan Sun, Yuanlu Bao, Wei Lu, Jun Dai, Zhe Wang, “A Study on Performance of Inter-VehicleCommunications in Bidirectional Traffic Streams”, International Conference on Future Networks(ICFN), Sanya, China, 2010.

8. Xiaoyan Sun, Yuanlu Bao, Jun Dai, Wei Lu, Zhe Wang, “Performance Analysis of Inter-vehicleCommunications in Multilane Dynamic Traffic Streams”, IEEE Vehicular Networking Conference(VNC), Tokyo, Japan, 2009.

9. Wei Lu, Yuanlu Bao, Xiaoyan Sun, Zhe Wang, “Performance Evaluation of Inter-vehicle Communi-cation in a Unidirectional Dynamic Traffic Flow with Shockwave”, International Workshop on Com-munication Technologies for Vehicles, Oct 2009.

10. Xiaoyan Sun, Ping Huang, “Class Teaching of Electronic Circuits Based on Multisim”, Modern Elec-tronics Technique, Issue 24, 2006. Journal Paper. In Chinese.

11. Yanjun Liu, Zhen’an Liu, Xiaoyan Sun, “C Programming Language Practice Tutorial”, China Ma-chine Press. ISBN: 7111250532, 9787111250531 2009. Book. In Chinese.

12. Xiaoyan Sun, “A Study on Performances of Bidirectional and Multilane Inter-vehicle CommunicationNetworks”, Thesis for Master’s Degree, 2010.

13. Xiaoyan Sun, “Tire Pressure Monitoring System”, Thesis for Bachelor’s Degree, 2007.