dynamic risk measurement and mitigation for proactive...

Dynamic Risk Measurement and Mitigation for Proactive Security

Configuration Management

Mohammad Salim Ahmed†, Ehab Al-Shaer‡, Latifur Khan†

†Department of Computer Science, The University of Texas at Dallas

‡School of Computer Science, Telecommunications and Information Systems, DePaul University

[email protected], [email protected], [email protected]

December 7, 2007

Abstract

The factors on which security depends are of dynamic nature. These include emergence ofnew vulnerabilities and threats, policy structure and network traffic. Due to the dynamic natureof these factors, objectively identifying and measuring security metrics is a major challenge.However, such an evaluation can significantly help security professionals in decision making suchas choosing between alternative security architectures, designing security countermeasures, andto systematically modify security configurations in order to improve security. Moreover, thisevaluation must be done dynamically to handle real time changes in the threat toward the network.

In this paper, we propose a security metric framework that quantifies objectively the mostsignificant security risk factors, which include existing vulnerabilities, historical trend of vulner-ability of the remotely accessible services, prediction of potential vulnerabilities for any generalnetwork service and their estimated severity and finally propagation of an attack within the net-work. We have implemented this framework as a user-friendly tool called Risk based prOactiveseCurity cOnfiguration maNAger (ROCONA)and showed how this tool simplifies security config-uration management using risk measurement and mitigation. We also combine all the componentsinto one single metric and present validation experiments using real-life vulnerability data fromNational Vulnerability Database (NVD) and show high accuracy and confidence of the proposedmetrics.

1 Introduction

The purpose of a network is to provide services by combining multiple systems. Individually evaluat-ing each of these systems is a necessity to measure the overall security of a network. Risk evaluationof individual services can help in identifying services posing higher risk thereby allowing extra at-tention to them. It can also aid in enhancing security by making changes to those services as part ofthe network security policy or access options. If the risk of a system or a service can be quantifiedin such a way, then existing aggregating methods can be used to evaluate the security of the entirenetwork.

Our proposed new framework and user-friendly implementation for network security policy eval-uation can quantitatively measure the security of a network based on two critical risk aspects -the risk of having a successful attack and the risk of this attack being propagated within the net-work. As can be seen in Figure 1, we have modeled our framework as a combination of two parts.The first part measures the security level of the services within the network based on vulnerabilityanalysis. The analysis considers the presence of existing vulnerabilities, the dormant risk based onprevious history of vulnerabilities and potential future risk. In the second part, the service exposureto outside network by filtering policies, the degree of penetration or impact of successful attacksagainst the network and the risk due to traffic destined for unused address space are measured froma network policy perspective. Service risk components, the exposure factor and the unused address

1

Figure 1: Network risk measurement framework

space exposure, together give us the threat likelihood and at the same time the attack propagationprovides us with the risk impact to the network of interest. When the risk impact is combined withthe cost of the damage, we get the contribution of network policies in the total risk. Finally all thesecomponents are combined into a single metric called Policy Evaluation Metric (PEM) to portraythe overall security level.

The effectiveness of a security metric depends on the security measurement techniques and toolsthat enable network administrators to analyze and evaluate network security. Our proposed tool,based on our framework, can help in comparing security policies with each other to determine whichpolicy is more secure. It is also possible to judge the effect of a change to the policy by comparingthe security metrics before and after the change. The comparison can be done, in aggregated form(e.g. weighted average) or component wise, based on user needs. Our proposed framework and itsimplementation is also an important step toward adaptive security systems in which networks can beevaluated and automatically hardened accordingly by constantly monitoring changes in the networkand service vulnerabilities. The tool, therefore, simplifies periodic risk measurement. We used theJava Programming Language to implement these metrics in one graphical user-interface ROCONA.Also, using the National Vulnerability Database published by NIST [13], we performed extensiveevaluation of our proposed model. The results show high level of accuracy in the evaluation.

Our framework is more advanced with respect to the existing tools and previous research worksas they only scan or analyze a given network to find out allowed traffic patterns and existence ofvulnerabilities in the services, and do not predict the future state of the security of the network,consider the policy immunity or spurious risk. Works that do try to predict future vulnerabilitieslike [3] is limited to predicting software bugs and it requires inside knowledge of the studied software.Our prediction model is general and can work using only publicly available data.

The organization of the paper is as follows. First, we discuss related works in Sect. 7. Second,we discuss the service risk analysis part of our security metric in Sect. 2. Third, the risk associatedwith network policies is discussed in detail in Sect. 3. Fourth, we present our strategy of combiningall the scores in Sect. 4. Fifth, we present our implementation of the framework in Sect. 5. Sixth,we present our experiments, results and discussion on the results in Sect. 6. Finally we present ourconclusion and some directions of future works in Sect. 8.

2

2 Network Service Risk Analysis

In this section, we describe and discuss in detail the calculation method of our vulnerability analysisfor service risk measurement. This analysis comprises of Existing Vulnerability Measure, HistoricalVulnerability Measure and Probabilistic Vulnerability Measure.

2.1 Existing Vulnerability Measure

Existing vulnerability is important as sometimes the services in the network are left unpatched or incases where there are no known patches available. Also, when a vulnerability is discovered, it takestime before a patch is introduced for it. During that time the network and services are vulnerableto outside attack. The Existing Vulnerability Measure (EVM) measures this risk. EV M has beenstudied and formalized in our previous work [1].

We use the exponential average to quantify the worst case scenario so that the score is always atleast as great as the maximum vulnerability value in the data. Let EV (A) be the set of vulnerabilitiesthat currently exist in system A, and SS(v) be the severity score of a vulnerability v. We divideEV (A) into two sets – EVS(A) containing the vulnerabilities with existing solutions, and EVU (A)containing those without existing solutions. Mathematically,

EV M(A) = α1. ln∑

vi∈EVS(A)

eSS(vi) + α2. ln∑

vj∈EVU (A)

eSS(vj) (1)

Here, the weight factors α1 and α2 are used to model the difference in security risks posed by thetwo classes of vulnerabilities. Details can be found in [1]. However, finding the risk to fully patchedservices based on historical trend and future prediction is one of the major challenges where wecontribute in this paper.

2.2 Historical and Probabilistic Vulnerability Measure

These two measures tries to measure risk from frequency and severity trends observable in NVDdatabase.

2.2.1 Historical Vulnerability Measure

Using the vulnerability history of a service, the Historical Vulnerability Measure (HVM) measureshow vulnerability prone a given service has been in the past. Considering both the frequency andrecency of the vulnerabilities, we combine the severity scores of past vulnerabilities so that a servicewith a high frequency of vulnerabilities in the near past has a high HVM.

We divide the set of vulnerabilities HV (S) of service S into three groups – HVH(S), HVM (S)and HVL(S) for vulnerabilities that pose high, medium and low risks to the system. In evaluatinga service, the vulnerabilities discovered a long time ago should carry smaller weight, because withtime these would be analyzed, understood and patched. And our analysis of vulnerabilities foundthe service vulnerability to be less dependent on past vulnerabilities than more recent ones having anonlinear relationship. Hence, we apply an exponential decay function of the age of the vulnerability.In computing the HVM of individual services, we sum up the decayed scores in each class, andtake their weighted sum. Finally, we take its natural logarithm to bring it to a more manageablemagnitude. Before taking the logarithm, we add 1 to the sum so that a sum of 0 will not result in∞, and the result is always positive. The equation for HVM of service S, HV M(S) is as follows:

HV M(S) = ln

1 +∑

X∈{H,M,L}

wX .∑

vi∈HVX(S)

SS(vi).e−βAge(vi)

(2)

3

Here the parameter β controls how fast the factor decays with age. If the set of such services ina system A is SERV ICES(A), then the aggregated historical vulnerability measure of system A,AHV M(A) is calculated as

AHV M(A) = ln

∑

si∈SERV ICES(A)

eHV M(si)

(3)

This equation is designed to be dominated by the highest HVM of the services exposed by the policy.We take the exponential average of all the HV Ms so that the score will be at least equal to thehighest HVM, and will increase with the HVM’s of the other services. If arithmetic average wastaken, then the risk of the most vulnerable services would have been undermined. Our formalizationusing the exponential average is validated through our conducted experiments.

2.2.2 Probabilistic Vulnerability Measure

The Probabilistic Vulnerability Measure (PVM) combines the probability of a vulnerability beingdiscovered in the next period of time and the expected severity of that vulnerability to give anindication of the risk faced by the network in the future.

Using the vulnerability history of a service, we can calculate the probability of at least one newvulnerability being published in a given period of time. From the vulnerability history, we can alsocompute the expected severity of the vulnerabilities exposed in the next period of time.

We define a new measure, Expected Risk (ER) for a service as the product of the probabilityof at least one new vulnerability affecting the service in the next period of time and the expectedseverity of the vulnerabilities. We can compare two policies using this measure – a higher value ofthe measure will indicate a higher chance of getting vulnerable in the near future.

For each service, we construct the list of the interarrival times between the previous vulnerabilitiesaffecting the service. Then we compute the probability that the interarrival time is less than or equalto a given period of time, T . Let Psi

be the probability that dsi, the number of days before the next

vulnerability of the service si is exposed, is less than or equal to a given time interval, T , i.e.,

Psi= P (dsi

≤ T ) (4)

To compute the expected severity, first we build the probability distribution of the severities of thevulnerabilities affecting the service in the past. Let X be the random variable corresponding tothe severities, and x1, x2, . . . be the values taken by X. Then, the expectation of X is given byEqn. 5 [10]:

E[X] =∞

∑

i=1

xiP (xi) (5)

Where P (xi) is the probability that a vulnerability with severity xi occurs. Finally, we can definethe expected risk, ER, of a service si as in Eqn. 6.

ER(si) = Psi× E[Xsi

] (6)

If S is the set of services exposed by the policy, we can combine the probabilities of all the servicesexposed by the policy to compute the Predicted Vulnerability Measure of the policy as in Eqn. 7

PV M (S) = ln∑

si∈S

eER(si) (7)

For combining the expected risks into one single measure, we are using the exponential averagemethod like the AHV M , so that the PV M of a policy is at least as high as the highest expectedrisk in the policy.

4

In order to caluculate PV M , the first method that we have analyzed is Exponential Distribution. Interarrival times usually follow exponential distribution [10, 11]. In order to evaluate Eqn. 4 for agiven service, we can fit the interarrival times to an exponential distribution [7], and we can find therequired probability from the Cumulative Distribution Function (CDF). If λ is the mean interarrivaltime of service Si, then the interarrival times of service Si, dSi

, will be distributed exponentiallywith the following CDF:

Psi= P (dSi

≤ T ) = FdSi(T ) = 1 − e−λT (8)

The next method that have been analyzed is Empirical Distribution. We can model the distributionof the interarrival times of the data using empirical distribution. The frequency distribution is usedto construct a Cumulative Distribution Function (CDF) for this purpose.

Let fi (x) be the number of times the value x occurs in the interarrival time data of the serviceSi. Then, the empirical CDF of the interarrival time, dSi

, will be:

Psi= P (dSi

≤ T ) = FdSi(T ) =

∑

x≤T fi (x)∑

fi (x)(9)

Finally, exponential smoothing, a Time Series Analysis technique was also used for identifying theunderlying trend of the data and finding the probabilities.

2.2.3 EVM, HVM and PVM based risk mitigation

As can be seen from our definition of EVM, it indicates whether there are existing vulnerabilitiespresent in the system of interest. The vulnerabilities that may be present in the system can bedivided into two categories. One category has known solutions i.e. patches available and the othercategory does not have any solutions. In order to minimize risk present in the system ROCONA(1) Finds out which vulnerabilities have solutions and alerts the user with the the list of patchesavailable for install, (2) Services having higher risk value than user defined threshold are listed tothe user with the options: (i)block the service completely (ii) limit the traffic toward the serviceby inserting firewall rules (iii) minimize traffic by inserting new rules in IDS (iv) place the servicein DMZ area and (v) Manual. (3) Recalculate EV M to show score for updated settings of thesystem. As for HVM, it gives us the historical profile of a software and PV M score gives us thelatent risk toward the system. Naturally we would like to use a software that does not have a longhistory of vulnerabilities. In such cases a software with low HVM and PVM score providing thesame service should be preferred. Our implemented tool performs the following steps to mitigatethe HV M and PV M risk. (1) Calculate the HVM and PVM scores of the services (2) Comparethe scores to the user defined threshold vaules. (3) If scores are below the threshold then strengthenlayered protection with options just like in case of EV M . (4)Recalculate the HVM and PVM scoresof the services (5)If scores still below the threshold, then show recommendations to the systemadministrator. Recommendations include (i)isolation of the service using Virtual LANs (ii)Increaseweight (i.e. importance) to the alerts originating from these services even if false alarms increase.(iii)replace the service providing software with a different one. The administrators can use our toolto measure the EV M , HV M and PV M scores of similar service providing softwares from the NVDdatabase and choose the best possible solution. But for all them, a cost-benefit analysis must precedesuch a decision making.

3 Network Policy Risk Analysis

The network policies determine the exposer of the network to outside world as well as the exten-siveness of an attack on the network (i.e. how widespread the attack is). The Exposure Factor (EF)mentioned previously contributes in this risk analysis as well. Aside from EF , the Attack Propaga-tion (AP) and Exposure of Unused Address Spaces (ENAS) are the other two factors that we haveanalyzed to quantify risk with respect to network security policies.

5

3.1 Attack Propagation Metric

The degree to which a policy allows an attack to spread within the network is given by the At-tack propagation (AP). The Attack Propagation measure assesses how difficult it is for an attackerto propagate an attack through a network, using service vulnerabilities as well as security policyvulnerabilities.

3.1.1 The Attack Immunity of a Service

For the purpose of these measures, we define a general network having N hosts and protected by k

firewalls. Each node is denoted by d and D is the set of nodes running services. The set of servicesis Sd and ps is the vulnerability score for service s in Sd. ps has the range (0, 1).

For our analysis, we define a measure, Is, that assesses the attack immunity of a given service, s,to vulnerabilities based on that service’s EVM and HVM. Is is directly calculated from the combinedvulnerability measure of a service s, ps as:

Is = − ln(ps) (10)

Is has a range of [0,∞), and will be used to measure the ease with which an attacker can propagatean attack from one host to the other using service s. Thus, if host a can connect to host b usingservice s exclusively, then Is measures the immunity of host b to an attack initiating from host a. Incase of connection using multiple services, we define a measure of combined attack immunity. Beforewe give the definition of the combined attack immunity, we need to make one more definition. Wedefine as Smn , the set of services that host m can use to connect to host n. Now, assuming that thecombined vulnerability measure is independent for different services, we can calculate a combinedvulnerability measure psmn . Finally, using this measure, we can define the combined attack immu-nity Ismn as:

Ismn = − ln(psmn) × PL (11)

Here PL is the protection level. For protection using firewall, the protection is 1, for protectionusing IDS, the protection is a value between 0 and 1 equal to the false negative rate of the IDS.

3.1.2 The Service Connectivity Graph

To extract a measure of security for a given network, we map the network of interest to a ServiceConnectivity Graph (SCG). The SCG is a directed graph that represents each host in the networkby a vertex, and represents each member of Smn by a directed edge from m to n for each pairm, n ∈ N . Each arch between two hosts is labeled with the corresponding attack immunity measure,or combined attack immunity measure, if the arch represents more than one service. Consideringthat the number of services running on a host cannot exceed the number of ports (and is practicallymuch lower than that), it is straight forward to find the time complexity of this algorithm to beO(n2). We assume that Smn is calculated off-line before the algorithm is run. A path through thenetwork having a low combined historical immunity value represents a security issue.

3.1.3 Calculating the Attack Propagation

For each node d in D, we want to find how difficult it is for an attacker to compromise all the hostswithin reach from d. To do that, we build a minimum spanning tree for d for its segment of theSCG. The weight of this minimum spanning tree will represent how vulnerable this segment is toan attack. The more vulnerable the system is, the higher this weight will be. There are severalalgorithms for finding a minimum spanning tree in a directed graph like [6] running in O(n2) timein the worst case. We define Service Breach Effect (SBEd) to be the weight of the tree rooted at d

6

in D. It actually denotes the damage possible through d. We use the following equation to calculatethe SBEd

SBEd =∑

n∈N

(1 − Isdn) × Costn (12)

Here Costn indicates the cost of damage when host n is compromised and N is the set of hostspresent in the spanning tree rooted at host d. Finally, the attack propagation metric of the networkis:

AP (D) =∑

d∈D

P (d) × SBEd (13)

Where, P (d) denotes the probability of the existence of a vulnerability in host d. If the service sd

present in host d is patched then this probability can be derived directly from the expected riskER(sd) defined previously in Eqn. 6 . If the service is unpatched and there is existing vulnerabilitiesin service sd of host d, then the value of P (d) is 1. The equation is formulated such that it providesus with the expected cost as a result of attack propagation within the network.

3.2 Exposure of Unused Address Spaces (ENAS)

Policies should not allow spurious traffic, i.e., traffic destined to unused IP addresses or port numbers,to flow inside the network, because this spurious traffic has the potential to consume bandwidth andcause DDoS attacks1. In our previous work [2, 8], we show how to identify automatically the rulesin the security policy that allow for spurious traffic, and once we identify the possible spurious flows,we can readily compute the spurious residual risk for the policy.

Although spurious traffic may not exploit vulnerabilities, it can potentially cause flooding andDDoS attacks and therefore poses a security threat toward the network. In order to accuratelyestimate the risk of the spurious traffic, we must consider how much spurious traffic can reachthe internal network, what is the ratio of the spurious traffic to the total capacity and what isthe average available bandwidth used in each internal subnet. Assuming that maximum per-flowbandwidth allowed by the firewall policer is Mli Mbps for link li of capacity Cli , and Fli is the setof all spurious flows passing through link li, we can estimate the Residual Capacity (RC) for link lias follows:

RC(li) = 1 −

∑

fj∈FliMli

Cli

(14)

The residual capacity has a range [0, 1]. We can now calculate the Spurious Risk (SPR) of host d

and then sum this measure for all the hosts in network A to get the SPR for the entire network :

SPR(A) =∑

d∈N

(c(d) × maxli∈L

(1 − RC(li))) (15)

Where c(d) is the weight (i.e. importance or cost) associated with the host d in the network with arange [0, 1] and L is the set of links connected to host d. We take the minimum of all the ResidualCapacity associated with host d to reflect the maximum amount of spurious traffic entering thenetwork and therefore measuring the worst case scenario.This is the spurious residual risk afterconsidering the use of per-flow traffic policing as a counter-measure and assuming that each of thehosts within the network have a different cost value associated with each of them.

3.2.1 AP and SPR based risk mitigation

AP indicates the penetrability of the network. Therefore, if this metric has a high value, it indicatesthat the network should be partitioned to minimize communication within the network and attackpropagation as well. The possible risk mitigation measures that can be taken include (1) Network

1We exclude here spurious traffic that might be forwarded only to Honeynets or sandboxes for analysis purpose.

7

Figure 2: ROCONA Risk Gauges Figure 3: ROCONA Configuration

redesigning (introducing virtual LANs) (2) Rearrange the network so that machines having equiva-lent risk are in the same partition. (3)Increase the number of enforcement points (Firewalls, IDS)(4)Increase the security around the hot spots in the network and (5)Strengthen MAC sensitivitylabels, DAC file permission sets, access control lists, roles or user profiles. In case of SPR, Highscore for this metric indicates that the firewall rules and policies require fine tuning. When the scoreof this measure rises beyond the threshold level, ROCONA adds additional firewall rules to thefirewall. The allowable IP addresses and ports need to be checked thoroughly so that no IP addressor port is allowed unintentionally.

4 Policy Evaluation Metric

Although the main goal of this research is to device new factors in measuring network security, weshow here that these factors can be used as a vector or a scalar element for evaluating or comparingpolices. For a policy A, we can combine EV M(A), AHV M(A), PV M(A), AP (A) and SPR(A)into one total vulnerability measure of the policy, TVM(A), as a vector containing EV M(A),AHV M(A), PV M(A), AP (A) and SPR(A) as in Eqn. 16.

TVM(A) =[

EV M(A) AHV M(A) PV M(A) AP (A) SPR(A)]T

(16)

We define the Policy Evaluation Metric of the system A, PEM(A), as in Eqn. 17.

PEM(A) = 10[

e−γ1EV M(A) e−γ2AHV M(A) e−γ3PV M(A) e−γ4(10−AP (A)) e−γ5SPR(A)]T

(17)

This will assign the value of[

10 10 10 10 10]T

to a system with TVM of[

0 0 0 0 0]T

.The component values assigned by this equation will be monotonically decreasing functions of thecomponents of the total vulnerability measure of the system. The parameters γ1, γ2, γ3, γ4 andγ5 provide control over how fast the components of the Policy Evaluation Metric decreases withthe risk factors. The PEM can be converted from a vector value to a scalar value by a suitabletransformation like taking the norm or using weighted averaging. For comparing two policies, wecan compare their PEM vectors component-wise or convert the metrics to scalar values and comparethe scalars. Intuitively, one way to combine these factors is to choose the maximum risk factor as thedominating one (e.g., vulnerability risk vs. spurious risk) and then measure the impact of this riskusing the policy resistance measure. Although we advocate generating a combined metric, we alsobelieve that this combination framework should be customizable to accommodate user preferencessuch as benefit sensitivity or mission criticality, which, however, is beyond the scope of this paper.

8

Total Number of entries 23542

Total number of valid entries 23309

Total number of rejected entries 233

Total number of distinct products 10375

Total number of versions of the products 147640

Total number of distinct vendors 6229

Earliest entry 10/01/1988

Latest Entry 04/05/2007

Table 1: Summary statistics of the NVD vulnerability database.

5 ROCONA Tool Implementation

To simplify the network risk measurement and mitigation, we have implemented a tool called Riskbased proactive security configuration manager (ROCONA). We have used Java Programming Lan-guage for this implementation. The tool can run as a daemon process and therefore provide systemadministrators with periodic risk updates. The risk scores for each of the five components alreadydescribed have been provided. The measures are provided as risk gauges. The risk is shown as avalue between 0 and 100.

The users can setup different profiles for each run of the risk measurement. They can also config-ure different options like parameters for EV M , HV M and PV M and the topology and vulnerabilitydatabase files to consider

All these configuration options are shown in Figure3. After the tool completes its run, it providesthe system administrator with the measurement process in details. All the details can be seen inthe Details tab. Using all the gauges, ROCONA also provides risk mitigation strategies (i.e. whichservice needs to patched, how rigorous firewall policies should be, etc. already described with theindividual measures) in the Risk Mitigation tab. All this information can be stored to compare witha different configuration (i.e. different settings of a profile) at a later time. It will be made availablefor evaluation in the web in near future when it becomes a stable release.

6 Experimentation and Evaluation

We conducted extensive experiments for evaluating our metric using real vulnerability data. Incontrast to other similar research works that present studies on a few specific systems and products,we experimented using publicly available vulnerability databases. We evaluated and tested ourmetric, both component-wise and as a whole, on a large number of services and randomly generatedpolicies. In our evaluation process, we divided the data into training sets and test sets. In thefollowing sections, we describe our experiments and present their results.

6.1 Vulnerability Database Used In the Experiments

In our experiments, we used the National Vulnerability Database (NVD) published by National Insti-tute of Science and Technology (NIST) which is available at http://nvd.nist.gov/download.cfm.The NVD provides a rich array of information that makes it the vulnerability database of choice. Firstof all, all the vulnerabilities are stored using the standard CVE (Common Vulnerabilities and Expo-sures) namehttp://cve.mitre.org/. For each vulnerability, the NVD provides the products andversions affected, descriptions, impacts, cross-references, solutions, loss types, vulnerability types,the severity class and score, etc. We have used the database snapshot updated at 04/05/2007. Wepresent some summary statistics about the NVD database snapshot that we used in our experimentsin Table 1.

9

(a) (b) (c)

Figure 4: (a) Accuracy of the HVM for different values of β and minimum number of vulnerabilities. (b) Results ofER validation with training data set size vs. accuracy for different test data set sizes. (c) Results of ER validationwith test data set size Vs. accuracy for different training data set sizes.

For each vulnerability in the database, NVD provides CVSS scores [19] for vulnerabilities in therange 1 to 10. The severity score is calculated using the Common Vulnerability Scoring System,CVSS, which provides a base score depending on several factors like impact, access complexity,required authentication level, etc.

6.2 Validation of HVM

We conducted an experiment to evaluate the HVM score according to Eqn. 2 with the hypothesisthat if service A has a higher HVM than service B, then in the next period of time, service A willdisplay a higher number of vulnerabilities than B.

In our experiment, we used vulnerability data upto 06/30/2006 to compute the HVM of theservices, and used the rest of the data to validate the result. We varied β so that the decay functionfalls to 0.1 in 0.5, 1 , 1.5 and 2 years respectively, and observed the best accuracy in the first case.Here, we first chose services with at least 10 vulnerabilities in their lifetimes, and gradually increasedthis lower limit to 100 and observed that the accuracy increases with the lower limit. As expected ofa historical measure, better results have been found when more history is available for the servicesand observed the maximum accuracy of 83.33%. The graph in Fig. 4(a) presents the results of thisexperiment.

6.3 Validation of Expected Risk (ER)

The experiment for the validation of Expected Risk, ER, is divided into a number of parts. First, weconducted experiments to evaluate the different ways of calculating the probability in Eqn. 4. Weconducted experiments to compare the accuracies obtained by the exponential distribution, empiricaldistribution and time series analysis method. Our general approach was to partition the data intotraining and test data sets, compute the quantities of interest from the training data sets and validatethe computed quantity using the test data sets. Here, we obtained the most accurate and stableresults using exponential CDF.

The data used in the experiment for exponential CDF was the inter-arrival times for the vul-nerability exposures for the services in the database. We varied the length of the training dataand the test data set. We only considered those services that have at least 10 distinct vulnerabilityrelease dates in the 48 months training period. For Expected Severity, we used similar approach.For evaluating expected risk, we combined the data sets for the probability calculation methods andthe data sets of the expected severity.

10

(a) (b) (c)

Figure 5: (a) Training Data Set Size Vs. Accuracy graph for the Exponential CDF method for probability calculation,test set size = 12 months. (b) Prediction Interval Parameter (T ) Vs. Accuracy graph for the Exponential CDF methodfor probability calculation, test set size = 12 months. (c) Training data set size Vs. Accuracy graph for the ExpectedSeverity calculation for different values of the test data set size.

In the experiment for exponential CDF, we constructed an exponential distribution for the inter-arrival time data and computed Eqn. 4 using the formula in Eqn. 8. For each training set, we variedthe value of T , and ran validation for each value of T with the test data set. Fig. 5(a) presents theaccuracy of the computed probabilities using Exponential CDF method for different training dataset sizes and different values of the prediction interval parameter, T . In Fig. 5(b), we have presentedthe accuracies on the Y axis against the prediction interval parameter T on the Y axis. For the testdata set size of 12 months, we observed the highest accuracy of 78.62% with the 95% confidenceinterval being [72.37, 84.87] for the training data set size of 42 months and prediction interval of 90days. We present the results of the experiment for Expected Severity in Fig. 5(c). The maximumaccuracy obtained in this experiment was 98.94% for the training data set size of 3 months and thetest data set size of 9 months. The results of the Expected Risk experiment are presented in theFigs. 4(b) and 4(c). For the Expected Risk, we observed the best accuracy of 70.77% for trainingdata set size of 24 months and test data set size of 12 months with the 95% confidence interval of[64.40, 77.14].

It can be observed in Fig. 5(b) that the accuracy of the model increases with increasing values ofthe prediction interval parameter T . This implies that this method is not sensitive to the volume oftraining data available to it. From Fig. 5(c), it is readily observable that the accuracy of the ExpectedSeverity is not dependent on the test data set size. It increases quite sharply with decreasing valuesof training set data size for small values of training set data size. This means that the expectationcalculated from the most recent data is actually the best model for the expected severity in the testdata.

6.4 Validation of PEM

To validate the PEM, we need to evaluate policies using our proposed metric in Eqn. 17. In absenceof other comparable measure of policy security, we used the following hypothesis – if policy A hasa better PEM than policy B based on training period data, then policy A will have less number ofvulnerabilities than policy B in the test period. We assume that the EVM component of the policieswill be 0 as any existing vulnerability can be removed.

In generating the data set, we chose the reference date separating the training and test periodsto be 10/16/2005. We used the training data set for ER using Exponential CDF method. In theexperiment, we generated a set of random policies and for each policy we evaluated Eqn. 17. We

chose γ1 = γ2 = γ3 = γ4 = γ5 = 0.06931472 so that the PEM for a TVM of[

10 10 10 10 10]T

is[

5 5 5 5 5]T

. Then, for each pair of policies A and B, we evaluated the hypothesis that

11

(a) (b)

Figure 6: (a) Number of Policies Vs. Accuracy graph for PEM for different values of the number of services in eachpolicy. (b) Number of Services per Policy Vs. Accuracy graph for PEM for different values of the number of polices.

50

60

70

80

90

0

100

200

300

4000

1

2

3

4

5

6

%ConnectivityNetwork size

Runn

ing

time

per s

erve

r (Se

c)

Figure 7: The average running times for different network sizes and %connectivity

if PEM(A) ≥ PEM(B) then number of vulnerabilities for service A should be less than or equalto the number of vulnerabilities for service B. In our experiment, we varied the number of policiesfrom 50 to 250 in steps of 25. In generating the policies, we varied the number of services per policyfrom 2 to 20 in increments of 2.

We present the results obtained by the experiment in Figs. 6(a) and 6(b). We can observe fromthe graph in Fig. 6(a) that the accuracy of the model is not at all sensitive to the variation in thetotal number of policies. However, the accuracy does vary with the number of services per policy– the accuracy decreases with increasing number of services per policy. This trend is more clearlyillustrated in Fig. 6(b) where the negative correlation is clearly visible.

6.5 Running Time Evaluation Of Attack Propagation Metric

To assess the feasibility of calculating the AP metric under different conditions, we ran a Matlabimplementation of the algorithm for different network sizes, as well as different levels of networkconnectivity percentages (the average percentage of the network directly reachable from a host).The machine used to run the simulation was a 1.9GHz P-IV processor with 768MB RAM. Theresults are shown in figure 7. For each network size, we generated several random networks withthe same %connectivity value. The running time was then calculated for several hosts within eachnetwork. The average value of the running time per host was then calculated and used in figure 7. Ascan be seen from the figure, the quadratic growth of the running time against network connectivitywas not noticeably dependent on the network %connectivity in our implementation. The highestrunning time per host for a network of 320 nodes was very reasonable at less than 5 seconds. Thus,the algorithm scales gracefully in practice and is thus feasible to run for a wide range of networksizes.

12

7 Related Work

Many organizational standards have evolved to evaluate the security of the organization. NationalSecurity Agency’s (NSA) INFOSEC Evaluation Methodology(IEM) can be mentioned as an examplein this respect [16]. There are some professional organizations that perform vulnerability analysis.There are also some vulnerability assessment tools including Nessus [21], Retina and others [22]. Theyactually try to find out vulnerabilities from the configuration of the concerned network. However,all these approaches are usually limited to providing a report telling what should be done to keepthe organization secure without considering the vulnerability history of the deployed services, policystructure or network traffic.

The area of security policy evaluation and verification has seen significant research. A. Atzeniet al. discuss the importance of security metrics in [5]. Evaluation of VPN, Firewall and Firewallsecurity policies include [2, 8, 9]. Attack graph [4, 15] is another technique that has gained interestrecently for assessing the risks associated with network exploits. But the modeling and analysis inthis approach is highly complex and costly. Mehta et al. try to rank the states of an attack graphin [20]. Sahinoglu et al. propose a framework in [17, 18] for calculating existing risk. They considerpresent vulnerabilities in terms of threat represented as probability of exploiting a vulnerability andalso the lack of counter-measures. But their work do not give any sort of prediction of future riskassociated with the system. Also, they do not consider policy resistance of firewall and IDS.

Attack surface of a network is another strategy which has been focused in order to measuresecurity. Mandhata et al. in [12] have tried to find the attack surface from the attackability of asystem. In [14] Pamula propose a security metric based on the weakest adversary (i.e. the leastamount of effort required to make an attack successful). In [3], Alhazmi et al. present their resultson the prediction of vulnerabilities. They argue that vulnerabilities are essentially defects in releasedsoftware and use past data and track records of the development phase to predict the number ofvulnerabilities.

But all these work do not represent the total picture as they predominantly try to find existingrisk and do not address how risky the system will be in the near future or how policy structureor network traffic would impact on security. Their analysis regarding security policies can not beregarded as complete and their evaluation also lacks flexibility. In this respect our technique issimpler, computationally less expensive and more suitable for our goal.

A preliminary investigation of measuring the existing vulnerability and some historical trendshave been anlyzed in our previous work, published as a short paper [1]. This work was still limitedin analysis and scope as many other evaluation measures that we have discussed in this paper wereleft unexplored.

8 Conclusions

As network security is of utmost importance for an organization, a unified policy evaluation metricwill be highly effective in assessing the protection of the current policy, and justifying consequentdecisions to strengthen security. In this paper, we present a proactive approach to quantitativelyevaluate security policies by identifying, formulating and validating several important factors thatgreatly affect the security of a network. Our experiments validate our hypothesis that if a servicehas a highly vulnerability prone history, then there is higher probability that the service will becomevulnerable again in the near future. These metrics also indicate how the internal firewall policiesaffect the security of the network as a whole. These metrics are useful not only for administratorsto evaluate policy/network changes and, take timely and judicious decisions, but also for enablingadaptive security systems based on vulnerability and network changes.

Our experiments provide very promising results regarding our metric. Our vulnerability predic-tion model proved to be up to 78% accurate, while the accuracy level of our historical vulnerabilitymeasurement was 83.33% based on real-life data from National Vulnerability Database (NVD). The

13

accuracies obtained in these experiments vindicate our claims about the components of our metricand also the metric as a whole. Combining all the measures into a single metric and performingexperiments based on this single metric also provides us with an idea of its effectiveness.

Acknowledgments

The authors would like to thank Muhammad Abedin and Syeda Nessa of The University of Texas at Dallas andMohamed Taibah of Depaul University for their help with the formalization and experiments making this work possible.

References

[1] M. Abedin, S. Nessa, E. Al-Shaer, and L. Khan. Vulnerability analysis for evaluating quality of protection ofsecurity policies. In 2nd ACM CCS Workshop on Quality of Protection, Alexandria, Virginia, October 2006.

[2] E. Al-Shaer and H. Hamed. Discovery of policy anomalies in distributed firewalls. In Proceedings of IEEE

INFOCOM’04, March 2004.

[3] O. H. Alhazmi and Y. K. Malaiya. Prediction capabilities of vulnerability discovery models. In Proc. Reliability

and Maintainability Symposium, pages 86–91, January 2006.

[4] P. Ammann, D. Wijesekera, and S. Kaushik. Scalable, graph-based network vulnerability analysis. In CCS ’02:

Proceedings of the 9th ACM conference on Computer and communications security, pages 217–224, New York,NY, USA, 2002. ACM Press.

[5] A. Atzeni and A. Lioy. Why to adopt a security metric? a little survey. In QoP-2005: Quality of Protection

workshop, September 2005.

[6] F. Bock. An algorithm to construct a minimum directed spanning tree in a directed network. In Developments

in Operations Research, pages 29–44. Gordon and Breach, 1971.

[7] A. Gut. An Intermediate Course In Probability. Springer-Verlag New York, Inc, 1995.

[8] H. Hamed, E. Al-Shaer, and W. Marrero. Modeling and verification of ipsec and vpn security policies. InProceedings of IEEE ICNP’2005, November 2005.

[9] S. Kamara, S. Fahmy, E. Schultz, F. Kerschbaum, and M. Frantzen. Analysis of vulnerabilities in internet firewalls.Computers and Security, 22(3):214232, April 2003.

[10] A. M. Law and W. D. Kelton. Simulation Modeling and Analysis. McGraw-Hill Book Company, 2000.

[11] S. C. Lee and L. B. Davis. Learning from experience: Operating system vulnerability trends. IT Professional,5(1), January/February 2003.

[12] P. Manadhata and J. Wing. An attack surface metric. In First Workshop on Security Metrics, Vancouver, BC,August 2006.

[13] National institute of science and technology (nist). http://nvd.nist.gov.

[14] J. Pamula, P. Ammann, S. Jajodia, and V. Swarup. A weakest-adversary security metric for network configurationsecurity analysis. In ACM 2nd Workshop on Quality of Protection 2006, Alexandria, VA, October 2006.

[15] C. Phillips and L. P. Swiler. A graph-based system for network-vulnerability analysis. In NSPW ’98: Proceedings

of the 1998 workshop on New security paradigms, pages 71–79, New York, NY, USA, 1998. ACM Press.

[16] R. Rogers, E. Fuller, G. Miles, M. Hoagberg, T. Schack, T. Dykstra, and B. Cunningham. Network Security

Evaluation Using the NSA IEM. Syngress Publishing, Inc., first edition, August 2005.

[17] M. Sahinoglu. Quantitative risk assessment for dependent vulnerabilities. In Reliability and Maintainability

Symposium, June 2005.

[18] M. Sahinoglu. Security meter: A practical decision-tree model to quantify risk. In IEEE Security and Privacy,June 2005.

[19] M. Schiffman. A complete guide to the common vulnerability scoring system (cvss). http://www.first.org/

cvss/cvss-guide.html, June 2005.

[20] C. B. V. Mehta, H. Zhu, E. Clarke, , and J. Wing. Ranking attack graphs. In Recent Advances in Intrusion

Detection 2006, Hamburg, Germany, September 2006.

[21] Nessus vulnerability scanner. http://www.nessus.org/, January 2007.

[22] 10 network security assessment tools you can’t live without. http://www.windowsitpro.com/Article/ArticleID/47648/47648.html?Ad=1.

14

dynamic risk measurement and mitigation for proactive...

Documents