research openaccess muse ... · the key observation of muse is that entities’ reputation and risk...

Hu et al. EURASIP Journal on Information Security 2014, 2014:17http://jis.eurasipjournals.com/content/2014/1/17

RESEARCH Open Access

MUSE: asset risk scoring in enterprise networkwith mutually reinforced reputationpropagationXin Hu1*, Ting Wang1, Marc Ph Stoecklin2, Douglas L Schales1, Jiyong Jang1 and Reiner Sailer1

Abstract

Cyber security attacks are becoming ever more frequent and sophisticated. Enterprises often deploy several securityprotection mechanisms, such as anti-virus software, intrusion detection/prevention systems, and firewalls, to protecttheir critical assets against emerging threats. Unfortunately, these protection systems are typically ‘noisy’, e.g., regularlygenerating thousands of alerts every day. Plagued by false positives and irrelevant events, it is often neither practicalnor cost-effective to analyze and respond to every single alert. The main challenges faced by enterprises are to extractimportant information from the plethora of alerts and to infer potential risks to their critical assets. A betterunderstanding of risks will facilitate effective resource allocation and prioritization of further investigation. In thispaper, we present MUSE, a system that analyzes a large number of alerts and derives risk scores by correlating diverseentities in an enterprise network. Instead of considering a risk as an isolated and static property pertaining only toindividual users or devices, MUSE exploits a novelmutual reinforcement principle and models the dynamics of riskbased on the interdependent relationship among multiple entities. We apply MUSE on real-world network traces andalerts from a large enterprise network consisting of more than 10,000 nodes and 100,000 edges. To scale up to suchlarge graphical models, we formulate the algorithm using a distributed memory abstraction model that allowsefficient in-memory parallel computations on large clusters. We implement MUSE on Apache Spark and demonstrateits efficacy in risk assessment and flexibility in incorporating a wide variety of datasets.

Keywords: Risk scoring; Enterprise network; Reputation propagation

IntroductionMitigating and defending against ever more frequentand sophisticated cyber attacks are often top priori-ties for enterprises. To this end, a plethora of detec-tion and prevention solutions have been developed anddeployed, including anti-virus software, intrusion detec-tion/prevention systems (IDS/IPS), blacklists, firewalls,and so on. With these state-of-the-art technologies cap-turing various types of security threats, one would expectthat they are very effective in detecting and prevent-ing attacks. In reality, however, the effectiveness ofthese systems often fall short. The increasingly diver-sified types of cyber attacks, coupled with increasing

*Correspondence: [email protected] Research Department, IBM T.J. Watson Research Center, 1101Kitchawan Rd, Yorktown Heights, NY, USAFull list of author information is available at the end of the article

collection of applications, hardware configurations, andnetwork equipments, have made the enterprise environ-ment extremely ‘noisy’. For example, IDS/IPS systemsregularly generate over 10,000 alerts every day. Majorityof them turn out to be false positives. Even true alerts areoften triggered by low level of threats such as brute-forcepassword guessing and SQL injection attempts. Althoughthe suspicious nature of these events warrants the reportsby IPS/IDS systems, their excessive amount often onlymade the situation even more noisy.Digging into the haystack of alerts to find clues to actual

threats is a daunting task that is very expensive, if notimpossible, through manual inspection. As a result, mostenterprises practically only have resources to investigatea very small fraction of alerts raised by IPS/IDS systems.Vast majority of others are stored in a database merelyfor forensic purposes and inspected only after signifi-cant incidents have been discovered or critical assets have

© 2014 Hu et al.; licensee Springer. This is an Open Access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproductionin any medium, provided the original work is properly credited.

mailto: [email protected]://creativecommons.org/licenses/by/4.0

Hu et al. EURASIP Journal on Information Security 2014, 2014:17 Page 2 of 9http://jis.eurasipjournals.com/content/2014/1/17

already been severely damaged, e.g., security breachesand data leakage. However, even those small fraction oftrue positive alerts (e.g., device compromise, virus infec-tion on a user’s computer) are often too voluminous tobecome security analysts’ top priority. In addition, thesealerts are often considered as low level of threats andpose far less risks to the enterprises comparing with moresevere attacks, such as server compromise or sensitivedata leakage. Evidently, a more effective solution is to bet-ter understand and rank potential risks associated withthese alerts so that analysts can effectively prioritize andallocate resources for alert investigation.In a typical enterprise environment, as shown in

Figure 1, there are different sets of entities: servers,devices, users, credentials, and (high-value) assets (e.g.,databases, business processes). The connections betweenthese entities represent their intuitive relationships; forexample, a user may own multiple devices, a devicemay connect to different types of servers and inter-nal databases, a device may be associated with mul-tiple user accounts with different credentials, etc. Wenote that the reputation of these entities provide valu-able indicators into their corresponding risks and areimportant factors to rank various security incidentsassociated with these entities e.g., IPS/IDS alerts andbehavior anomalies. More importantly, an entity’s rep-utation and the risk it may produce are not restrictedto each individual entity. In fact, multiple entities areoften tied together in a mutually reinforcing relationshipwith their reputation closely interdependent with eachother.In this work, we develop MUSE (Mutually-reinforced

Unsupervised Scoring for Enterprise risk), a risk analysisframework that analyzes a large amount of security alerts

Figure 1 Entities in a typical enterprise network.

and computes the reputation of diverse entities based onvarious domain knowledge and interactions among theseentities. Specifically, MUSE models the interactions withcomposite, multi-level bipartite graphs where each pairof entity types (e.g., a user and a device) constitute onebipartite graph. MUSE then applies an iterative prop-agation algorithm on the graphic model to exploit themutual reinforcement relationship between the connectedentities and derive their reputation and risk score simul-taneously. Finally, with the refined risk scores, MUSE isable to provide useful information such as ranking of low-reputation entities and potential risks to critical assets,allowing security analysts to make an informed decisionas to how resources can be prioritized for further inves-tigation. MUSE will also provide greater visibility intothe set of alerts that are responsible for an entity’s lowreputation, offering insights into the root cause of cyberattacks.Themain contributions of this work include: 1) amutual

reinforcement framework to analyze the reputation andthe risk of diverse entities in an enterprise network, 2) ascalable propagation algorithm to exploit the networkingstructures and identify potential risky entities that maybe overlooked by a discrete risk score, 3) a highly flexi-ble system that can incorporate data sources in multipledomains, 4) implementation of MUSE that takes advan-tage of recent advances in distributed in-memory clustercomputing framework and scales to very large graphicmodels, and 5) evaluations with real network traces froma large enterprise to verify the efficiency and efficacy ofMUSE.

Risk and reputation in amulti-entity environmentIn the following sections, we will present the design andthe architecture of MUSE. We will first formulate theproblem of risk and reputation assessment in enterprisenetwork and then discuss specific domain knowledge andintuition that are crucial for solving the problem.

Problem formulationIn a typical enterprise environment, there are multiplesets of connected entities. Specifically, we consider fivedistinct types of entities as depicted in Figure 1: users, devices , credentials , high value assets , and

external servers . These entities are often related in apairwise many-to-many fashion. For example, a devicecan access multiple external servers. A user may ownseveral devices e.g., laptops and workstations, while onedevice (e.g., server clusters) can be used by multiple users.We model the interconnection between entities as a

composite bipartite graph G = (V ,E), schematicallyshown in Figure 2. In G, vertices V = {U ,D, C,A,S}represent various entities. Edges E = {MDS ,MDU ,MDA,MUC ,MDC} of bipartite graphs represent their


Figure 2 Network of interactions amongmultiple entities.

relationships, where MDS is the |D|-by-|S| matrixcontaining all the pairwise edges, i.e., MDS(i, j) > 0 ifthere is an edge between device di and external server sj.The value of MDS(i, j) denotes the edge weight derivedfrom the characteristics of the relationship, such as thenumber of connections, the number of bytes transmit-ted, duration, and so on. Similarly, MDS , MDU , MDA,MUC , and MDC are the matrices of the pairwise edgesrepresenting the association of their respective entities.Next, we define the risk and the reputation of different

entities more precisely. We treat each entity as a ran-dom variable X with a binary class label X = {xR, xNR}assigned to it. Here, xR is a risky (or bad) label, and xNR isa non-risky (or good) label. A probability distribution P isdefined over this binary class, where P(xR) is the proba-bility of being risky and P(xNR) is the probability of beingnon-risky. By definition, the sum of P(xR) and P(xNR) is1. We use this probabilistic definition because it encom-passes a natural mapping between P(xNR) and the generalconcept of reputation, i.e., an entity with a high proba-bility of being good (or non-risky) is expected to havehigh reputation. In addition, it accepts different types ofentities to incorporate specific domain knowledge intothe reputation computation considering their respectivecharacteristics, e.g.,

• External server reputation ps = Ps(xNR) indicatesthe server’s probability of being malicious andinfecting the clients connecting to it. Notice that alow reputation ps means high probability of beingmalicious.

• Device reputation pd = Pd(xNR) represents theprobability that a device may have been infected orcompromised and thus under control of adversaries.

• User reputation pu = Pu(xNR) indicates howsuspiciously a user behaves, e.g., an unauthorizedaccess to sensitive data.

• Credential reputation pc = Pc(xNR) denotes theprobability that a credential may have been leaked tothe adversaries and thus making any serversassociated with the credential vulnerable.

• High-value asset reputation pa = Pa(xNR) denotesthe asset’s probability of being risky; for instance, aconfidential database being accessed by unauthorizedusers, exfiltration of sensitive data, etc.

As there is a natural correlation between reputation andrisk (e.g., less reputable entities generally pose high risks),we define an entity’s risk as P(xR) = (1−P(xNR))weightedby the importance of the entity, such that a high-valueasset will experience a large increase in its risk score evenwith a small decline in its reputation. With these defi-nitions, the goal of MUSE is to aggregate large amountsof security alerts, determine the reputation of each entityby exploiting their structural relationships in the con-nectivity graph, and finally output a ranked list of riskyentities for further investigation. In the next section, wewill describe the mutual reinforcement principle [1] thatunderlies MUSE.

Mutual reinforcement principleThe key observation of MUSE is that entities’ reputationand risk are not separated; instead, they are closely corre-lated and interdependent. Through interacting with eachother, an entity’s reputation can impact on the risk associ-ated with its neighbors, and at the same time, the entity’srisk can be influenced by the reputation of its neighbors.For example, a device is likely to be of low reputation 1)if the server it frequently visits are suspicious or mali-cious e.g. Botnet C&C, Phishing, or malware sites, 2) if theusers using the device have bad reputation, and 3) if thecredentials used to log into the device have high risks ofbeing compromised, leaked, or even used by an unautho-rized user. Similarly, a credential’s risk of being exposedwill increase if it has been used by a less reputable userand/or on a device that exhibits suspicious behavior pat-terns. Along the same line, a user will have low reputationif she owns several low-reputation devices and credentials.Last but not least, a high-value asset or the sensitive datastored in internal databases are likely to be under a signif-icant risk e.g., data exfiltration if they have been accessedby multiple low-reputation devices that also connect toexternal malicious servers. We describe these mutuallydependent relationships more formally in our multi-layermutual reinforcement framework, using the following set


of equations governing the server reputation ps, devicereputation pd, user reputation pu, credential reputation pc,and high-value asset reputation pa.

pd ∝ ωds∑

d∼smdsps + ωdu

∑

d∼umdupu + ωdc

∑

d∼cmdcpc

pu ∝ ωdu∑

d∼umdupd + ωuc

∑

u∼cmucpc

pc ∝ ωuc∑

u∼cmucpu + ωdc

∑

d∼cmdcpd

pa ∝ ωda∑

d∼amdapd + ωua

∑

u∼amuapu + ωca

∑

c∼amcapc,

where d ∼ s, d ∼ u, etc. represent edges connectingdevice d with server s and user u, etc., ∝ means ‘propor-tional to’, ωij indicates the weights associated with edgesand reputation types, andmij is the value in the connectiv-ity matrices. Next, we exploit this mutual reinforcementprinciple in the bipartite graph network to simultaneouslyestimate the reputation and the risk using the propagationalgorithm described in the next section.

Reputation propagation algorithmSpecifically, we employ the principle of belief propaga-tion (BP) [2] on the large composite bipartite graph Gto exploit the link structure and efficiently compute thereputations for all entities. Belief propagation is an iter-ative message passing algorithm on general graphs andhas been widely used to solve many graph inference prob-lem [3], such as social network analysis [1], fraud detection[4], and computer vision [5].BP is typically used for computing the marginal distri-

bution (or so-called ‘hidden’ distribution) for the nodesin the graph, based on the prior knowledge (or ‘observed’distribution) about the nodes and from its neighbors. Inour case, the algorithm infers the probabilistic distribu-tion of an entity’s reputation in the graph based on twosources of information: 1) the prior knowledge about theentity itself and 2) information about the neighbor entitiesand relationship between them. The inference is accom-plished by iteratively passingmessages between all pairs ofentities ni and nj. Let mi,j denote the ‘message’ sent fromi to j. The message represents i’s influence on j’s reputa-tion. One could view it as if i, with a certain probability ofbeing risky, passes some ‘risk’ to j. Additionally, the priorknowledge about i (e.g., importance of the assets and auser’s anomalous behavior) is expressed by node potentialfunction φ(i) which plays a role in determining the mag-nitude of the influence passed from i to j. In details, edgeei,j is associated with message mi,j (and mj,i if the mes-sage passing is bi-directional). The outgoingmessage fromi to neighbor j is updated at each iteration based on the

incoming messages from i’s other neighbors and nodepotential function φ(i) as follows.

mi,j(xj) ←∑

xi∈{xR,xNR}φi(xi)ψij(xi, xj)

∏

k∈N(i)\jmk,i(xi)

(1)

where N(i) is the set of i’s neighbors, and ψi,j is the edgepotential which is a transformation function defined onthe edge between i and j to convert a node’s incomingmessages into its outgoing messages. Edge potential alsocontrols how much influence can be passed to the receiv-ing nodes, depending on the properties of the connectionsbetween i and j (e.g., the number of connections and vol-ume of traffic). ψ(xi, xj) is typically set according to thetransition matrix shown in Table 1, which indicates thata low-reputation entity (e.g. a less reputable user) is morelikely to be associated with low-reputation neighbors (e.g.compromised devices). The algorithm runs iteratively andstops when the entire network is converged with somethresholdT, i.e., the change of anymi,j is smaller thanT, oramaximumnumber of iterations are done. Convergence isnot theoretically guaranteed for general graphs; however,the algorithm often does converge for real-world graphsin practice. At the end of the propagation procedure, eachentity’s reputation (i.e. marginal probability distribution)is determined by the converged messages mi,j and thenode potential function (i.e. prior distribution).

p(xi) = kφi(xi)∏

j∈N(i)mj,i(xi); xi ∈ {xR, xNR} (2)

where k is the normalization constant.

Incorporating domain knowledgeOne of the major challenges in adopting a BP algorithmis to properly determine its parameters, particularly, thenode potential and the edge potential function. In thissection, we briefly discuss how we leverage the availabledata sources in a typical enterprise network and incor-porate specific domain knowledge (unique to each entitytype) to infer the parameters.

Characteristics of external serversWe develop an intrusion detection system that leveragesseveral external blacklists to inspect all the HTTP traffic.It flags different types of suspicious web servers to whichinternal devices try to connect; this which allows us to

Table 1 Edge potential function

ψ(xi , xj) xi = xNR xi = xRxj = xNR 0.5 + � 0.5 − �xj = xR 0.5 − � 0.5 + �


assign the node potential function according to the mali-ciousness of the external servers. Specifically, we classifysuspicious servers into the following five types:

• Spam websites: servers that are flagged by externalspam blacklists like Spamhaus, SpamCop, etc.

• Malware websites: servers that host malicioussoftware including virus, spyware, ransomware, andother unwanted programs that may infect the clientmachine.

• Phishing websites: servers that try to purport to bepopular sites such as bank sites, social networks,online payment, or IT administration sites in order tolure unsuspecting users to disclose their sensitiveinformation e.g., user names, passwords, and creditcard details. Recently, attackers started to employmore targeted spear phishing attacks which usespecific information about the target to increase theprobability of success. Because of its potential tocause severe damage, we assign a high risky value toits node potential.

• Exploit websites: servers that host exploit toolkits,such as Blackhole and Flashpack, which are designedto exploit vulnerabilities of the victims’ web browsersand install malware on victims’ machines.

• Botnet C&C servers: are connected with botprograms to command instructions, update botprograms, or to extrude confidential information. Ifan internal device makes an attempt to connect toany known botnet C&C servers, the device is likely tobe compromised. In addition to blacklists (e.g., ZeusTracker), we also design models to detect fast fluxingand domain name generation botnets based on theirdistinct DNS request patterns.

Using the categorization of suspicious servers, wedetermine initial node potential values according to theseverity of their categories. We assign (φ(xR),φ(xNR)) =(0.95, 0.05) for the high-risk types, such as botnets andexploit servers. For the medium-risk (Phishing and mal-ware) and low-risk (Spam) types, we assign (0.75, 0.25)and (0.6, 0.4), respectively.

Characteristics of internal entitiesD,U ,C,AFor internal entities, e.g., devices, users, credentials, andassets, rich information can be obtained from the internalasset management systems and IPS/IDS systems. Avail-able information include device’s status (e.g., OS version,patch level), device behavior anomalies (e.g., scanning),suspicious user activities (e.g., illegal accesses to sensi-tive data, multiple failed login attempts), and creden-tial anomalies (e.g., unauthorized accesses). For instance,from the IPS system deployed in our enterprise network,we are able collect over 500 different alert types, most of

which are various attack vectors such as SYN port scan,remote memory corruption attempts, bruteforce logon,XSS, SQL injection, etc. Based on these information, weadjust node potential for the internal entities by assign-ing a severity score (1 to 3 for low, medium, and high-riskalerts)to each type of suspicious activities exemplifiedabove and summing up the severities of all suspiciousactivities associated with an entity i to get total severitySi. Since an entity i may be flagged multiple times for thesame or different types of suspicious behaviors, to avoidbeing overshadowed by a few outliers, we transform theaggregated severity score using the sigmoid function

Pi = 11 + exp(−Si)The node potential for i is then calculated as

(φ(xR),φ(xNR)) = (Pi, 1 − Pi). The key benefit of using asigmoid function is that if no alerts have been reported foran entity (e.g. Si = 0), its initial node potential will auto-matically be set to (0.5, 0.5) i.e. equal probability of beingrisky and non-risky, implying that no prior informationexist for the particular entity.Although the parameters inMUSE require some level of

manual tuning by domain experts, it is valuable to secu-rity analysts in several aspects. First, the output of MUSEis the ranking of high-risk entities whose absolute riskvalues are less important. As long as the parameters areassigned based on reasonable estimation of the severi-ties of different types of alerts, MUSE will able to derivea ranking of entities based on their potential risks, thusproviding useful information to help analysts prioritizetheir investigation. Second, MUSE offers the flexibility toincorporate diverse types of entities and thus can be eas-ily adapted in a wide variety of other domains. Finally, itis possible to automatically learn the appropriate parame-ter values through machine learning techniques, providedthat proper labeled training sets are available. We leavethis as our plan for future exploration.

Scale up propagation algorithm to big dataAnother major challenge of applying BP algorithm in alarge enterprise network is the scalability. Even thoughBP itself is a computationally efficient algorithm: the run-ning time scales quadratically with the number of edgesin the graph, for large enterprises with hundred of thou-sands nodes and edges, the computation cost can becomesignificant. To make the MUSE practical for large-scalegraphic models, we observe that the main computationin belief propagation is localized, i.e. message passing isperformed between only a specific node and its neigh-bors. This means that the computation can be efficientlyparallelized and distributed to a cluster of machines.One of the most prominent parallel programming

paradigms is MapReduce, popularized by its open-source


implementation of Apache Hadoop [6]. MapReduceframework consists of the map and the reduce stage thatare chained together to perform complex operations in adistributed and fault-tolerant fashion. However, MapRe-duce is notoriously inefficient for iterative algorithmswhere the intermediate results are reused across multiplerounds of computations. Due to the lack of abstractionfor leveraging distributed memory, the only way to reusedata across two MapReduce jobs is to persist them to anexternal storage system e.g. HDFS and load them back viaanother Map job. This incurs substantial overheads dueto disk I/O and data synchronization which can domi-nate the execution times. Unfortunately, the BP algorithmunderlying MUSE is a typical example of iterative com-putation where the same set of operations i.e. messageupdate in Equation 1 are repeatedly applied to multipledata items. As a result, instead of MapReduce, we leveragea new cluster computing abstraction called Resilient Dis-tributed Datasets (RDDs) [7] that achieves orders of mag-nitude performance improvement for iterative algorithmsover existing parallel computing frameworksa.RDDs are parallel data structures that are created

through deterministic operations on data in stable stor-age or through transformations from other RDDs. Typicaltransformations include map, filter, join, reduce, etc. Themajor benefit of RDDs is that it allows users to explicitlyspecify which intermediate results (in the form of RDDs)they want to reuse in the future operations. Keeping thosepersistent RDDs in memory eliminates unnecessary andexpensive disk I/O or data replication across iterations,thus making it ideal for iterative algorithms. To abstractBP algorithm into RDDs, our key observation is that themessage update process Equation 1 can bemore efficientlyrepresented by RDDs on an induced line graph from theoriginal graph which represents the adjacencies betweenedges of original graph. Formally, given a directed graphG = (V ,E), a directed line graph or derived graph L(G) isa graph such that:

• each vertex of L(G) represents an edge of G. We usefollowing notations to denote vertices and edges in Gand L(G): let i, j ∈ V denote two vertices in theoriginal graph G, we use (i, j) to represent the edge inG and the corresponding vertex in L(G)

• two vertices of L(G) are adjacent if and only if theircorresponding edges share a common endpoint (‘areincident’) in G and they form a length two directedpath. In other words, for two vertices (i, j) and (m, n)in L(G), there is an edge from (i, j) to (m, n) if andonly if j = m in the original G.

Figure 3 shows the conversion of original graph G to itsdirected line graph. Since an edge (i, j) ∈ E in the origi-nal graph G corresponds to a node in L(G), the message

passing process in Equation 1 is essentially an iterativeupdating process of a node in L(G) based on all of thisnode’s adjacent nodes. On each iteration, each node inL(G) sends a message (or influence)mi,j to all of its neigh-bors and at the same time, it updates its own messagebased on the message it received from the neighbors. Thiscan be easily described in RDDs as follows:

Algorithm 1Message passing algorithm using RDDs1: // Load line graph L(G) as an RDD of (srcNode,

dstNode) pair2: links = RDD.textFile(graphFile).map(split).persist()3:4: // load initial node potential function in original graph

G as (node, φi) pairs5: potentials = RDD.textFile(potentialFile).map(split)

.persist()6:7: messages = // initialize RDD of messages as (Node,

mi,j) pairs8:9: for iteration in xrange(ITERATIONS):

10: // Build an RDD of (dstNode, msrc) pairs withmessages sent by all node to dstNode

11: MsgContrib = links.join(messages).map(12: lambda(srcNode, (dstNode, msrc):

(dstNode,msrc))13:14: // Multiplication of all incoming messages by

dstNode15: AggContrib = MsgContrib.reduceByKey(labmda

(m1,m2):m1 ∗ m2)16:17: // Get New updatedMessages for the next iteration

18: messages = potentials.join(AggContrib).map(19: labmda(dstNode, (φi, magg )):

φi ∗ ψi,j ∗ magg)20:21: //After iterations, compute final belief according to

Eq. 222: belief = potentials.join(messages).mapValues(lambda

(φi,magg): k ∗ φi ∗ magg)23:24: // and save to external storage25: belief.saveAsTextFile(“Beliefs.txt”)

The above algorithm leads to the RDD lineage graph inFigure 4. On each iteration, the algorithm create a newmessages dataset based on the aggregated contributionsAggContrib and messages from previous iteration as wellas the static links and potential datasets that are persisted


Figure 3 Converting a directed graph G to a directed line graph.

in memory. Keeping these static datasets and intermedi-ate messages in memory and making them readily avail-able in the subsequent iterations avoids unnecessary I/Ooverhead and thus significantly speed up the computation.

EvaluationDatasetsWe evaluated MUSE with datasets collected from mul-tiple data sources in the entire USNorth IBM network.The data sources included DNS messages from local DNSservers, network flows from edge routers, proxy logs, IPSalerts, and HTTP session headers (for categorization ofwebsites). The size of the raw data per day was about 200GB and the average data event rates are summarized inTable 2.

Experiment resultsWe first evaluate MUSE using data collected over 1 weekperiod of time. The resulting graph consists of 11,790nodes and 44,624 edges. The number of different entitytypesb are shown in Table 3.

Figure 4 Linage graph for BP algorithms.

We applied MUSE to the graph, and our algorithm con-verged at the 5th iteration. We manually inspected topranked entities (i.e., with higher P(xR)) in each entity typeand were able to confirm that they were all suspicious ormalicious entities including infected devices, suspicioususers, etc. Here, we show one example of user reputa-tion among our findings. We selected five top risky usersbased on the output of MUSE. Figure 5 shows their riskvalues at each iteration. Note that all the users startedwith neutral score (0.5, 0.5), meaning that these users hadnot been flagged by anomalous behaviors. However, dueto their interaction with low-reputation neighbors, theirassociated risks increased. Further investigation showedthat user332755 owned five devices which made 56 timesof connections to spam websites, four times of connec-tions to malware websites, and two times of connec-tions to exploit websites during our monitoring period.user332755 inherited low reputation from his neighborsincluding the user’s devices, causing his risk to quickly riseto the top.We also measured the running time of MUSE at each

iteration against different size of the graph in terms of thenumber of edges. Experiments are performed in a serverblade with 2.00 GHz Intel(R) Xeon(R) CPU and 500 Gmemory (the memory usage of MUSE is less than 1 G).The experiment results are shown in Figure 6. From thefigure, one can see that BP is efficient in handling small-to-medium-sized graphic models. Even for 1 week worthof traffic data, MUSE is able to finish each iteration inless than 1 min. However, we also notice that the running

Table 2 Average traffic rate for IBM US North network

Data type Data rate

Firewall logs 950 M/day

DNS messages 1,350 M/day

Proxy logs 490 M/day

IPS/IDS events 4 M/day

Overall: 2.5 billion events/day


Table 3 Number of different entities for one week data

ServerDevice User Assets

Spam Malware Phishing Exploit Botnet

5,500 124 10 26 16 3,823 2,191 100

time quadratically increases with the number of edges,which can be a bottleneck to handle large graphs as we willdemonstrate in the next section.

Scalability of MUSE using RDDsTo compare the scalability of non-parallelized BP algo-rithm with that of the distributed version using RDDsabstraction, we collected half month worth of networktraffic to stress test MUSE. The resulting graph consists of24,706 nodes and 123,380 edges. The number of differententities are listed in Table 4. As a baseline benchmark, weran the non-distributed version ofMUSE against this largedataset for five iterations on the same server blade. Theoverall experiment took 1,641 s to finish and each iterationon average required 328 s.We implement a distributed version of MUSE on

Apache Spark [8] which is the open source implemen-tation of RDDs. We deploy Spark on our blade centerwith three blade servers. We vary the number of CPUcores available for the Spark framework from 10 to 30 andsubmit the same workload to it. Figure 7 illustrates thecomparison results. From the figure, we can notice thatMUSE is able to leverage RDDs’ in-memory cluster com-puting framework to achieve 10× to 15× speed up. Forinstance, with 30 CPU cores, MUSE is able to completefive iterations in 104 s with each iteration requiring lessthan 20 s. Although the algorithm does not scale linearlywith the number of cores due to fixed I/O and com-munication overhead, the results demonstrate that with

Figure 5 Risk scores of top five risky users at the end of eachiteration.

Figure 6 Running time for each iteration with varying sizes ofgraphs.

moderate hardware configuration, MUSE is scalable andpractical for large enterprise networks.

Related workWith the cyber threats rapidly evolving towards large-scale and multi-channel attacks, security becomes crucialfor organizations of varying types and sizes. Many tradi-tional intrusion detection and anomaly detectionmethodsare focused on a single entity and applied rule-basedapproaches [9]. They were often too noisy to be usefulin practice [10]. Our work is inspired by the prominentresearch in the social network area that used a link struc-ture to infer knowledge about the network properties.Previous work demonstrated that social structure wasvaluable to find authoritative nodes [11], to infer indi-vidual identities [12], to combat web spam [13], and todetect security fraud [14]. Among various graph min-ing algorithms, the belief propagation algorithm [2] hasbeen successfully applied in many domains, e.g., detect-ing fraud [4], accounting irregularities [15], and malicioussoftware [3]. For example, NetProbe [4] applied a BP algo-rithm to the eBay user graph to identify subgraphs offraudsters and accomplices.

ConclusionIn this paper, we proposedMUSE, a framework to system-atically quantify and rank risks in an enterprise network.MUSE aggregated alerts generated by traditional IPS/IDSon multiple data sources, and leveraged the link structure

Table 4 Number of different entities for half a month data

ServerDevice User Assets

Spam Malware Phishing Exploit Botnet

8,924 171 103 33 22 10,809 4,527 116


Figure 7 Comparison of running time between non-distributedMUSE and distributed version implemented using RDDs.

among entities to infer their reputation. The key advan-tage of MUSE was that it derived the risk of each entitynot only by considering its own characteristics but alsoby incorporating the influence from its neighbors. Thisallowed MUSE to pinpoint a high-risk entity based onits interaction with low-reputation neighbors, even if theentity itself was benign. By providing risk rankings, MUSEhelps security analysts to make an informed decision onallocation of resources and prioritization of further inves-tigation to develop proper defense mechanisms at anearly stage. We have implemented and tested MUSE onreal world traces collected from large enterprise network,demonstrating that the efficacy and scalability of MUSE.

Endnotesa10x-100x speedup as compared to Hadoop [8].bDue to the privacy issues, we were not able to include

authentication logs to incorporate user credentials in ourexperiments.

Competing interestsThe authors declare that they have no competing interests.

AcknowledgementsThis research was sponsored by the U.S. Army Research Laboratory and theU.K. Ministry of Defense and was accomplished under Agreement NumberW911NF-06-3-0001. The views and conclusions contained in this documentare those of the author(s) and should not be interpreted as representing theofficial policies, either expressed or implied, of the U.S. Army ResearchLaboratory, the U.S. Government, the U.K. Ministry of Defense, or the U.K.Government. The U.S. and U.K. Governments are authorized to reproduce anddistribute reprints for government purposes notwithstanding any copyrightnotation hereon.

Author details1Security Research Department, IBM T.J. Watson Research Center, 1101Kitchawan Rd, Yorktown Heights, NY, USA. 2IBM Zurich Research Lab,Saumerstrasse 4, 8803 Ruschlikon, Switzerland.

Received: 11 August 2014 Accepted: 28 October 2014

References1. J Bian, Y Liu, D Zhou, E Agichtein, H Zha, in Proceedings of WWW ‘09.

Learning to recognize reliable users and content in social media withcoupled mutual reinforcement (ACM New York, 2009)

2. JS Yedidia, WT Freeman, Y Weiss, in Exploring Artificial Intelligence in theNewMillennium, Volume 8. Understanding belief propagation and itsgeneralizations (Morgan Kaufmann Publishers Inc., San Francisco, 2003),pp. 236–239

3. DH Chau, C Nachenberg, J Wilhelm, A Wright, C Faloutsos, in SIAMInternational Conference on DataMining. Polonium: tera-scale graphmining and inference for malware detection, (2011)

4. S Pandit, DH Chau, S Wang, C Faloutsos, in International Conference onWorldWideWeb. Netprobe: a fast and scalable system for fraud detectionin online auction networks (ACM New York, 2007)

5. PF Felzenszwalb, DP Huttenlocher, Efficient belief propagation for earlyvision. Int. J. Comput. Vis. 70, 41–54 (2006)

6. AS Foundation, Apache Hadoop. http://hadoop.apache.org/7. M Zaharia, M Chowdhury, T Das, A Dave, J Ma, M McCauley, MJ Franklin, S

Shenker, I Stoica, in Proceedings of the 9th USENIX Conference on NetworkedSystems Design and Implementation. Resilient distributed datasets: afault-tolerant abstraction for in-memory cluster computing (USENIXAssociation San Jose, 2012), pp. 2–2

8. AS Foundation, Apache Spark Lightning-fast cluster computing. https://spark.apache.org/

9. P Garcia-Teodoro, J Diaz-Verdejo, G Maciá-Fernández, E Vázquez,Anomaly-based network intrusion detection: techniques, systems andchallenges. Comput. Secur. 28, 18–28 (2009)

10. J Viega,Myths of Security. (O’Reilly Media, Inc, Sebastopol, 2009)11. JM Kleinberg, Authoritative sources in a hyperlinked environment. J. ACM.

46(5), 604–632 (1999)12. S Hill, F Provost, The Myth of the double-blind review?: Author

identification using only citations. ACM SIGKDD Explorations Newsl. 5(2),179–184 (2003)

13. Z Gyöngyi, H Garcia-Molina, J Pedersen, in Intl Conference on Very LargeData Bases. Combating Web Spam with Trustrank (VLDB EndowmentToronto, 2004)

14. J Neville, O Simsek, D Jensen, J Komoroske, K Palmer, H Goldberg, in ACMConference on Knowledge Discovery and DataMining. Using relationalknowledge discovery to prevent securities fraud (ACM New York, 2005)

15. M McGlohon, S Bay, MG Anderle, DM Steier, C Faloutsos, in ACMConference on Knowledge Discovery and DataMining. SNARE: a link analyticsystem for graph labeling and risk detection (ACM New York, 2009)

doi:10.1186/s13635-014-0017-1Cite this article as: Hu et al.:MUSE: asset risk scoring in enterprise networkwith mutually reinforced reputation propagation. EURASIP Journal onInformation Security 2014 2014:17.

Submit your manuscript to a journal and benefi t from:

7 Convenient online submission7 Rigorous peer review7 Immediate publication on acceptance7 Open access: articles freely available online7 High visibility within the fi eld7 Retaining the copyright to your article

Submit your next manuscript at 7 springeropen.com

http://hadoop.apache.org/https://spark.apache.org/https://spark.apache.org/

AbstractKeywords

IntroductionRisk and reputation in a multi-entity environmentProblem formulation

Mutual reinforcement principleReputation propagation algorithmIncorporating domain knowledgeCharacteristics of external servers SCharacteristics of internal entities D, U, C, A

Scale up propagation algorithm to big dataEvaluationDatasetsExperiment resultsScalability of MUSE using RDDs

Related workConclusionEndnotesCompeting interestsAcknowledgementsAuthor detailsReferences

research openaccess muse ... · the key observation of muse is that entities’ reputation and risk...

Documents