Download - Anomaly detection in online social networks
Anomaly Detection in Online Social Networks
Abstract
Anomalies in online social networks can signify irregular, and often illegal behaviour. Detection of such anomalieshas been used to identify malicious individuals, including spammers, sexual predators, and online fraudsters. In thispaper we survey existing computational techniques for detecting anomalies in online social networks. We characteriseanomalies as being either static or dynamic, and as being labelled or unlabelled, and survey methods for detecting thesedi↵erent types of anomalies. We suggest that the detection of anomalies in online social networks is composed of twosub-processes; the selection and calculation of network features, and the classification of observations from this featurespace. In addition, this paper provides an overview of the types of problems that anomaly detection can address andidentifies key areas for future research.
Keywords: Anomaly Detection, Link Mining, Link Analysis, Social Network Analysis, Online Social Networks
1. Introduction1
Anomalies arise in online social networks as a conse-2
quence of particular individuals, or groups of individuals,3
making sudden changes in their patterns of interaction or4
interacting in a manner that markedly di↵ers from their5
peers. The impacts of this anomalous behaviour can be6
observed in the resulting network structure. For example,7
fraudulent individuals in an online auction system may8
collaborate to boost their reputation. Because these indi-9
viduals have a heightened level of interaction, they tend10
to form highly interconnected subregions within the net-11
work (Pandit et al., 2007). In order to detect this type12
of behaviour, the structure of a network can be examined13
and compared to an assumed or derived model of nor-14
mal, non-collaborative interaction. Regions of the network15
whose structure di↵ers from that expected under the nor-16
mal model can then be classified as anomalies (also known17
as outliers, exceptions, abnormalities, etc.).18
In recent times, the rise of online social networks and19
the digitisation of many forms of communication has meant20
that online social networks have become an important part21
of social network analysis (SNA). This includes research22
into the detection of anomalies in social networks, and23
numerous methods have now been developed. This de-24
velopment has occurred over a wide range of problem do-25
mains, with anomaly detection being applied to the de-26
tection of important and influential network participants27
(e.g. Shetty and Adibi (2005); Malm and Bichler (2011);28
Cheng and Dickinson (2013)), clandestine organisational29
structures (e.g. Shetty and Adibi (2005); Krebs (2002);30
Reid et al. (2005)), and fraudulent and predatory activity31
(e.g. Phua et al. (2010); Fire et al. (2012); Chau et al.32
(2006); Akoglu et al. (2013); Pandit et al. (2007)).33
Since anomaly detection is coming to play an increas-34
ingly important role in SNA, the purpose of this paper is35
to survey existing techniques, and to outline the types of 36
challenges that can be addressed. To our knowledge this 37
survey represents the first attempt to examine anomaly 38
detection with a specific focus on social networks. The 39
contributions of this paper are as follows 40
• provide an overview of existing challenges in a range 41
of problem domains associated with online social net- 42
works that can be addressed using anomaly detection 43
• provide an overview of existing techniques for anomaly 44
detection, and the manner in which these have been 45
applied to social network analysis 46
• explore future challenges for online social networks, 47
and the role that anomaly detection can play 48
• outline key areas where future research can improve 49
the use of anomaly detection techniques in SNA 50
In drafting this review we did not set out to consider 51
particular problem domains. Rather, we aimed to iden- 52
tify tools specifically designed for detection of anomalies, 53
regardless of the particular social networks they were de- 54
signed to analyse. However, as we conducted our survey 55
we found that relevant work was predominantly published 56
in the area of computer science, and consequently, many 57
of the applications of anomaly detection that we encoun- 58
tered were focused on anomalies in online systems. There- 59
fore, unless specifically stated otherwise, the term social 60
network will be used throughout this paper to mean an 61
online social network. 62
Within the social sciences literature, we found a num- 63
ber of papers focusing on the concept of network change 64
(see for example McCulloh and Carley (2011); Arney and 65
Arney (2013); Tambayong (2014)), which attempts to char- 66
acterise the evolution of social networks. We see anomaly 67
Preprint submitted to Social Networks May 7, 2014
detection as being a subset of of change detection, as anomaly68
detection could be used to identify change points where an69
evolving social network undergoes a rapid change, however70
a network that evolves in a consistent fashion over an ex-71
tended period of time is unlikely to be deemed anomalous.72
We have therefore elected to limit the scope of our review73
to those methods that deal specifically with anomaly de-74
tection.75
2. Related Work76
Previous reviews of anomaly detection have provided77
an overview of the general, non-network based problem,78
describing the use of various algorithms and the particu-79
lar types of problems to which these algorithms are most80
suited (Hodge and Austin, 2004; Chandola et al., 2009;81
Markou and Singh, 2003a,b). A workshop on the detection82
of network based anomalies was also held at ACM 201383
(Akoglu and Faloutsos, 2013). The most recent review84
of general anomaly detection (Chandola et al., 2009), ex-85
pands on previous works to define six categories of anomaly86
detection techniques; classification (supervised learning),87
clustering, nearest neighbour, statistical, information the-88
oretic, and spectral analysis.89
As well as categorising anomaly detection techniques,90
previous reviews describe a number of challenges for anomaly91
detection, mainly associated with the problem of defining92
normal behaviour, particularly in the face of evolving sys-93
tems, or systems where anomalies result from malicious94
activities (Chandola et al., 2009; Hodge and Austin, 2004).95
In particular, Chandola et al. (2009) note that the devel-96
opment of general solutions to anomaly detection remains97
a significant challenge and that novel methods are often98
developed to solve specific problems, accommodating the99
specific requirements of these problems and the specific100
representation of the underlying systems. As discussed in101
Section 7, this has also been the case for some methods102
focused on anomaly detection in social networks.103
104
In addition to the major reviews of anomaly detection105
described above, other works have considered anomaly de-106
tection as part of methodological surveys for particular107
problem domains. For example, methods for performing108
anomaly detection have been discussed as part of more109
general reviews in areas of fraud detection (Bolton and110
Hand, 2002; Phua et al., 2010), network intrusion (Jyoth-111
sna et al., 2011; Gogoi et al., 2011; Patcha and Park, 2007),112
and the deployment of wireless sensor networks (Zhang113
et al., 2010; Janakiram et al., 2006). While significant114
overlap exists between the analysis of computer and sen-115
sor networks and social networks, there are also a number116
of di↵erences that must be taken into account. In par-117
ticular, social networks are typically composed of many118
inter-connected communities, which has important conse-119
quences for the distribution of node degree, and the transi-120
tivity of the network (Newman and Park, 2003). Moreover,121
anomaly detection in both sensor and computer networks122
is typically required to occur online in (soft) real-time, and 123
while this constraint may also apply in some SNA sce- 124
narios, it is not typically required. In addition, anomaly 125
detection in sensor networks generally requires algorithms 126
that reduce network tra�c and have a low computational 127
complexity (Zhang et al., 2010). 128
3. Problem domains for the application of anomaly 129
detection in social networks 130
Anomalies in social networks are often representative 131
of illegal and unwanted behaviour. The recent explosion of 132
social media and online social systems, means that many 133
social networks have become key targets for malicious in- 134
dividuals attempting to illegally profit from, or otherwise 135
cause harm to, the users of these systems. 136
Many users of online social systems such as Facebook, 137
Google+, Twitter, etc. are regularly subjected to a bar- 138
rage of spam and otherwise o↵ensive material (Shrivastava 139
et al., 2008; Fire et al., 2012; Akoglu et al., 2010; Hassan- 140
zadeh et al., 2012). Moreover, the relative anonymity and 141
the unsupervised nature of interaction in many online sys- 142
tems provides a means for sexual predators to engage with 143
young, vulnerable individuals (Fire et al., 2012). Since the 144
perpetrators of these behaviours often display patterns of 145
interaction that are quite di↵erent from regular users, they 146
can be identified through the application of anomaly de- 147
tection techniques. For example, sexual predators often 148
interact with a set of individuals who are otherwise un- 149
connected, leading to the formation of star like structures 150
(Fire et al., 2012). These types of structures can be iden- 151
tified by examining a range of network features (Akoglu 152
et al., 2010; Shrivastava et al., 2008; Hassanzadeh et al., 153
2012), or through the use of trained classifiers (Fire et al., 154
2012). 155
Online retailers and online auctions have also become 156
a key target for malicious individuals. By subverting the 157
reputation systems of online auction systems, fraudsters 158
are able to masquerade as honest users, fooling buyers into 159
paying for expensive goods that are never delivered. This 160
process is facilitated by the use of Sybil attacks (the use 161
of multiple fake accounts) and through collaboration be- 162
tween fraudulent individuals to artificially boost reputa- 163
tion to a point where honest buyers are willing to partici- 164
pate in large transactions (Chau et al., 2006; Pandit et al., 165
2007). In many online stores, opinion spam, in the form of 166
fake product reviews, is used in an attempt to distort con- 167
sumers’ perceptions of product quality and to influence 168
buyer behaviour (Akoglu et al., 2013). Again, the mali- 169
cious individuals who engage in these types of behaviour 170
often form anomalous structures within the network, as 171
their patterns of interaction can be quite di↵erent from 172
regular users. 173
In addition to the social networks supported by ded- 174
icated online systems, mining of the social networks in- 175
duced by mobile phone communications, financial trans- 176
actions, etc. can also be used to identify illegal activities. 177
2
Detection of anomalies in these types of networks have pre-178
viously been used to identify organised criminal behaviour,179
including insurance fraud (Subelj et al., 2011), and ter-180
rorist activities (Reid et al., 2005; Krebs, 2002). Given181
the highly detrimental impact of these types of behaviour,182
anomaly detection in social networks can be seen as an183
extremely important component in the growing tool-box184
for performing social network analysis (SNA).185
Outside of criminal or malicious behaviour, anomaly186
detection has also been used to detect important and in-187
fluential individuals (Shetty and Adibi, 2005), individu-188
als fulfilling particular roles within a community (Welser189
et al., 2011), levels of community participation (Bird et al.,190
2008), and unusual patterns in email tra�c (Eberle and191
Holder, 2007).192
4. Definitions193
Anomalies are typically defined in terms of deviation194
from some expected behaviour. A recent review of general,195
non-network based anomaly detection defined anomalies196
as “patterns in data that do not conform to a well defined197
notion of normal behaviour” (Chandola et al., 2009). An-198
other recent review defines anomalies as “an observation199
(or subset of observations) which appears to be inconsis-200
tent with the remainder of that set of data” ((Barnett and201
Lewis, 1994), cited in (Hodge and Austin, 2004)). We note202
two important aspects of these definitions.203
First, the definitions given above highlight the impor-204
tance of defining expected behaviour, but are somewhat205
imprecise in terms of exactly how the anomaly deviates206
from this expectation. This imprecision stems from the207
fact that the magnitude of any deviation will depend on208
the specific problem domain (Chandola et al., 2009).209
Second, these definitions presuppose an understanding210
of what exactly should be observed in order to di↵erentiate211
normal and anomalous behaviour. In order to detect real-212
life behaviour of interest we must observe the underlying213
system through a suitable set of features, and determining214
which features will provide the greatest separation of nor-215
mal and anomalous behaviour is in itself a key challenge216
in anomaly detection (Chandola et al., 2009). This can be217
especially di�cult in adversarial problem domains as the218
behaviour of anomalous entities may change over time in219
direct response to the detection methods employed.220
221
For the detection of anomalies in social networks, we222
are concerned with the pattern of interactions between223
the individuals in the network. Thus, following (Chandola224
et al., 2009; Hodge and Austin, 2004) we define network225
anomalies simply as patterns of interaction that signifi-226
cantly di↵er from the norm. As is the case for general227
anomaly detection, this definition is highly imprecise, and228
for a given domain, the specific meaning of ‘pattern of229
interaction’ and ‘significantly di↵er’ will reflect the partic-230
ular behaviour of interest. For example, one analysis of231
emails between employees of the Enron corporation con- 232
sidered ‘patterns of interaction’ to simply be the number 233
of emails sent by an individual over a given period of time 234
(Priebe et al., 2005). For this particular study, this met- 235
ric was deemed to be a suitable feature for identifying the 236
real-life anomalous behaviour of interest. However, for a 237
di↵erent study, concerned with a di↵erent aspect of em- 238
ployee behaviour, a di↵erent metric may be considered in 239
order to capture the appropriate ‘patterns of interaction’. 240
In this paper, we focus on the detection of anoma- 241
lies resulting from patterns of interaction that di↵er from 242
normal behaviour within the same social network. How- 243
ever, for completeness, we note here that another form of 244
anomaly exists, termed horizontal anomalies (Gao et al., 245
2013). Horizontal anomalies occur when the character- 246
istics or behaviours of an entity vary depending on the 247
source of the data (Gao et al., 2013; Papadimitriou et al., 248
2010). For example, a user of social media may have a 249
similar set of friends or followers across a number of plat- 250
forms (e.g. Twitter, Facebook, etc.), but for one particular 251
platform (Google+ for example) has a markedly di↵erent 252
set of acquaintances. Methods for detecting this type of 253
anomaly are beyond the scope of this paper. However, 254
given the highly diverse nature of the systems from which 255
social network data is currently being extracted, we believe 256
that the detection of horizontal anomalies will become ex- 257
tremely important in the near future, and suggest that 258
further research in this area will be extremely fruitful. 259
5. Characterisation of anomalies 260
In analysing social networks it is the interactions be- 261
tween individuals that form the main object of study. Fo- 262
cus is given to the manner in which interactions between 263
pairs of individuals influence interactions with and be- 264
tween other individuals in the system, and the relation- 265
ship between these interactions and the attributes of the 266
individuals involved (see Brandes et al. (2013); Getoor and 267
Diehl (2005) for discussion of network analysis in general). 268
This di↵erentiates anomaly detection in social networks 269
from traditional, non-network based analysis, where indi- 270
vidual’s attributes are assumed to follow some population 271
level distribution that ignores interactions and the result- 272
ing relationships between individuals and their peers. 273
Typically, social networks are represented using a graph 274
with vertices (nodes) representing individuals (or groups, 275
companies, etc.) and edges (links, arcs, etc.) between 276
the vertices representing interactions. In some instances, 277
nodes may be partitioned into multiple types, and such 278
networks can be represented using bipartite (or tripartite, 279
etc.) graphs. 280
281
Depending on the type of analysis being performed, 282
networks can be characterised as being either static or dy- 283
namic, and as being labeled or unlabelled. Obviously, dy- 284
namic networks change over time, with changes occurring 285
in the pattern of interactions (who interacts with who), 286
3
or in the nature of these interactions (how X interacts287
with Y). Labeled networks include information regarding288
various attributes of both the individuals and their inter-289
actions.290
It could be argued that all social networks are dynamic,291
however it is often useful to analyse social networks as if292
they were static. As an example, consider the Enron email293
data set (analysed in Priebe et al. (2005); Akoglu et al.294
(2010); Eberle and Holder (2007, 2009)), which consists295
of a dynamic network, representing the emails sent be-296
tween Enron employees between 1998 - 2002. This network297
changes over time as di↵erent groups of individuals send298
and receive emails at di↵erent times. If employee A sends299
five emails to employee B, a dynamic representation of the300
network would track these five emails as links between A301
and B occurring only in the appropriate time-steps. If the302
properties of each email are considered (e.g. length, sub-303
ject, etc.) then the network is treated as being labelled.304
By considering a dynamic representation of the network,305
Priebe et al. (2005) were able to detect anomalous time-306
steps where a particular individual suddenly increased the307
number of emails they sent relative to previous time-steps.308
In contrast to a dynamic representation of the Enron309
network, if consideration is given to the total set of emails310
sent by each employee over the full time period, the time311
at which each email is sent is no longer deemed important312
and the network can be treated as static. In reducing the313
network from a dynamic to a static representation, multi-314
ple interactions occurring at di↵erent points in time may315
be combined in some way, and represented by a single link.316
This link may be labelled with the number of interactions317
it represents, and with some aggregation of the properties318
of these interactions (e.g. mean email length), or it may be319
that only the fact that an interaction took place is impor-320
tant, and the network can be treated as being unlabelled.321
For example, a static, unlabelled representation of the En-322
ron email network was used in the analysis by (Akoglu323
et al., 2010), where the formation of an anomalous star324
like structure was identified, indicative of a single indi-325
vidual (Ken Lay, a former CEO) sending emails to large326
numbers of individuals who were not otherwise connected327
(i.e. they did not send emails to one another). Thus, so328
long as the information is actually available, a network329
can be easily represented in a number of di↵erent ways,330
depending on the requirements of the analysis.331
Exactly how a social network is chosen to be repre-332
sented will depend of course on the type of anomalies to be333
detected. As with the networks themselves, the anomalies334
that occur in social networks can also be characterised as335
being dynamic or static, and as labelled or unlabelled. A336
dynamic anomaly occurs with respect to previous network337
behaviour, while a static anomaly occurs with respect to338
the remainder of the network. Labelled anomalies relate339
to both network structure and vertex or edge attributes,340
while unlabelled anomalies relate only to network struc-341
ture (see Akoglu and Faloutsos (2013)).342
In addition to the characterisation as dynamic or static343
and labelled or unlabelled, anomalies can be further char- 344
acterised as occurring with respect to a global or local 345
context and as occurring across a particular network unit, 346
which we refer to as the minimal anomalous unit. A global 347
or local context simply describes whether an anomaly oc- 348
curs relative to the entire network, or relative only to its 349
close neighbours. For example, an individual’s income 350
may be globally insignificant (many people have similar 351
income), however if the income of their friends (defined by 352
their links within the network) are all significantly higher, 353
they may be considered a local anomaly (Gao et al., 2010; 354
Ji et al., 2012). The minimal anomalous unit refers to the 355
network structure that we consider to be anomalous. If 356
we are interested in changes made by individuals, a sud- 357
den change in the medium of communication for example, 358
we might consider a single vertex to be the minimal anom- 359
nalous unit. However, if we are interested in larger groups 360
of individuals, perhaps a group of fraudulent individuals 361
collaborating to boost their reputation in an online auc- 362
tion system (Pandit et al., 2007), the minimal anomalous 363
unit of interest scales to that of a sub-network. In this 364
situation, the behaviour of any given vertex may not be 365
anomalous, but taken together as a group, the pattern of 366
communication may be anomalous with respect to other 367
groups of vertices in the network. Throughout this review, 368
we will consider how the di↵erent approaches to anomaly 369
detection relate to each of these four characteristics. 370
6. Methods for anomaly detection 371
In Section 5, we defined four characteristics that can be 372
used to categorise anomalies; static or dynamic, labelled 373
or unlabelled, local or global context, and the minimal 374
anomalous unit. In this section we outline various ap- 375
proaches for detecting these di↵erent types of anomalies. 376
In practice we found that characterisation as static or dy- 377
namic and labelled or unlabelled best di↵erentiates the 378
various approaches, thus the following section is organised 379
according to characterisation along these two axes. 380
Note that we have considered here only methods that 381
have previously been used to analyse social networks. Net- 382
work based anomaly detection has been investigated in a 383
number of additional problem domains including intrusion 384
detection (Garcıa-Teodoro et al., 2009; Gogoi et al., 2011; 385
Jyothsna et al., 2011), tra�c modelling (Shekhar et al., 386
2001) and gene regulation (Kim and Gelenbe, 2009; Kim 387
et al., 2012). However since we are primarily interested in 388
social networks we have not included these studies in our 389
review. 390
391
6.1. Static unlabelled anomalies 392
Static, unlabelled anomalies occur when the behaviour 393
of an individual or group of individuals leads to the forma- 394
tion of unusual network structures. Because the labels on 395
edges and vertices are not considered, any information re- 396
garding the type of interaction, it’s duration, the age of the 397
4
individuals involved, etc. is ignored. Only the fact that398
the interaction occurred is significant. Thus in order to399
detect anomalous behaviour, assumptions must be made400
regarding the probability that a given pair of individuals401
will interact.402
As an example, Shrivastava et al. (2008) noted that403
email spam and viral marketing material is typically sent404
from a single malicious individual to many targets. Since405
the targets are selected at random, they are unlikely to be406
connected independently of the malicious individual, and407
the resulting network structure will form a star. Based on408
this observation, the number of triangles in each ego-net409
1 is used to label the subject individual as being either410
malicious or innocent, with a low triangle count indicating411
a malicious individual. Extensions to this basic approach412
have also been used to detect groups of malicious individ-413
uals acting in collaboration (Shrivastava et al., 2008).414
While Shrivastava et al. (2008) considered star-like struc-415
tures to be indicative of malicious behaviour, Akoglu et al.416
(2010) demonstrated that both near-stars and near-cliques417
may be indicative of anomalous behaviour in social net-418
works. Cliques and stars form the extremal values of a419
power-law relationship between the number of nodes in420
an ego-net and the number of edges (Ne / N↵v ), thus in421
order to detect anomalies, a power-law curve can be fit-422
ted to the network, and the residuals analysed for signif-423
icant deviance from the expected relationship. This idea424
has been applied to numerous properties of the ego-net425
(Akoglu et al., 2010; Hassanzadeh et al., 2012), and in a426
comparison of various combinations of features, Hassan-427
zadeh et al. (2012) showed that a power-law relationship428
between edge count, and the average betweenness central-429
ity (see Hassanzadeh et al. (2012) for definitions) of an430
individual’s ego-net was most useful in di↵erentiating be-431
tween normal and anomalous (near-cliques or near-stars)432
ego-nets.433
Another approach for detecting static, unlabelled anoma-434
lies is the use of signal processing techniques. The appli-435
cation of signal processing stems from work on graph par-436
titioning, and community detection (Miller et al., 2010b;437
Newman, 2006), and treats anomalous subgraphs as a sig-438
nal embedded in a larger graph, considered to be back-439
ground noise. The problem of detecting this anomalous440
subgraph is formulated in terms of a hypothesis test (Miller441
et al., 2010b,a, 2011a) with442
H0: the graph is noise (no anomalous subgraph443
exists)444
H1: the graph is signal + noise (an anomalous445
subgraph is embedded in the graph)446
The null hypothesis assumes that the network is generated447
by some stochastic process describing the probability that448
1An ego-net is defined as the subgraph induced by considerationof a subject and their immediate neighbours
any pair of nodes will be connected by an edge. Anoma- 449
lous subgraphs are generated by di↵erent underlying pro- 450
cesses, resulting in a density of edges that di↵ers from 451
that expected under the null model (Miller et al., 2010b). 452
Clearly, the degree to which a signal processing approach 453
can be successfully applied depends on the identification 454
of suitable null models describing the background graph. 455
The most basic model of random graphs, the Erdos-Renyi 456
model, treats the possibility of an edge between two ver- 457
tices as an independent Bernoulli trial, so that any given 458
edge exists with probability p. However, graphs generated 459
using this process exhibit a Poisson distribution of vertex 460
degrees, which has been shown to be a poor approxima- 461
tion of many real world networks. Typically, real social 462
networks exhibit a much fatter tail than the Poisson dis- 463
tribution (Newman et al., 2001; Menges et al., 2008). More 464
sophisticated models, such as the recursive-matrix model 465
(R-MAT) are able to generate more realistic graphs with 466
some degree of clustering and non-Poisson degree distri- 467
butions (Chakrabarti et al., 2004; Newman et al., 2001; 468
Menges et al., 2008). However, the selection of a suit- 469
able model remains a challenge for signal processing ap- 470
proaches, and further research is required in this area. 471
6.2. Static labelled anomalies 472
By considering vertex and edge labels in addition to the 473
network structure, a context can be defined in which typ- 474
ical structures may be considered anomalous. Conversely, 475
the particular combination of edge and vertex labels within 476
the context defined by a given network structure (say an 477
ego-net or community) may also be considered anomalous. 478
In addition to the detection of cliques and stars us- 479
ing unlabelled properties, Akoglu et al. (2010) also con- 480
sidered edge labels to detect ‘heavy’ ego-nets, where the 481
sum over a particular label is disproportionately high rel- 482
ative to the number of edges. For example, in analysing 483
donations to US presidential candidates, the Democratic 484
National Committee were found to have donated a sub- 485
stantial amount of money, but this money was spread over 486
only a few candidates, thus the ego-net formed is consid- 487
ered ‘heavy’. In contrast the Kerry Committee received 488
a large number of donations, but each donation was for 489
a small amount. In this example, a bipartite network is 490
considered (with candidates and donors as the two vertex 491
types), and the ego-net provides a context in which the re- 492
ceived or donated amounts may be considered anomalous. 493
Outside this structure, a series of small or large donations 494
may be perfectly normal. 495
Taking a similar view of context, the detection of com- 496
munity dependent anomalies has been investigated by di- 497
viding the network into communities based on individu- 498
als’ patterns of interactions, and then considering the at- 499
tributes of the members of each community (Gao et al., 500
2010; Ji et al., 2012; Gupta et al., 2012; Muller et al., 501
2013). Attribute values which would be deemed normal 502
across the entire network may appear as anomalies within 503
the more restricted context imposed by the community 504
5
setting. For example, Gao et al. (2010) describe a situa-505
tion where the patterns of interaction between a particular506
group of individuals marks them as a community within507
a larger network, however one member of this community508
has a far lower income than their peers. Globally, this509
individual’s income is completely normal, however within510
the context provided by the community their income is511
seen as an anomaly.512
513
In the realm of spam detection, static labelled anoma-514
lies have been used to identify opinion spam in online prod-515
uct reviews (Akoglu et al., 2013; Chau et al., 2006; Pandit516
et al., 2007). These researchers have applied belief prop-517
agation to anomaly detection, whereby ‘hidden’ labels as-518
signed to vertices and edges are updated in an iterative519
manner conditionally based on the labels observed in the520
system of interest.521
As an example of belief propagation, the FraudEagle522
algorithm (Akoglu et al., 2013) uses a bipartite graph to523
represent reviews of products in an online retail store, with524
users forming one set of vertices, and products forming525
the other. Edges between users and products represent526
product reviews and are signed according to the overall527
rating given in the review. The aim of FraudEagle is to528
assign a hidden label to each node, with users labelled as529
either honest or fraudulent, and products as good or bad.530
These labels are assigned based on the observed pattern531
of reviews. It is assumed that normal, honest reviewers532
will typically give good products positive reviews and bad533
products negative reviews, while fraudulent reviewers are534
assumed to regularly do the opposite. This assumption is535
encoded as a matrix describing the probability of a user536
with a hidden label of either honest or fraudulent giving537
a positive or negative review of a product whose hidden538
label designates it as either good or bad. By iteratively539
propagating labels through the network according to these540
assumed probabilities (i.e. by applying loopy belief prop-541
agation), the system is able to determine those users who542
can be considered as opinion spammers.543
While not strictly applied to social networks, TrustRank544
(Gyongyi et al., 2004) could also be considered as an ex-545
ample of a belief propagation approach to anomaly detec-546
tion. In TrustRank, trustworthy pages are assumed to be547
unlikely to link to spam pages, and given an initially la-548
beled set of trustworthy pages, trust is propagated through549
page out-links, so that those pages within k-steps of the550
the initially trusted page are also labeled as trustworthy.551
By ranking pages according to the propagated trust score,552
spam pages, representing anomalies, can be detected.553
554
Another approach to the detection of static labelled555
anomalies is the application of information theory, which556
provides a set of measures for quantifying the amount of557
information described by some event or observation. The558
most commonly used measure, entropy, describes the num-559
ber of bits required to encode the information associated560
with a given event. Thus, given a set of observations O, the561
global entropy H(O) can be thought of as a measure of the 562
degree of randomness or heterogeneity. A set containing 563
a large number of di↵erent observations will require more 564
bits to store the associated information and will therefore 565
have a higher global entropy. In contrast a totally ho- 566
mogenous set will require relatively few bits to store the 567
associated information. 568
In terms of anomaly detection, entropy can be used to 569
identify those subgraphs that if removed from the network 570
would lead to a significant change in entropy. These sub- 571
graphs may represent important communities or individ- 572
uals (Shetty and Adibi, 2005; Tsugawa et al., 2010; Serin 573
and Balcisoy, 2012), or may represent a normal pattern of 574
behaviour that can then be used as a point of comparison, 575
with those subgraphs that deviate from this norm consid- 576
ered to be anomalous (Noble and Cook, 2003; Eberle and 577
Holder, 2007, 2009). In comparing subgraphs, an informa- 578
tion theoretic approach considers a global context, using a 579
very di↵erent representation of the network than that used 580
in the other approaches described in this paper. In ap- 581
plying an information theoretic approach, interactions are 582
represented as subgraphs, and values typically represented 583
as labels on edges or vertices are themselves represented 584
as a vertex in this subgraph. 585
Information theoretic approaches have been shown to 586
be capable of detecting anomalous interactions based on 587
complex structures and flows of information. For exam- 588
ple, a version of this method was applied to the Enron 589
email data set (Eberle and Holder, 2009), resulting in de- 590
tection of a single instance of an email being sent from a 591
director to a non-management employee, and then being 592
forwarded to another non-management employee. Within 593
the Enron data set many emails are sent from directors 594
to non-management employees and between pairs of non- 595
management employees, but this was the only instance 596
involving a forwarded email from a director to a non- 597
management employee. This degree of sensitivity is use- 598
ful in problem domains consisting of adversarial systems 599
where individuals actively seek to hide illegal or otherwise 600
unwanted behaviour, and deviations from the norm are 601
likely to be small (Eberle and Holder, 2009). 602
6.3. Dynamic unlabelled anomalies 603
Dynamic unlabelled anomalies arise when patterns of 604
interaction change over time, such that the structure of 605
a network in one time-step di↵ers markedly from that in 606
previous time-steps. As with static unlabelled anomalies, 607
there are obviously numerous graph properties that we 608
may consider, each of which can be used to generate time- 609
series that can be analysed using traditional tools. Indeed, 610
we could go so far as to use a static analysis (e.g. the pa- 611
rameter of a fitted power-law relationship) as the subject 612
of a time-series and consider how this value changes over 613
time. The di�culty lies, of course, in selecting an appro- 614
priate feature, or set of features, that adequately captures 615
the real-world behaviour of interest. 616
6
Within the literature we found examples of dynamic617
unlabelled network analysis using a variety of methods,618
including scan statistics (Priebe et al., 2005; Neil et al.,619
2011; Cheng and Dickinson, 2013; Marchette, 2012; Park620
et al., 2009; McCulloh and Carley, 2011), Bayesian infer-621
ence (Heard et al., 2010), auto-regressive moving average622
(ARMA) models (Pincombe, 2005), and link prediction623
(Huang and Zeng, 2006). The use of these methods has624
mostly focused on individual nodes, or ego-nets, consider-625
ing the node degree or the size of the ego-net.626
Link prediction attempts to predict future interactions627
between individuals based on previous interactions. In628
order to detect anomalies, a link prediction algorithm is629
run, and for each pair of individuals in the network, the630
likelihood of interaction occurring in the next time-step is631
calculated. This predicted likelihood is then compared to632
the observed interactions, and those observed interactions633
having a low predicted likelihood are deemed anomalous634
(Huang and Zeng, 2006).635
Using scan statistics, analysis is performed by averag-636
ing a time-series over a moving window [t� ⌧, t] and com-637
paring this average to the current time-step. If the di↵er-638
ence is larger than some predefined threshold, the time-639
step is considered to be anomalous (Priebe et al., 2005;640
Neil et al., 2011; Cheng and Dickinson, 2013; Marchette,641
2012; Park et al., 2009). The use of ARMA models in-642
volves fitting the model to a time series and considering643
the residuals for this fit (see Pincombe (2005)). Those644
residuals above a specified cuto↵ are deemed to represent645
anomalous time-steps.646
In Bayesian analysis, the generated time series is as-647
sumed to follow some distribution (i.e. each time-step rep-648
resents a draw from the assumed distribution). Given an649
initial estimation of the distribution parameters, a series of650
updates can be performed in an iterative manner. For each651
time step, the prior distribution is updated by considering652
the value at that time step and calculating a posterior dis-653
tribution that incorporates the information from all of the654
previous time steps. Using this posterior distribution, a655
p-value is calculated, giving the probability that the value656
at the next time step was drawn from this distribution.657
If this p-value is below a significance threshold, the value658
can be considered anomalous. The posterior distribution659
is then treated as the prior for the next update (Heard660
et al., 2010).661
662
The use of scan statistics is the most prominent form663
of analysis we found in the literature, and extensions to664
the basic form have been suggested that allow correlated665
changes in graph properties to be detected (Cheng and666
Dickinson, 2013). This enables inter-related vertices or667
subgraphs to be identified, and may be of benefit in de-668
tecting clandestine organisations. Another extension is the669
application to hypergraphs (Park et al., 2009), which are670
a generalisation of graphs allowing hyperedges that can671
connect more than two vertices. Such graphs can be used672
to represent interactions that involve more than two indi-673
viduals such as meetings, group emails and text messages, 674
and co-authorship on research papers (Park et al., 2009; 675
Silva and Willett, 2008). 676
Other researchers have considered larger structures such 677
as maximal cliques (Chen et al., 2012), and coronets 2 (Ji 678
et al., 2013), with a view to detecting anomalous evolu- 679
tion of a system at a neighbourhood scale. If we con- 680
sider only the pattern of interactions between individuals, 681
a maximal clique can evolve in only six di↵erent ways; by 682
growing, shrinking, merging, splitting, appearing or van- 683
ishing (Chen et al., 2012). Similarly, a coronet can evolve 684
only through the insertion or deletion of a node, or the 685
insertion or deletion of a number of edges. If normal be- 686
haviour is assumed to result in stable, non-evolving neigh- 687
bourhoods, then any neighbourhood that undergoes one 688
of these transformations can be considered to be anoma- 689
lous. For less stable systems, this idea could obviously be 690
extended to include consideration of the number of neigh- 691
bourhoods that change over a time-step, or the magnitude 692
of this change. In such scenarios, scan-statistics, Bayesian 693
inference or any other form of time-series analysis may be 694
employed in order to detect anomalous time-steps. 695
696
In addition to the approaches discussed above, there is 697
a large body of work that focuses on anomaly detection in 698
time-series in general (see Cheboli (2010); Chandola (2009) 699
for recent reviews). We believe that these existing methods 700
could also be used to detect dynamic anomalies in social 701
networks, assuming that suitable features can be identified 702
for generating the required time-series. Moreover, sub- 703
stantial work has been undertaken in the field of change 704
detection in social networks (e.g. McCulloh and Carley 705
(2011); Arney and Arney (2013); Tambayong (2014), and 706
we believe that there is considerable room for cross-over 707
between these two fields. 708
6.4. Dynamic labelled anomalies 709
In conducting our survey, we found dynamic labelled 710
anomalies to be the least represented in the literature. A 711
recent paper describes the application of a signal process- 712
ing approach to anomaly detection in dynamic, labelled 713
networks (Miller et al., 2013), however this was the only 714
example we found. 715
Extending the basic signal processing approach used 716
for static, unlabelled anomalies (Miller et al., 2010b,a), 717
detection of dynamic labelled anomalies assumes that the 718
probability of an edge occurring between any two nodes 719
is a function of the the linear combination of node at- 720
tributes (Miller et al., 2013). Coe�cients for this linear 721
2A coronet is a concept of neighbourhood, where the ‘closeness’to a subject is defined in terms of the number of interactions thatoccur between individuals, rather than simple connectedness. See(Ji et al., 2013) for a formal definition. Note that the number ofinteractions between two individuals is often expressed as a weight,which could be considered a label on the edge. However, since mul-tiple interactions can also be represented by multiple edges, we donot consider this to be a true label.
7
combination are fitted to the network of interest. The722
dynamic nature of the network is handled by considering723
the network structure in discrete time-steps, and treating724
each time-step as for a static network. Comparing the ex-725
pected graph to the observed graph in each time-step gives726
a residual matrix that can be integrated over a specified727
time period using a filter (see Miller et al. (2013, 2011b)728
for details). Note that this method assumes the number729
of vertices remains constant over the specified time-period,730
and that only the edges change.731
732
In addition to the signal processing approach described733
above, we suggest that those methods used for detecting734
dynamic unlabelled anomalies could be adapted for the735
detection of dynamic labelled anomalies. We argue that736
this type of development will be extremely beneficial as in-737
cluding additional information provided by node and edge738
labels is likely to significantly improve anomaly detection,739
providing additional dimensions for separating normal and740
anomalous behaviour. For example, a recent study demon-741
strated that link prediction can be significantly improved742
by including topic information from users tweets (Rowe743
et al., 2012). Obviously if the algorithm used to predict744
links is improved, any anomaly detection using this algo-745
rithm will also be improved.746
As with dynamic unlabelled anomalies, we also sug-747
gest that existing methods for time-series analysis can be748
applied to the detection of dynamic labelled anomalies. A749
number of methods for detecting anomalies within discrete750
sequences are discussed in (Chandola et al., 2012), and we751
believe that many of these methods could be applied to752
the detection of dynamic labelled anomalies in social net-753
works.754
7. Discussion755
In this paper we have surveyed a small but growing756
number of solutions to detecting anomalies in online social757
networks. While these approaches di↵er in their treatment758
of the network, considering dynamic or static representa-759
tions and including or ignoring node attributes, they are760
all based primarily on the analysis of interactions between761
individuals. It is this basis that di↵erentiates the detec-762
tion of anomalies in social networks form other forms of763
anomaly detection.764
Note that while anomalies and networks can be charac-765
terised as static or dynamic, labelled or unlabelled etc., the766
various approaches to anomaly detection can, of course,767
also be grouped by the types of algorithms employed and768
also the problem domain they were originally developed769
to address. In conducting our survey, we found it quite770
interesting that the partitioning of detection methods by771
algorithm type very nearly matches the partition induced772
by the authorship of related literature. During our re-773
view, we found very little crossover between the di↵erent774
research groups exploring this area, with each group tend-775
ing to focus on a single or limited number of approaches.776
Certainly, we did not find clear lines of development, where 777
a particular approach has evolved through incremental im- 778
provements over a large number of iterations. Clearly, this 779
reflects the young nature of this field, and we anticipate 780
that this state of a↵airs will change significantly over the 781
next decade. 782
783
In considering how the various approaches presented in 784
the literature apply to di↵erent types of network anoma- 785
lies, we found that the methods used to actually detect 786
anomalies, as opposed to calculating network features, is 787
largely independent of the type of anomaly being detected. 788
Once a suitable feature space has been identified and cal- 789
culated, it is often the case that any number of traditional 790
anomaly detection techniques could be applied. For exam- 791
ple, in Section 6.1 (static unlabelled anomalies), we dis- 792
cuss the use of regression to detect anomalous ego-nets 793
displaying an unusual edge count relative to node count 794
(Akoglu et al., 2010). However, rather than applying re- 795
gression, once the edge count - node count feature space is 796
extracted, an alternative anomaly detection method, such 797
as a nearest neighbour method, could just as easily have 798
been applied. Therefore, we argue that the process of de- 799
tecting anomalies in social networks is composed of two 800
quite separable sub-processes; namely, the calculation of 801
a suitable feature space, and the detection of anomalies 802
within this space. Moreover, we argue that the selection 803
of a suitable feature space can be further broken down so 804
that, taking a high level view of the problem, the vast ma- 805
jority of the approaches we reviewed can be generalised 806
into the following five steps. 807
1. Determine the smallest unit a↵ected by the behaviour 808
of interest (node, neighbourhood, community, etc). 809
2. Identify the particular properties of this unit that 810
are expected to deviate from the norm. 811
3. Identify the context in which these deviations are 812
expected to occur. 813
4. Calculate the properties of interest, extracting a fea- 814
ture space in which the distances between observa- 815
tions can be measured and compared in a meaningful 816
way. 817
5. Within this space, calculate distances between obser- 818
vations, possibly applying traditional (non-network) 819
tools for anomaly detection (e.g. clustering, nearest 820
neighbour analysis etc.). 821
In each of the papers we surveyed we found something 822
akin to the five steps given above. Obviously there is a 823
major focus on step 4, as the naive approach to this step 824
is often computationally expensive. Given the size of many 825
online social networks, and the demand for tools to handle 826
Big Data we see the development of scalable anomaly de- 827
tection solutions as being an extremely important area of 828
research. However, we also see the selection of a suitable 829
feature space (steps 1 - 3) as being equally important. 830
We believe that our steps 1 - 3, which encompass the 831
mapping of real-world behaviours of interest to suitable 832
8
feature spaces, form a key challenge for detecting anoma-833
lies in social networks. In conducting our survey, few of the834
papers we reviewed explicitly described how the mapping835
between the behaviour of interest and the network prop-836
erties considered was performed. Those few papers that837
did (e.g. Akoglu et al. (2013); Shrivastava et al. (2008);838
Friedland (2009); Gao et al. (2010); Ji et al. (2013)), clearly839
explained the combination of domain knowledge and math-840
ematical reasoning that underpin the use of a particular841
approach and the examination particular network features.842
For the majority of papers though, the reasons for con-843
sidering a particular set of features were unclear, and in844
many examples the real-world behaviour giving rise to any845
anomalies detected was discussed only in response to the846
anomaly actually being found, rather than as a motiva-847
tion for using a particular approach. We believe that this848
is problematic as many of the approaches we reviewed were849
tested only on a single, or a limited number of data sets,850
and may therefore be biased towards the particular anoma-851
lies inherent in those data sets. Without explicit reasoning852
as to why a particular behaviour should be identifiable by a853
particular anomalous network feature there is no reason to854
suspect that this feature will behave in a particular man-855
ner across di↵erent data sets, representing di↵erent social856
networks.857
The lack of papers clearly describing the reasons for858
examining a particular set of features suggests to us that859
selection of a suitable feature space may be extremely dif-860
ficult in practice. Many properties of social networks are861
correlated in some fashion, and it is not clear which com-862
binations of features can be used to capture orthogonal863
concepts. Some comparison of feature spaces has been un-864
dertaken (e.g. Hassanzadeh et al. (2012); McCulloh and865
Carley (2011)), however such comparisons have been re-866
stricted in the number of features considered, and the867
types of anomalies being detected. We predict that the868
requirements for anomaly detection in social networks will869
rapidly advance in the near future, with ever larger vol-870
umes of data and increasingly complex behaviours being871
considered. This may in turn lead to the consideration872
of increasingly complex feature spaces. Thus the develop-873
ment of any guidelines or heuristics for mapping real-world874
behaviour to appropriate feature spaces will be of benefit.875
One possibility that we see for identifying suitable fea-876
ture spaces is the application of a shotgun approach. By877
applying multiple detection techniques across numerous878
graph properties, it may be possible to identify mecha-879
nistic links between the actual behaviours of interest and880
resulting changes in network properties, and to determine881
those features that best di↵erentiate anomalous and nor-882
mal behaviours.883
Another major challenge for detecting anomalies in on-884
line social networks is the evaluation and comparison of885
di↵erent methods. Clearly, not all algorithms are univer-886
sally applicable, and many of the existing approaches have887
been developed with specific problem domains and data888
formats in mind. Moreover, there is a distinct lack of pub-889
licly available data sets with known ground truths. In or- 890
der to compare existing and novel techniques, it is impor- 891
tant to consider how accuracy (recall) and precision are in- 892
fluenced by the scale of the analysis, the density of anoma- 893
lies, and the magnitude of di↵erence between anomalous 894
and normal observations. However, the range of di↵er- 895
ent data sets required to perform such comparisons is not 896
currently available. Thus many approaches are tested on 897
a single data set, with verification of results performed 898
for the top ranked anomalies using manual investigation. 899
For example, Pandit et al. (2007) analyse transactions in 900
an online auction system, and in order to test their ap- 901
proach results were ranked according to an anomaly score. 902
The top-ranked user accounts were then manually investi- 903
gated by checking the auction website for complaints made 904
against these accounts. This type of manual investiga- 905
tion is highly time-consuming and highly dependent on 906
the level of reporting by the owners of the target system. 907
In many cases, manual investigation of user accounts in 908
this way is impossible. 909
Some data sets do exist that are relatively well charac- 910
terised, including the Enron email data set and to a lesser 911
extent the Paraiso communication (synthetic, VAST 2008, 912
www.cs.umd.edu/hcil/VASTchallenge08/). Anomalies de- 913
tected within these data sets can be easily verified against 914
the known series of events. However these data sets are 915
relatively small, and may only be suitable for a small sub- 916
set of problem domains. Therefore, in order to address 917
this challenge, we believe that the generation of synthetic 918
data sets is a logical course of action. 919
A number of methods for generating random social net- 920
works have been proposed, ranging from simple models as- 921
suming some global probability that a given pair of nodes 922
will be connected to more complex models considering the 923
dynamics of each individual node Menges et al. (2008). Be- 924
tween these two extremes lies a broad spectrum of models 925
that have been developed over time and thus reflect our 926
growing understanding of the general properties of social 927
networks (e.g. high transitivity, positive, assortivity, pres- 928
ence of community structures). Consequently, early mod- 929
els (reviewed by Chakrabarti and Faloutsos (2006), see 930
also Sallaberry et al. (2013)) capture only a small num- 931
ber of what we now consider to be general properties of 932
social networks Akoglu and Faloutsos (2009), while more 933
recent models (e.g. Akoglu and Faloutsos (2009); Sal- 934
laberry et al. (2013)) are able to reproduce far more realis- 935
tic representations of a wide range of social networks. As 936
an example, Akoglu and Faloutsos (2009) describe eleven 937
properties of social networks which they suggest should 938
be reproduced by random network generators and pro- 939
vide a suitable method for generation that does indeed 940
reporduce these properties. This method is reported to be 941
highly flexible, and can be used to generate weighted or un- 942
weighted, directed or undirected and unipartite or bipar- 943
tite networks. The method has since been applied in the 944
testing of a novel anomaly detection method, being used to 945
generate a bipartite network representing user reviews of 946
9
online products in an e-commerce setting (Akoglu et al.,947
2013) (a method for generating these types of networks948
is also described as part of the Berlin SPARQL Bench-949
mark http://wifo5-03.informatik.uni-mannheim.de/950
bizer/berlinsparqlbenchmark, last accessed May, 2014).951
The vast majority of network generation methods pro-952
posed to date can be described as phenomenological in953
nature. While these methods are able to reproduce many954
of the known properties of social networks, they do so955
without describing the underlying mechanisms that cause956
these properties to arise. In contrast, agent-based mod-957
els (alternatively, individual-based models) such as that958
proposed by Menges et al. (2008) (see also Glasser and959
Lindauer (2013); Hummel et al. (2012)) describe the spe-960
cific actions taken by individuals in the underlying system961
that lead to the formation of a social network. In this962
type of simulation, the macro-level properties of the re-963
sulting social network can be seen as emergent properties964
stemming from the micro-level behaviour of the underly-965
ing agent-based system. The model does not expicitely966
set out to reproduce these properties, but rather to rep-967
resent the behaviour of the individuals involved. In the968
course of representing this behaviour, the model repro-969
duces a realistic social network. We believe that this ap-970
proach has great potential for generating realistic data sets971
and represents an extremely interesting area of research,972
particularly in adversarial problem domains where anoma-973
lous agents could attempt to disguise their behaviour by974
mimicing ’normal’ agents.975
976
In this paper we have surveyed a number of approaches977
for detecting anomalies in online social networks. We978
found that these di↵erent approaches can be usefully cat-979
egorised based on characterisation of anomalies as being980
static or dynamic and labelled or unlabelled. Depending981
on this characterisation, di↵erent features of the network982
may be examined, and we have suggested that the selec-983
tion of appropriate features is a highly non-trivial task. We984
believe that as the requirements of social network analysis985
continue to grow, the challenges inherent in Big Data anal-986
ysis will necessitate sophisticated heuristics and algorithms987
for mapping complex behaviours to network features, and988
scalable solutions for calculating rich feature spaces which989
can be subjected to anomaly detection analysis.990
8. References991
Akoglu, L., Chandy, R., Faloutsos, C., 2013. Opinion fraud detec-992
tion in online reviews by network e↵ects. In: Proceedings of the993
Seventh International AAAI Conference on Weblogs and Social994
Media.995
Akoglu, L., Faloutsos, C., 2009. Rtg: a recursive realistic graph gen-996
erator using random typing. Data Mining and Knowledge Discov-997
ery 19 (2), 194–209.998
Akoglu, L., Faloutsos, C., 2013. Anomaly, event, and fraud detection999
in large network datasets. In: Proceedings of the sixth ACM in-1000
ternational conference on Web search and data mining. ACM, pp.1001
773–774.1002
Akoglu, L., McGlohon, M., Faloutsos, C., 2010. Oddball: Spotting 1003
anomalies in weighted graphs. In: Advances in Knowledge Dis- 1004
covery and Data Mining. Springer, pp. 410–421. 1005
Arney, D. C., Arney, K., 2013. Modeling insurgency, counter- 1006
insurgency, and coalition strategies and operations. The Journal 1007
of Defense Modeling and Simulation: Applications, Methodology, 1008
Technology 10 (1), 57–73. 1009
Barnett, V., Lewis, T., 1994. Outliers in Statistical Data, 3rd Edi- 1010
tion. John Wiley & Sons. 1011
Bird, C., Pattison, D., D’Souza, R., Filkov, V., Devanbu, P., 2008. 1012
Latent social structure in open source projects. In: Proceedings 1013
of the 16th ACM SIGSOFT International Symposium on Foun- 1014
dations of software engineering. ACM, pp. 24–35. 1015
Bolton, R. J., Hand, D. J., 2002. Statistical fraud detection: A re- 1016
view. Statistical Science, 235–249. 1017
Brandes, U., Robins, G., McCrannie, A., Wasserman, S., 4 2013. 1018
What is network science? Network Science 1 (01), 1–15. 1019
Chakrabarti, D., Faloutsos, C., 2006. Graph mining: Laws, genera- 1020
tors, and algorithms. ACM Computing Surveys (CSUR) 38 (1), 1021
2. 1022
Chakrabarti, D., Zhan, Y., Faloutsos, C., 2004. R-mat: A recursive 1023
model for graph mining. Computer Science Department, 541. 1024
Chandola, V., 2009. Anomaly detection for symbolic sequences and 1025
time series data. Ph.D. thesis, University of Minnesota. 1026
Chandola, V., Banerjee, A., Kumar, V., 2009. Anomaly detection: 1027
A survey. ACM Computing Surveys (CSUR) 41 (3), 15. 1028
Chandola, V., Banerjee, A., Kumar, V., 2012. Anomaly detection for 1029
discrete sequences: A survey. Knowledge and Data Engineering, 1030
IEEE Transactions on 24 (5), 823–839. 1031
Chau, D. H., Pandit, S., Faloutsos, C., 2006. Detecting fraudulent 1032
personalities in networks of online auctioneers. In: Knowledge Dis- 1033
covery in Databases: PKDD 2006. Springer, pp. 103–114. 1034
Cheboli, D., 2010. Anomaly detection of time series. Ph.D. thesis, 1035
University of Minnesota. 1036
Chen, Z., Hendrix, W., Samatova, N. F., 8 2012. Community-based 1037
anomaly detection in evolutionary networks. Journal of Intelligent 1038
Information Systems 39 (1), 59–85. 1039
Cheng, A., Dickinson, P., 2013. Using scan-statistical correlations for 1040
network change analysis. In: Trends and Applications in Knowl- 1041
edge Discovery and Data Mining. Springer, pp. 1–13. 1042
Eberle, W., Holder, L., 2009. Applying graph-based anomaly detec- 1043
tion approaches to the discovery of insider threats. In: Intelligence 1044
and Security Informatics, 2009. ISI’09. IEEE International Con- 1045
ference on. IEEE, pp. 206–208. 1046
Eberle, W., Holder, L. B., 2007. Mining for structural anomalies in 1047
graph-based data. In: DMIN. pp. 376–389. 1048
Fire, M., Katz, G., Elovici, Y., 2012. Strangers intrusion detection - 1049
detecting spammers and fake profiles in social networks based on 1050
topology anomalies. Human Journal 1 (1), 26–39. 1051
Friedland, L., 2009. Anomaly detection for inferring social structure. 1052
Gao, J., Du, N., Fan, W., Turaga, D., Parthasarathy, S., Han, J., 1053
2013. A multi-graph spectral framework for mining multi-source 1054
anomalies. In: Graph Embedding for Pattern Analysis. Springer, 1055
pp. 205–227. 1056
Gao, J., Liang, F., Fan, W., Wang, C., Sun, Y., Han, J., 2010. On 1057
community outliers and their e�cient detection in information 1058
networks. In: Proceedings of the 16th ACM SIGKDD interna- 1059
tional conference on Knowledge discovery and data mining. ACM, 1060
pp. 813–822. 1061
Garcıa-Teodoro, P., Dıaz-Verdejo, J., Macia-Fernandez, G., Vazquez, 1062
E., 2 2009. Anomaly-based network intrusion detection: Tech- 1063
niques, systems and challenges. Computers & Security 28 (1-2), 1064
18–28. 1065
Getoor, L., Diehl, C. P., 2005. Link mining: a survey. ACM SIGKDD 1066
Explorations Newsletter 7 (2), 3–12. 1067
Glasser, J., Lindauer, B., 2013. Bridging the gap: A pragmatic ap- 1068
proach to generating insider threat data. In: Security and Privacy 1069
Workshops (SPW), 2013 IEEE. IEEE, pp. 98–104. 1070
Gogoi, P., Bhattacharyya, D. K. . K., Borah, B., Kalita, J. K. . K., 4 1071
2011. A survey of outlier detection methods in network anomaly 1072
identification. The Computer Journal 54 (4), 570–588. 1073
10
Gupta, M., Gao, J., Sun, Y., Han, J., 2012. Community trend outlier1074
detection using soft temporal pattern mining. Springer, pp. 692–1075
708.1076
Gyongyi, Z., Garcia-Molina, H., Pedersen, J., 2004. Combating web1077
spam with trustrank. In: Proceedings of the Thirtieth interna-1078
tional conference on Very large data bases-Volume 30. VLDB En-1079
dowment, pp. 576–587.1080
Hassanzadeh, R., Nayak, R., Stebila, D., 2012. Analyzing the e↵ec-1081
tiveness of graph metrics for anomaly detection in online social1082
networks. Lecture Notes in Computer Science : Web Information1083
Systems Engineering 7651, 624–630.1084
Heard, N. A., Weston, D. J., Platanioti, K., Hand, D. J., 6 2010.1085
Bayesian anomaly detection methods for social networks. The An-1086
nals of Applied Statistics 4 (2), 645–662.1087
Hodge, V. J., Austin, J., 2004. A survey of outlier detection method-1088
ologies. Artificial Intelligence Review 22 (2), 85–126.1089
Huang, Z., Zeng, D. D., 2006. A link prediction approach to anoma-1090
lous email detection. In: Systems, Man and Cybernetics, 2006.1091
SMC’06. IEEE International Conference on. Vol. 2. IEEE, pp.1092
1131–1136.1093
Hummel, A., Kern, H., Kuhne, S., Dohlernd, A., 2012. An agent-1094
based simulation of viral marketing e↵ects in social networks. In:1095
26th European Simulation and Modelling Conference - ESM’2012.1096
p. 212219.1097
Janakiram, D., Adi Mallikarjuna Reddy, V., Phani Kumar, A. V. U.,1098
2006. Outlier detection in wireless sensor networks using bayesian1099
belief networks. In: Communication System Software and Mid-1100
dleware, 2006. Comsware 2006. First International Conference on.1101
IEEE, pp. 1–6.1102
Ji, T., Gao, J., Yang, D., 2012. A Scalable Algorithm for Detecting1103
Community Outliers in Social Networks. Springer, pp. 434–445.1104
Ji, T., Yang, D., Gao, J., 2013. Incremental Local Evolutionary Out-1105
lier Detection for Dynamic Social Networks. Springer, pp. 1–15.1106
Jyothsna, V., Prasad, V. V. R., Prasad, K. M., 2011. A review of1107
anomaly based intrusion detection systems. International Journal1108
of Computer Applications (0975–8887) Volume.1109
Kim, H., Atalay, R., Gelenbe, E., 2012. G-network modelling based1110
abnormal pathway detection in gene regulatory networks. In:1111
Computer and Information Sciences II. Springer, pp. 257–263.1112
Kim, H., Gelenbe, E., 2009. Anomaly detection in gene expression1113
via stochastic models of gene regulatory networks. BMC Genomics1114
10 (Suppl 3), S26.1115
Krebs, V. E., 2002. Mapping networks of terrorist cells. Connections1116
24 (3), 43–52.1117
Malm, A., Bichler, G., 5 2011. Networks of collaborating criminals:1118
Assessing the structural vulnerability of drug markets. Journal of1119
Research in Crime and Delinquency 48 (2), 271–297.1120
Marchette, D., 9 2012. Scan statistics on graphs. Wiley Interdisci-1121
plinary Reviews: Computational Statistics 4 (5), 466–473.1122
Markou, M., Singh, S., 12 2003a. Novelty detection: a review - part1123
1: statistical approaches. Signal Processing 83 (12), 2481–2497.1124
Markou, M., Singh, S., 12 2003b. Novelty detection: a review - part1125
2: neural network based approaches. Signal Processing 83 (12),1126
2499–2521.1127
McCulloh, I., Carley, K. M., 2011. Detecting change in longitudinal1128
social networks. Tech. rep., DTIC Document.1129
Menges, F., Mishra, B., Narzisi, G., 2008. Modeling and simulation1130
of e-mail social networks: a new stochastic agent-based approach.1131
In: Proceedings of the 40th Conference on Winter Simulation.1132
Winter Simulation Conference, pp. 2792–2800.1133
Miller, B., Bliss, N., Wolfe, P. J., 2010a. Subgraph detection using1134
eigenvector l1 norms. In: Advances in Neural Information Pro-1135
cessing Systems. pp. 1633–1641.1136
Miller, B. A., Arcolano, N., Bliss, N. T., 2013. E�cient anomaly1137
detection in dynamic, attributed graphs. In: Intelligence and Se-1138
curity Informatics. pp. 179–184.1139
Miller, B. A., Beard, M. S., Bliss, N. T., 2011a. Eigenspace analysis1140
for threat detection in social networks. In: Information Fusion1141
(FUSION), 2011 Proceedings of the 14th International Conference1142
on. IEEE, pp. 1–7.1143
Miller, B. A., Beard, M. S., Bliss, N. T., 2011b. Matched filtering1144
for subgraph detection in dynamic networks. In: Statistical Signal 1145
Processing Workshop (SSP), 2011 IEEE. IEEE, pp. 509–512. 1146
Miller, B. A., Bliss, N. T., Wolfe, P. J., 2010b. Toward signal pro- 1147
cessing theory for graphs and non-euclidean data. In: Acoustics 1148
Speech and Signal Processing (ICASSP), 2010 IEEE International 1149
Conference on. IEEE, pp. 5414–5417. 1150
Muller, E., Snchez, P. I., Mulle, Y., Bohm, K., 2013. Ranking outlier 1151
nodes in subspaces of attributed graphs. In: Data Engineering 1152
Workshops (ICDEW), 2013 IEEE 29th International Conference 1153
on. IEEE, pp. 216–222. 1154
Neil, J., Storlie, C., Hash, C., Brugh, A., Fisk, M., 2011. Scan statis- 1155
tics for the online detection of locally anomalous subgraphs. Ph.D. 1156
thesis, PhD thesis, U. of New Mexico. 1157
Newman, M. E., 2006. Finding community structure in networks us- 1158
ing the eigenvectors of matrices. Physical review E 74 (3), 036104. 1159
Newman, M. E., Park, J., 2003. Why social networks are di↵erent 1160
from other types of networks. Physical Review E 68 (3), 036122. 1161
Newman, M. E. J., Strogatz, S. H. . H., Watts, D. J. . J., 7 2001. 1162
Random graphs with arbitrary degree distributions and their ap- 1163
plications. Physical Review E 64 (2). 1164
Noble, C. C., Cook, D. J., 2003. Graph-based anomaly detection. In: 1165
Proceedings of the ninth ACM SIGKDD international conference 1166
on Knowledge discovery and data mining. ACM, pp. 631–636. 1167
Pandit, S., Chau, D. H., Wang, S., Faloutsos, C., 2007. Netprobe: 1168
a fast and scalable system for fraud detection in online auction 1169
networks. In: Proceedings of the 16th international conference on 1170
World Wide Web. ACM, pp. 201–210. 1171
Papadimitriou, P., Dasdan, A., Garcia-Molina, H., 2010. Web graph 1172
similarity for anomaly detection. Journal of Internet Services and 1173
Applications 1 (1), 19–30. 1174
Park, Y., Priebe, C., Marchette, D., Youssef, A., 2009. Anomaly de- 1175
tection using scan statistics on time series hypergraphs. In: SDM. 1176
SDM. 1177
Patcha, A., Park, J.-M. . M., 8 2007. An overview of anomaly detec- 1178
tion techniques: Existing solutions and latest technological trends. 1179
Computer Networks 51 (12), 3448–3470. 1180
Phua, C., Lee, V., Smith, K., Gayler, R., 2010. A comprehensive sur- 1181
vey of data mining-based fraud detection research. arXiv preprint 1182
arXiv:1009.6119. 1183
Pincombe, B., 2005. Anomaly detection in time series of graphs using 1184
arma processes. ASOR BULLETIN 24 (4), 2. 1185
Priebe, C. E., Conroy, J. M., Marchette, D. J., Park, Y., 2005. Scan 1186
statistics on enron graphs. Computational & Mathematical Orga- 1187
nization Theory 11 (3), 229–247. 1188
Reid, E., Qin, J., Zhou, Y., Lai, G., Sageman, M., Weimann, G., 1189
Chen, H., 2005. Collecting and analyzing the presence of terrorists 1190
on the web: A case study of jihad websites. In: Intelligence and 1191
Security Informatics. Springer, pp. 402–411. 1192
Rowe, M., Stankovic, M., Alani, H., 2012. Who will follow whom? 1193
exploiting semantics for link prediction in attention-information 1194
networks. In: The Semantic Web–ISWC 2012. Springer, pp. 476– 1195
491. 1196
Sallaberry, A., Zaidi, F., Melancon, G., 2013. Model for generating 1197
artificial social networks having community structures with small- 1198
world and scale-free properties. Social Network Analysis and Min- 1199
ing 3 (3), 597–609. 1200
Serin, E., Balcisoy, S., 2012. Entropy based sensitivity analysis and 1201
visualization of social networks. In: Advances in Social Networks 1202
Analysis and Mining (ASONAM), 2012 IEEE/ACM International 1203
Conference on. IEEE, pp. 1099–1104. 1204
Shekhar, S., Lu, C.-t., Zhang, P., 2001. Detecting graph-based spatial 1205
outliers: algorithms and applications (a summary of results). In: 1206
Proceedings of the seventh ACM SIGKDD international confer- 1207
ence on Knowledge discovery and data mining. ACM, pp. 371–376. 1208
Shetty, J., Adibi, J., 2005. Discovering important nodes through 1209
graph entropy the case of enron email database. In: Proceedings 1210
of the 3rd international workshop on Link discovery. ACM, pp. 1211
74–81. 1212
Shrivastava, N., Majumder, A., Rastogi, R., 2008. Mining (social) 1213
network graphs to detect random link attacks. In: Data Engineer- 1214
ing, 2008. ICDE 2008. IEEE 24th International Conference on. 1215
11
IEEE, pp. 486–495.1216
Silva, J., Willett, R., 2008. Detection of anomalous meetings in a1217
social network. In: Information Sciences and Systems, 2008. CISS1218
2008. 42nd Annual Conference on. IEEE, pp. 636–641.1219
Tambayong, L., 2014. Change detection in dynamic political net-1220
works: The case of sudan. In: Theories and Simulations of Com-1221
plex Social Systems. Springer, pp. 43–59.1222
Tsugawa, S., Ohsaki, H., Imase, M., 2010. Inferring success of online1223
development communities: Application of graph entropy for quan-1224
tifying leaders’ involvement. In: Information and Telecommunica-1225
tion Technologies (APSITT), 2010 8th Asia-Pacific Symposium1226
on. IEEE, pp. 1–6.1227
Subelj, L., Furlan, u., Bajec, M., 2011. An expert system for de-1228
tecting automobile insurance fraud using social network analysis.1229
Expert Systems with Applications 38 (1), 1039–1052.1230
Welser, H. T., Cosley, D., Kossinets, G., Lin, A., Dokshin, F., Gay,1231
G., Smith, M., 2011. Finding social roles in wikipedia. In: Pro-1232
ceedings of the 2011 iConference. ACM, pp. 122–129.1233
Zhang, Y., Meratnia, N., Havinga, P., 2010. Outlier detection tech-1234
niques for wireless sensor networks: A survey. Communications1235
Surveys & Tutorials, IEEE 12 (2), 159–170.1236
12