a dynamic-adversarial mining approach to the security of ... · a dynamic-adversarial mining...

A Dynamic-Adversarial Mining Approach to the Security of Machine Learning

Tegjyot Singh Sethia,∗, Mehmed Kantardzica, Lingyu Lyua, Jiashun Chenb

aData Mining Lab, University of Louisville, Louisville, USAbHuai Hai Institute of Technology, China

Abstract

Operating in a dynamic real world environment requires a forward thinking and adversarial aware design for classi-fiers, beyond fitting the model to the training data. In such scenarios, it is necessary to make classifiers - a) harder toevade, b) easier to detect changes in the data distribution over time, and c) be able to retrain and recover from modeldegradation. While most works in the security of machine learning has concentrated on the evasion resistance (a)problem, there is little work in the areas of reacting to attacks (b and c). Additionally, while streaming data researchconcentrates on the ability to react to changes to the data distribution, they often take an adversarial agnostic view ofthe security problem. This makes them vulnerable to adversarial activity, which is aimed towards evading the conceptdrift detection mechanism itself. In this paper, we analyze the security of machine learning, from a dynamic andadversarial aware perspective. The existing techniques of Restrictive one class classifier models, Complex learningmodels and Randomization based ensembles, are shown to be myopic as they approach security as a static task. Thesemethodologies are ill suited for a dynamic environment, as they leak excessive information to an adversary, who cansubsequently launch attacks which are indistinguishable from the benign data. Based on empirical vulnerability anal-ysis against a sophisticated adversary, a novel feature importance hiding approach for classifier design, is proposed.The proposed design ensures that future attacks on classifiers can be detected and recovered from. The proposed workpresents motivation, by serving as a blueprint, for future work in the area of Dynamic-Adversarial mining, whichcombines lessons learned from Streaming data mining, Adversarial learning and Cybersecurity.

Keywords: Adversarial machine learning, Streaming data, Concept drift, Ensemble, Feature information hiding,Evasion

1. Introduction

The machine learning gold rush has led to a prolif-eration of predictive models, being used at the core ofseveral cybersecurity applications. Machine learningallows for scalable and swift interpolation from largequantities of data, and also provides generalize-abilityto improve longevity of the security mechanisms. How-ever, the increased exuberance in the application of ma-chine learning has led to the overlooking of its secu-rity vulnerabilities. Recent works on adversarial ma-chine learning (Sethi et al., 2017; Papernot et al., 2017,2016b; Tramer et al., 2016), have demonstrated that de-ployed classification systems are vulnerable to adver-sarial perturbations at test time, which cause the models

∗Corresponding author.Email addresses: [email protected]

(Tegjyot Singh Sethi ), [email protected](Mehmed Kantardzic), [email protected] (Lingyu Lyu),[email protected] (Jiashun Chen)

to degrade over time. Such attacks on the integrity ofmachine learning systems, are termed as exploratory at-tacks (Barreno et al., 2006), as they are caused at testtime, by an adversary learning and evading the charac-teristics of the deployed classifier model. An exampleof such an attack is shown in Figure 1, where an adver-sary learns the behavior of the spam detection system,and then crafts emails so as to avoid being flagged. Sim-ilar attacks have been shown to be possible on computervision systems (Papernot et al., 2016b), speech recog-nition systems (Carlini et al., 2016) and also on naturallanguage processing systems (Hosseini et al., 2017).

Owing to the pressing need for secure machine learn-ing systems, there has been considerable recent at-tention towards improving the resilience of such sys-tems (Papernot et al., 2016c; Wang, 2015; Smutz andStavrou, 2016). However, the entire research effort inthe security of machine learning is split into two schoolsof thought: a) Proactive security measures, and b) Re-

Preprint submitted to . March 28, 2018

arX

iv:1

803.

0916

2v1

[cs

.LG

] 2

4 M

ar 2

018

Figure 1: Illustration of exploratory attacks on a machine learningbased spam filtering system.

active security measures. Proactive security is aimedtowards anticipating adversary strategies, and makingsystems resistant to possible attacks (Hardt et al., 2016;Wang, 2015; Biggio et al., 2010a; Stevens and Lowd,2013; Xu et al., 2014; Colbaugh and Glass, 2012b;Vorobeychik and Li, 2014). These methods focus onprevention, with the aim of delaying an attack till themaximum possible extent. As shown in Figure 2) a),these defensive measures aim at increasing the attackerseffort δP, which is needed to degrade the target system’sperformance (evade the system). Work on proactive se-curity views the attack-defense problem as a static one,with the intention of bolstering defenses before deploy-ing the system.

While works on proactive security emphasize on theprevention problem, Reactive security focuses on theproblem of fixing the system after it is attacked (Barthet al., 2012; Henke et al., 2015). Reactive securityemphasizes on the detection component of the attack-defense cycle. These approaches aim at being able toswiftly detect attacks and fix the system, with limitedhuman supervision, to make the system available andeffective again. As shown in Figure 2) b), these mea-sures aim at reducing the delay δR, which is the timetaken by the system to detect degradation and fix (re-train) the model. Works in the area of concept drift de-tection and adaptation are suitable for reactive security,as they do not make any explicit assumption about thetype of change and aim at maintaining high model per-formance, over time (Minku and Yao, 2012; Brzezinskiand Stefanowski, 2014; Gama et al., 2014). However,hitherto, the work in the domain of concept drift re-search has followed a domain agnostic approach, whereany changes to the data distribution is regarded equally,without any adversarial awareness.

The works on Proactive security and Reactive secu-rity, approach the challenge of security from differentperspectives. An analogy to explain the two is: Con-sider the task of securing a precious artifact. Proactivesecurity aims at securing the vault in which the artifact is

(a) Proactive security aimsat increase time and effortneeded (δP) to degrade amodel

(b) Reactive security aims atreduce the time taken to detectattacks and fix them (δR)

Figure 2: Goals of Proactive and Reactive Security.

kept- by making thicker walls, by setting up barricades,chains and locks, and by keeping track of suspects whohave a history of theft. On the other hand, Reactive se-curity would concentrate its efforts on setting up surveil-lance and alarm systems, so as to be able to apprehendthe thief easily, if at all a larceny is attempted. Thereis a need for a holistic approach, which combines thebenefits of both these classes of security measures.

Since the adversary is in a never-ending arms racewith the defender, a dynamic and adversarial aware ap-proach to security is needed. The need for a holistic so-lution, is illustrated using the example of a 2D syntheticdataset in Figure 3 a). A classifier model C is trainedon the space of Legitimate class training samples (blue).The model C is seen to be restrictive, as it allows onlya narrow margin of samples to be admitted into the sys-tem as benign samples. This strategy is considered toprovide better security from a static perspective, as itwould require an adversary to craft samples in a smalland exact range of feature values. However, looking atthe problem from a dynamic perspective, it is seen thatthe space of possible attacks is now highly overlappedwith the training data. As such, once an attack starts,the adversary has enough information about the featurespace, so as to become indistinguishable from benigntraffic entering the system. These attacks are harder todetect and also harder to recover from, due to the in-separability of samples. On the other hand, the simpledesign of Figure 3 b) (simple as it reduces the numberof features used in the model C), has a larger space ofpossible attacks, making it easier to evade. However,the increased space means that an adversary evading Cwill not be able to completely reverse engineer the loca-tion and characteristics of the Legitimate training sam-ples, due to the added uncertainty provided by the largespace of possible attacks.

2

(a) Impact of using a restrictive defender model.

(b) Impact of using a simple defender model.

Figure 3: Impact of using a restrictive defender model (top) and thatof using a more generalized defender model (bottom). In case if re-strictive models, attacks are harder to carry out, but will lead to insep-arability from benign traffic.

The illustrative example of Figure 3 a) and Figure 3b), demonstrate the need to develop countermeasuresfrom a Dynamic - Adversarial perspective. In such anenvironment, delaying the onset of attacks, detecting at-tacks and recovering from them are all equally impor-tant, for the continued usage of the classification sys-tem. In this paper, the impact of classifier design strate-gies on the severity of attacks launched at test time,is presented. The existing ideas of Randomness (Al-abdulmohsin et al., 2014; Colbaugh and Glass, 2012a)and Complex learning (Rndic and Laskov, 2014; Wang,2015; Biggio et al., 2010b), for securing classifiers, areevaluated from a dynamic perspective. Also, the abilityto take preemptive measures, to ensure defender’s lever-age in a dynamic environment, is evaluated. In partic-ular, the ability to hide feature importance is demon-strated, as a proactive measure to enable efficient re-active security. To the best of our knowledge, this isthe first work which combines both adversarial and dy-namic aspects of the security of machine learning. Themain contributions of this paper are as follows:

• Analyzing the impact of classifier design strate-gies, in the training phase, on the dynamic capa-bilities of an adversary, at test time.

• Quantifying the impact of attack severity using the

Adversarial Margin Density (AMD) metric, for de-termining the ability to detect and recover from at-tacks.

• Analyzing vulnerabilities of popularly used Re-strictive, Complex learning and Randomizationbased approaches, in a dynamic environment.

• Proposing a counter-intuitive and novel featurehiding approach, to ensure long term defenderleverage, for better attack detection and modelretrain-ability. This proposed approach serves asa blueprint for future works in the domain ofDynamic-Adversarial mining.

• Providing background and motivation for futurework, for a new interdisciplinary area of study- Dynamic-Adversarial mining, which combinesStreaming data, Machine learning, Cybersecurityand Adversarial learning, for a holistic approach tothe security of machine learning.

The rest of the paper is organized as follows: Sec-tion 2 presents a survey of work in the area of secu-rity of machine learning. Section 3 presents the analy-sis methodology for evaluating effects of different clas-sifier design strategies, on the adversarial activities attest time. Section 4 presents experimental evaluation ofpopular classifier design strategies in the domain of se-curity: Restrictive one class classifiers, Complex learn-ing based ensemble methods and Randomization basedclassifiers, are evaluated. Based on the analysis, a novelfeature hiding approach is proposed and evaluated inSection 5. Ideas and motivation for future work in thedomain of Dynamic-Adversarial mining, are presentedin Section 6. Conclusion and avenues for further exten-sions of the work are presented in Section 7.

2. Related work on security of machine learning

In this section, related work in the domain of machinelearning security is discussed. Section 2.1 presentswork in the area of adversarial manipulation at testtime, which affects the performance of machine learn-ing based systems. Proactive/Static security measures,developed in the domain of adversarial machine learn-ing, are discussed in Section 2.2. Methodologies whichadvocate a reactive/dynamic approach to dealing withattacks are discussed in Section 2.3. The need for aholistic approach, combining dynamic and adversarialthinking, is emphasized.

3

Figure 4: Illustration of AP attacks. (Left - Right): The defender’smodel from its training data. The Exploration phase depicting theseed (blue) and the anchor points samples (purple). The Exploitationattack phase samples (red) generated based on the anchor points.(Sethiet al., 2017)

2.1. Exploratory attacks on classification based sys-tems

Machine learning systems deployed in the real worldare vulnerable to exploratory attacks, which aim to de-grade the learned model, over time. Exploratory attacksare launched by an adversary, by first learning about thecharacteristics of the system through carefully craftedprobes, and then morphing suspicious samples to causeevasion at test time (Biggio et al., 2014a; Tramer et al.,2016; Biggio et al., 2013; Papernot et al., 2016a; Big-gio et al., 2014b; Sethi et al., 2017). Such attacks arecommonplace and difficult to avoid, as they rely on thesame black box access to the systems, that a benign useris entitled to. These attacks were shown to affect a widevariety of classifier types in (Papernot et al., 2016a), andwere also shown to be potent towards ML-as-a-service(such as Amazon AWS Machine Learning1 and GoogleCloud Platform2), which provide APIs for accessingpredictive analytics as a service (Sethi and Kantardzic,2017a).

One such black box, data driven, and algorithmicattack strategy, was proposed in (Sethi et al., 2017).The work in (Sethi et al., 2017), proposed the AnchorPoints (AP) attack strategy, which uses an exploration-exploitation framework, to perform evasion on classi-fiers, as shown in Figure 4. In these attacks, the ad-versary starts with exploration of the deployed classifiermodel(green), by using crafted probing samples. Thesesamples are submitted to the system, via remote APIcalls, and the resulting feedback of Accept/Reject is ob-served. The AP attack strategy then leverages the sam-ples which lead to successful evasion, to generate ad-ditional attack samples (red) via exploitation. Theseattack samples will cause the performance of the de-fender to drop, ultimately leaving it unusable. The AP

1https://aws.amazon.com/machine-learning/2cloud.google.com/machine-learning

attack strategy provides a domain independent, classi-fier independent, and simplistic approach to simulateexploratory attacks and to evaluate the impact of var-ious security design decisions, on the performance ofthe classifier. In this paper, the AP attack strategy willbe extended and used, as a tool for assessing and com-paring the efficacy of various security mechanisms.

2.2. Proactive/Static approaches for security of ma-chine learning

Works on proactive security, approach the problemfrom a static perspective. Robustness is incorporatedinto the model’s design, when it is trained from the ob-served data, and the system is assumed to be resilientagainst future evasion attacks. These techniques arepopular in the adversarial machine learning research do-main, and the available work can be broadly classifiedinto the following two categories: a) Complex learningmethods and b) Disinformation methods. These cate-gories of attacks are discussed in the following sections.

2.2.1. Security based on learning of complex modelsThese methods increase the complexity of the learned

model, to make it harder for an attacker to effectivelyreverse engineer it with a limited number of probes.The complexity is increased either by using difficult tomimic features in the classification model (Rndic andLaskov, 2014), or by balancing weights across features,such that the resulting classification models are robust(Rndic and Laskov, 2014; Wang, 2015; Biggio et al.,2010b). Such robust classifiers require the adversary toreverse engineer and mimic a large number of features,thereby increasing its efforts and cost, to successfullyevade the classifier. A practical challenge that thesemethods face is the balancing of complexity with over-fitting, as overfitting can lead to poor generalization per-formance and can also make the system vulnerable totraining time (causative) attacks (Barreno et al., 2010).

Complexity of the models can be increased systemat-ically, by distributing weights among informative fea-tures, to make a classifier robust (Rndic and Laskov,2014). In (Kołcz and Teo, 2009), feature reweighingwas used by weighting every feature inversely based ontheir importance. This made the classification bound-ary dependent on a larger number of features, therebyrequiring attacks to spend more effort in trying to evadethe classifier. In (Zhang et al., 2015), the task of se-lecting a reduced feature subset in an adversarial en-vironment was presented. Classification security wasintroduced as a regularization term, in the learning ob-jective function of the defender’s model. This term was

4

https://aws.amazon.com/machine-learning/

cloud.google.com/machine-learning

optimized along with the classifier’s generalization ca-pability, during the feature selection process, makingit more secure against evasion attacks on reduced fea-ture spaces. Robustness to random feature deletion attest time, caused by noise or by adversarial reverse en-gineering, was introduced in (Globerson and Roweis,2006). It was shown that the proposed methodologywas resistant to changes caused by random feature dele-tion and also to, a reasonable extent, those caused bydeletion based on feature importance. In (Colbaugh andGlass, 2012b) the predictive defense algorithm was pro-posed, which adds regularization terms in the min-maxformulation of the objective function, to model adver-sary actions in aggressively reduced feature spaces. Theobjective function optimized in (Colbaugh and Glass,2012b), is given by Equation 1.

minw

maxa

−α‖a‖3 + β‖w‖3 +∑

i

loss(yi,wT (xi + a))

(1)

Where, the loss function represent misclassificationrate and w represents the learned model of the defender.The attacker attempts to circumvent the defender by us-ing a linear transform vector a ∈ Rd. The terms α and βdenote the regularization imposed on attackers and de-fenders, respectively. The term α embodies the effort ofan attacker, which increases based on the distance fromexisting samples. The term β denotes the defenders needto avoid overfitting to the data. By incorporating bothregularization terms into the optimization function, thismethod makes it harder for an adversary to apply a sim-ple linear transformation on a few features of malicioussamples, to get them classified as Legitimate by the de-fender model. The use of this game theoretic formula-tion to the problem was shown to outperform a naiveBayes classifier, used as gold standard, for the task ofspam classification with reduced feature space.

While the above methods rely on adding robustnessto the training of a single classifier, it was shown in(Biggio et al., 2008, 2010a,b) that combining classi-fiers trained on different subsets of the data is more suit-able to the task of security. Multiple Classifier Systems(MCS)(Wozniak et al., 2014) allow classifiers trainedon different subsets of the feature space to be combinedin a natural way, to provide an overall robust classifi-cation prediction. In (Biggio et al., 2010a), instancebagging and random subspace methods were analyzed,as two MCS approaches, for securing against evasionattacks. It was found that both approaches make theevasion problem harder for the adversary. In particu-lar, random subspace models are more suitable when a

very small number of features exhibit high discriminat-ing capability, whereas bagging works better when theweights are already evenly distributed among the fea-tures. These methods essentially have the effect of av-eraging over the feature weights, as models trained ondifferent views of the problem space are aggregated topresent the final result(Biggio et al., 2010b). Heteroge-neous combination of one class and binary classifiers, toproduce an evasion resistant model, has also shown tobalance high accuracy with improved security, in worksof (Biggio et al., 2015).

2.2.2. Security based on disinformationThese methods rely on the concept of ‘Security by

obscurity’, and they are aimed at confusing the ad-versary about the internal state of the system (Biggioet al., 2014a). The inability of the adversary to accu-rately ascertain the internal state would then result inincreased delay/effort for performing the attack. Popu-lar principles for applying disinformation based securityare: Randomization (Huang et al., 2011), Use of honey-pots (Rowe et al., 2007), Information hiding (Abram-son, 2015) and Obfuscation (Barreno et al., 2006). Anexample obfuscation technique used in case of a classi-fier filter, is one where the classifier would output cor-rect prediction with a probability of 0.8, in an attemptto sacrifice accuracy to provide security. Ideally, hidingall information about the feature space and classifica-tion model could result in complete security. However,this goes against the Kerchkoff’s principles of informa-tion security (Kerckhoffs, 1883; Mrdovic and Perunicic,2008), which states that security should not rely overlyon obscurity and that one should always assume that theadversary knows the system.

When disinformation relies on obfuscation, theamount of information made available to the adversaryis not limited, but obtaining the information is madeharder (Xu et al., 2014). Randomization is a popularapproach to ensure that information presented is gar-bled. A simple strategy for introducing randomizationinto the classification process was proposed in (Barrenoet al., 2006), where it was suggested that randomnessbe introduced in placing the boundary of the classifier.This was made possible by using the confidence es-timate given by a probabilistic classifier, to randomlypick the class labels. While this increases the evasionefforts necessary, it leads to a drop in accuracy in nonadversarial environments. A detailed analysis of thistradeoff was presented in (Alabdulmohsin et al., 2014),where a family of robust SVM models were learned andused to increase reverse engineering effort.

The work in (Alabdulmohsin et al., 2014) showed

5

that choosing a single classifier is a suboptimal strat-egy, and that drawing classifiers from a distribution withlarge variance can improve resistance to reverse engi-neering, at little cost to the generalization performanceof the model. This idea has also been used in multipleclassifier systems, where a random classifier is selectedat any given time, from a bag of trained classifiers, toperform the classification. The moving target approachof (Colbaugh and Glass, 2012a), divides features into Ksubsets and trains one classifier on each of these sub-sets. One of these classifiers is chosen, according to ascheduling policy, to perform the classification of an in-coming sample x. It was shown that using a randomuniform scheduling policy provides the best protectionagainst an active adversary. This was formally provenin (Vorobeychik and Li, 2014), where it was shown thatthe optimal randomization strategy is to select classi-fiers randomly with equal probabilities. In case of tar-geted attacks, selecting uniformly was shown to be thebest strategy, whereas no randomization was the optimalstrategy in the face of indiscriminate attacks. Althoughthe ideas of (Colbaugh and Glass, 2012a) and (Vorob-eychik and Li, 2014) are applicable to multiple classi-fier systems, experimental results were shown only ona set of two separately trained classifiers. Obtainingmultiple sets of classifiers, with high accuracy, was notdiscussed. The work of (Biggio et al., 2008) uses themultiple classification system formulation, by maintain-ing a large set of classifier models and with randomiza-tion being performed, by assigning a different weight-ing scheme to the individual classifier’s prediction, forevery iteration. Experimentation was performed on theSpamassassin dataset, by pre-selecting 100 weightingschemes and allowing the classifier to choose any oneof them with equal probability, over different iterations.This resulted in increased hardness of evasion of theclassifier under simulated attacks.

While randomization is a popular method for incor-porating obscurity against evasion attacks, other meth-ods have been suggested (Biggio et al., 2014a). Lim-iting the feedback or providing incorrect feedback, tocombat probing attacks have been suggested in (Bar-reno et al., 2006). Use of honeypots is another promis-ing direction, wherein systems are designed for the solepurpose of being attacked and thereby providing infor-mation about attackers (Rowe et al., 2007). Use of so-cial honeypots was suggested in (Lee et al., 2010), asa means of collecting adversarial samples about socialspam. Bots which behave as humans are deployed inthe social cyber-space, specifically tested on Myspaceand Twitter data in (Lee et al., 2010), where they collectinformation about potential spammers.

2.3. Reactive/Dynamic approaches for security of ma-chine learning

Adversarial learning is a cyclic process, as describedin Figure 5 and in (Stein et al., 2011). Attacks are aquestion of when and not if. Proactive measures of se-curity delay the occurrence of attack, but eventually ev-ery system is vulnerable to attack. These measures donot provide any direction or solution for dealing with at-tacks, and for fixing the system for future use after it hasbeen attacked. They are static one shot solutions, whichdrop the ball as soon as a system is compromised. Reac-tive system on the other hand, can deal with attacks aftertheir onset and aim to fix the system as soon as possible.Work in (Barth et al., 2012) shows that, in the absenceof prior information about adversaries, reactive systemperforms as good as a proactive security setting and isa suggested security infrastructure for industrial scalesystems. Reactive systems can also ensure that attacksare recognized and alerted, to take appropriate countermeasures.

Reactive methods have primarily been studied in thedomain of streaming data mining, where changes to thedata, called concept drift, are detected and the system isadapted to maintain performance over time (Zliobaite,2010). In these techniques, the system is agnostic to thecause of change, and treats any degradation in its per-formance in a similar way. As such, they are reactivebut not necessarily adversary aware (Dalvi et al., 2004).The work in (Gama et al., 2014) summarizes the ma-jor frameworks, methodologies and algorithms devel-oped for handling concept drift. A trigger based conceptdrift handling approach (Ditzler et al., 2015) consists oftwo important components: a) A drift detection mod-ule, and b) A drift adaptation module. The drift detec-tion module is responsible for detecting changes in thedata and to signal further evaluation or adaptation basedon the detected changes. The adaptation module thenlaunches retraining of the system, by collecting newlabeled data and updating the model. Ensemble tech-niques are a popular choice for drift handling systems,as they allow modular retraining of the system (Sethiet al., 2016). Ensemble methods have been proposed fordrift detection (Kuncheva, 2008) and for learning withconcept drift (Brzezinski and Stefanowski, 2014; Minkuand Yao, 2012).

At the confluence of concept drift and adversar-ial learning, are the approaches of adversarial drift(Kantchelian et al., 2013) and adversarial active learn-ing (Miller et al., 2014). In (Kantchelian et al., 2013),concept drift caused as a result of attacks, was dis-cussed. The need for a responsive system, with the in-tegration of traditional blacklists and whitelists, along-

6

Figure 5: The Attack-Defense cycle between adversaries and systemdesigners (i.e., defenders).

side an ensemble of classifiers trained for every classof malicious activity, was proposed. Isolation of mali-cious campaign and techniques to ensure zero trainingerror, while still maintaining generalization, were ex-tensions suggested for traditional classification systemswhen used in an adversarial environment. The vulnera-bilities of the labeling process were analyzed in (Milleret al., 2014), which introduces the concept of adversar-ial active learning. The Security oriented Active Learn-ing Test bed (SALT) was proposed in (Miller et al.,2014), to ensure effective drift management, human in-tegration and querying strategies, in an adversarial envi-ronment. Use of concept drift tracking within malwarefamilies was analyzed in (Singh et al., 2012), to showthe temporal nature of adversarial samples. Metafea-tures (higher order difficult to evade) were suggested, todetect malicious activity. Use of an ensemble of clas-sifier to detect spam emails was suggested in (Chinavleet al., 2009), where mutual agreement of pairs of classi-fiers in the ensemble was tracked and concept drift wasdetected if the agreement drops. In the event of a driftdetection, poorly performing pairs of classifiers are se-lected to be retrained. Extension of this work was pro-posed in (Smutz and Stavrou, 2016), where the entireensemble’s disagreement score distribution was tracked.A sudden increase in the overall disagreement was usedto indicate an attack on the system, without using ex-ternal labeled data. Here, feature bagging (Bryll et al.,2003) was found to be an effective ensemble strategyto detect evasion attacks, such as mimicry and reversemimicry attacks, on the task of classifying malicious pdfdocuments. These works provide initial ideas for usingmachine learning in an adversarial environment, wherethe data is streaming and the attacks-defense is a cyclicnever-ending process (Figure 5).

The reactive approaches have been well studied to-wards an online never ending learning scheme. How-ever, the current methods for reactive security do notexplicitly consider an adversarial environment, eventhough a dynamic environment is considered. In ad-

versarial domains, concept drift (attacks) are a func-tion of the deployed model itself, as reverse engineer-ing is based on what model is learned in the first place(Barreno et al., 2006). This information can be usedto design more suited reactive systems, in adversar-ial domains. A combination of proactive and reactiveapproaches are necessary when dealing with such do-mains, as decisions made in the design phase could ulti-mately make future steps down the pipeline easier. Fur-ther research in this area will need a comprehensive lookat the learning process, with adversarial effects consid-ered at every step of the process, towards an adversar-ial aware dynamic learning system. This will requirenewer metrics of measuring system performance, be-yond accuracy and f-measure, towards metrics such asrecovery time and data separability (Mthembu and Mar-wala, 2008), which are necessary for subsequent cyclesof the process. In this paper, we look at the Dynamic-Adversarial nature of the security of machine learning.To this end, we analyze the effect of various designstrategies, on the long term security of a system.

3. Adversarial Uncertainty - On the ability to reactto attacks in Dynamic - Adversarial environments

Understanding the impact of attacks and defense ina cyclic environment, where a system is attacked andneeds to recover from it periodically, requires a new per-spective on the problem and the metrics used for eval-uating the different design strategies. The metric of at-tack evasion rate (Lowd and Meek, 2005; Sethi et al.,2017) is sufficient for evaluating robustness of modelsto the onset of exploratory attacks, but it does not pro-vide intuition about the effectiveness of strategies whichare designed for reacting to attacks in a dynamic en-vironment. In this case, we need a metric which canconvey information about the ability to react to attacksand continue operations, following the onset of attacks.In this section, the idea of Adversarial Uncertainty isintroduced, to understand the impact of different clas-sifier design strategies on the ability to react to attacks,once they start to impact a system’s performance. Themotivation behind adversarial uncertainty is presentedin Section 3.1. The impact of the defender’s classi-fier design on adversarial uncertainty, is presented us-ing an illustrative example of binary feature valued datain Section 3.2. A heuristic method to measure adver-sarial uncertainty is presented in Section 3.3, by meansof introducing the Adversarial Margin Density (AMD)measure. The ability of adversaries to utilize the avail-able probe-able information, from the defender’s blackbox model, in order to launch high confidence attacks, is

7

demonstrated in Section 3.4 and Section 3.4.1. This isdone to analyze the impact of a determined adversary,and to be able to design secure classifiers to counterthem.

3.1. MotivationThe defender extracts information about the predic-

tive problem, by analyzing and learning from the train-ing data. The adversary on the other hand, obtains its in-formation by probing the deployed black box model ofthe defender. As such, there is a gap between the infor-mation held by the two players. This is not a significantconcern when the adversary is aiming to only evade thedeployed model, as complete understanding of the fea-ture space is often not needed to evade most robust clas-sifier models. However, this information deficit/gap isnecessary to evaluate the impact an adversary can haveon the reactive capabilities of a defender. Adversarialuncertainty refers to the uncertainty on the part of theadversary, due to the unavailability of the original train-ing data.

To understand the impact of adversarial uncertainty,consider the following toy example: a 2-dimensional bi-nary dataset, where the sample L(X1=1, X2=1) repre-sents the Legitimate class training sample and M(X1=0,X2=0) represents the Malicious class samples. A modeltrained as C : X1 ∨ X2 provides generalization capabil-ities, by allowing (0,1) and (1,0) to be considered as le-gitimate samples at test time. In this case, an adversarylooking to evade C, can pick one of the following threeattack samples at test time: (1,0), (0,1), (1,1). While anyof these samples will lead to a successful evasion on theadversary’s part, only one is truly devastating for a reac-tive system. The adversarial sample (1,1) will make thedefender’s model ineffective, as it completely mimicsthe training sample L. However, in this case, the prob-ability of selecting this sample is 1/3, as an adversaryis not certain about the exact impact of the features X1and X2, on C. This uncertainty on the part of the adver-sary is referred to as adversarial uncertainty. In this ex-ample, adversarial uncertainty will enable the defenderto recover from attacks 2/3 times, as the attack sample(1,0) can be thwarted by an updated model C′ : X2, andthe sample (0,1) can be thwarted by the model C′ : X1.

The attack sample (1,1) is an example of a data nulli-fication attack (Kantchelian et al., 2013). Data nullifica-tion leaves the defender unable to use the same trainingdata, to continue operating in a dynamic environment.An illustration of data nullification attacks on numer-ical data spaces is shown in Figure 6. In this figure,the attacks samples generated, overlap with the origi-nal space of legitimate data samples. As such, the de-

Figure 6: Illustration of data nullification attacks on the space of le-gitimate samples (blue).

fender will be unable to train a high quality classifica-tion model, which can effectively separate the two classof samples, while using the existing set of features. Re-covering from data nullification attacks will require thedefender to generate additional features and collect newtraining data, both of which are time taking and expen-sive operations. Also, such attacks will be harder to de-tect via unsupervised techniques (Sethi and Kantardzic,2017b), because the legitimate and the attack sampleshave a high degree of similarity and proximity, in thefeature space. It is therefore required to ward of datanullification attacks, to be able to effectively detect andrecover from attacks. Adversarial uncertainty providesa heuristic indication of the inability of the adversary, todeduce with confidence the exact impact of the variousfeatures on the prediction problem. A high adversar-ial uncertainty will lead to a lower probability of a datanullification attack.

From a dynamic data perspective, the ability to detectattacks and being able to retrain the classifier, are impor-tant characteristics of the system. This is possible onlyif the original legitimate training data is not corrupted byan adversary at test time (i.e., data nullification attacksare prevented). Data nullification attacks are possibleif an adversary is able to simultaneously and success-fully reverse engineer the impact of the entire featureset, on the defender’s classification model. This is a re-sult of an adversary’s confidence in the impact of thedifferent features on the prediction task, which it ob-tains via exploration of the deployed black box model.A highly confident adversary can not only evade the de-ployed classifier, but can also avoid detection by unsu-pervised distribution tracking methodologies (Sethi andKantardzic, 2017b). Thus, it is essential to evaluate the

8

impact of the various defender strategies, on their abilityto ensure that they do not leak excessive information toan adversary (i.e., their ability to ensure high adversarialuncertainty).

3.2. Impact of classifier design on adversarial uncer-tainty

Adversarial activity and capabilities are a functionof the deployed black box classifier, which is beingevaded. The adversary’s perception of the predictionspace is directed by the feedback on probes submittedto the defender’s model. As such, the defender’s clas-sifier design has a certain degree of control in defin-ing the range of attacks, that can be launched againstit. Classifier design strategies range from restrictiveone class classifiers (Salem et al., 2008; Onoda and Ki-uchi, 2012), to robust majority voted classifier (Glober-son and Roweis, 2006; Liu et al., 2012), and random-ization based feature bagged ensembles (Colbaugh andGlass, 2012a; Biggio et al., 2008). From an adversarialperspective, the selection of classifier learning strategiescan have a significant impact on adversarial uncertainty,and on the hardness of evasion. The impact of popularclassifier design strategies, based on research directionsin security of machine learning, is illustrated here us-ing an example of a N-dimensional binary feature space.We consider the training dataset to be comprised of oneLegitimate class sample L(1,1,...(N features),1) and oneMalicious class sample M(0,0, ....(N features),0). Assuch, the features 1 to N are all informative in discrim-inating between the two classes of samples. The effec-tive usage of this orthogonal information, leads to thevarious design strategies, as shown in Table 1. Sincethis is meant to be an illustrative example, we considerthe impact of these designs against a random probingbased attack strategy, where the attacker tries differentpermutations of the N binary variables, to understandthe behavior of the black box classifier.

The Simple model strategy of Table 1, represents aclassifier design which emphasizes feature reduction inits learning phase. An example of such a classifieris a C4.5 Decision tree (Quinlan, 1993) or a LinearSVM with L1-regularization penalty (Chang and Lin,2011), which gravitate towards simpler model represen-tations. A representative model in this case, is given byC : Xi = 1, as the learned classification rule. The prob-ability that a randomly generated N-dimensional probesample will evade C is 0.5, as the prediction space isdivided into two halves, on this feature Xi. The adver-sarial certainty (Table 1) refers to the confidence that anadversary has about the defender’s training data, giventhat an attack sample is successful in evading C. In the

case of the simple model, the adversary is fairly uncer-tain, as the training data sample L could be any one ofthe 2N−1 probe samples, which provide successful eva-sion.

The Complex models rely on aggregating feature in-formation, to make the learned models more complex(i.e., have more coefficients and important features).The one class classifier strategy is a complex modeland comes in two flavors: a restrictive boundary aroundthe legitimate training data samples, and a restrictiveboundary around the malicious training data samples.In the former case, the model is the hardest to evade,as an adversary will need to simultaneously evade all Nfeatures, to gain access to the system. This provides ad-ditional security from attack onset, by reducing evasionprobability to 1 in 2N . However, it also leads to an ad-versarial certainty of 100%, implying that any success-ful attack will lead to a data nullification and subsequentinability of the defender to recover from such attacks.Thus, as opposed to common belief in cybersecurity(Onoda and Kiuchi, 2012), a more restrictive classifierscould be a bad strategy, when considering a dynamicand adversarial environment. The one class model onthe set of malicious training data, represents the otherend of the spectrum, where a boundary is drawn aroundthe malicious class samples. This model makes evasioneasier, but ensures high adversarial uncertainty. This isalso undesirable, as evasion can be performed by chang-ing any of the N features. It could however benefit de-fenders facing multiple disjoint adversaries (Kantche-lian et al., 2013; Sculley et al., 2011).

A popular design strategy in adversarial domains, fo-cuses on integrating multiple orthogonal feature spaceinformation to make a complex model, which is robustto changes in a subset of the features (Biggio et al.,2008, 2010a,b). A feature bagging ensemble, whichuses majority voting on its component model’s predic-tions, is a popular choice in this category. This modelis resilient to attacks which affect only a few featuresat any given time. Most robust learning strategies wereevaluated against a targeted exploration attack, wherethe adversary starts with a predetermined attack sam-ple and aims to minimally alter it, so as to evade theclassifier C. By measuring adversarial cost in termsof the number of features which need evasion, the effi-cacy of feature bagged models was demonstrated (Lowdand Meek, 2005), as robust models by design will re-quire a majority of the features to be modified, for asuccessful evasion. This reasoning is not valid for thecase of an indiscriminate exploratory attack, where anadversary is interested in generating any sample whichevades C without any predefined set of attack samples

9

Table 1: Impact of classifier design on evasion probability and adversarial certainty.

ClassifierModel (C)

Modelrepresentation

Evasionprobability

AdversarialCertainty

Simple model(e.g., C4.5 Decision tree) Xi = 1 2N−1/2N = 1/2 1/2N−1

Complex model -One class Legitimate

∧i∈N Xi = 1 1/2N 1/1=1

Complex model -One class Malicious

∨i∈N Xi = 1 (2N − 1)/2N 1/(2N − 1)

Complex model - Feature baggedrobust model (e.g., random subspacemajority voted ensemble)

∨s⊂N

∧i∈s Xi = 1

∑Ni= N

2 +1

(Ni

)/2N = 1/2∗ 1/2N−1

Randomized model - Feature baggedrobust model with majority voting andrandom selection

∧i∈s Xi = 1; s ⊂ N 1/2N/2 1/2(N/2)

∗∑N

i= N2 +1

(Ni

)=

∑Ni=0

(Ni

)/2, i f N is even = 2N/2 = 2N−1

(Sethi et al., 2017). In this setting, the impact of featurebagged ensemble is similar to that of a simple model, asseen from the evasion probability and adversarial cer-tainty computations of Table 1.

An additional design strategy, relies on randomnessto provide protection. By training models on multipledifferent subspace of the data and then randomly choos-ing one of the models to provide the prediction at anygiven time, these model aims to mislead attackers, byobscuring the feedback from the black box model C.Although this strategy has a higher perceived evasion re-sistance (Table 1), security through obscurity has shownto be ineffective when faced with indiscriminate probingbased attacks (Vorobeychik and Li, 2014).

3.3. Using Adversarial Margin Density (AMD) to ap-proximate adversarial uncertainty

The use of adversarial certainty in Table 1, was il-lustrated by using a random probing attack on a binaryfeature space. Here, the idea of adversarial uncertaintyis expanded and a methodology for its computation andusage is presented, for empirical evaluation.

Adversarial uncertainty is essentially the uncertaintyon the adversary’s part, due to the non availability of theoriginal training dataset. For the task of classification,this uncertainty refers to the inability of the adversary tosuccessfully reverse engineer the impact of all the im-portant features, on the prediction task. We measurethis uncertainty using the notion of Adversarial Mar-gin Density (AMD), as defined below. The concept ofmargin density was first introduced in (Sethi and Kan-tardzic, 2015, 2017b), where it was used to measure theuncertainty in data samples, as a method to detect driftsfrom unlabeled data. It was shown that robust classi-fiers defined regions of uncertainties (called margins or

blindspots), where samples fall as a result of partial fea-ture space disagreement. We extend this notion to thedomain of adversarial evaluation, by using it to defineadversarial uncertainty. An attacker that impacts only aminimal set of feature, so as to be sufficient to evade thedefender’s classifier C, will lead to a large margin den-sity. This is due to the increased disagreement betweenorthogonal models, trained from the original trainingdataset. Conversely, an attacker with a low margin den-sity is indicative of one who was successful in evading alarge set of informative features. To this end, the notionof Adversarial Margin Density (AMD) is defined here,to capture adversarial uncertainty on the set of evadedfeatures.

Definition 3.1. Adversarial Margin Density (AMD):The expected number of successful attack samples (at-tack samples classified by the defender as Legitimate),which fall within the defender’s margin (i.e., the regionof critical uncertainty), defined by a robust classifier(one that distributes feature weights) on the trainingdata.

The definition of AMD highlights the following twoconcepts: a) The AMD is computed only on the attacksamples which successfully evade the classifier C, andb) We measure only the region of critical uncertainty(margin/blindspots), predefined for a classifier trainedon the original training data. The AMD measures theuncertainty by quantifying the attack samples whichevade the classifier C, but do not have high certaintyabout the entire feature space. This is demonstrated inFigure 7, where the adversarial margin is given by theregion of space where the defender C is evaded, but thefeature X2 is still not successfully reverse engineeredby the adversary. This causes the attacks to fall withinblindspots of a robust classifier (i.e., one that distributes

10

Figure 7: Illustration of adversarial margin. It is given by the regionof space which leads to evasion of defender’s classifier C, but does notlead to evasion of all the informative features.

feature weights). For an attack to have a low uncer-tainty, it will have to avoid falling in the blindspots,which is possible only if an attacker can successfullyevade a significant portion of the feature space (givenby critical uncertainty), simultaneously.

The idea behind the AMD is presented in Defini-tion 3.1. An implementation methodology which canbe used to compute AMD is presented next. A randomsubspace ensemble is considered for defining the robustclassifier, over the training dataset, as this classifier isgeneral in its design and does not make assumptionsabout usage of any specific classifier type (Ho, 1998).The AMD is computed using Equation 2.

AMD =

∑S E(x)|x|

; ∀x ∈ XAttack : C(x) is Legitimate

where,

S E(x) =

1, i f∣∣∣pE(yLegitimate|x) − pE(yMalicious|x)

∣∣∣ ≤ θMargin

0, otherwise(2)

The AMD is measured only on the attack samples(XAttack) which successfully evade the defender’s clas-sifier C, at test time. These set of samples are evaluatedto determine if they fall within the region of critical un-certainty given by the parameter θMargin. By adjustingthe parameter, the sensitivity of measurements and tol-erance for adversarial activity, can be tuned. A valueof 0.5 is typically considered effective for most sce-narios (Sethi and Kantardzic, 2017b). Samples fallingin this region of high disagreement (given by param-eter θMargin) are considered to be cases where there iscritical uncertainty between the constituent models of

the ensemble. Since the ensemble is trained as a fea-ture bagged model, this disagreement is due to differ-ent feature value distributions, than the benign trainingsamples, caused by adversarial uncertainty. The ensem-ble E is taken to be a random subspace ensemble. Inthe experimentation here, we consider a feature baggedensemble of 50 base Linear-SVM models, each with50% of the total features, randomly picked. In Equa-tion 2, pE(y|x) is obtained via majority voting on theconstituent models, and represents disagreement scoresfor each sample x. The AMD is always a ratio between0 and 1, and a higher value indicates a high adversarialuncertainty. By extension, a higher AMD also indicatesan increased ease of unsupervised detection, and subse-quent recovery from attacks.

3.4. Evaluating dynamic effects of evasion by a sophis-ticated adversary

To fully understand the capabilities of an adversary, itis necessary to assume that we are facing a strong adver-sary, who uses all information and tools at its disposal.To this end, we extend the Anchor Points(AP) attackframework of (Sethi et al., 2017), to simulate a high im-pact sophisticated adversary, which utilizes its probingability to generate a high confidence attack on the de-fender’s model. While the framework of (Sethi et al.,2017) was developed as a proof of concept, to demon-strate the vulnerability of classifiers to evasion attacks,the proposed attack in this section focuses on the abilityof an attacker to not only evade but also cause long termdamages, by preventing attacks from being detected orrecovered from. Attacks of high adversarial certaintyare simulated, which are capable of using probed datato fall outside the margins of the defender. This exten-sion of the AP framework, will allow for testing againsta sophisticated adversary, and will enable us to betterunderstand the severity of exploratory attacks possibleat test time. Since this extension maintains the origi-nal black box assumptions of the defender’s model, itprovides for analyzing the impact of severe exploratoryattacks, which are possible to carry out against a classi-fier, by an adversary.

The proposed extension to the AP framework is de-picted in Figure 8. Specifically, the filter strategy is de-veloped as an extension to the Anchor Points attacks(AP) of (Sethi et al., 2017). The anchor points explo-ration samples obtained are first sent to a filter phase,where the low confidence samples are eliminated, be-fore the exploitation phase starts. This filtering leads toa reduced size of exploration samples, for which the ad-versary has high confidence that they do not fall in thedefender’s margins. This approach is called the Anchor

11

Figure 8: AP based evasion attack framework, with a high confidencefilter phase.

Points - High Confidence (AP-HC) strategy, and it is de-veloped as a wrapper over the AP framework, making iteasy for extension to other data driven attack strategiesas well. The AP-HC strategy will simulate adversarieswho are capable of utilizing all the probe-able informa-tion, to launch attacks which generate low AdversarialMargin Density (AMD), on the defender’s part.

In the filtering phase of the Figure 8, the adversary re-lies on stitching together information made available bythe defender’s black box, to launch attacks which evadea large number of features simultaneously. The adver-sary does so by integrating information learned in theexploration phase, about the impact of different subsetsof features, on the prediction outcome. It then filters outsamples which are good for evasion, but only result inevading a small subset of features. This will result in anattack exploitation, which would provide to high adver-sarial certainty. The idea is illustrated in Figure 9, on a2D synthetic dataset. In this example, there are two setsof features X1 and X2, both of which are informativefor the classification task, as X1 > 0.5 or X2 > 0.5 bothlead to the high quality defender models. Consider thedefender’s model to be ′X1 > 0.5 ∨ X2 > 0.5′, whichleads to an adversarial uncertainty of 1/3, since the train-ing data could be in any of the 3 quadrants where thesamples are perceived as Legitimate. A naive adver-sary using the AP framework, can launch an attack us-ing samples where atleast one of the two feature X1 orX2 is greater than 0.5. However, an adversary using thehigh confidence filter attack will combine this orthogo-nal information and launch attack samples only if bothconditions are satisfied (i.e., X1 > 0.5 and X2 > 0.5). In

doing so, the adversary is learning the orthogonal infor-mation about the prediction landscape and aggregatingthis information to avoid detection by the defender.

The high confidence filtering strategy illustrated inFigure 9, where the initial explored samples for the APattacks (b), are filtered using the aforementioned infor-mation aggregation technique, to generate a high con-fidence set of exploration samples (c). The adversarydoes so by using the exploration samples in b), to trainorthogonal models, and admit only those samples whichhave high consensus (low uncertainty), based on thetrained model. The filtered exploration samples are thenused in the exploitation phase, and as can be seen in Fig-ure 9, these attacks have an AMD of 0, as none of theattack samples fall inside the margin of the classifier.

In the analysis presented in Table 1, it was shown thatcomplex feature bagged ensembles models have a lowadversarial certainty (1/2N−1, for N dimensional binaryfeature space). This made it more secure in dynamic en-vironments, when compared to the restrictive one classclassifier model, where the adversarial certainty was 1.However, similar to a one class classifier, the robust en-semble model also makes a majority of the informationavailable, to be probed by a patient adversary, as demon-strated in Figure 9. This information is not directlyavailable (as in the case of a one class classifier, whereevasion leads to an adversarial uncertainty of 0), anddoes not directly lead an adversary to the space of train-ing data. However, the AP-HC filtering step can causethe adversary to stitch together information made avail-able by such ensemble models, to launch a potent attackwhich could leave the defender helpless. A generic ap-proach to extend the intuition of Figure 9, to high di-mensional spaces, is presented in the following section.

3.4.1. The Anchor Points - High Confidence (AP-HC)approach

The AP-HC attack strategy is presented in Algo-rithm 1. The algorithm receives the exploration sam-ples from the AP framework, and then uses the filteringapproach of Figure 9, to clear out samples of low confi-dence. The adversary does so by training a robust clas-sifier, such as a random subspace ensemble, from theprobed exploration samples. The trained model is thenused to identify samples of low confidence, as these arethe ones which have low consensus among the featurebagged ensemble’s models. The resulting set of filteredexploration samples DExplore−HC , from Algorithm 1, isthen used in the exploitation phase of the AP frame-work, to launch the attacks. This methodology is im-plemented completely on the adversary’s side, whilemaintaining the same black box assumption about the

12

Figure 9: Reducing adversarial uncertainty via filtering, in anchor points attacks. After the exploration phase in b), the adversary trains models onindividual feature subspaces (X1 and X2). It then aggregates this information to clean out samples in low confidence areas (C:X1, C:X2). Thefinal set of filtered high confidence samples are shown in d).

Algorithm 1: AP based high confidence filter at-tacks (AP-HC).

Input : Exploration Samples from AP framework- DExplore, Filter thresholdθAdversary con f idence

Output: High confidence exploration samplesDExplore−HC

1 E← Train random subspace samples from DExplore

2 DExplore−HC = ∅

3 for sample in DExplore do4 if |pE(yLegitimate|sample) −

pE(yMalicious|sample)| ≥ θAdversary con f idence

then5 DExplore−HC ∪ sample

6 return DExplore−HC

defender, as presented by the AP framework in (Sethiet al., 2017). As such, it serves as a methodology forthorough analysis of adversarial capabilities, to betterdesign secure machine learning frameworks. The pur-pose of the AP-HC attack approach is to highlight theeffects of making excessive information available to theadversary, and the resulting space of possible attackswhich could be launched by it, using purely data drivenmethodologies.

The parameter θAdversary con f idence, controls the fil-tering operation in Algorithm 1. Since an adver-sary is interested only in high confidence attack sam-ples, we consider a high confidence threshold ofθAdversary con f idence=0.8. A higher confidence thresholdwill allow for more stringent filtering of samples. Thisheuristic filtering approach will be used to demonstratethe capabilities of an adversary to evade detection, bygaining more certainty about the location of the training

data. This approach will be used to highlight innate vul-nerabilities in seemingly secure designs, such as the ro-bust feature bagged ensemble strategy, and will be usedfor thorough analysis of the impact of classifier designon adversarial capability.

4. Experimental evaluation and analysis

In this section, the impact of different classifier de-sign strategies, on adversarial capabilities, is evaluated.Moving beyond accuracy, the effects of attacks from adynamic-adversarial perspective, is analyzed. The ideaof adversarial margin density is considered, to accountfor detect-ability and retrain-ability from attacks. Theadversary is considered to be capable of using machinelearning to meet its needs. In Section 4.1, the protocoland setup of experiments performed in this section, ispresented. Effect of using a restrictive one class modelfor the defender is presented in Section 4.2. Impact ofusing a robust feature bagged ensemble and that of usingrandomization based models in presented in Section 4.3and Section 4.4, respectively. Discussion and analysisis presented in Section 4.5.

4.1. Experimental setup and protocol

The basic Anchor Points framework of (Sethi et al.,2017), is considered for generating attacks on the de-fender’s black box model, with the modifications pre-sented in Section 3.4.1. The parameters of the APattacks are taken without any modifications, to en-sure consistent analysis and extension of the attackparadigms. The defender is considered to be a blackbox by the adversary, with no information about its in-ternal working available. The only interaction the at-tacker has with the defender, is by means of submitting

13

probe samples, and receiving tacit Accept/Reject feed-back on them.

Adversarial margin density proposed in Section 3.3,is used for evaluating uncertainty of the attacks. AθMargin=0.5, for measuring the AMD (as motivated byanalysis in (Sethi and Kantardzic, 2017b)), is consid-ered. A random subspace model with 50 Linear SVMs(regularization constant c=1, and 50% of the features ineach model), is taken for the AMD computation. Also,the effects of filtering by an adversary is evaluated inthis section, with a θAdversary con f idence=0.8. A randomsubspace model with 50 Linear SVMs (regularizationconstant c=10, and 50% of the features in each model),is chosen for the filtering task. A high regularizationconstant ensures that the models do not overfit to theexploration samples, as the adversary’s goal is not to fitthe model to the explored samples, but to learn from itabout the region of high confidence. This also makesit more robust to black box feedback noise and strayprobes.

Description of datasets used for evaluation is pre-sented in Table 2. The synthetic datasets is a 10 dimen-sional dataset, with two classes. The Legitimate classis normally distributed with a µ=0.75 and σ=0.05, andthe Malicious class is centered at µ=0.25 and σ=0.05,across all 10 dimensions. This datasets provides forcontrolled experimentation and analysis, by exhibiting10 informative features. The CAPTCHA dataset istaken from (DSouza, 2014), and it represents the taskof classifying mouse movement data for humans andbots, for the task of behavioral authentication. Thephishing dataset is taken from (Lichman, 2013), and itrepresents characteristics for malicious and benign webpages. The digits dataset (Lichman, 2013) was takento represent a standard classification task. The multidi-mensional dataset was reduced to a binary class problemwith the class 1 and 7 taken for the Digits17 dataset, andthe class 0 and 8 taken for the Digits08 dataset, respec-tively. In all datasets, the class 0 was considered to rep-resent the Legitimate and 1 was taken as the Maliciousclass. For Digits17, the class 7 is considered to be Le-gitimate, and for the Digits08, class 0 is considered Le-gitimate. The data was normalized to the range of [0,1],using min-max normalization, and the features were re-duced to a numeric/binary type. The records were shuf-fled, to eliminate stray effects of concept drifts withinthem. All experiments are repeated 30 times and aver-age values are reported. The experiments are performedusing python and the scikit-learn machine learning li-brary (Pedregosa et al., 2011).

Table 2: Description of datasets used for experimentation.

Dataset #Instances #FeaturesSynthetic 500 10CAPTCHA 1886 26Phishing 11055 46Digits08 1499 16Digits17 1557 16

4.2. Analyzing effects of using a restrictive one-classclassifier for the defender’s model

In adversarial domains, restricting the space of sam-ples classified as Legitimate, is considered an effectivedefense strategy (Onoda and Kiuchi, 2012; Biggio et al.,2015). By tightly restricting what data points are qual-ified as legitimate, the ability of random probing basedattacks is significantly reduced. This is a consequenceof the reduced probe-able feature space area, as shownin Figure 10, where a one class classifier is trained onthe set of legitimate training data points. The feedbackobtained from the defender’s model, by probing the fea-ture space, is shown. The significantly smaller area ofthe blue feedback (Figure 10), is what makes one classclassifiers harder to probe and reverse engineer. How-ever, the one class classifier has limitation, which makethem unsuitable for an adversarial domain. Firstly, aone class classifier sacrifices generalization ability andit is not always possible to train an accurate model overthe available training data (Biggio et al., 2015). Sec-ondly, as seen in our analysis in Table 1, a one classclassifier could cause high adversarial certainty, mak-ing attack detection and recovery difficult. Most workson the security of classifiers advocate inclusion of allorthogonal information, to make the system more re-strictive and therefore more secure (Onoda and Kiuchi,2012; Biggio et al., 2015; Papernot et al., 2016c; Rndicand Laskov, 2014; Wang, 2015; Biggio et al., 2010a).However, these works approach the security of the sys-tem from a static perspective. Evaluating a one classclassifier provides valuable insights into the inefficacyof using restrictive models, in a dynamic adversarial en-vironment. While recent works present various novelideas for integrating feature information (Biggio et al.,2015; Wozniak et al., 2014), a one-class classifier servesas an ideal representative approach, in which all infor-mation across all features is used in securing the system.

In a dynamic environment, it is necessary to main-tain adversarial uncertainty, so as to ensure that attackscan be recovered from. A one-class classifier, beingoverly restrictive, leads an attacker directly to the spaceof the legitimate training data. Although this classi-

14

Figure 10: Illustration of prediction landscape of a one-class classifier.Smaller area of the legitimate samples indicate the resilience againstprobing based attacks.

fier design makes the adversary expend greater effort toevade the system, once evaded the attacks are indistin-guishable from the benign traffic entering the system.This is illustrated in case of a 2D synthetic data in Fig-ure 11. Here, the anchor points attack is used to gen-erate attacks, on two different classifier designs: a) Aone class classifier on the legitimate training data (SVMwith parameters: ν=0.1, RBF kernel and γ=0.1), and b)A two class linear classifier model (Linear SVM withL2-regularization, c=1). The attack used 20 samples forexploration (BExplore) and generated 40 attack samples,in each case (Sethi and Kantardzic, 2017a). It is seenin Figure 11, that the attack leads to a large number ofsamples occupying the same region as that of the legit-imate samples, in case of the restrictive one class de-fender model (Figure 11a)). This will cause problemsin a dynamic environment, as retraining to thwart at-tacks is close to impossible in this case. Also, detect-ing such attacks is difficult, due to the increased simi-larity with benign input. In case of the two class model(Figure 11b)), a larger data space is perceived as legiti-mate, due to the generalization provided by these mod-els, leading to attacks which are farther from the train-ing data space. This illustrates the ability of classifiermodel designs, to influence the severity of attacks onthe system, and to cause higher adversarial uncertainty.

Experimental evaluation of the synthetic 2D dataset,and two additional datasets from Table 2, is presentedin Table 3. Here, the metric of Effective Attack Rate(EAR) is taken to measure the vulnerability of the thedefender’s model, to anchor points attacks. EAR mea-sures the percentage of attack samples which are incor-rectly classified by the defender’s classifier model (Sethiet al., 2017). Additionally the following two metrics areintroduced, to measure the effectiveness of attacks in adynamic environment - Data Leakage and the Adver-sarial Margin Density (AMD). Adversarial margin den-sity was introduced in Section 3.3, as a measure for ap-proximating adversarial uncertainty over the defender’s

(a) Use of a one class classifier by the defender causes reduced at-tack rate, but leads to increased training data corruption and leak-age.

(b) Two class classifiers ensures less data leakage, but makes eva-sion easier.

Figure 11: Illustration of AP attacks on a synthetic 2D dataset, usinga restrictive one class classifier and a generalized two class classifiers,for the defender’s model.

training data. Data leakage is introduced here as an ad-hoc metric for evaluating the loss of private data, in aone class classification setting. Data leakage is mea-sured by developing a one class classifier on the spaceof legitimate training samples, and then measuring thenumber of attack samples which are incorrectly classi-fied by this classifier. Data leak is used to measure theproximity of the attack samples, to the original spaceof legitimate training samples, with a large value indi-cating the ability of the adversary to closely mimic thelegitimate samples. A one class SVM model (parame-ters: ν=0.1, RBF kernel, γ=0.1), is used to measure thedata leakage metric. The metrics were computed for thecase of the one class and the two class defender’s model,as shown in Table 3.

The Effective Attack Rate (EAR), is significantlylower for the one class classifier (∆=0.46 on average,from Table 3). This is a result of the stricter criterion forinclusion into the legitimate space, as imposed by thecomplex and restrictive classifier boundary. However,this comes at the cost of an increased possibility of dataleakage and lower adversarial uncertainty. From Ta-ble 3, it can be seen that the data leak increases sharplyfor the one class classifier (∆=0.49, on average), as therestrictive nature of the classifiers leads the adversary to

15

Table 3: Results of experimentation on one-class and two-class defender models. Training accuracy, Effective Attack Rate (EAR), Data Leakage(DL) and Adversarial Margin Density (AMD), are presented for comparison.

Dataset Defender’s ModelTrainingAccuracy EAR Data Leak

Adversarial MarginDensity (AMD)

Synthetic(2 features)

Restrictive one class 97.4 0.56 0.67 0Robust two class 100 0.89 0 0.85

Synthetic(10 features)

Restrictive one class 97.6 0.04 0.67 0Robust two class 100 0.99 0.1 0.28

CAPTCHA Restrictive one class 96.4 0.88 0.23 0Robust two class 100 0.99 0.01 0.14

the training data samples. This causes problems with re-training, loss of privacy, loss of clean training data, andissues with unsupervised attack detection. The data leakmetric provides a heuristic way to measure the severityof data nullification attacks. However, the data leak met-ric is difficult to compute for high dimensional datasets,due to the inability to train an accurate one class classi-fier, which is needed to measure the data leakage. Forthis reason the other datasets of Table 2, are not usedfor the analysis in this section. Also, these datasets pro-vided low training data accuracy when using a one classclassifier, making it unsuitable as a choice for the de-fender’s model. Going forward, the adversarial margindensity metric will be used, to indicate the strategic ad-vantage of the defender over the adversary, in detectingattacks and relearning from them.

The adversarial margin density (AMD) is more gen-eral in its applicability, when compared to the data leakmeasure, and provides a way to indirectly measure ad-versarial uncertainty and the severity of attacks in dy-namic environments. From Table 3, it is seen that theAMD is 0 for all 3 datasets, when using a one-class de-fender model. This is due to the increased confidencein the location of the training data, once the restric-tive model is evaded. Although the one class classi-fier is secure, as it ensures a low EAR, it leaves thedefender helpless once an attack starts. As such, design-ing for a dynamic environment requires forethought onthe defender’s part. The experimentation in this sectionwas presented for illustrating the ill effects of using anoverly restrictive model for the defender, and the needto reevaluate the notion of security which relies on gen-erating complex learning models.

4.3. Analyzing effects of using a robust feature-baggedensemble for the defender’s model

Robust models, which advocate complex models in-volving a large number of informative features, areconsidered to be effective in safeguarding against tar-geted exploratory attacks (Papernot et al., 2016c; Rndic

and Laskov, 2014; Wang, 2015; Biggio et al., 2010a,b;Wozniak et al., 2014; Stevens and Lowd, 2013). Inthese attacks, the adversary starts with a predefined setof attack samples, and intends to minimally modify it,so as to evade the defender. Robust models requirethat a majority of the feature values be mimicked bythe adversary, increasing the cost and effort needed tocarry out attacks. However, in the case of an indis-criminate exploratory attacks (Sethi et al., 2017), thissame line of reasoning is not valid, as an attacker isinterested in any sample that causes evasion. In theseattacks, the ease of evasion is given by the effectivespace of prediction, which is recognized as Legitimateby the defender. This is because an adversary launch-ing indiscriminate attacks does so by probing the blackbox model to find samples which would be classifiedas legitimate. A robust model, such as a Linear SVMwith L2-regularization, incorporates information con-veyed by a majority of the features, from the trainingdataset, into the learned model. It does so to makea model robust to stray changes, and also to generatewide generalization margins, for better test time predic-tive performance. In an adversarial environment, theeffective area of legitimate samples conveyed by a ro-bust model, is similar to that of a simple model (whichaims at reducing the number of features in the model),as shown in Figure 12. In Figure 12, a non robust linearSVM (with L1-regularization) is considered alongsidea robust linear SVM (with L2-regularization). The ef-fective area of legitimate samples (blue) is the same inboth cases. This makes the two strategies equivalent interms of securing against indiscriminate probing basedevasion attacks.

The equivalence of robust and non robust models, toindiscriminate probing attacks, is further analyzed byevaluating on 5 datasets in Table 4. The Anchor Pointsattacks (AP), are performed on two sets of classifiers: a)A robust classifier - Random subspace ensemble with 50linear SVMs (L1-regularized, each with 50% of the fea-tures randomly selected), and b) A non robust classifier-

16

Figure 12: Illustration of prediction landscape using simple and ro-bust models, on 2D synthetic data. Left- Initial training data, of thedefender. Center- L1-regularized linear SVM model for the defender(Non robust). Right- L2-regularized linear SVM model for the de-fender (robust).

Single linear SVM (with L1-regularization). The effectof these strategies on the adversarial outcome is pre-sented in Table 4. The Effective Attack Rate (EAR) isseen to be similar for the two cases (∆=0.01, on aver-age). This demonstrates the equivalence in effects ofthe two strategies, when it comes to securing againstprobing based attacks.

Based on the analysis on a binary feature space inTable 1, it was deduced that simple models behave sim-ilar to robust models, when attacks are of an indiscrim-inate nature, as both result in the same adversarial cer-tainty of 1/2N−1. From the same analysis, it was de-duced that robust models are better than one-class clas-sifiers, in maintaining adversarial uncertainty. The in-tuition behind this was that the increased generaliza-tion of robust models will create uncertainty regard-ing the exact location of the training data, and the im-pact of various feature subspaces on the prediction task.However, this intuition relied on the assumption of anaive adversary, whose primary aim is to evade the sys-tem only. The proposed high confidence filtering attackstrategy (AP-HC) of Section 3.4, provides a way to sim-ulate a sophisticated adversary, who is capable of uti-lizing all the probe-able information to launch attacksof high certainty. The result of using this attack strat-egy on the robust classifier model is presented in Ta-ble 4. It is seen that the AMD for the robust classifiersignificantly reduces, when faced with the high confi-dence attack. In case of the CAPTCHA dataset andthe Synthetic dataset, the adversarial activity will go to-tally unnoticed, while in case of other datasets, a signif-icant drop in the AMD is observed (∆=0.38, on aver-age over all datasets). An important observation comesfrom the fact that the filtering operation was performedcompletely on the adversary’s side, without any help orinformation from the defender’s model. The adversarystarts off with the goal of admitting only the most con-fident exploration samples, and in doing so, it makes it

Table 4: Results of Anchor Points (AP) attacks and Anchor Points– High Confidence (AP-HC) attacks, on the Effective Attack Rate(EAR) and the Adversarial Margin Density(AMD).

Dataset TrainingAccuracy

EAR AMDAP

AttacksAP-HCAttacks

APAttacks

AP-HCAttacks

Synthetic 100 0.98 0.999 0.28 0.01CAPTCHA 100 0.996 0.999 0.19 0.002Phishing 93.1 0.978 0.999 0.62 0.19Digits08 97.1 0.928 0.999 0.73 0.22Digits17 99.5 0.965 0.999 0.66 0.16

difficult for the defender to detect or stop it.The effectiveness of the high confidence attacks, is

due to the availability of all information, to be probed bya sophisticated adversary. The majority voting schemeof the robust ensemble, is reverse engineered by stitch-ing together orthogonal subsets of feature information,to generate attacks which closely mimic the legitimatesamples. As such, the robust ensemble methodologiesbehave similar to one-class classifiers, by being vulner-able to low adversarial uncertainty. Both design ap-proaches rely on incorporating maximum extracted in-formation from the training data, thereby conveying ex-cessive information to an adversary and equipping itwith a thorough understanding of the impact of differ-ent features to the classification task.

4.4. Analyzing effects of using randomization in the de-fender’s model

Robustness via complex learning methodologies aimsat increasing the adversary’s effort, by making it reverseengineer a large set of features, as seen in the case ofone-class classifiers and the feature bagged ensemble.The other advocated approach to dealing with adver-sarial activity relies on obfuscation via randomization.By randomizing the feedback presented to the attacker,these methodologies aim to mislead adversarial learningfrom probes, leading to less effective attacks. Random-ness can be introduced into multiple classifier systems,particularly the ones using feature bagging. This is doneby training multiple models on subsets of features andthen using any one of these models to provide predictionat any given time (Wozniak et al., 2014; Biggio et al.,2010a; Colbaugh and Glass, 2012b). The idea relies onconfusing the adversary by constantly presenting it witha moving target. In this section, the impact of random-ization is analyzed, when used with a feature baggedensemble, against indiscriminate evasion attacks.

Randomization is introduced into the defender’smodel, by extending the random subspace ensemble ofSection 4.3, to generate feedback based on the posterior

17

Figure 13: Prediction landscape for the randomized feature baggingensemble. Blindspots are perceived to be obscured, while high confi-dence spaces remain consistent across repeated probing.

probability for a given sample X. The use of randomsubspace ensemble allows for the randomness be causeddue to disagreement between orthogonal feature infor-mation. Given a sample X, the defender computes theconfidence on it, based on majority voting of its compo-nent models. This confidence is then used as a probabil-ity of prediction, to generate the class label for X. As anexample, consider a sample X for which the classifierC predicts with 0.8 probability to be in the Legitimateclass. A standard threshold of 0.5, based on majorityvoting, will cause the feedback on X to be of class Legit-imate. To introduce randomness, the defender’s modelwill instead sample number in the range of [0,1] and re-turn feedback Legitimate, only if the random number is>0.8. As such, randomness is introduced into the clas-sification scheme, while still maintaining the essentialpredictive properties of the classifier. Here, the adver-sary may not be aware of the internal scoring or the ran-domness of the black box model, as it still only experi-ences the defender as a black box model providing Le-gitimate/Malicious feedback on the submitted probe X.The perceived space of the adversary is shown in Fig-ure 13. The regions of uncertainty is heavily influencedby randomness, due to the disagreement between mod-els trained on the two features.

The effects of randomness is demonstrated over asynthetic 2D dataset in Figure 14a), where the anchorpoints (AP) attacks is used. The misleading feedbackfrom the defender, causes the exploration phase to becorrupted, due to the naive assumption on the adver-sary’s part about the veracity of the defender’s feedback.This is presented in Figure 15 for the 5 datasets. An av-erage drop of 22.9% in the Effective Attack Rate (EAR),is seen.

The analysis of a naive adversary assumes that the de-

(a) Naive adversary

(b) Randomization aware adversary

Figure 14: Anchor Points attacks against a randomized defender. Top-Naive adversary disregards randomness. Bottom- Adversary with con-fidence filtering, repeated probing of exploration samples is used toweed out samples with inconsistent feedback.

fender always returns the correct feedback on the sub-mitted probes. This was seen to result in the attackerbeing mislead, as seen for the EAR of the naive at-tacker in Figure 14b). However, an adversary can be-come aware of the randomness, by submitting the sameprobe multiple times and observing different responseson it. Such an adversary can account for randomness indesigning its attacks. A simple heuristic strategy is sim-ulated here, to understand the behavior of such an adver-sary. In the exploration phase, the adversary makes re-peated submission on every exploration point, to under-stand the defender’s confidence in the sample. A samplewhich is closer to the training data, will generate feed-back with more consistency, while samples falling inblindspots will be more random in their feedback (Fig-ure 13). Using this intuition, the the anchor points (AP)attack strategy is modified, to account for smart adver-saries capable of dealing with randomness. In the explo-ration phase, the adversary submits each sample NRetries

(taken as 5 in experiments here) times, and accepts theprobe to belong to the Legitimate class, only if it returnsthe same feedback for all NRetries times. All other sam-ples are assumed to belong to the Malicious class. Bycleaning samples of low confidence, the blindspot ex-ploration samples are removed, making the exploitationphase more potent, as demonstrated in Figure 14b).

18

Figure 15: Comparison of Effective Attack Rate (EAR) for Non robustmodels, Robust models, Randomization based models, and Random-ization models with adversaries capable of filtering low confidencesamples.

Applying the filtering step allows for simulating anadversary capable of dealing with randomness. The re-sults of such an adversary is shown in Figure 15, wherean increase in 16.7% in the EAR is seen, for such attack-ers. After this filtering, the attacks are similar to that ona robust model, with only a 5.1% difference on average.This demonstrates the ineffectiveness of the randomiza-tion approaches in providing security, against probingbased attacks. There is increased onus on an adversaryto use more probes to validate the exploration samples,but if a possible adversary makes this additional invest-ment, incentivized by a high EAR, the randomizationapproach fails.

4.5. Why informative models are ill suited for adversar-ial environments?

The evaluation of classifier design from the perspec-tive of adversarial uncertainty provides new insightsinto their vulnerabilities, when applied in a dynamicadversarial environment. The model design strategiesof one-class classifiers, robust feature bagging modelsand randomized feature bagging models, are all suscep-tible to attacks which can leave detection and retrain-ing, intractable. All these methodologies try to includethe maximum information from the training data, intothe learned model, in order to maintain an informa-tion advantage over the adversary. This is seen to bethe general trend in the adversarial machine learning re-search community (Papernot et al., 2016c; Biggio et al.,2010a, 2014a), where including more information intothe models is believed to make it more secure. How-ever, incorporating more information into the learnedmodel means that the adversary will be able to reverseengineer and learn more of this information from thedeployed model, via probing. In case of a one-classclassifier, which advocates using maximal information

to develop a tight boundary around the training data,it was seen that an attacker is led straight to the spaceof the legitimate data (Section 4.2). In case of Robust-ness and Randomization, information available is madeharder to mimic/extract. In robustness, more featuresneed to be mimicked to gain access, while randomiza-tion aims to mislead the adversary’s learning. It is seenthat both fail against a determined adversary, who giventime/resources can learn and leverage all the availableinformation.

Robust models are also seen to fail at detection andrecovery, when faced with an adversary who is capableof generating high confidence attacks. These also stemsfrom the increased availability of probe-able informa-tion, to be used by an adversary. These defense strate-gies relies of unrealistic assumption on the adversary’spart, to ensure safety. It is generally assumed that ifan attack is too expensive (i.e., requires many probes orthe modification of many features), the adversary willgive up. While this assumption is the basis of secu-rity against targeted evasion attacks, it does not hold incase of indiscriminate exploratory attacks. This sectionaimed to highlight some of the issues with incorporat-ing excessive information into the defender’s model. Inan adversarial domain, the attack is a function of the de-ployed model. A highly informative model will lead toa highly confident attack. It is necessary to reevaluatethis central idea of overly informative defender models,and analyze the effects of causing an information gapbetween the defender and the attacker, for improvingsecurity.

5. Ensuring long term dynamic security by hidingfeature importance

The approaches of Complex learning and Random-ization, rely on including maximum information fromthe training dataset, into the learned model. As such,they are vulnerable to information leakage via adversar-ial probing and reverse engineering. Experimentation inSection 4, demonstrated that a classifier which exposesexcessive information about the feature importance issusceptible to high confidence attacks. The increasedconfidence in attacks leads to issues with adversarialdetect-ability, data leakage and retrain-ability. As such,the impact and ability to hide the importance of some ofthe features, to ensure that the model is shielded fromtotal reverse engineering, is evaluated here. This ap-proach presents a counter-intuitive idea, as it advocatesreducing the amount of information incorporated intothe deployed model.

19

5.1. Motivation for feature importance hidingThe notion behind feature importance hiding relies

on eliminating a few of the important features from theclassification training phase, so as to intentionally mis-represent their importance to an external adversary. Byeliminating these features from the classification model,the adversary is made to believe that they are not impor-tant for the prediction task. No amount of probing willhelp the adversary to ascertain the importance of thehidden features, as their importance and informative-ness is shielded from the classification task. Although,this does not provide any direct benefits against the on-set of evasion attacks, they help in maintaining adver-sarial uncertainty. An attacker will not be able to com-pletely reverse engineer the importance of all features,no matter the resources/time used by it, as they are notavailable in the black box model.

The idea of feature importance hiding, is illustratedin Figure 16. The prediction landscapes of the fol-lowing classifiers are depicted: a) A restrictive classi-fier aggregating all feature information (X1 ∧ X2), b)A randomization based classifier which picks its outputbased on either feature’s prediction (X1 ∨ X2), and c)The hidden feature classifier where feature X2’s influ-ence is hidden. The choice of these designs affects theadversarial uncertainty, at test time. Based on experi-mentation in Section 4.2 and Section 4.4, we see thatrestrictive models (a)) and randomization models (b))are ineffective, as an intelligent adversary can reverseengineer them to generate high confidence attacks. Therobust model of Figure 16 a) reduces the effective spaceof legitimate samples, but in doing so it leaks informa-tion about the importance of features X1 and X2. Therandomized model of Figure 16 b), aims to mislead theadversarial probing, but is vulnerable to repeated prob-ing attacks, as demonstrated in Section 4.4. In case ofthe hidden feature importance design of Figure 16 c),an adversary sees a misrepresented view of the predic-tion landscape. The adversary is led to believe that onlyfeature X1 is important to the defender’s model. SinceC2 is kept hidden, the attacker has information aboutonly half the feature space and no amount of probingwill help it understand the importance of feature X2 tothe prediction problem. Attacks generated at test time,will fall under the blue region to make them effective,and when they do so, they have 50% chance of fallingunder the blindspot B2. B2 serves as a honeypot in thelearned model, which helps capture attacks. An impor-tant distinction between the robustness and randomiza-tion approaches, in comparison to the hidden classifierapproach, is that in the former case the defender expectsthe adversary to be dissuaded by the increased cost of

Figure 16: Prediction landscapes as perceived by probing on the de-fender models. a): Defender model is given as C1∧C2. b): Defendermodel given by randomly selecting C1 ∨ C2, to perform prediction.c): Defender model is given by C1 (trained on feature X1), while C2is kept hidden to detect adversarial activity. Blindspot(Margin) B2denotes region for adversarial uncertainty.

evasion (in terms of probing budget), while in the lattercase, the defender is making it unfeasible for an attackerto probe and obtain information necessary to generate ahigh confidence attack.

It should be noted that, the idea of hiding feature im-portance is not in direct violation of the Kerckhoffs’ssecurity principle (Kerckhoffs, 1883; Mrdovic and Pe-runicic, 2008), which states that security should not relysolely on the unavailability of information, on the partof the adversary. This is because the defender is not us-ing hidden features, but only misleading the adversaryinto believing that a few of the features are not impor-tant to the prediction task. As in Figure 16, the attackeris aware of features X1 and X2, but infers that only X1is important to the classification task. All features areincluded in the classification model, but by intention-ally hiding the importance of a subset of the features,the adversary is misled into generating attacks of par-tial confidence only (i.e., partially mimics the traininglegitimate data).

5.2. Evaluation of effects of feature importance hiding

In order to demonstrate the effect of hiding featureimportance, experimentation is presented on a classifiermodel with 50% of the features intentionally eliminatedfrom the classification process, in Table 5. A randomsubspace model is chosen as the defender’s model (50Linear SVM, 50% of features per model). A randomsubset of 50% of the features are considered to be elim-inated from the classification process. This is done byeliminating the features from the training dataset, andthen training the defender’s model on the reduced set offeatures. Any incoming probe is first truncated to the re-duced feature set and then evaluated on the model. Thismakes the feature reduction strategy opaque to the ex-ternal users. The results of the Anchor Points - HighAttacks (AP - HC) attacks on this classifier is presentedin Table 5. The Effective Attack Rate (EAR) and theAdversarial Margin Density (AMD), are evaluated on

20

Table 5: Training Accuracy, Effective attack rate (EAR) and Adversarial Margin Density (AMD) under AP-HC attacks, for defender using allfeatures and one which uses only half of the features.

DatasetTrainingAccuracy EAR AMD

All Features Hidden Features All Features Hidden Features All Features Hidden FeaturesSynthetic 100 100 1 0.99 0.01 0.172

CAPTCHA 100 100 0.99 0.99 0.002 0.075Phishing 93.1 89.4 0.99 0.99 0.19 0.562Digits08 97.1 95.1 0.99 0.99 0.22 0.648Digits17 99.5 94.9 0.98 0.99 0.16 0.517

Table 6: Effective Attack Rate (EAR) of models trained on Availableand on the Hidden features, after AP-HC attack.

Dataset EARAvailableFeatures

HiddenFeatures

Synthetic 0.99 0.67CAPTCHA 0.99 0.65

Phishing 0.99 0.47Digits08 0.99 0.32Digits17 0.99 0.32

this hidden feature classifier design and also a baselineclassifier which uses all features in its model.

It can be seen that the training accuracy is onlymarginally affected (<5% difference at max), by the re-duction of features from the models. This is a result ofthe presence of orthogonal information in the featuresof the training data. In case of a robust ensemble de-sign, which incorporates all information in the model,the AMD is seen to drop as the attacks leverage the or-thogonal information from the probes, to avoid regionsof low certainty. However, for the hidden features ap-proach, a significantly higher AMD value is seen in allcases (0.28, on average). This is because no amountof probing and cleansing will help the adversary in de-termining that the hidden features are important to theclassification, and what exact values of the features theyneed to mimic. The EAR of all attacks is comparableto that of the robust classifier (Table 5). However, theincreased AMD ensures that the defender has an up-per hand in the attack-defense cycle, as it can detect at-tacks and keep the training data clean for future retrain-ing. This makes hiding of feature importance an effec-tive strategy in providing reactive security to adversarialdrift. The notion of hiding feature importance suggeststhat not all information available at the training phaseshould be used at the same time, to ensure leverage overthe adversary. A high AMD ensures that attacks can bedetected and can be recovered from. The EAR of the

two models: a) trained on the available set of featuresand b) trained on the hidden features, is shown in Ta-ble 6. It is seen that, while model trained on the set ofavailable features is ultimately evaded by an adversary,the hidden features provide for a relatively unattackedset of information, which can be leveraged for retrain-ing and continued usage, in a dynamic environment.

6. Towards a Dynamic-Adversarial mining approach

The analysis in this paper, takes initial steps towardsunderstanding the security of machine learning as aholistic approach, combining proactive and reactive se-curity measures. Long term effects of training time de-sign decisions are analyzed. It was shown that restric-tive classifier designs can be damaging for long termsecurity, as they lead to data leakage and inseparability,after attacks. Based on the analysis, a counter-intuitivestrategy was proposed, which advocates shielding offeature importance information.

While existing works on Proactive and on Reactivesecurity have largely been carried out in isolation fromeach other, there is a need for a combined dynamic viewof the problem. Proactive steps can be taken in modeldevelopment, to make subsequent detection and retrain-ing easier. Similarly, honeypots and randomization, canbe used effectively to delay the onset of attacks. Thestudy of adversarial drift, as a specific case of the largerfield of concept drift, is important to understand howadversarial behavior is affected by design decisions ofthe learned model. As adversarial drift is dependent onthe defender’s model (which the attacker is trying to re-verse engineer and evade), it could be possible to trainmodels to mislead adversary, and not just improve gen-eralization.

A holistic view of the security would understand thatit is a cyclic process, where delaying attacks, detectingthem and then recovering, are all equally important tomeet the system’s security goals. A dynamic security

21

paradigm would take proactive measures to delay at-tacks, and to make their subsequent detection and fixingeasier, as well. As such metrics like Attack delay, Diffi-culty to reverse engineer, Detect-ability of vulnerabilityof classifiers, and Recover-ability after attacks; are moreuseful to dynamic security, than the traditional metricsof accuracy, precision and recall. Also, dynamic secu-rity needs to operate as a never-ending learning scheme,with efficient involvement of human in the loop, to pro-vide expertise and retraining ability, from time to time.

The interdisciplinary field of Dynamic-Adversarialmining (Figure 17), needs to incorporate lessons learnedfrom: Machine Learning, Cybersecurity and StreamData Mining; to rethink use of machine learning in secu-rity applications and develop systems which are ‘secureby design’ (Biggio et al., 2014a). Some core ideas underthe dynamic adversarial mining scheme are:

• Ability to leave feature space honeypots in thelearned classifier at training time, to efficiently de-tect attacks using unlabeled data at test time.

• Semi automated self aware algorithms, which candetect abnormal data distribution shifts, at testtime.

• Maintaining multiple backup models, which canprovide prediction when the main model gets at-tacked.

• Ability to use a stream labeling budget effectivelyand to distribute the budget appropriately, to de-tect and fix attacks. Thereby, managing human in-volvement in the process.

• Understanding attack vulnerabilities on classifiers,from a purely data driven perspective, to effectivelytest the security measures employed.

This work takes steps towards an integrated ap-proach of machine learning security, under the canopyof Dynamic-Adversarial mining. Our work aims to en-courage further awareness and interest in the need for adynamic security paradigm, by bringing together ideasfrom the domains of Streaming data and Cybersecurity.

7. Conclusion and future work

Although the naive application of machine learning,has found early success in many domains, its abilitiesand vulnerabilities are still not well understood. To se-cure machine learning and to make it usable in a longterm dynamic environment, it is necessary to adopt a

Figure 17: The field of Dynamic Adversarial Mining, will derive fromthe concepts of Machine learning, Stream data mining and Cyberse-curity.

holistic view in its analysis. In this paper, the short-comings in popular classifier design strategies is high-lighted, as they tend to approach security from a staticperspective. Approaches such as restrictive one classclassifiers, robust feature bagged ensembles and ran-domization based ensembles, are all found to leak ex-cessive information to the adversary. This excessive in-formation can be leveraged by an intelligent adversary,who can launch attacks, such that future detection andrecovery from attacks is made difficult. Such vulnera-bilities are introduced into the machine learning model,due to the widely accepted philosophy of incorporatingexcessive information into the defender’s model, in thehopes of making it more secure. However, a dynamicanalysis of this philosophy shows that it is not applica-ble as a long term solution.

Based on the analysis of the static solutions, a novelDynamic-Adversarial aware solution is proposed. Thismethodology intelligently shields a subset of the fea-tures, from being probed and leaked to the adversary,by hiding their impact on the classification task. Thisresulted in adversary’s with a misguided sense of con-fidence and also enables better attack detection and re-cover. The proposed methodology serves as a blueprintfor further research in the area of Dynamic-Adversarialmining, where a holistic approach to security will leadto long term benefits for the defender. Future work willconcentrate on developing secure methods for retrainingclassifiers, with the new labeled data. Also, the devel-opment of an adversarial aware concept drift handlingmethodology, is an area warranting further research.

22

References

References

Abramson, M., 2015. Toward adversarial online learning and the sci-ence of deceptive machines. In: 2015 AAAI Fall Symposium Se-ries.

Alabdulmohsin, I. M., Gao, X., Zhang, X., 2014. Adding robustnessto support vector machines against adversarial reverse engineer-ing. In: Proceedings of the 23rd ACM International Conference onConference on Information and Knowledge Management. ACM,pp. 231–240.

Barreno, M., Nelson, B., Joseph, A. D., Tygar, J., 2010. The securityof machine learning. Machine Learning 81 (2), 121–148.

Barreno, M., Nelson, B., Sears, R., Joseph, A. D., Tygar, J. D., 2006.Can machine learning be secure? In: Proceedings of the 2006ACM Symposium on Information, computer and communicationssecurity. ACM, pp. 16–25.

Barth, A., Rubinstein, B. I., Sundararajan, M., Mitchell, J. C., Song,D., Bartlett, P. L., 2012. A learning-based approach to reactive se-curity. IEEE Transactions on Dependable and Secure Computing9 (4), 482–493.

Biggio, B., Corona, I., He, Z.-M., Chan, P. P., Giacinto, G., Yeung,D. S., Roli, F., 2015. One-and-a-half-class multiple classifier sys-tems for secure learning against evasion attacks at test time. In:Multiple Classifier Systems. Springer, pp. 168–180.

Biggio, B., Corona, I., Maiorca, D., Nelson, B., Srndic, N., Laskov, P.,Giacinto, G., Roli, F., 2013. Evasion attacks against machine learn-ing at test time. In: Machine Learning and Knowledge Discoveryin Databases. Springer, pp. 387–402.

Biggio, B., Fumera, G., Roli, F., 2008. Adversarial pattern classifi-cation using multiple classifiers and randomisation. In: Structural,Syntactic, and Statistical Pattern Recognition. Springer, pp. 500–509.

Biggio, B., Fumera, G., Roli, F., 2010a. Multiple classifier systems forrobust classifier design in adversarial environments. InternationalJournal of Machine Learning and Cybernetics 1 (1-4), 27–41.

Biggio, B., Fumera, G., Roli, F., 2010b. Multiple classifier systemsunder attack. In: Multiple Classifier Systems. Springer, pp. 74–83.

Biggio, B., Fumera, G., Roli, F., 2014a. Pattern recognition systemsunder attack: Design issues and research challenges. InternationalJournal of Pattern Recognition and Artificial Intelligence 28 (07),1460002.

Biggio, B., Fumera, G., Roli, F., 2014b. Security evaluation of pat-tern classifiers under attack. IEEE Transactions on Knowledge andData Engineering 26 (4), 984–996.

Bryll, R., Gutierrez-Osuna, R., Quek, F., 2003. Attribute bagging: im-proving accuracy of classifier ensembles by using random featuresubsets. Pattern recognition 36 (6), 1291–1302.

Brzezinski, D., Stefanowski, J., 2014. Reacting to different types ofconcept drift: The accuracy updated ensemble algorithm. IEEETransactions on Neural Networks and Learning Systems 25 (1),81–94.

Carlini, N., Mishra, P., Vaidya, T., Zhang, Y., Sherr, M., Shields,C., Wagner, D., Zhou, W., 2016. Hidden voice commands. In:USENIX Security Symposium. pp. 513–530.

Chang, C.-C., Lin, C.-J., 2011. Libsvm: a library for support vectormachines. ACM Transactions on Intelligent Systems and Technol-ogy (TIST) 2 (3), 27.

Chinavle, D., Kolari, P., Oates, T., Finin, T., 2009. Ensembles in ad-versarial classification for spam. In: Proceedings of the 18th ACMconference on Information and knowledge management. ACM, pp.2015–2018.

Colbaugh, R., Glass, K., 2012a. Predictability-oriented defense

against adaptive adversaries. In: IEEE International Conference onSystems, Man, and Cybernetics (SMC). IEEE, pp. 2721–2727.

Colbaugh, R., Glass, K., 2012b. Predictive defense against evolvingadversaries. In: IEEE International Conference on Intelligence andSecurity Informatics (ISI). IEEE, pp. 18–23.

Dalvi, N., Domingos, P., Sanghai, S., Verma, D., et al., 2004. Adver-sarial classification. In: Proceedings of the tenth ACM SIGKDDinternational conference on Knowledge discovery and data min-ing. ACM, pp. 99–108.

Ditzler, G., Roveri, M., Alippi, C., Polikar, R., 2015. Learning innonstationary environments: A survey. Computational IntelligenceMagazine 10 (4), 12–25.

DSouza, D. F., 2014. Avatar captcha: telling computers and humansapart via face classification and mouse dynamics.

Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.,2014. A survey on concept drift adaptation. ACM Computing Sur-veys (CSUR) 46 (4), 44.

Globerson, A., Roweis, S., 2006. Nightmare at test time: robust learn-ing by feature deletion. In: Proceedings of the 23rd internationalconference on Machine learning. ACM, pp. 353–360.

Hardt, M., Megiddo, N., Papadimitriou, C., Wootters, M., 2016.Strategic classification. In: Proceedings of the 2016 ACM Confer-ence on Innovations in Theoretical Computer Science. ACM, pp.111–122.

Henke, M., Souto, E., dos Santos, E. M., 2015. Analysis of the evolu-tion of features in classification problems with concept drift: Ap-plication to spam detection. In: IFIP/IEEE International Sympo-sium on Integrated Network Management (IM). IEEE, pp. 874–877.

Ho, T. K., 1998. The random subspace method for constructing deci-sion forests. IEEE Transactions on Pattern Analysis and MachineIntelligence 20 (8), 832–844.

Hosseini, H., Kannan, S., Zhang, B., Poovendran, R., 2017. Deceivinggoogle’s perspective api built for detecting toxic comments. arXivpreprint arXiv:1702.08138.

Huang, L., Joseph, A. D., Nelson, B., Rubinstein, B. I., Tygar, J.,2011. Adversarial machine learning. In: Proceedings of the 4thACM workshop on Security and artificial intelligence. ACM, pp.43–58.

Kantchelian, A., Afroz, S., Huang, L., Islam, A. C., Miller, B.,Tschantz, M. C., Greenstadt, R., Joseph, A. D., Tygar, J., 2013.Approaches to adversarial drift. In: Proceedings of the 2013 ACMworkshop on Artificial intelligence and security. ACM, pp. 99–110.

Kerckhoffs, A., 1883. La cryptographie militaire (military cryptogra-phy), j. Sciences Militaires (J. Military Science, in French).

Kołcz, A., Teo, C. H., 2009. Feature weighting for improved classifierrobustness. In: CEAS09: sixth conference on email and anti-spam.

Kuncheva, L. I., 2008. Classifier ensembles for detecting conceptchange in streaming data: Overview and perspectives. In: 2ndWorkshop SUEMA. Vol. 2008. pp. 5–10.

Lee, K., Caverlee, J., Webb, S., 2010. Uncovering social spammers:social honeypots+ machine learning. In: Proceedings of the 33rdinternational ACM SIGIR conference on Research and develop-ment in information retrieval. ACM, pp. 435–442.

Lichman, M., 2013. UCI machine learning repository.URL http://archive.ics.uci.edu/ml

Liu, W., Chawla, S., Bailey, J., Leckie, C., Ramamohanarao, K., 2012.An efficient adversarial learning strategy for constructing robustclassification boundaries. In: AI 2012: Advances in Artificial In-telligence. Springer, pp. 649–660.

Lowd, D., Meek, C., 2005. Adversarial learning. In: Proceedings ofthe eleventh ACM SIGKDD international conference on Knowl-edge discovery in data mining. ACM, pp. 641–647.

Miller, B., Kantchelian, A., Afroz, S., Bachwani, R., Dauber, E.,

23

http://archive.ics.uci.edu/ml

Huang, L., Tschantz, M. C., Joseph, A. D., Tygar, J. D., 2014.Adversarial active learning. In: Proceedings of the 2014 Workshopon Artificial Intelligent and Security Workshop. ACM, pp. 3–14.

Minku, L. L., Yao, X., 2012. Ddd: A new ensemble approach fordealing with concept drift. IEEE Transactions on Knowledge andData Engineering 24 (4), 619–633.

Mrdovic, S., Perunicic, B., 2008. Kerckhoffs’ principle for intrusiondetection. In: The 13th International Telecommunications Net-work Strategy and Planning Symposium, 2008. IEEE, pp. 1–8.

Mthembu, L., Marwala, T., 2008. A note on the separability index.arXiv preprint arXiv:0812.1107.

Onoda, T., Kiuchi, M., 2012. Analysis of intrusion detection in controlsystem communication based on outlier detection with one-classclassifiers. In: Neural Information Processing. Springer, pp. 275–282.

Papernot, N., McDaniel, P., Goodfellow, I., 2016a. Transferability inmachine learning: from phenomena to black-box attacks using ad-versarial samples. arXiv preprint arXiv:1605.07277.

Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B.,Swami, A., 2017. Practical black-box attacks against machinelearning. In: Proceedings of the 2017 ACM on Asia Conferenceon Computer and Communications Security. ACM, pp. 506–519.

Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z. B.,Swami, A., 2016b. The limitations of deep learning in adversar-ial settings. In: 2016 IEEE European Symposium on Security andPrivacy (EuroS&P). IEEE, pp. 372–387.

Papernot, N., McDaniel, P., Sinha, A., Wellman, M., 2016c. Towardsthe science of security and privacy in machine learning. arXivpreprint arXiv:1611.03814.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B.,Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V.,Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Per-rot, M., Duchesnay, E., 2011. Scikit-learn: Machine learning inPython. Journal of Machine Learning Research 12, 2825–2830.

Quinlan, J. R., 1993. C4. 5: Programming for machine learning. Mor-gan Kauffmann.

Rndic, N., Laskov, P., 2014. Practical evasion of a learning-based clas-sifier: A case study. In: IEEE Symposium on Security and Privacy(SP). IEEE, pp. 197–211.

Rowe, N. C., Custy, E. J., Duong, B. T., 2007. Defending cyberspacewith fake honeypots. Journal of Computers 2 (2), 25–36.

Salem, M. B., Hershkop, S., Stolfo, S. J., 2008. A survey of insiderattack detection research. Insider Attack and Cyber Security, 69–90.

Sculley, D., Otey, M. E., Pohl, M., Spitznagel, B., Hainsworth, J.,Zhou, Y., 2011. Detecting adversarial advertisements in the wild.In: Proceedings of the 17th ACM SIGKDD international confer-ence on Knowledge discovery and data mining. ACM, pp. 274–282.

Sethi, T. S., Kantardzic, M., 2015. Dont pay for validation: Detectingdrifts from unlabeled data using margin density. Procedia Com-puter Science 53, 103–112.

Sethi, T. S., Kantardzic, M., 2017a. Data driven exploratory attackson black box classifiers in adversarial domains. arXiv preprintarXiv:1703.07909.

Sethi, T. S., Kantardzic, M., 2017b. On the reliable detection of con-cept drift from streaming unlabeled data. Expert Systems with Ap-plications.

Sethi, T. S., Kantardzic, M., Hu, H., 2016. A grid density based frame-work for classifying streaming data in the presence of concept drift.Journal of Intelligent Information Systems 46 (1), 179–211.

Sethi, T. S., Kantardzic, M., Ryu, J. W., 2017. Security theater: On thevulnerability of classifiers to exploratory attacks. In: 12th PacificAsia Workshop on Intelligence and Security Informatics. Springer.

Singh, A., Walenstein, A., Lakhotia, A., 2012. Tracking concept drift

in malware families. In: Proceedings of the 5th ACM workshop onSecurity and artificial intelligence. ACM, pp. 81–92.

Smutz, C., Stavrou, A., 2016. When a tree falls: Using diversity inensemble classifiers to identify evasion in malware detectors.

Stein, T., Chen, E., Mangla, K., 2011. Facebook immune system.In: Proceedings of the 4th Workshop on Social Network Systems.ACM, p. 8.

Stevens, D., Lowd, D., 2013. On the hardness of evading combina-tions of linear classifiers. In: Proceedings of the 2013 ACM work-shop on Artificial intelligence and security. ACM, pp. 77–86.

Tramer, F., Zhang, F., Juels, A., Reiter, M. K., Ristenpart, T., 2016.Stealing machine learning models via prediction apis. In: USENIXSecurity.

Vorobeychik, Y., Li, B., 2014. Optimal randomized classification inadversarial settings. In: Proceedings of the 2014 international con-ference on Autonomous agents and multi-agent systems. Interna-tional Foundation for Autonomous Agents and Multiagent Sys-tems, pp. 485–492.

Wang, F., 2015. Robust and adversarial data mining.Wozniak, M., Grana, M., Corchado, E., 2014. A survey of multiple

classifier systems as hybrid systems. Information Fusion 16, 3–17.Xu, J., Guo, P., Zhao, M., Erbacher, R. F., Zhu, M., Liu, P., 2014.

Comparing different moving target defense techniques. In: Pro-ceedings of the First ACM Workshop on Moving Target Defense.ACM, pp. 97–107.

Zhang, F., Chan, P. P., Biggio, B., Yeung, D. S., Roli, F., 2015. Adver-sarial feature selection against evasion attacks.

Zliobaite, I., 2010. Learning under concept drift: an overview. arXivpreprint arXiv:1010.4784.

24

a dynamic-adversarial mining approach to the security of ... · a dynamic-adversarial mining...

Documents