# [IEEE 2010 Third International Workshop on Advanced Computational Intelligence (IWACI) - Suzhou, China (2010.08.25-2010.08.27)] Third International Workshop on Advanced Computational Intelligence - Discriminant Support Vector Data Description

Post on 07-Mar-2017

212 views

Embed Size (px)

TRANSCRIPT

<ul><li><p>Third International Workshop on Advanced Computational Intelligence August 25-27,2010 - Suzhou, Jiangsu, China </p><p>Discriminant Support Vector Data Description </p><p>Zhe Wang and Daqi Gao </p><p>Abstract-Support Vector Data Description (SVDD) was designed to construct a minimum hypersphere so as to enclose all the data of the target class in the one-class classification case. In this paper, we propose a novel Discriminant Support Vector Data Description (DSVDD). The proposed DSVDD adopts the relevant metric learning instead of the original Euclidean distance metric learning in SVDD, where the relevant metric learning can consider the relationship between data. Here through incorporating both the positive and negative equivalence information, the presented DSVDD assigns large weights to the relevant features and tightens the similar data. More importantly, we introduce the discriminant knowledge prior into the proposed algorithm due to considering the negative equivalence information. The experiments show that the proposed DSVDD can bring more accurate classification performance than the conventional SVDD for all the tested data. </p><p>I. INTRODUCTION </p><p>ONE-CLASS classification is generally only one cer</p><p>tain class named the target class available, which is </p><p>different from a multi-class classification. Support Vector </p><p>Domain Description (SVDD) as one popular one-class clas</p><p>sifier was proposed by Tax and Duin [7], [8], [9]. In the </p><p>SVDD model, a hypersphere is constructed like that it can </p><p>enclose as many target objects as possible, and minimize the </p><p>chance of accepting the non-target data named the outlier </p><p>objects. It is well-known that the original SVDD model </p><p>adopts the Euclidean distance metric [7], [8], [9]. But an </p><p>important problem in those learning algorithms based on </p><p>Euclidean distance metric is the scale of the input variables. </p><p>In the Euclidean case, SVDD takes all the features of the </p><p>target class data as equivalent in training. As a result, those </p><p>irrelevant features of the data might be considered in training </p><p>and would mislead the data description of the SVDD model </p><p>into an irrelevant hypersphere. Simultaneously, the SVDD </p><p>with Euclidean distance metric does not have the ability of </p><p>considering the prior relationship among the target data. </p><p>Relevant metric learning was firstly developed as a Ma</p><p>halanobis distance metric learning [3], [5], [6]. One of its </p><p>special algorithms is Relevant Component Analysis (RCA) </p><p>[6]. RCA is an effective linear transformation for unsuper</p><p>vised learning. It constructs a Mahalanobis distance metric </p><p>through using positive equivalence relationship. The positive </p><p>Zhe Wang and Daqi Gao are with the Department of Computer Science & Engineering, East China University of Science & Technology, Shanghai, China (email: {wangzhe.gaodaqi}@ecust.edu.cn). </p><p>This work was supported by Natural Science Foundations of China under Grant No.60903091, the High-Tech Development Program of China (863) under Grant No. 2006AAlOZ315 and the Specialized Research Fund for the Doctoral Program of Higher Education under Grant No.20090074120003 for support. This work was also supported by the Open Projects Program of National Laboratory of Pattern Recognition and the Fundamental Research Funds for the Central Universities </p><p>978-1-4244-6337-4/10/$26.00 @201O IEEE 97 </p><p>equivalence relationship is achieved by the covariance ma</p><p>trices of the positive equivalence data. In RCA, the positive </p><p>equivalence data are selected from the same chunklet. Each </p><p>chunklet is the set in which the data come from the same </p><p>class but without special class labels. Through the transfor</p><p>mation based on a group of chunklets, RCA can assign large </p><p>weights to relevant features and low weights to irrelevant </p><p>features [6]. Unfortunately, RCA cannot make use of negative </p><p>equivalence constraints or discriminant information. To this </p><p>end, Yeung and Chang [10] extended RCA with both positive </p><p>and negative equivalence relationship. Specially, the extended </p><p>RCA was achieved through designing the so-called within</p><p>chunklet and between-chunklet co-variance matrices. In do</p><p>ing so, both positive and negative equivalence constraints can </p><p>be used. </p><p>In this paper, we introduce the extended RCA distance </p><p>metric [10] rather than the original Euclidean distance metric </p><p>into SVDD and therefore propose a novel Discriminant </p><p>Support Vector Domain Description (DSVDD). In doing </p><p>so, the presented DSVDD can inherit the advantages of </p><p>the extended RCA. In practice, the proposed DSVDD can </p><p>reduce the input variable scale influence due to the use </p><p>of Mahalanobis distance metric from the extended RCA. </p><p>Simultaneously, the proposed DSVDD can easily incorporate </p><p>a priori discriminant knowledge due to the consideration of </p><p>both the positive and negative equivalence data from the </p><p>within-chunklet and between-chunklet co-variance matrices. </p><p>In order to validate the effectiveness of the proposed DSVDD </p><p>algorithm, we give the experimental results from both syn</p><p>thetic and real data sets. The experimental results show the </p><p>proposed DSVDD can bring more accurate description for </p><p>all the tested data than the conventional SVDD. </p><p>The rest of this paper is organized as follows. Section </p><p>II gives the structure of the proposed DSVDD. Section III </p><p>experimentally shows that the proposed method DSVDD can </p><p>bring more accurate description for all the tested target cases </p><p>than the conventional SVDD. Following that, both conclusion </p><p>and future work are given in Section IV. </p><p>II. DISCRIMINANT SUPPORT VECTOR DATA </p><p>DESCRIPTION (DSVDD) </p><p>Suppose that there is a set of one-class training samples </p><p>{Xi }1 ]R.n. SVDD seeks such a hypersphere that can contain all the samples {Xi }1 and meanwhile minimize the volume of the hypersphere through the following opti-</p></li><li><p>mization formulation </p><p>min J (1) </p><p>subjectto (Xi - af M-1(Xi - a) ::; R2 + i (2) i 0, i = 1...N (3) </p><p>where the parameters R E lR. and a E lR.n are the radius and the center of the optimized hypersphere respectively; the </p><p>regularization parameter C E lR. gives the tradeoff between the volume of the hypersphere and the errors; and the i E lR. are slack variables. Since SVDD adopts Euclidean distance </p><p>metric, the matrix M E lR.nxn is an identity one with all the diagonal elements 1 and the others 0. </p><p>It can be found that SVDD views all the features of the </p><p>samples as equivalent. In contrast, our proposed DSVDD </p><p>framework assigns large weights to the relevant features and </p><p>small weights to the irrelevant features by introducing the </p><p>relevant metric learning instead of the Euclidean metric. In </p><p>the proposed DSVDD framework, we adopt the relevant </p><p>metric learning defined in [10]. Firstly, the whole sample </p><p>set {Xi}1 would be divided into some chunklets without replacement. Each chunklet is made up of those data with </p><p>the positive equivalence relationship. If Xi, Xj belong to the same chunklet, both Xi and Xj should have the same but unknown class label. As the literature [10] does, we here </p><p>give the so-called within-chunklet matrix Ml and between</p><p>chunklet co-variance matrix M2 as following </p><p>1 D nd </p><p>Ml = N </p><p>L L(Xjd - Xd)(Xjd - xdf (4) d=lj=1 </p><p>1 D D nd </p><p>M2 = N(D -1) </p><p>L L L(Xjd - Xp)(Xjd - xpf (5) d=1 p=l,p#d j=1 </p><p>where D is the size of the chunklets; nd is the number of samples in the dth chunklet; Xd is the mean of the dth chunklet. Here, since the sample set {Xi }1 is divided into D chunklets without replacement, i.e., N = L:f=1 nd. The Ml owns the positive equivalent information and the M2 owns the negative equivalent information. We define the </p><p>following matrix </p><p>(6) </p><p>Then through taking the above matrix Q instead of the matrix M-1 in the equation (2), the objective function of the proposed DSVDD can be obtained. For further exploring </p><p>the proposed DSVDD, the matrix Q can be decomposed into 1 1 1 1 </p><p>Q = (Ml M;2f(Ml M;2) Therefore, the equation (2) with Q can be rewritten as following </p><p>(Xi - a)TQ(xi - a) 1 1 1 1 </p><p>(Xi - a)T(M22 M;2)T(Ml M;2)(Xi - a) 1 1 1 1 </p><p>[Ml M;2 (Xi - a)] T[Ml M;2 (Xi - a)] < R2 + i' i = 1...N </p><p>98 </p><p>In this case, each Xi can be viewed as being linearly 1 1 </p><p>transformed by M22 M; 2. Ml and M2 play a similar role as reducing within-class scatter and increasing between-class </p><p>scatter in Fisher discriminant analysis, which is also demon</p><p>strated in [10]. Since the between-class information can be </p><p>introduced, we call the proposed method as Discriminant </p><p>Support Vector Data Description (DSVDD). </p><p>In order to optimize the parameters R, a, i' we construct the Lagrangian function through introducing Lagrangian </p><p>multipliers ai, 'Yi and taking the equation (2), (3), (6) into (1), and thus get </p><p>N N </p><p>L = R2+CLi-LadR2+i- (Xi-afQ(Xi-a)]-L 'Yii i=1 i=1 </p><p>(7) </p><p>where ai 0, 'Yi 0. Setting partial derivatives of L with respect to R, a, i to 0, we can get </p><p>8L =0 8R </p><p>8L =0 8a 8L =0 8i </p><p>=} </p><p>=} </p><p>=} </p><p>N </p><p>Lai= 1 (8) i=1 </p><p>N </p><p>a = Laixi (9) i=1 </p><p>'Yi = C - ai (10) </p><p>Further, we take the constraints (8), (9), (10) into the La</p><p>grange function (7) and obtain the maximized criterion as </p><p>following </p><p>s.t. i,j </p><p> ::; ai ::; C, i = 1...N 1 1 </p><p>Q = M;2M2M;2 (12) </p><p>(13) </p><p>The maximization of the equation (11) can be solved through </p><p>Quadratic Programming (QP) [1]. </p><p>A test sample z E lR.n is classified as the target class when the relevant distance II z -a II Q between the sample z to the center a of the hypersphere is smaller than or equal to the radius R, i.e., </p><p>II z - a II= (z - a)TQ(z - a) ::; R2 (14) </p><p>The radius R can be calculated from the center a of the hypersphere to the sample on the hypersphere bound. In </p><p>mathematics, the radius R is given as following </p><p>(15) </p><p>where, Xi is the sample from the set of support vectors, i.e., its Lagrangian multiplier < ai < C. </p><p>III. EXPERIMENTS </p><p>In order to validate the effectiveness of the proposed </p><p>DSVDD, we compare the DSVDD with the original SVDD </p><p>in terms of both synthetic and UCI data set [2]. Both DSVDD </p><p>and SVDD adopt the linear kernel k(Xi,Xj) = xTxj, polynomial kernel (Poly) k(Xi' Xj) = (xT Xj + l)P and radial </p></li><li><p>10r----------------. 10.------------------. </p><p>5 </p><p>-10 </p><p>0=[0.1500, 0.0500[ f =[0.9444, 0.8500[ </p><p>(a) </p><p>5 </p><p>-10 </p><p>0=[0.0300, 0.0500[ f =[0.9510, 0.9700[ </p><p>* </p><p>(b) </p><p>-15-------------- </p><p>-15------------------ -5 o </p><p>Feature 1 5 -5 0 5 </p><p>Feature 1 </p><p>10r----------------. 10r----------------' </p><p>5 </p><p>-10 </p><p>0=[0.0300,0.0500[ f =[0.9510, 0.9700[ </p><p>* </p><p>(e) </p><p>-15---------------- -5 0 5 </p><p>Feature 1 </p><p>5 </p><p>-10 </p><p>0=[0.0200, 0.0600] f =[0.9423, 0.9800] </p><p>* </p><p>(d) </p><p>-15-------------- </p><p>-5 0 5 Feature 1 </p><p>Fig. 1. The classification boundaries of the SVDD and the proposed DSVDD with D = 2,4, 50, respectively. The sub-figure (a) corresponds to SVDD with the classification result e = [0.1500,0.0500]' f = [0.9444,0.8500]; the sub-figure (b) corresponds to DSVDD with D = 2 and the classification result e = [0.0300,0.0500]' f = [0.9510,0.9700]; the sub-figure (c) corresponds to DSVDD with D = 4 and the classification result e = [0.0300,0.0500]' f = [0.9510,0.9700]; the sub-figure (d) corresponds to DSVDD with D = 50 and the classification result e = [0.0200,0.0600], f = [0.9423,0.9800]. </p><p>TABLE I </p><p>THE AVERAGE AUe VALUES AND THEIR CORRESPONDING STANDARD DE VIATIONS OF TEN INDEPENDENT RUNS FOR TAE, WATER AND SONAR. </p><p>THE LARGER THE VALUE OF THE AUe, THE BETTER THE PERFORMANCE OF THE CORRESPONDING ONE-CLASS CLASSIFIER. </p><p>Class No. SVDD DSVDD </p><p>Linear Poly RBF Linear Poly REF </p><p>TAE </p><p>1 0.610.17 0.600.17 0.690.20 0.730.14 0.670.16 0.830.11 </p><p>2 0.450.19 0.47O.l7 0.540.14 0.480.19 0.50 0.17 0.530.14 </p><p>3 0.470.17 0.43O.l7 0.550.15 0.620.15 0.510.19 0.930.1O </p><p>Total 0.5100 0.5000 0.5933 0.6134 0.5575 0.7641 WATER </p><p>1 0.520.29 0.630.34 0.880.11 0.740.19 0.740.24 0.910.13 </p><p>2 0.810.16 0.650.27 0.890.07 0.910.1O 0.680.19 0.960.06 </p><p>Total 0.6650 0.6400 0.8850 0.8233 0.7102 0.9357 SONAR </p><p>1 0.530.17 0.610.12 0.630.18 0.640.20 0.630.19 0.n0.19 </p><p>2 0.500.25 0.500.19 0.610.22 0.680.19 0.760.18 0.800.16 </p><p>Total 0.5180 0.5549 0.6202 0.6588 0.6962 0.7622 </p><p>99 </p></li><li><p>basis kernel (RBF) k(X i,Xj) = exp(-lla-bI12ja2). All computations were run on Pentium IV 2.IO-GHz processor </p><p>running, Windows XP Professional and MATLAB environ</p><p>ment. </p><p>First, we implement the experiments on synthetic data. In </p><p>one-class classification problem here, we adopt the vectors </p><p>e, f E ]R2 to measure the performance of the one-class classifier, where e(1) gives the False Negative (FN) rate (the error on the target class), e(2) gives the False Positive (FP) rate (the error on the outlier class), f(1) gives the ratio between the sample number of correct target predictions and </p><p>the sample number of target predictions, and f(2) gives the ratio between the sample number of correct target predictions </p><p>and the sample number of target samples. </p><p>The synthetic data used here are made of a two</p><p>dimensional two-class data set, where the target class is </p><p>generated as a banana shaped distribution with 100 samples </p><p>and the outlier class is generated with a normal distribution </p><p>with mean I and a standard deviation sqrt( 1.5). The target </p><p>data are uniformly distributed along the bananas and are </p><p>superimposed with a normal distribution. Figure 1 gives the </p><p>classification boundaries of the SVDD and the proposed </p><p>DSVDD with the size D = 2, 4, 50 of the chunklets, respectively. From Figure I, we can find that 1) the DSVDD </p><p>has a significant superior advantage to SVDD in terms of </p><p>FN; 2) the performance of the DSVDD is not sensitive to </p><p>the parameter D. </p><p>Further, we also report the experimental results of </p><p>the proposed DSVDD and SVDD on some real data </p><p>TAE (3 classes/I 5 1 samples/5 features) [2], WATER </p><p>(2 classesl116 samples/38 features) [2] and SONAR (2 </p><p>classes/208 samples/60 features) which is available at </p><p>ftp://ftp.cs.cmu.edu/afs/cs/projectlconnectlbenchl. The size </p><p>D of the chunklets in each classification problem is set to the size of the classes. Here, we adopt the average value </p><p>of Area Under the Receiver Operating Characteristics Curve </p><p>(AUC) as the measure criterion for the performance of </p><p>one-class classifiers [4]. It is known that a good one-class </p><p>classifier should have a small FP and a high True Positive </p><p>(TP) [7], [8], [9]. Thus, we prefer one classifier with higher </p><p>AUC to another one with lower AUe. It means that for the </p><p>specific FP threshold, the TP is higher for the first classifier than the second classifier. Thus the larger the value of the </p><p>AUC, the better the corresponding one-class classifier. In our </p><p>experiments, the value of the AUC belongs to the range [0, 1]. Table I gives the average AUC values and their corresponding </p><p>standard deviations of the proposed DSVDD and SVDD of </p><p>ten independent runs for the data sets. The best values of </p><p>the AUC is denoted with bold. Both DSVDD and SVDD </p><p>adopt linear, polynomial and radial basis kernels. The label </p><p>of a target data class is indicated in the first column. In each </p><p>classification, we take one class as the target class and the </p><p>other classes as the outlier data. From this table, it can be </p><p>found that the proposed DSVDD has a significantly superior </p><p>classification to SVDD in all the tested cases. The resu...</p></li></ul>