[ieee 2010 third international workshop on advanced computational intelligence (iwaci) - suzhou,...
TRANSCRIPT
Third International Workshop on Advanced Computational Intelligence August 25-27,2010 - Suzhou, Jiangsu, China
Discriminant Support Vector Data Description
Zhe Wang and Daqi Gao
Abstract-Support Vector Data Description (SVDD) was designed to construct a minimum hypersphere so as to enclose all the data of the target class in the one-class classification case. In this paper, we propose a novel Discriminant Support Vector Data Description (DSVDD). The proposed DSVDD adopts the relevant metric learning instead of the original Euclidean distance metric learning in SVDD, where the relevant metric learning can consider the relationship between data. Here through incorporating both the positive and negative equivalence information, the presented DSVDD assigns large weights to the relevant features and tightens the similar data. More importantly, we introduce the discriminant knowledge prior into the proposed algorithm due to considering the negative equivalence information. The experiments show that the proposed DSVDD can bring more accurate classification performance than the conventional SVDD for all the tested data.
I. INTRODUCTION
ONE-CLASS classification is generally only one cer
tain class named the target class available, which is
different from a multi-class classification. Support Vector
Domain Description (SVDD) as one popular one-class clas
sifier was proposed by Tax and Duin [7], [8], [9]. In the
SVDD model, a hypersphere is constructed like that it can
enclose as many target objects as possible, and minimize the
chance of accepting the non-target data named the outlier
objects. It is well-known that the original SVDD model
adopts the Euclidean distance metric [7], [8], [9]. But an
important problem in those learning algorithms based on
Euclidean distance metric is the scale of the input variables.
In the Euclidean case, SVDD takes all the features of the
target class data as equivalent in training. As a result, those
irrelevant features of the data might be considered in training
and would mislead the data description of the SVDD model
into an irrelevant hypersphere. Simultaneously, the SVDD
with Euclidean distance metric does not have the ability of
considering the prior relationship among the target data.
Relevant metric learning was firstly developed as a Ma
halanobis distance metric learning [3], [5], [6]. One of its
special algorithms is Relevant Component Analysis (RCA)
[6]. RCA is an effective linear transformation for unsuper
vised learning. It constructs a Mahalanobis distance metric
through using positive equivalence relationship. The positive
Zhe Wang and Daqi Gao are with the Department of Computer Science & Engineering, East China University of Science & Technology, Shanghai, China (email: {wangzhe.gaodaqi}@ecust.edu.cn).
This work was supported by Natural Science Foundations of China under Grant No.60903091, the High-Tech Development Program of China (863) under Grant No. 2006AAlOZ315 and the Specialized Research Fund for the Doctoral Program of Higher Education under Grant No.20090074120003 for support. This work was also supported by the Open Projects Program of National Laboratory of Pattern Recognition and the Fundamental Research Funds for the Central Universities
978-1-4244-6337-4/10/$26.00 @201O IEEE 97
equivalence relationship is achieved by the covariance ma
trices of the positive equivalence data. In RCA, the positive
equivalence data are selected from the same chunklet. Each
chunklet is the set in which the data come from the same
class but without special class labels. Through the transfor
mation based on a group of chunklets, RCA can assign large
weights to relevant features and low weights to irrelevant
features [6]. Unfortunately, RCA cannot make use of negative
equivalence constraints or discriminant information. To this
end, Yeung and Chang [10] extended RCA with both positive
and negative equivalence relationship. Specially, the extended
RCA was achieved through designing the so-called within
chunklet and between-chunklet co-variance matrices. In do
ing so, both positive and negative equivalence constraints can
be used.
In this paper, we introduce the extended RCA distance
metric [10] rather than the original Euclidean distance metric
into SVDD and therefore propose a novel Discriminant
Support Vector Domain Description (DSVDD). In doing
so, the presented DSVDD can inherit the advantages of
the extended RCA. In practice, the proposed DSVDD can
reduce the input variable scale influence due to the use
of Mahalanobis distance metric from the extended RCA.
Simultaneously, the proposed DSVDD can easily incorporate
a priori discriminant knowledge due to the consideration of
both the positive and negative equivalence data from the
within-chunklet and between-chunklet co-variance matrices.
In order to validate the effectiveness of the proposed DSVDD
algorithm, we give the experimental results from both syn
thetic and real data sets. The experimental results show the
proposed DSVDD can bring more accurate description for
all the tested data than the conventional SVDD.
The rest of this paper is organized as follows. Section
II gives the structure of the proposed DSVDD. Section III
experimentally shows that the proposed method DSVDD can
bring more accurate description for all the tested target cases
than the conventional SVDD. Following that, both conclusion
and future work are given in Section IV.
II. DISCRIMINANT SUPPORT VECTOR DATA
DESCRIPTION (DSVDD)
Suppose that there is a set of one-class training samples
{Xi }�1 � ]R.n. SVDD seeks such a hypersphere that can
contain all the samples {Xi }�1 and meanwhile minimize
the volume of the hypersphere through the following opti-
mization formulation
min J (1)
subjectto (Xi - af M-1(Xi - a) ::; R2 + �i (2)
�i � 0, i = 1...N (3)
where the parameters R E lR. and a E lR.n are the radius
and the center of the optimized hypersphere respectively; the
regularization parameter C E lR. gives the tradeoff between
the volume of the hypersphere and the errors; and the �i E lR. are slack variables. Since SVDD adopts Euclidean distance
metric, the matrix M E lR.nxn is an identity one with all the
diagonal elements 1 and the others 0. It can be found that SVDD views all the features of the
samples as equivalent. In contrast, our proposed DSVDD
framework assigns large weights to the relevant features and
small weights to the irrelevant features by introducing the
relevant metric learning instead of the Euclidean metric. In
the proposed DSVDD framework, we adopt the relevant
metric learning defined in [10]. Firstly, the whole sample
set {Xi}�1 would be divided into some chunklets without
replacement. Each chunklet is made up of those data with
the positive equivalence relationship. If Xi, Xj belong to the
same chunklet, both Xi and Xj should have the same but
unknown class label. As the literature [10] does, we here
give the so-called within-chunklet matrix Ml and between
chunklet co-variance matrix M2 as following
1 D nd
Ml = N
L L(Xjd - Xd)(Xjd - xdf (4) d=lj=1
1 D D nd
M2 = N(D -1)
L L L(Xjd - Xp)(Xjd - xpf (5) d=1 p=l,p#d j=1
where D is the size of the chunklets; nd is the number
of samples in the dth chunklet; Xd is the mean of the dth
chunklet. Here, since the sample set {Xi }�1 is divided into
D chunklets without replacement, i.e., N = L:f=1 nd. The
Ml owns the positive equivalent information and the M2 owns the negative equivalent information. We define the
following matrix
(6)
Then through taking the above matrix Q instead of the
matrix M-1 in the equation (2), the objective function of
the proposed DSVDD can be obtained. For further exploring
the proposed DSVDD, the matrix Q can be decomposed into 1 1 1 1
Q = (Ml M;2f(Ml M;2) Therefore, the equation (2) with Q can be rewritten as
following
(Xi - a)TQ(xi - a) 1 1 1 1
(Xi - a)T(M22 M;2)T(Ml M;2)(Xi - a) 1 1 1 1
[Ml M;2 (Xi - a)] T[Ml M;2 (Xi - a)] < R2 + �i' i = 1...N
98
In this case, each Xi can be viewed as being linearly 1 1
transformed by M22 M; 2. Ml and M2 play a similar role
as reducing within-class scatter and increasing between-class
scatter in Fisher discriminant analysis, which is also demon
strated in [10]. Since the between-class information can be
introduced, we call the proposed method as Discriminant
Support Vector Data Description (DSVDD).
In order to optimize the parameters R, a, �i' we construct
the Lagrangian function through introducing Lagrangian
multipliers ai, 'Yi and taking the equation (2), (3), (6) into
(1), and thus get
N N
L = R2+CL�i-LadR2+�i- (Xi-afQ(Xi-a)]-L 'Yi�i i=1 i=1
(7)
where ai � 0, 'Yi � 0. Setting partial derivatives of L with
respect to R, a, �i to 0, we can get
8L =0 8R
8L =0 8a 8L =0 8�i
=}
=}
=}
N
Lai= 1 (8) i=1
N
a = Laixi (9) i=1
'Yi = C - ai (10)
Further, we take the constraints (8), (9), (10) into the La
grange function (7) and obtain the maximized criterion as
following
s.t. i,j
° ::; ai ::; C, i = 1...N 1 1
Q = M;2M2M;2 (12)
(13)
The maximization of the equation (11) can be solved through
Quadratic Programming (QP) [1].
A test sample z E lR.n is classified as the target class when
the relevant distance II z -a II Q between the sample z to the
center a of the hypersphere is smaller than or equal to the
radius R, i.e.,
II z - a II�= (z - a)TQ(z - a) ::; R2 (14)
The radius R can be calculated from the center a of the
hypersphere to the sample on the hypersphere bound. In
mathematics, the radius R is given as following
(15)
where, Xi is the sample from the set of support vectors, i.e.,
its Lagrangian multiplier ° < ai < C.
III. EXPERIMENTS
In order to validate the effectiveness of the proposed
DSVDD, we compare the DSVDD with the original SVDD
in terms of both synthetic and UCI data set [2]. Both DSVDD
and SVDD adopt the linear kernel k(Xi,Xj) = xTxj, polynomial kernel (Poly) k(Xi' Xj) = (xT Xj + l)P and radial
10r---�------�----��---. 10.---�------�------�---.
5
-10
0=[0.1500, 0.0500[ f =[0.9444, 0.8500[
(a)
5
-10
0=[0.0300, 0.0500[ f =[0.9510, 0.9700[
*
(b)
-15�--��----�------�--�
-15�----�------�----�----� -5 o
Feature 1 5 -5 0 5
Feature 1
10r---�------�----��---. 10r---�------�----��---'
5
-10
0=[0.0300,0.0500[ f =[0.9510, 0.9700[
*
(e)
-15�----�----��----�----� -5 0 5
Feature 1
5
-10
0=[0.0200, 0.0600] f =[0.9423, 0.9800]
*
(d)
-15�--�------�----��--�
-5 0 5 Feature 1
Fig. 1. The classification boundaries of the SVDD and the proposed DSVDD with D = 2,4, 50, respectively. The sub-figure (a) corresponds to SVDD with the classification result e = [0.1500,0.0500]' f = [0.9444,0.8500]; the sub-figure (b) corresponds to DSVDD with D = 2 and the classification result e = [0.0300,0.0500]' f = [0.9510,0.9700]; the sub-figure (c) corresponds to DSVDD with D = 4 and the classification result e = [0.0300,0.0500]' f =
[0.9510,0.9700]; the sub-figure (d) corresponds to DSVDD with D = 50 and the classification result e = [0.0200,0.0600], f = [0.9423,0.9800].
TABLE I
THE AVERAGE AUe VALUES AND THEIR CORRESPONDING STANDARD DE VIATIONS OF TEN INDEPENDENT RUNS FOR TAE, WATER AND SONAR.
THE LARGER THE VALUE OF THE AUe, THE BETTER THE PERFORMANCE OF THE CORRESPONDING ONE-CLASS CLASSIFIER.
Class No. SVDD DSVDD
Linear Poly RBF Linear Poly REF
TAE
1 0.61±0.17 0.60±0.17 0.69±0.20 0.73±0.14 0.67±0.16 0.83±0.11
2 0.45±0.19 0.47±O.l7 0.54±0.14 0.48±0.19 0.50 ±0.17 0.53±0.14
3 0.47±0.17 0.43±O.l7 0.55±0.15 0.62±0.15 0.51±0.19 0.93±0.1O
Total 0.5100 0.5000 0.5933 0.6134 0.5575 0.7641 WATER
1 0.52±0.29 0.63±0.34 0.88±0.11 0.74±0.19 0.74±0.24 0.91±0.13
2 0.81±0.16 0.65±0.27 0.89±0.07 0.91±0.1O 0.68±0.19 0.96±0.06
Total 0.6650 0.6400 0.8850 0.8233 0.7102 0.9357 SONAR
1 0.53±0.17 0.61±0.12 0.63±0.18 0.64±0.20 0.63±0.19 0.n±0.19
2 0.50±0.25 0.50±0.19 0.61±0.22 0.68±0.19 0.76±0.18 0.80±0.16
Total 0.5180 0.5549 0.6202 0.6588 0.6962 0.7622
99
basis kernel (RBF) k(X i,Xj) = exp(-lla-bI12ja2). All
computations were run on Pentium IV 2.IO-GHz processor
running, Windows XP Professional and MATLAB environ
ment.
First, we implement the experiments on synthetic data. In
one-class classification problem here, we adopt the vectors
e, f E ]R2 to measure the performance of the one-class
classifier, where e(1) gives the False Negative (FN) rate
(the error on the target class), e(2) gives the False Positive
(FP) rate (the error on the outlier class), f(1) gives the ratio
between the sample number of correct target predictions and
the sample number of target predictions, and f(2) gives the
ratio between the sample number of correct target predictions
and the sample number of target samples.
The synthetic data used here are made of a two
dimensional two-class data set, where the target class is
generated as a banana shaped distribution with 100 samples
and the outlier class is generated with a normal distribution
with mean I and a standard deviation sqrt( 1.5). The target
data are uniformly distributed along the bananas and are
superimposed with a normal distribution. Figure 1 gives the
classification boundaries of the SVDD and the proposed
DSVDD with the size D = 2, 4, 50 of the chunklets,
respectively. From Figure I, we can find that 1) the DSVDD
has a significant superior advantage to SVDD in terms of
FN; 2) the performance of the DSVDD is not sensitive to
the parameter D.
Further, we also report the experimental results of
the proposed DSVDD and SVDD on some real data
TAE (3 classes/I 5 1 samples/5 features) [2], WATER
(2 classesl116 samples/38 features) [2] and SONAR (2
classes/208 samples/60 features) which is available at
ftp://ftp.cs.cmu.edu/afs/cs/projectlconnectlbenchl. The size
D of the chunklets in each classification problem is set to
the size of the classes. Here, we adopt the average value
of Area Under the Receiver Operating Characteristics Curve
(AUC) as the measure criterion for the performance of
one-class classifiers [4]. It is known that a good one-class
classifier should have a small FP and a high True Positive
(TP) [7], [8], [9]. Thus, we prefer one classifier with higher
AUC to another one with lower AUe. It means that for the
specific FP threshold, the TP is higher for the first classifier
than the second classifier. Thus the larger the value of the
AUC, the better the corresponding one-class classifier. In our
experiments, the value of the AUC belongs to the range [0, 1]. Table I gives the average AUC values and their corresponding
standard deviations of the proposed DSVDD and SVDD of
ten independent runs for the data sets. The best values of
the AUC is denoted with bold. Both DSVDD and SVDD
adopt linear, polynomial and radial basis kernels. The label
of a target data class is indicated in the first column. In each
classification, we take one class as the target class and the
other classes as the outlier data. From this table, it can be
found that the proposed DSVDD has a significantly superior
classification to SVDD in all the tested cases. The results
validate that the discriminant prior knowledge is induced into
100
the proposed DSVDD framework.
IV. CONCLUSION AND FUTURE WORK
In this paper, we propose a novel Discriminant SVDD
named DSVDD. DSVDD adopts the relevant metric learning
instead of the original Euclidean distance metric learning.
In doing so, the proposed DSVDD assigns large weights
to the relevant features and tights the similar data through
incorporating both the positive and negative equivalence
prior knowledge. The experimental results validate that the
proposed DSVDD significantly improves the effectiveness
of the one-class classifier. In future, we plan to extend our
work to large scale classification cases and make a further
exploration.
REFERENCES
[1] F. Alizadeh and D. Goldfarb, "Second-order cone programming," MathelfUltical Programming, vol. 95, pp. 3-51, 2003.
[2] A. Asuncion and D. Newman, UCI machine learning repository [http://www.ics.uci.edulmlearnlmlrepository.html}. School of Information and Computer Science, University of California, Irvine, CA, 2007.
[3] A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall, "Learning distance functions using equivalence relations," Proceeding International Conference on Machine Learning, 2003.
[4] A.P. Bradley, "The use of the area under the ROC curve in the evaluation of machine learning algorithms," Pattern Recognition, vol. 30, no. 7, pp. 1145-1159, 1997.
[5] J. Goldberger, Roweis, G. Hinton, and R. Salakhutdinov, "Neighbourhood components analysis," Advances in Neural Information Processing Systems, 2005.
[6] N. Shental, T. Hertz, D. Weinshall, and M. Pavel, "Adjustment learning and relevant component analysis," Proceeding of the European Conference on Computer Vision, 2002.
[7] D. Tax and R.P.w. Duin, "Support vector domain description," Pattern Recognition Letters, vol. 20, no. 14, pp. 1191-1199, 1999.
[8] D. Tax and R.P.W. Duin, "Support vector data description," Machine Learning, vol. 54, pp. 45-66, 2004.
[9] D. Tax and P. Juszczak, "Kernel whitening for one-class classification," International Journal of Pattern Recognition and Artificial Intelligence, vol. 17, no. 3, pp. 333-347, 2003.
[10] D. Yeung and H. Chang, "Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints," Pattern Recognition, vol. 39, no. 5, pp. 1007-1010, 2006.