[IEEE 2010 Third International Workshop on Advanced Computational Intelligence (IWACI) - Suzhou, China (2010.08.25-2010.08.27)] Third International Workshop on Advanced Computational Intelligence - Discriminant Support Vector Data Description

Download [IEEE 2010 Third International Workshop on Advanced Computational Intelligence (IWACI) - Suzhou, China (2010.08.25-2010.08.27)] Third International Workshop on Advanced Computational Intelligence - Discriminant Support Vector Data Description

Post on 07-Mar-2017

212 views

Category:

Documents

0 download

TRANSCRIPT

  • Third International Workshop on Advanced Computational Intelligence August 25-27,2010 - Suzhou, Jiangsu, China

    Discriminant Support Vector Data Description

    Zhe Wang and Daqi Gao

    Abstract-Support Vector Data Description (SVDD) was designed to construct a minimum hypersphere so as to enclose all the data of the target class in the one-class classification case. In this paper, we propose a novel Discriminant Support Vector Data Description (DSVDD). The proposed DSVDD adopts the relevant metric learning instead of the original Euclidean distance metric learning in SVDD, where the relevant metric learning can consider the relationship between data. Here through incorporating both the positive and negative equivalence information, the presented DSVDD assigns large weights to the relevant features and tightens the similar data. More importantly, we introduce the discriminant knowledge prior into the proposed algorithm due to considering the negative equivalence information. The experiments show that the proposed DSVDD can bring more accurate classification performance than the conventional SVDD for all the tested data.

    I. INTRODUCTION

    ONE-CLASS classification is generally only one cer

    tain class named the target class available, which is

    different from a multi-class classification. Support Vector

    Domain Description (SVDD) as one popular one-class clas

    sifier was proposed by Tax and Duin [7], [8], [9]. In the

    SVDD model, a hypersphere is constructed like that it can

    enclose as many target objects as possible, and minimize the

    chance of accepting the non-target data named the outlier

    objects. It is well-known that the original SVDD model

    adopts the Euclidean distance metric [7], [8], [9]. But an

    important problem in those learning algorithms based on

    Euclidean distance metric is the scale of the input variables.

    In the Euclidean case, SVDD takes all the features of the

    target class data as equivalent in training. As a result, those

    irrelevant features of the data might be considered in training

    and would mislead the data description of the SVDD model

    into an irrelevant hypersphere. Simultaneously, the SVDD

    with Euclidean distance metric does not have the ability of

    considering the prior relationship among the target data.

    Relevant metric learning was firstly developed as a Ma

    halanobis distance metric learning [3], [5], [6]. One of its

    special algorithms is Relevant Component Analysis (RCA)

    [6]. RCA is an effective linear transformation for unsuper

    vised learning. It constructs a Mahalanobis distance metric

    through using positive equivalence relationship. The positive

    Zhe Wang and Daqi Gao are with the Department of Computer Science & Engineering, East China University of Science & Technology, Shanghai, China (email: {wangzhe.gaodaqi}@ecust.edu.cn).

    This work was supported by Natural Science Foundations of China under Grant No.60903091, the High-Tech Development Program of China (863) under Grant No. 2006AAlOZ315 and the Specialized Research Fund for the Doctoral Program of Higher Education under Grant No.20090074120003 for support. This work was also supported by the Open Projects Program of National Laboratory of Pattern Recognition and the Fundamental Research Funds for the Central Universities

    978-1-4244-6337-4/10/$26.00 @201O IEEE 97

    equivalence relationship is achieved by the covariance ma

    trices of the positive equivalence data. In RCA, the positive

    equivalence data are selected from the same chunklet. Each

    chunklet is the set in which the data come from the same

    class but without special class labels. Through the transfor

    mation based on a group of chunklets, RCA can assign large

    weights to relevant features and low weights to irrelevant

    features [6]. Unfortunately, RCA cannot make use of negative

    equivalence constraints or discriminant information. To this

    end, Yeung and Chang [10] extended RCA with both positive

    and negative equivalence relationship. Specially, the extended

    RCA was achieved through designing the so-called within

    chunklet and between-chunklet co-variance matrices. In do

    ing so, both positive and negative equivalence constraints can

    be used.

    In this paper, we introduce the extended RCA distance

    metric [10] rather than the original Euclidean distance metric

    into SVDD and therefore propose a novel Discriminant

    Support Vector Domain Description (DSVDD). In doing

    so, the presented DSVDD can inherit the advantages of

    the extended RCA. In practice, the proposed DSVDD can

    reduce the input variable scale influence due to the use

    of Mahalanobis distance metric from the extended RCA.

    Simultaneously, the proposed DSVDD can easily incorporate

    a priori discriminant knowledge due to the consideration of

    both the positive and negative equivalence data from the

    within-chunklet and between-chunklet co-variance matrices.

    In order to validate the effectiveness of the proposed DSVDD

    algorithm, we give the experimental results from both syn

    thetic and real data sets. The experimental results show the

    proposed DSVDD can bring more accurate description for

    all the tested data than the conventional SVDD.

    The rest of this paper is organized as follows. Section

    II gives the structure of the proposed DSVDD. Section III

    experimentally shows that the proposed method DSVDD can

    bring more accurate description for all the tested target cases

    than the conventional SVDD. Following that, both conclusion

    and future work are given in Section IV.

    II. DISCRIMINANT SUPPORT VECTOR DATA

    DESCRIPTION (DSVDD)

    Suppose that there is a set of one-class training samples

    {Xi }1 ]R.n. SVDD seeks such a hypersphere that can contain all the samples {Xi }1 and meanwhile minimize the volume of the hypersphere through the following opti-

  • mization formulation

    min J (1)

    subjectto (Xi - af M-1(Xi - a) ::; R2 + i (2) i 0, i = 1...N (3)

    where the parameters R E lR. and a E lR.n are the radius and the center of the optimized hypersphere respectively; the

    regularization parameter C E lR. gives the tradeoff between the volume of the hypersphere and the errors; and the i E lR. are slack variables. Since SVDD adopts Euclidean distance

    metric, the matrix M E lR.nxn is an identity one with all the diagonal elements 1 and the others 0.

    It can be found that SVDD views all the features of the

    samples as equivalent. In contrast, our proposed DSVDD

    framework assigns large weights to the relevant features and

    small weights to the irrelevant features by introducing the

    relevant metric learning instead of the Euclidean metric. In

    the proposed DSVDD framework, we adopt the relevant

    metric learning defined in [10]. Firstly, the whole sample

    set {Xi}1 would be divided into some chunklets without replacement. Each chunklet is made up of those data with

    the positive equivalence relationship. If Xi, Xj belong to the same chunklet, both Xi and Xj should have the same but unknown class label. As the literature [10] does, we here

    give the so-called within-chunklet matrix Ml and between

    chunklet co-variance matrix M2 as following

    1 D nd

    Ml = N

    L L(Xjd - Xd)(Xjd - xdf (4) d=lj=1

    1 D D nd

    M2 = N(D -1)

    L L L(Xjd - Xp)(Xjd - xpf (5) d=1 p=l,p#d j=1

    where D is the size of the chunklets; nd is the number of samples in the dth chunklet; Xd is the mean of the dth chunklet. Here, since the sample set {Xi }1 is divided into D chunklets without replacement, i.e., N = L:f=1 nd. The Ml owns the positive equivalent information and the M2 owns the negative equivalent information. We define the

    following matrix

    (6)

    Then through taking the above matrix Q instead of the matrix M-1 in the equation (2), the objective function of the proposed DSVDD can be obtained. For further exploring

    the proposed DSVDD, the matrix Q can be decomposed into 1 1 1 1

    Q = (Ml M;2f(Ml M;2) Therefore, the equation (2) with Q can be rewritten as following

    (Xi - a)TQ(xi - a) 1 1 1 1

    (Xi - a)T(M22 M;2)T(Ml M;2)(Xi - a) 1 1 1 1

    [Ml M;2 (Xi - a)] T[Ml M;2 (Xi - a)] < R2 + i' i = 1...N

    98

    In this case, each Xi can be viewed as being linearly 1 1

    transformed by M22 M; 2. Ml and M2 play a similar role as reducing within-class scatter and increasing between-class

    scatter in Fisher discriminant analysis, which is also demon

    strated in [10]. Since the between-class information can be

    introduced, we call the proposed method as Discriminant

    Support Vector Data Description (DSVDD).

    In order to optimize the parameters R, a, i' we construct the Lagrangian function through introducing Lagrangian

    multipliers ai, 'Yi and taking the equation (2), (3), (6) into (1), and thus get

    N N

    L = R2+CLi-LadR2+i- (Xi-afQ(Xi-a)]-L 'Yii i=1 i=1

    (7)

    where ai 0, 'Yi 0. Setting partial derivatives of L with respect to R, a, i to 0, we can get

    8L =0 8R

    8L =0 8a 8L =0 8i

    =}

    =}

    =}

    N

    Lai= 1 (8) i=1

    N

    a = Laixi (9) i=1

    'Yi = C - ai (10)

    Further, we take the constraints (8), (9), (10) into the La

    grange function (7) and obtain the maximized criterion as

    following

    s.t. i,j

    ::; ai ::; C, i = 1...N 1 1

    Q = M;2M2M;2 (12)

    (13)

    The maximization of the equation (11) can be solved through

    Quadratic Programming (QP) [1].

    A test sample z E lR.n is classified as the target class when the relevant distance II z -a II Q between the sample z to the center a of the hypersphere is smaller than or equal to the radius R, i.e.,

    II z - a II= (z - a)TQ(z - a) ::; R2 (14)

    The radius R can be calculated from the center a of the hypersphere to the sample on the hypersphere bound. In

    mathematics, the radius R is given as following

    (15)

    where, Xi is the sample from the set of support vectors, i.e., its Lagrangian multiplier < ai < C.

    III. EXPERIMENTS

    In order to validate the effectiveness of the proposed

    DSVDD, we compare the DSVDD with the original SVDD

    in terms of both synthetic and UCI data set [2]. Both DSVDD

    and SVDD adopt the linear kernel k(Xi,Xj) = xTxj, polynomial kernel (Poly) k(Xi' Xj) = (xT Xj + l)P and radial

  • 10r----------------. 10.------------------.

    5

    -10

    0=[0.1500, 0.0500[ f =[0.9444, 0.8500[

    (a)

    5

    -10

    0=[0.0300, 0.0500[ f =[0.9510, 0.9700[

    *

    (b)

    -15--------------

    -15------------------ -5 o

    Feature 1 5 -5 0 5

    Feature 1

    10r----------------. 10r----------------'

    5

    -10

    0=[0.0300,0.0500[ f =[0.9510, 0.9700[

    *

    (e)

    -15---------------- -5 0 5

    Feature 1

    5

    -10

    0=[0.0200, 0.0600] f =[0.9423, 0.9800]

    *

    (d)

    -15--------------

    -5 0 5 Feature 1

    Fig. 1. The classification boundaries of the SVDD and the proposed DSVDD with D = 2,4, 50, respectively. The sub-figure (a) corresponds to SVDD with the classification result e = [0.1500,0.0500]' f = [0.9444,0.8500]; the sub-figure (b) corresponds to DSVDD with D = 2 and the classification result e = [0.0300,0.0500]' f = [0.9510,0.9700]; the sub-figure (c) corresponds to DSVDD with D = 4 and the classification result e = [0.0300,0.0500]' f = [0.9510,0.9700]; the sub-figure (d) corresponds to DSVDD with D = 50 and the classification result e = [0.0200,0.0600], f = [0.9423,0.9800].

    TABLE I

    THE AVERAGE AUe VALUES AND THEIR CORRESPONDING STANDARD DE VIATIONS OF TEN INDEPENDENT RUNS FOR TAE, WATER AND SONAR.

    THE LARGER THE VALUE OF THE AUe, THE BETTER THE PERFORMANCE OF THE CORRESPONDING ONE-CLASS CLASSIFIER.

    Class No. SVDD DSVDD

    Linear Poly RBF Linear Poly REF

    TAE

    1 0.610.17 0.600.17 0.690.20 0.730.14 0.670.16 0.830.11

    2 0.450.19 0.47O.l7 0.540.14 0.480.19 0.50 0.17 0.530.14

    3 0.470.17 0.43O.l7 0.550.15 0.620.15 0.510.19 0.930.1O

    Total 0.5100 0.5000 0.5933 0.6134 0.5575 0.7641 WATER

    1 0.520.29 0.630.34 0.880.11 0.740.19 0.740.24 0.910.13

    2 0.810.16 0.650.27 0.890.07 0.910.1O 0.680.19 0.960.06

    Total 0.6650 0.6400 0.8850 0.8233 0.7102 0.9357 SONAR

    1 0.530.17 0.610.12 0.630.18 0.640.20 0.630.19 0.n0.19

    2 0.500.25 0.500.19 0.610.22 0.680.19 0.760.18 0.800.16

    Total 0.5180 0.5549 0.6202 0.6588 0.6962 0.7622

    99

  • basis kernel (RBF) k(X i,Xj) = exp(-lla-bI12ja2). All computations were run on Pentium IV 2.IO-GHz processor

    running, Windows XP Professional and MATLAB environ

    ment.

    First, we implement the experiments on synthetic data. In

    one-class classification problem here, we adopt the vectors

    e, f E ]R2 to measure the performance of the one-class classifier, where e(1) gives the False Negative (FN) rate (the error on the target class), e(2) gives the False Positive (FP) rate (the error on the outlier class), f(1) gives the ratio between the sample number of correct target predictions and

    the sample number of target predictions, and f(2) gives the ratio between the sample number of correct target predictions

    and the sample number of target samples.

    The synthetic data used here are made of a two

    dimensional two-class data set, where the target class is

    generated as a banana shaped distribution with 100 samples

    and the outlier class is generated with a normal distribution

    with mean I and a standard deviation sqrt( 1.5). The target

    data are uniformly distributed along the bananas and are

    superimposed with a normal distribution. Figure 1 gives the

    classification boundaries of the SVDD and the proposed

    DSVDD with the size D = 2, 4, 50 of the chunklets, respectively. From Figure I, we can find that 1) the DSVDD

    has a significant superior advantage to SVDD in terms of

    FN; 2) the performance of the DSVDD is not sensitive to

    the parameter D.

    Further, we also report the experimental results of

    the proposed DSVDD and SVDD on some real data

    TAE (3 classes/I 5 1 samples/5 features) [2], WATER

    (2 classesl116 samples/38 features) [2] and SONAR (2

    classes/208 samples/60 features) which is available at

    ftp://ftp.cs.cmu.edu/afs/cs/projectlconnectlbenchl. The size

    D of the chunklets in each classification problem is set to the size of the classes. Here, we adopt the average value

    of Area Under the Receiver Operating Characteristics Curve

    (AUC) as the measure criterion for the performance of

    one-class classifiers [4]. It is known that a good one-class

    classifier should have a small FP and a high True Positive

    (TP) [7], [8], [9]. Thus, we prefer one classifier with higher

    AUC to another one with lower AUe. It means that for the

    specific FP threshold, the TP is higher for the first classifier than the second classifier. Thus the larger the value of the

    AUC, the better the corresponding one-class classifier. In our

    experiments, the value of the AUC belongs to the range [0, 1]. Table I gives the average AUC values and their corresponding

    standard deviations of the proposed DSVDD and SVDD of

    ten independent runs for the data sets. The best values of

    the AUC is denoted with bold. Both DSVDD and SVDD

    adopt linear, polynomial and radial basis kernels. The label

    of a target data class is indicated in the first column. In each

    classification, we take one class as the target class and the

    other classes as the outlier data. From this table, it can be

    found that the proposed DSVDD has a significantly superior

    classification to SVDD in all the tested cases. The results

    validate that the discriminant prior knowledge is induced into

    100

    the proposed DSVDD framework.

    IV. CONCLUSION AND FUTURE WORK

    In this paper, we propose a novel Discriminant SVDD

    named DSVDD. DSVDD adopts the relevant metric learning

    instead of the original Euclidean distance metric learning.

    In doing so, the proposed DSVDD assigns large weights

    to the relevant features and tights the similar data through

    incorporating both the positive and negative equivalence

    prior knowledge. The experimental results validate that the

    proposed DSVDD significantly improves the effectiveness

    of the one-class classifier. In future, we plan to extend our

    work to large scale classification cases and make a further

    exploration.

    REFERENCES

    [1] F. Alizadeh and D. Goldfarb, "Second-order cone programming," MathelfUltical Programming, vol. 95, pp. 3-51, 2003.

    [2] A. Asuncion and D. Newman, UCI machine learning repository [http://www.ics.uci.edulmlearnlmlrepository.html}. School of Information and Computer Science, University of California, Irvine, CA, 2007.

    [3] A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall, "Learning distance functions using equivalence relations," Proceeding International Conference on Machine Learning, 2003.

    [4] A.P. Bradley, "The use of the area under the ROC curve in the evaluation of machine learning algorithms," Pattern Recognition, vol. 30, no. 7, pp. 1145-1159, 1997.

    [5] J. Goldberger, Roweis, G. Hinton, and R. Salakhutdinov, "Neighbourhood components analysis," Advances in Neural Information Processing Systems, 2005.

    [6] N. Shental, T. Hertz, D. Weinshall, and M. Pavel, "Adjustment learning and relevant component analysis," Proceeding of the European Conference on Computer Vision, 2002.

    [7] D. Tax and R.P.w. Duin, "Support vector domain description," Pattern Recognition Letters, vol. 20, no. 14, pp. 1191-1199, 1999.

    [8] D. Tax and R.P.W. Duin, "Support vector data description," Machine Learning, vol. 54, pp. 45-66, 2004.

    [9] D. Tax and P. Juszczak, "Kernel whitening for one-class classification," International Journal of Pattern Recognition and Artificial Intelligence, vol. 17, no. 3, pp. 333-347, 2003.

    [10] D. Yeung and H. Chang, "Extending the relevant component analysis algorithm for metric learning using both positive and negative equivalence constraints," Pattern Recognition, vol. 39, no. 5, pp. 1007-1010, 2006.

Recommended

View more >