convex sets as prototypes for classifying patterns

8
Convex sets as prototypes for classifying patterns Ichigaku Takigawa a, , Mineichi Kudo b , Atsuyoshi Nakamura b a Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan b Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido 060-0814, Japan article info Article history: Received 31 October 2007 Accepted 1 May 2008 Available online 30 November 2008 Keywords: Pattern classification Nonparametric method Convex sets Convex hulls Minimum enclosing balls Set covering Boundary projection abstract We propose a general framework for nonparametric classification of multi-dimensional numerical patterns. Given training points for each class, it builds a set cover with convex sets each of which contains some training points of the class but no points of the other classes. Each convex set has thus an associated class label, and classification of a query point is made to the class of the convex set such that the projection of the query point onto its boundary is minimal. In this sense, the convex sets of a class are regarded as ‘‘prototypes’’ for that class. We then apply this framework to two special types of convex sets, minimum enclosing balls and convex hulls, giving algorithms for constructing a set cover with them and for computing the projection length onto their boundaries. For convex hulls, we also give a method for implicitly evaluating whether a point is contained in a convex hull, which can avoid computational difficulty for explicit construction of convex hulls in high-dimensional space. & 2008 Elsevier Ltd. All rights reserved. 1. Introduction The goal of pattern classification is to create a computer program (classifier) that can automatically and precisely classify incoming observed patterns into one of the predefined classes. Developing computational methods for automatic classification are becoming more inevitable since accelerated computerization in recent years produces tons of real data to be automatically classified in many practical situations. The number of Internet users and the data traffic on the Internet are rapidly increasing, with driving demand for automatic classification techniques even at user level to handle spam e-mails and examine data on the PC, as well as in industrial use to analyze access logs, buying histories for online shopping, and financial data. For automatic pattern classification, many methods have been proposed in the field of machine learning and data mining (Bishop, 2006; Duda et al., 2001), which usually take data-driven approach, partly incorporating model-driven inference using background knowledge on the problem. We follow the standard setting of supervised problem in this paper: For the given problem, instances with the correct class labels, training samples, are collected beforehand, and can be used for building the classifier. We consider the patterns described as d numerical measurements (features), which means that the patterns are given as points in d-dimensional real space R d . The quality of real data in recent practical situations are often unsatisfactory in statistical sense. In many cases, they are collected simply in an automatic manner and not manually tested well. Hence, direct application of conventional statistical models to these data often results in insufficient performance. Standard parametric models are inconsistent with the nature of these data and for this reason often unsuccessful in these cases. Nonpara- metric methods are, on the other hand, fully data-driven and free from the type of probability distribution of the data. Computa- tional power of recent computers enables us to apply nonpara- metric methods to many practical problems, showing the successful results by methods such as nearest neighbor method and support vector machine both of which were thought to be unrealistic several decades ago. In this paper, we describe a novel framework for nonpara- metric pattern classification in R d . Observed data in practical cases are so heterogeneous that they often have no consistent nature sharing in the entire data. Problems on these data hence should be solved more locally using only appropriate subset of the data as also seen in conventional mixture modeling (McLachlan and Peel, 2000). The observed patterns in R d are mostly continuous, and locally preserve similarity of the nature. Given a classification problem, we represent local regions which are likely to contain only points belonging to a single specific class by convex sets such as the minimum enclosing balls or the convex hulls defined by those points. Covering all training points of each class with a collection of such convex sets gives a nonparametric classifier. Since each convex set has an associated class label, classification of a query point can be made to the class of the ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/engappai Engineering Applications of Artificial Intelligence 0952-1976/$ - see front matter & 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2008.05.012 Corresponding author. Tel.: +81774 38 3024; fax: +81774 38 3037. E-mail address: [email protected] (I. Takigawa). Engineering Applications of Artificial Intelligence 22 (2009) 101–108

Upload: ichigaku-takigawa

Post on 26-Jun-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Convex sets as prototypes for classifying patterns

ARTICLE IN PRESS

Engineering Applications of Artificial Intelligence 22 (2009) 101–108

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence

0952-19

doi:10.1

� Corr

E-m

journal homepage: www.elsevier.com/locate/engappai

Convex sets as prototypes for classifying patterns

Ichigaku Takigawa a,�, Mineichi Kudo b, Atsuyoshi Nakamura b

a Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japanb Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido 060-0814, Japan

a r t i c l e i n f o

Article history:

Received 31 October 2007

Accepted 1 May 2008Available online 30 November 2008

Keywords:

Pattern classification

Nonparametric method

Convex sets

Convex hulls

Minimum enclosing balls

Set covering

Boundary projection

76/$ - see front matter & 2008 Elsevier Ltd. A

016/j.engappai.2008.05.012

esponding author. Tel.: +81774 38 3024; fax:

ail address: [email protected] (I. T

a b s t r a c t

We propose a general framework for nonparametric classification of multi-dimensional numerical

patterns. Given training points for each class, it builds a set cover with convex sets each of which

contains some training points of the class but no points of the other classes. Each convex set has thus an

associated class label, and classification of a query point is made to the class of the convex set such that

the projection of the query point onto its boundary is minimal. In this sense, the convex sets of a class

are regarded as ‘‘prototypes’’ for that class. We then apply this framework to two special types of convex

sets, minimum enclosing balls and convex hulls, giving algorithms for constructing a set cover with

them and for computing the projection length onto their boundaries. For convex hulls, we also give a

method for implicitly evaluating whether a point is contained in a convex hull, which can avoid

computational difficulty for explicit construction of convex hulls in high-dimensional space.

& 2008 Elsevier Ltd. All rights reserved.

1. Introduction

The goal of pattern classification is to create a computerprogram (classifier) that can automatically and precisely classifyincoming observed patterns into one of the predefined classes.Developing computational methods for automatic classificationare becoming more inevitable since accelerated computerizationin recent years produces tons of real data to be automaticallyclassified in many practical situations. The number of Internetusers and the data traffic on the Internet are rapidly increasing,with driving demand for automatic classification techniques evenat user level to handle spam e-mails and examine data on the PC,as well as in industrial use to analyze access logs, buying historiesfor online shopping, and financial data.

For automatic pattern classification, many methods have beenproposed in the field of machine learning and data mining(Bishop, 2006; Duda et al., 2001), which usually take data-drivenapproach, partly incorporating model-driven inference usingbackground knowledge on the problem. We follow the standardsetting of supervised problem in this paper: For the givenproblem, instances with the correct class labels, training samples,are collected beforehand, and can be used for building theclassifier. We consider the patterns described as d numericalmeasurements (features), which means that the patterns aregiven as points in d-dimensional real space Rd.

ll rights reserved.

+81774 38 3037.

akigawa).

The quality of real data in recent practical situations are oftenunsatisfactory in statistical sense. In many cases, they arecollected simply in an automatic manner and not manually testedwell. Hence, direct application of conventional statistical modelsto these data often results in insufficient performance. Standardparametric models are inconsistent with the nature of these dataand for this reason often unsuccessful in these cases. Nonpara-metric methods are, on the other hand, fully data-driven and freefrom the type of probability distribution of the data. Computa-tional power of recent computers enables us to apply nonpara-metric methods to many practical problems, showing thesuccessful results by methods such as nearest neighbor methodand support vector machine both of which were thought to beunrealistic several decades ago.

In this paper, we describe a novel framework for nonpara-metric pattern classification in Rd. Observed data in practicalcases are so heterogeneous that they often have no consistentnature sharing in the entire data. Problems on these data henceshould be solved more locally using only appropriate subset of thedata as also seen in conventional mixture modeling (McLachlanand Peel, 2000). The observed patterns in Rd are mostlycontinuous, and locally preserve similarity of the nature. Given aclassification problem, we represent local regions which are likelyto contain only points belonging to a single specific class byconvex sets such as the minimum enclosing balls or the convexhulls defined by those points. Covering all training points of eachclass with a collection of such convex sets gives a nonparametricclassifier. Since each convex set has an associated class label,classification of a query point can be made to the class of the

Page 2: Convex sets as prototypes for classifying patterns

ARTICLE IN PRESS

I. Takigawa et al. / Engineering Applications of Artificial Intelligence 22 (2009) 101–108102

convex set such that the projection of the query point onto itsboundary is minimal. In this sense, the convex sets of a class areregarded as ‘‘prototypes’’ for that class. After briefly presenting thegeneral idea, we give practical algorithms for two special cases ofconvex set: minimum enclosing balls and convex hulls.

We presented the preliminary version of this work in Takigawaet al. (2005), focusing on minimum enclosing balls as convex sets.The idea was originated in the special case of minimum enclosingaxis-parallel boxes (Kudo et al., 1996; Takigawa et al., 2004). It isalso related with logical analysis of data (LAD) with boxes (Alexeand Hammer, 2006; Eckstein et al., 2002), ball-based combinator-ial classifier (Cannon et al., 2002; Cannon and Cowen, 2004;Priebe et al., 2003; Marchette, 2004; DeVinney, 2003), conven-tional prototype-based methods such as nearest neighbors andlearning vector quantization (Hastie et al., 2001), and classifiersbased on explicit computational geometric structure such assupport vector machine (Scholkopf and Smola, 2001).

This paper includes an improved formulation, interpretation ofthe general framework, and its application to another importantcase of convex hulls with a practical algorithm, as well aspreviously proposed algorithms for minimum enclosing balls.Recently, convex hulls spanned by the training points of each classhave been of interest in the machine learning field (Bennett andBredensteiner, 2000; Zhou et al., 2004; Mavroforakis et al., 2006)since those convex hulls can examine the separability among thegiven classes and they are also strongly associated with theseparating hyperplane of support vector machine. The second caseof convex hulls in our framework can also contribute to applyingthese nice properties of convex hulls to the practical classificationproblems.

2. Convex subclass method

Many conventional nonparametric classifiers in Rd use,whether explicitly or implicitly, some kind of geometric structuresencoding a priori knowledge behind problems. For examples,support vector machine uses a hyperplane (or a halfspace) andnearest neighbor method uses a Voronoi diagram. A hyperplane isone of computational geometric structures which are more stablydetermined in high-dimensional spaces than classifier estimatedvia complicated parametric models. These structures are surpris-ingly powerful as has been proved in many practical applications.It can be widely used even for nonlinear problems throughkernelization or modification of the metric (Scholkopf and Smola,2001).

Focusing attention on this kind of geometric intuition, weconsider representing the geometric base structure of each classby simple ‘‘convex set’’. For example, we can use boxes, balls,convex hulls, ellipsoids, halfspaces, and cylinders such thatcontain several points in a specific class but no points of theother classes. Fig. 1 shows the idea of our approach. In the leftfigure, the white points are covered with two convex sets, C1 and

Fig. 1. An illustrative example showing the idea of the convex subclass method.

C2, and the black points are covered with one convex set, C3. Eachof C1 and C2 represents a kind of ‘‘prototype’’ for the white points,while C3 represents that for the black points. Constructing theseconvex sets thus corresponds to the learning step. Once weobtained the convex sets such as C12C3, the class label of a querypoint is assigned using projection of it onto the boundaries ofC12C3. In the left figure, for a query point ‘‘a’’ located outside anyof the convex sets, the projections are shown as the lines from thepoint onto each boundary of C12C3. The point ‘‘a’’ is assigned tothe class of ‘‘white points’’ since the projection onto the boundaryof C1 is minimal among all. When a query point are located insome convex sets, as the query point ‘‘b’’, we use the negative ofprojection length. For example, the query point ‘‘b’’ is in C3, thusprojection length from ‘‘b’’ onto the boundary of C3 takes anegative value of its length, whereas the lengths of the projectionfrom ‘‘b’’ to C1 and that from ‘‘b’’ to C2 are both positive. In thiscase of ‘‘b’’, since only the projection onto the boundary C3 isnegative, this projection is minimal among all, which means thepoint ‘‘b’’ is assigned to the class of ‘‘black points’’. We call thismeasure as directed boundary projection, which results in formingthe decision boundary shown in the right figure.

These general prototypes and directed boundary projectionsonto them can not only give a stable geometric basis fornonparametric classification, but also naturally define multi-classclassification. Although using a hyperplane is the simplest way todistinguish two classes, it is still unsure whether it meets cases ofmore than three classes or not. Moreover, assuming convex hullsas convex sets, we can intuitively see that this framework gives alarge-margin classifier similar to support vector machine inlinearly separable cases, and this property holds in piecewise-linear manner even for linearly inseparable cases. In addition, as aspecial case, if we regard one single point in Rd as a convex set,the subclass method can be reduced to 1-nearest neighbormethod.

2.1. Subclass covering for target class

As a solid foundation for our framework, we give some formaldefinitions of the related notions. Suppose that two finite pointsets Oþ;O� � Rd are given as the positive and negative trainingsamples for a target class, respectively. First of all, convex set CðXÞ

covering a point set X are defined as follows.1

Definition 1 (Convex-set assignment). Let CðXÞ be some type ofconvex set defined uniquely by a point sets X � Rd and alsoincluding X. More formally, function C maps a point set X to aconvex set CðXÞ in Rd such that X � CðXÞ.

In typical cases, we use CðXÞ;X � Oþ which contains somesubset X of the positive samples Oþ but no points in the negativesamples O�. The set X covered by a ‘‘locally maximized’’ convexset CðXÞ in the following sense is called subclass set.

Definition 2 (Subclass set). Given convex-set assignment C, Oþ,and O�, a subclass set X of Oþ, against O� with respect to C, is asubset of the positive samples Oþ, i.e., X � Oþ, which satisfies thefollowing condition:

CðXÞ \O� ¼+ and CðX [ YÞ \O�a+ for any Y � OþnX.

Suppose that the set CðXÞ contains some subset of the positivesamples Oþ. Then, if further addition of another uncontainedpoint y 2 Oþ to the set X still satisfies CðX [ fygÞ \O� ¼+, i.e., ifthe convex set defined by the points X [ fyg does not contain any

1 In addition to the points X, we can also use all negative samples in order to

define CðXÞ, but we do not mention this case here.

Page 3: Convex sets as prototypes for classifying patterns

ARTICLE IN PRESS

I. Takigawa et al. / Engineering Applications of Artificial Intelligence 22 (2009) 101–108 103

points of the negative samples O�, then the set X must beexpanded to X [ fyg. In other words, the set which is ‘‘locallymaximized’’ by adding uncontained points unless the resultantconvex set contains any of the negative samples is a subclass set.

Given Oþ and O�, we can often find multiple subclass setssatisfying the condition of subclass sets. A collection of subclasssets whose union can cover all positive samples is called a subclass

cover of Oþ (against O�).

Definition 3 (Subclass cover). A subclass cover F for Oþ is acollection of subclass sets of Oþ such that Oþ �

SX2FCðXÞ, which

means that a collection F can give a set cover for Oþ. The largestsubclass cover always exists and it is unique.

Because of local maximality of each subclass set, any subclasscover F satisfies the Sperner condition ‘‘AgB for any A;B 2F’’since if we have A � B, the points BnA must be added to A, andtherefore A cannot be a subclass set. In this sense, any subclasscover is irredundant.

In many cases, however, we may find multiple subclass coversfor given Oþ and O� and have to choose one out of them forapplication to classification although the largest one always existsuniquely due to local maximality of subclass sets. Since the largestsubclass cover includes all of the possible subclass sets, onepossibility is to use this subclass cover. The practical utility of thissubclass cover is, however, still unclear because the size of thiscollection can be large. We can hence consider several types ofselecting one subclass cover from among the possible candidates,involving the following computational problems:

(1)

Find any one of subclass covers. (2) Find the largest subclass cover. (3) Find any one of the smallest subclass covers. (4) Find a smallest subclass cover the size of whose elements are

as balanced as possible.

(5) Enumerate all subclass covers and choose the desirable one

according to some predefined criterion.

Fig. 2. Examples of the effect of non-monotonicity.

In this paper, we basically take the first approach of finding anyone of subclass covers by random search, while we also show asimple algorithm for finding the largest subclass cover availablefor minimum enclosing balls. Randomized algorithm is computa-tionally efficient for these problem but instable for some convexsets which have the large degree of freedom such as convex hulls.Later, we also propose an algorithm for stably generating a bettersubclass cover using the proximity between the samples.

2.2. Classification based on directional boundary projection

Given training samples, the learning algorithm of subclassmethod constructs subclass covers for respective classes. Once weobtain a subclass cover for each class, the class label of a querypoint p is assigned by the class label of the convex set such thatthe following directional boundary projection of p onto the convexset is minimal.

Definition 4 (Directional boundary projection). Let qCðXÞ be theboundary of convex set CðXÞ. Given a query point p 2 Rd and asubclass cover F, the length of directional boundary projectiondðp;XÞ for each X 2F is given by

dðp;XÞ ¼ �sgnðp 2 CðXÞÞ � minq2qCðXÞ

kp� qk,

where k � k denotes an arbitrary norm in Rd, and sgnðAÞ ¼ 1 if A istrue, sgnðAÞ ¼ �1 otherwise. Note that dðp;XÞ can take a negativevalue.

2.3. Relaxed subclass covers

In contrast to the case of support vector machine usinghyperplanes, the proposed framework can always handle linearlyinseparable problems without kernelization. It gives, however,‘‘hard’’ discrimination which does not allow any training error,and may cause an overfitting problem in some noisy cases, or insome cases having heavily overlapping classes. Thus it is stillworth considering to tolerate small error for subclass sets in someway similar to the soft-margin penalization (regularization) insupport vector machines. This allows a subclass set to containsome negative samples to some extent. The relaxed conditiondepends on the type of Cð�Þ or the type of problem, and we give therelaxed subclass cover for balls later.

The alternative solution to perform ‘‘soft’’ discrimination isintroducing adaptive sampling such as boosting to the internalsteps of the randomized algorithm for building a subclass cover.This approach was studied in Takigawa et al. (2004) for minimumenclosing axis-parallel boxes. Since this approach is easilyapplicable to minimum enclosing balls and convex hulls, we donot mention it any more in this paper.

2.4. Monotonicity and degeneracy of convex-set assignment

As previously mentioned, we can consider various types ofconvex sets such as boxes, balls, convex hulls, ellipsoids, half-spaces, and cylinders. We here address two common issues toconsider subclass covers with these types of sets: monotonicityand degeneracy of the convex sets.

When a predefined function Cð�Þ giving convex sets satisfiesX � Y ) CðXÞ � CðYÞ, assignment Cð�Þ is said to be monotonic. Forexample, minimum enclosing axis-parallel boxes and convex hullsare monotonic, while minimum enclosing balls are nonmono-tonic. When a convex set is nonmonotonic, incremental additionof uncontained points to the subclass set not always provides‘‘local maximization’’ in strict sense. Fig. 2 shows examples ofthese situations for minimum enclosing balls. In Fig. 2(a), wealways miss the dashed circle Cðf1;2;3gÞ giving a subclass setf1;2;3g by incremental addition of the white points because anyof Cðf1;2gÞ;Cðf2;3gÞ; and Cðf1;3gÞ contains one black point. InFig. 2(b), suppose that we added uncontained white points in theindicated order with the attached numbers. The minimumenclosing ball Cðf1;2;3gÞ contains no black points, while ballCðf1;2gÞ contains one black point. Thus the order 1–3 gives asubset X ¼ f1;3g but the corresponding ball CðXÞ includes {1,2,3}.Since nonmonotonicity sometimes causes overlook of possiblesubclass set as shown in these examples, we should pay attentionto the fact that learning algorithm using incremental addition isone approximate solution in these cases although the perfor-mance in classification seems sufficiently good in the case ofminimum enclosing balls.

Page 4: Convex sets as prototypes for classifying patterns

ARTICLE IN PRESS

I. Takigawa et al. / Engineering Applications of Artificial Intelligence 22 (2009) 101–108104

On the other hand, convex hulls are monotonic and does notcause these kinds of problems. However, a convex hull in Rd isdegenerated when the number of contained points is less thandþ 1, while minimum enclosing balls are always nondegenerated.The degenerated convex sets cannot contain any other points ingeneral position. Thus it can be one of subclass sets when anyfurther addition of uncontained points fails to exclude all negativesamples. The degenerated convex sets except those consisting ofone single point are, however, mostly useless for classificationpurpose. Hence we does not include these sets into the subclasscover. We use only subclass set whose size is larger than or equalsto dþ 1, or equals to 1. After taking care of these degeneratedcases, convex hulls, which are monotonic, can provide a goodclassifier in practical use.

3. Subclass method based on minimum enclosing balls

As the first realization of our general framework, we hereconsider the subclass method based on balls. In this section, CðXÞ

denotes the minimum enclosing ball for a point set X. Theminimum enclosing ball for a set consisting of only one point isdefined as a ball centered at the point with radius 0.

3.1. Exact learning algorithm

First, we show a simple polynomial-time exact algorithmwhich can enumerate all subclass sets in the largest subclasscover, which can strictly compute the subclass cover satisfyingsome condition even for nonmonotonic cases such as balls. It isbased on the simple fact that a ball in Rd can be determined by atmost dþ 1 points (Welzl, 1991). This algorithm is, hence, alwaysavailable when CðXÞ is defined by at most m points and thisnumber m does not depend on the size jXj, including theminimum enclosing axis-parallel boxes.

Algorithm 1 (Exact algorithm).

(1)

Set H:¼fX � Oþ : jXjpdþ 1g. (2) Remove sets from H such that CðXÞ contains some negative

samples.

(3) Set F:¼fOþ \ CðXÞ : X 2Hg and eliminate duplication. (4) If A 2F is included in the other B 2F, i.e., A � B, remove A.

Roughly speaking, this algorithm first enumerates all subsets ofsize at most dþ 1 which can exclude all negative samples. Then,we can obtain a subclass cover F as an irreducible set withrespect to the Sperner condition. Thus we can see that thisalgorithm gives the largest subclass cover, i.e., the resultantsubclass cover includes all possible subclass sets.

The number of subsets of size at most dþ 1 is polynomial withrespect to the size of inputs jOþj because ðjO

þj

m Þ ¼ jOþj � ðjOþj � 1Þ

� � � ðjOþj �mþ 1Þ=m!. Assuming that the other steps require onlypolynomial-time, Algorithm 1 is polynomial-time computable.This is one of the advantages against the constrained class covers,which results in NP-hard problem (Cannon and Cowen, 2004;Cannon et al., 2002).

3.2. Randomized incremental learning algorithm

For practical use of the framework, we still need to improve thecomputational cost for finding a subclass cover because the cost ofAlgorithm 1 is still computationally demanding even if it ispolynomial-time computable.

Instead, we can uses the original idea of incremental additionof uncontained points for building a subclass set. Becauseof the nonmonotonicity of minimum enclosing balls, thisapproach may miss some types of subclass sets and sometimesgenerates duplicated subclass sets in the obtained subclass cover(The operation Oþ \ CðXÞ in (2)-(c) and procedure (3) of Algorithm2 are required for resolving this problem). Nevertheless, subclasscovers can be quickly obtained by this algorithm, which workswell in most classification problems.

Algorithm 2 (Randomized incremental algorithm).

(1)

Set T + for storing tested points, F + for output,respectively.

(2)

Repeat the following until OþnT ¼+ is satisfied:(a) Select randomly x 2 OþnT, set X fxg.(b) For all x 2 Oþnfxg in random order, do the following

sequentially: If a point set X [ fxg can exclude O� then, set

X X [ fxg.(c) F F [ fOþ \ CðXÞg andT T [ ðOþ \ CðXÞÞ.

(3)

Remove duplication of subclass sets in F if any.

3.3. Directional boundary projection onto balls

After learning a subclass cover for each class, the classificationof a query point is based on the directional boundary projectiondefined in Definition 4. We describe this projection for minimumenclosing balls. The length of directed boundary projectionof a query point p onto ball Bðc; rÞ centred at c with radius r isdefined by

dðp;Bðc; rÞÞ:¼kp� ck2 � r

when using 2-norm for k � k. If the point p is in Bðc; rÞ, the value ofdðp;Bðc; rÞÞ is negative.

Suppose that training samples are assigned to one of m classes.First, the learning algorithm build a subclass cover of each class,then m subclass covers F1; . . . ;Fm are obtained after the learningstep. The class label to a query point p is assigned by

arg mini2f1;2;...;mg

minX2Fi

dðp;CðXÞÞ

using the obtained subclass covers F1; . . . ;Fm. This allows notraining error when the exclusion of negative samples are perfect.Under this quasi-distance, the subclass balls play a role as‘‘prototypes’’ for the corresponding class.

3.4. Relaxed subclass cover for balls

As already mentioned in Section 2.3, perfect exclusion ofnegative samples often yields overfitting in practical problems.Thus we often need the relaxed version of exclusion condition totolerate small training error. We give relaxed subclass covers forballs as follows.

For both Algorithm 1 and 2, exclusion condition CðXÞ \O� ¼+ can be relaxed to some extent. In soft-classification version, fora given parameter x, the statement ‘‘Bðc; rÞ can contain no negativesamples’’ can be relaxed to the case of satisfying the followingcondition:

r ¼ 0 orX

y2O�max 0;1�

ky� ck2

r

� �px.

The summation in left-hand side of the second condition indicatesthe accumulated amount of violation each of which is measured

Page 5: Convex sets as prototypes for classifying patterns

ARTICLE IN PRESS

I. Takigawa et al. / Engineering Applications of Artificial Intelligence 22 (2009) 101–108 105

by the projection length onto the boundary of the ball from aviolated inner point y 2 O�.

From the definition, when x ¼ 0, it is consistent with theperfect exclusion of negative samples (i.e. hard-discriminationversion). In addition, we consider the second additional condition:For a given parameter d,

d4# of containing negative samples

# of all negative samples.

This additional condition is sometimes needed for avoidingexcessive incorporation of negative samples to the subclass ballwhen we use the relaxed condition. When the numbers of trainingsamples of each class are unbalanced or when the variances ofeach class distribution are very different, this second condition ishelpful to avoid over-relaxation.

3.5. Minimum enclosing ball computation

For implementing Algorithm 1 and Algorithm 2, an efficientmethod computing the minimum enclosing ball for a given pointset is required. Computation of the minimum enclosing ball has along history (Welzl, 1991) and many algorithms have beendeveloped. Recently, computation in higher-dimensional spaceor computation for large-scale problem has been studied. Ourimplementation is based on the simple algorithm (Gartner andSchonher, 1999) which works efficiently for do30. For morehigher-dimensional problems, we can use alternatively thecomputational geometric method (Fischer et al., 2003) or theaggregation function method and second-order cone program-ming-based method (Zhou et al., 2005).

Fig. 3. (a) An example dataset consisting of two classes, (b) the subclass covers of

each class by Algorithm 2 including the ‘‘spiky’’ convex hulls, (c) the minimum

spanning trees of each class for the complete proximity graphs on the positive

samples, and (d) the subclass cover of each class by Algorithm 3 using the trees

in (c).

4. Subclass method based on convex hulls

As the second but more important case, we here describe thesubclass method based on convex hulls. Suppose that the positivesamples Oþ and the negative samples O� are given for training.For an N point set X:¼fx1; x2; . . . ; xNg � RM , we identify the set X

with an M � N matrix whose i-th column vector is xi. For anM-tuple vector z ¼ ðz1; z2; . . . ; zMÞ 2 R

M, we denote zX0 forrepresenting that ziX0 for all i ¼ 1;2; . . . ;M. The notation �>

indicates the transpose of matrix or vector. The symbol 1 denotesa vector of ones.

Let us consider convex hulls of subsets X � Oþ. We denote theconvex hull of a point set X as convðXÞwhich is defined as follows:

convðXÞ:¼fXljlX0;1>l ¼ 1g.

In other words, any y 2 convðXÞ can be represented as aconvex combination of all x 2 X, that is, y ¼

PNi¼1lixi for

X ¼ fx1; x2; . . . ; xNg.

4.1. Learning algorithm

Explicit construction of a convex hull convðXÞ for a given pointsets X in high-dimensional spaces, i.e., determining all facets ofconvðXÞ is known to be costly in the field of computationalgeometry. Our problem of finding a subclass cover, however, doesnot necessarily require the convex hulls in explicit form. All weneed for constructing learning algorithm is only a method fordetermining whether a query point p 2 Rd is in convðXÞ or not.This change in thinking can make the learning problem ofsubclass sets with convex hulls efficiently solvable even in high-dimensional spaces. From the above definition of a convex hull,we can first observe the following fact.

p 2 convðXÞ 3 fljp ¼ Xl;lX0; 1>l ¼ 1ga+.

This problem is exactly the same as the problem for finding aninitial feasible solution in linear programming, which is alsoknown as Phase I problem in the simplex method, whether anyfeasible solution l exists or not which satisfies the linearconstraints p ¼ Xl; 1>l ¼ 1; lX0. This problem can also be solvedusing the following linear program:

minlX0;yX0

1>yDp

1

� ����� ¼DX

1>

� �lþ y

� �(1)

starting with a trivial feasible solution with l ¼ 0 and y ¼ ðDp1 Þ,

where D is a diagonal matrix whose ði; iÞ element is sgnðpiÞ, wheresgnðpiÞ ¼ 1 if piX0 and sgnðpiÞ ¼ �1 otherwise, which is simplyfor transforming the constraint Xl ¼ p into the standard formDXl ¼ Dp with DpX0. If the optimum for (1) is nonzero, then wecan conclude fljp ¼ Xl; lX0; 1>l ¼ 1g ¼+, which directlymeans peconvðXÞ.

Since ‘‘If a point set X [ fxg can exclude O�’’ in Algorithm 2corresponds to convðX [ fxgÞ \O� ¼+, that also equals topeconvðX [ fxgÞ for all p 2 O�. Thus we can also use Algorithm 2for learning a subclass cover with convex hulls, solving the linearprogram (1) for each p 2 O�.

When we tried this approach for several datasets using Algorithm2, then the ‘‘spiky’’ convex hulls were often observed due to fullrandomization of the order of addition of positive points inAlgorithm 2. For example, for the data in Fig. 3(a), the obtainedsubclass cover by Algorithm 2 contains spiky convex hulls as seen inFig. 3(b). Since these spiky convex hulls are often unnecessary toobtain good classification performance, we propose another techni-que to avoid these hulls. First, as shown in Fig. 3(c), we build theminimum spanning tree on the complete proximity graph on Oþ

which is a complete graph on node set Oþ and the edge weightsbetween node p 2 Oþ and node q 2 Oþ are given by 2-normdistance kp� qk2. The point x 2 OþnT in (2)-(a) of Algorithm 2 mustbe located in some node of this tree, then we can traverse all theother nodes x 2 Oþnfxg in (2)-(b) by breadth-first search from nodex. This order of point addition reflects the closeness of the points:Roughly speaking, this procedure can add points x in order ofcloseness to the starting node x, and hence can remove ‘‘spiky’’convex hulls as shown in Fig. 3(d). The algorithm for finding asubclass cover is summarized as the following psuedocode.

Algorithm 3 (Tree-order incremental algorithm).

(1)

Set T + for storing tested points, F + for output,respectively.

(2)

Build the minimum spanning tree for the complete proximity

graph on Oþ.

(3) Repeat the following until OþnT ¼+ is satisfied:

(a) Select randomly x 2 OþnT, set X fxg.(b) Traverse nodes x 2 Oþnfxg of the minimum spanning tree

from node x in breadth-first manner: If yeconvðX [ fxgÞ for

all y 2 O� then, set X X [ fxg.

Page 6: Convex sets as prototypes for classifying patterns

ARTICLE IN PRESS

-3

-2

-1

0

1

2

3

4

-3

-2

-1

0

1

2

3

4

Fig.(Prie

I. Takigawa et al. / Engineering Applications of Artificial Intelligence 22 (2009) 101–108106

(c) X ¼ fxg if jXjodþ 1.(d) F F [ fXg and T T [ X.

4. Su

be e

4.2. Directional boundary projection onto convex hulls

For classification of a query point after learning a subclasscover with convex hulls, we need to compute the directionalboundary projection defined in Definition 4. As mentioned inthe beginning of this section, a convex hulls of a point set X isdefined by

convðXÞ:¼fXljlX0; 1>l ¼ 1g,

while its complement of convðXÞ in Rd is defined by

RdnconvðXÞ ¼ fyjX>yþ a1X0; y>yþ ao0; kyk� ¼ 1g,

where k � k� is the dual of an arbitrary norm k � k on Rd

(Mangasarian, 1999). Then, the directional boundary projectionof a query point p onto convðXÞ is directly defined by the followingformula:

dðp;XÞ ¼

minq2convðXÞ

kp� qk if peconvðXÞ;

� minq2Rd

nconvðXÞkp� qk if p 2 convðXÞ:

8><>: (2)

The projection of the point peconvðXÞ is simply defined as

minlX0fkp� Xlk j1>l ¼ 1g, (3)

whereas the projection of a point in a convex hull onto theboundary of the hull is NP-hard for the 2-norm and the 1-normbut solvable by a single linear program for the 1-norm (Freundand Orlin, 1985; Gritzmann and Klee, 1993; Briec, 1997).

We use 1-norm for directional boundary projection ontoconvex hulls both from the outside and from the inside of a hull.Thus the projection of a query point peconvðXÞ onto qconvðXÞ(from the outside of a convex hull) shown in (3) is given as thefollowing linear program:

minlX0;yX0

f1>yjp� Xlpy; p� XlX� y; 1>l ¼ 1g (4)

Subclass balls

-2 0 2 4 -2 0 2 4

-3

-2

-1

0

1

2

3

4

-2 0 2 4 -2 0 2 4

-3

-2

-1

0

1

2

3

4

Relaxed balls

bclass covers and the corresponding decision boundaries. Relaxed subclass c

t al., 2003) are also shown.

because the absolute value jaj of a real value a can be defined asthe minimum value satisfying both �jajpa and apjaj. Similarly,the projection of a query point p 2 convðXÞ onto qconvðXÞ (fromthe inside of a convex hull) is given as the following linearprogram (Mangasarian, 1999; Tuenter, 2002):

mini2f1;2;...;dg;s¼�1

min�1pyp1;a2R

fp>yþ ajX>yþ a1X0;yi ¼ sg� �

. (5)

To compute the value of (2) for a query point p, i.e., to compute thedirectional boundary projection of p onto convðXÞ, the value of (4)is first computed without determining whether p 2 convðXÞ ornot. If the value of (4) is nonzero, that value equals to the value of(2). Only when the value of (4) is zero, the value of (5) is furthercomputed and the value of (2) equals to the negative of that valueof (5).

As a result, the class label out of m classes to a query point p isassigned by

arg mini2f1;2;...;mg

minX2Fi

dðp;XÞ

using formula (2) with the obtained subclass covers F1; . . . ;Fm.

5. Numerical examples

Fig. 4 shows illustrative examples of subclass cover classifica-tion in two-dimensional case, including the result by another setcover algorithm with balls, called class cover catch digraph (CCCD)method (Priebe et al., 2003), for comparison. The subclass coverswith balls shown here were built by Algorithm 1, and those withconvex hulls by Algorithm 3. The classification by subclasscovering with convex hulls was based on 1-norm boundaryprojection. Since CCCD method proposed much simpler classifica-tion criterion in the original paper, we used the directionalboundary projection onto CCCD balls for assigning class labels toquery points instead (the result using the original classificationprocedure on the same dataset were shown in Takigawa et al.(2004)). CCCD balls are centered at any one of the positivesamples, while any subclass balls are free from this restriction.

-2 0 2 4

-3

-2

-1

0

1

2

3

4

-2 0 2 4

-3

-2

-1

0

1

2

3

4

-2 0 2 4

-3

-2

-1

0

1

2

3

4

-2 0 2 4

-3

-2

-1

0

1

2

3

4

Subclass hullsCCCD balls

overs with balls of x ¼ 1, and class covers with class cover catch digraph method

Page 7: Convex sets as prototypes for classifying patterns

ARTICLE IN PRESS

Table 1Correct classification rate by 10-fold cross-validation (correct number).

Subclass method SVM k-NN

Balls Balls (x ¼ 0:5) Hulls C ¼ 1 C ¼ 100

d g g k

– 0:1 0:01 0:25 0:01 0:25 1 5

iris 96.0 90.0 94.0 96.0 88.0 96.0 96.0 94.7 96.0 96.0

(150) (144) (135) (141) (144) (133) (144) (144) (142) (144) (144)

glass 72.0 63.1 65.0 67.3 50.5 68.7 67.3 61.9 70.1 65.9

(214) (154) (135) (139) (144) (108) (147) (144) (136) (150) (141)

wine 94.9 89.9 94.9 97.2 97.8 96.6 96.1 97.2 94.9 96.1

(178) (169) (160) (169) (173) (174) (172) (171) (173) (169) (171)

I. Takigawa et al. / Engineering Applications of Artificial Intelligence 22 (2009) 101–108 107

These subclass sets are locally maximized as long as they containno points of the other classes, and hence, the decision boundariesare characterized as an equilibrium between maximization of theeach convex sets. We can see that the relaxed subclass coverbecomes a smaller collection than the strict subclass cover.Moreover, convex hulls have more degree of freedom of shapethan that for minimum enclosing balls, the number of subclasssets in the subclass cover are much smaller. These examplessuggest the advantage for the use of computational geometricsets, also implying that nonparametric learning via convexsubclass method can avoid instability on parameter estimationand that by random initialization that are commonly appearedin complicated parametric models for linearly inseparableproblems.

For more quantitative evaluation, we also performed 10-foldcross validation on the three higher-dimensional data in UCImachine learning repository (Blake and Merz, 1998): iris(4 features, 3 classes, 150 samples), glass (9 features, 6 classes,214 samples), and wine (13 features, 3 classes, 178 samples). Thissmall benchmark set was selected because our implementationused an algorithm for minimum enclosing ball computation(Gartner and Schonher, 1999) which is appropriate for less than30-dimensional data and we focused on only numerical patterns,although subclass covering algorithm for convex hulls is scalablewith respect to the number of features (dimensionality ofpatterns). Table 1 presents the five types of results: (1) subclassmethod for balls, (2) subclass method for relaxed balls, (3)subclass method for convex hulls, (4) support vector machinewith Gaussian kernel Kðx; yÞ:¼ expð�gkx� yk2Þ and a regulariza-tion parameter C, and (5) k-nearest neighbor method. Forcomputing subclass cover with balls, Algorithm 2 was used, whilefor that with convex hulls, Algorithm 3 was used. The results bysubclass methods were comparable with the conventionalmethods such as support vector machine and nearest neighbormethod, showing better results in some cases.

The subclass method without relaxation has no parameters forlearning a subclass cover, and the result depends only on thenature of the given data. Thus we may know how far the givenproblem is from easily ‘‘separable’’ cases with the applied convexsets by the number of generated subclass sets in a subclass cover.It is difficult to reveal the nature of high-dimensional datasetbecause of impossibility for direct visualization, and thus, theseinformation based on totally separable subsets can contribute to abetter understanding of the nature of given high-dimensional datato be classified into several classes. Our framework of the convexsubclass method is hence also informative for this kind ofexploratory data analysis in practical situations.

6. Conclusion

We proposed a novel nonparametric classification frameworkof the convex subclass method with general formulation. Accord-ing to this framework, we gave two types of practical algorithmsfor learning a subclass cover with the minimum enclosing ballsand with the convex hulls, also discussing the relaxation of thecondition for ‘‘soft’’ discrimination and the several problemarising in the use of nonmonotonic convex sets. For learning asubclass cover with convex hulls, we described a new techniqueusing the minimum spanning tree for the complete proximitygraph on the positive samples, which can provide practical usefulsubclass covers. We also gave the implementation of thedirectional boundary projection onto the convex hulls with1-norm via linear programming. Nonparametric way of thesubclass method can give the stable result without parametertuning, with providing another types of information by thenumber of generated subclass sets for knowing how separablethe given problem is even for high-dimensional data. Theadditional evaluation with typical numerical datasets showedthat the proposed method is comparable with the conventionalmethod such as support vector machine and nearest neighbormethod, showing better performance in some cases. The convexhulls or the minimum enclosing balls defined by the positivesamples and the negative samples have been of interest in thegeometric understanding of support vector machine (Bennett andBredensteiner, 2000; Zhou et al., 2004; Mavroforakis et al., 2006;Vapnik, 1998; Devroye et al., 1996), and the subclass method withconvex hulls or that with minimum enclosing balls can directlyutilize these computational geometric structures and can handlethe linearly inseparable problems without kernelization, provid-ing a different type of large-margin classifier. It can then alsoprovide a considerable advantage compared to the kernelizedlinear method such as support vector machine.

Acknowledgments

This work is supported by the Grant-in-Aid for Young Scientists(B) 20700134, The Ministry of Education, Culture, Sports, Science,and Technology, Japan.

References

Alexe, G., Hammer, P.L., 2006. Spanned patterns for the logical analysis of data.Discrete Applied Mathematics 154 (7), 1039–1049.

Page 8: Convex sets as prototypes for classifying patterns

ARTICLE IN PRESS

I. Takigawa et al. / Engineering Applications of Artificial Intelligence 22 (2009) 101–108108

Bennett, K.P., Bredensteiner, E.J., 2000. Duality and geometry in SVM classifiers. In:ICML’00: Proceedings of the 17th International Conference on MachineLearning, San Francisco, CA, USA. Morgan Kaufmann, Los Altos, CA, pp. 57–64.

Bishop, C.M., 2006. Pattern Recognition and Machine Learning (InformationScience and Statistics). Springer, New York, Secaucus, NJ, USA.

Blake, C., Merz, C., 1998. UCI repository of machine learning databases.Briec, W., 1997. Minimum distance to the complement of a convex set: Duality

result. Journal of Optimization Theory and Applications 93 (2), 301–319.Cannon, A.H., Cowen, L.J., 2004. Approximation algorithms for the class cover

problem. Annals of Mathematics and Artificial Intelligence 40, 215–223.Cannon, A.H., Ettinger, J.M., Hush, D., Scovel, C., 2002. Machine learning with data

dependent hypothesis classes. Journal of Machine Learning Research 2,335–358.

DeVinney, J.G., 2003. The class cover problem and its application in patternrecognition. Ph.D. Thesis, The Johns Hopkins University.

Devroye, L., Gyorfi, L., Lugosi, G., 1996. A Probabilistic Theory of PatternRecognition. Springer, New York.

Duda, R.O., Hart, P.E., Stork, D.G., 2001. Pattern Classification, second ed. Wiley,New York.

Eckstein, J., Hammer, P.L., Liu, Y., Nediak, M., Simeone, B., 2002. The maximum boxproblem and its application to data analysis. Computational Optimization andApplications 23 (3), 285–298.

Fischer, K., Gartner, B., Kutz, M., 2003. Fast smallest-enclosing-ball computation inhigh dimensions. In: Proceedings of the 11th Annual European Symposium onAlgorithms (ESA).

Freund, R.M., Orlin, J.B., 1985. On the complexity of four polyhedral set contain-ment problems. Mathematical Programming 33, 139–145.

Gartner, B., Schonher, S., 1999. Fast and robust smallest enclosing balls. In:Proceedings of the 7th Annual European Symposium on Algorithms (ESA).Lecture Notes in Computer Science, vol. 1643. Springer, Berlin, pp. 325–338.

Gritzmann, P., Klee, V., 1993. Computational complexity of inner and outer j-radiiof polytopes in finite-dimensional normed spaces. Mathematical Programming59, 163–213.

Hastie, T., Tibshirani, R., Friedman, J.H., 2001. Prototype methods and nearest-neighbors. In: The Elements of Statistical Learning. Springer, New York,pp. 411–435 (Chapter 13).

Kudo, M., Yanagi, S., Shimbo, M., 1996. Construction of class regions by arandomized algorithm: a randomized subclass method. Pattern Recognition29, 581–588.

Mangasarian, O.L., 1999. Polyhedral boundary projection. SIAM Journal ofOptimization 9 (4), 1128–1134.

Marchette, D.J., 2004. Random Graphs for Statistical Pattern Recognition. Wiley,New York.

Mavroforakis, M.E., Sdralis, M., Theodoridis, S., 2006. A novel SVM geometricalgorithm based on reduced convex hulls. In: ICPR’06: Proceedings of the 18thInternational Conference on Pattern Recognition, Washington, DC, USA. IEEEComputer Soc. Press, Silver Spring, MD, pp. 564–568.

McLachlan, G.J., Peel, D., 2000. Finite Mixture Models. Wiley Series in Probabilityand Statistics. Wiley-Interscience, New York.

Priebe, C.E., Marchette, D.J., DeVinney, J.G., Socolinsky, D.A., 2003. Classificationusing class cover catch digraphs. Journal of Classification 20, 3–23.

Scholkopf, B., Smola, A.J., 2001. Learning with Kernels: Support Vector Machines,Regularization, Optimization, and Beyond. Adaptive Computation and MachineLearning. MIT Press, Cambridge, MA.

Takigawa, I., Abe, N., Shidara, Y., Kudo, M., 2004. The boosted/bagged subclassmethod. International Journal of Computing Anticipatory Systems 14, 311–320.

Takigawa, I., Kudo, M., Nakamura, A., 2005. The convex subclass method:combinatorial classifier based on a family of convex sets. In: Perner, P., Imiya,A. (Eds.), Machine Learning and Data Mining in Pattern Recognition (MLDM2005). LNAI, vol. 3587. Springer, Heidelberg, Berlin, pp. 90–99.

Tuenter, H.J.H., 2002. Minimum l1-distance projection onto the boundary of aconvex set: simple characterization. Journal of Optimization Theory andApplications 112 (2), 441–445.

Vapnik, V.N., 1998. Statistical Learning Theory. Wiley-Interscience, New York.Welzl, E., 1991. Smallest enclosing disks (balls and ellipsoids). In: New Results and

New Trends in Computer Science. Lecture Notes in Computer Science, vol. 555.Springer, Berlin, pp. 359–370.

Zhou, D., Xiao, B., Zhou, H., 2004. Global geometry of SVM classifiers. TechnicalReport 30-5-02, Institute of Automation, Chinese Academy of Sciences.

Zhou, G.L., Toh, K.C., Sun, J., 2005. Efficient algorithms for the smallestenclosing ball problem. Computational Optimization and Applications 30 (2),147–160.