[ieee 2010 3rd international conference on biomedical engineering and informatics (bmei) - yantai,...

4
978-1-4244-6498-2/10/$26.00 ©2010 IEEE 2958 2010 3rd International Conference on Biomedical Engineering and Informatics (BMEI 2010) Soft Clustering with CP matrices Changqing Xu, Guanghui Xu Department of Applied Mathematics Zhejiang A&F University, Hangzhou,311300,China Wasin So Department of Mathematics San Jose State University,San Jose,CA,USA. Abstract—The classical clustering methods such as PCA,ICA, SVM,or most recently, NNMF(the nonnegative matrix factoriza- tion method) and its extension, NTF(nonnegative tensor factor- ization), are fatally based upon an assumption that the number of the groups of the data points is already known. But the fact is there are many cases where we have no idea at all whether there exist some or how many patterns for us to recognize in reality. We introduce a novel clustering method based on CPF, the method of completely positive matrix factorization. An example is supplied to illustrate the implementation of a CPF algorithm. I. I NTRODUCTION The classical clustering methods such as PCA (Principal Component Analysis),ICA (Independent Component Analysis), SVM (Support Vector Machine),or most recently, NMF(nonnegative matrix factorization) and its various extensions [6,8,13-14], NTF(nonnegative tensor factorization)[15,18], are vitally based upon an assumption that the number of the groups of the data points is already known. But there are many situations where we have no idea at all whether there exist some or how many patterns for us to recognize in reality. To the best of our knowledge, there have been no deterministic algorithm available to cluster the massive dataset without this assumption even though there have been several probabilistic models to handle this problem. So our basic problem is: How to cluster a massive data set without any knowledge about the clustering a priori? In this small note, we introduce a novel clustering method, called CPF, i.e., Completely Positive Factorization. This method is based upon the construction of an affinity (or similarity, or distance) matrix of the dataset. We show that an affinity matrix is a in the sense that it can be factorized as = where is an entrywise nonnegative matrix. We also use the rank (to be defined later) to estimate the number of the clusters. An example is supplied to illustrate the implementation of a CPF algorithm. The study of the theory of completely positive factorization has been experienced for more than half a century. In fact, this kind of factorization has been investigated even since 1930 when M. Modell [16] and C. Ko [12] deal with integral quadratic form decompositions where the associated matrices and the decompositions are restricted in the ring of integers). In 1963, H. Diananda [7] consider the same problem when he came to decompose a nonnegative real quadratic into a sum of squares of some nonnegative linear forms. M.Hall Jr. [10,11] uses cone theory to cope with this problem and shows that the convex cone consisting of all the matrices of an given order is a dual cone of the cone consisting of all the copositive matrices of the same order, M.Marcus and H.Minc [17] are the first to tackle such kind of problem by using of matrix theory. L.Gray and P.Wilson [9] relate this question to the covariance matrix and confirm a conjecture of Chernic’s on covariance in the case of order less than 5. In 1988, A.Berman put forward two basic problems concerning completely positive matrices [3]. From 1991 to 2004, A.Berman,D.Hershkowtiz, N. Kogan, and Shade-monde Nomi etc. obtained many interesting results and some useful characterization of matrices and their factorizations (see [4] for more detail). The first author [19] present some feasible necessary and sufficient conditions for matrices to be , and fully investigate the case of matrices of order 5. II. COMPLETELY POSITIVE MATRICES An × entrywise nonnegative,positive semi-definite matrix (called Doubly Non-Negative),denoted by ,is called completely positive, or briefly, , if can be factorized into = (1) where is any × entrywise nonnegative matrix for some positive integer . The smallest possible number of columns of , , is called completely positive index, or rank of , which is usually denoted by (). () can be regarded as the minimal number of summands in a rank-1 representation of , i.e., = =1 (2) where = () and 1 , 2 ,..., R are the columns of . It is obvious that () () for any given matrix of order , and it is also shown in [9] that () ( 1)/2 where is the number of zeros above the main diagonal of . We have [4] Let be a matrix of order . 1) If 3 or () 2, ()= (). 2) For 4, () .

Upload: wasin

Post on 14-Apr-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2010 3rd International Conference on Biomedical Engineering and Informatics (BMEI) - Yantai, China (2010.10.16-2010.10.18)] 2010 3rd International Conference on Biomedical Engineering

978-1-4244-6498-2/10/$26.00 ©2010 IEEE 2958

2010 3rd International Conference on Biomedical Engineering and Informatics (BMEI 2010)

Soft Clustering with CP matricesChangqing Xu, Guanghui Xu

Department of Applied MathematicsZhejiang A&F University, Hangzhou,311300,China

Wasin SoDepartment of Mathematics

San Jose State University,San Jose,CA,USA.

Abstract—The classical clustering methods such as PCA,ICA,SVM,or most recently, NNMF(the nonnegative matrix factoriza-tion method) and its extension, NTF(nonnegative tensor factor-ization), are fatally based upon an assumption that the numberof the groups of the data points is already known. But the fact isthere are many cases where we have no idea at all whether thereexist some or how many patterns for us to recognize in reality. Weintroduce a novel clustering method based on CPF, the method ofcompletely positive matrix factorization. An example is suppliedto illustrate the implementation of a CPF algorithm.

I. INTRODUCTION

The classical clustering methods such as PCA (PrincipalComponent Analysis),ICA (Independent ComponentAnalysis), SVM (Support Vector Machine),or mostrecently, NMF(nonnegative matrix factorization) and itsvarious extensions [6,8,13-14], NTF(nonnegative tensorfactorization)[15,18], are vitally based upon an assumptionthat the number of the groups of the data points is alreadyknown. But there are many situations where we have no ideaat all whether there exist some or how many patterns for usto recognize in reality.To the best of our knowledge, there have been no deterministicalgorithm available to cluster the massive dataset without thisassumption even though there have been several probabilisticmodels to handle this problem. So our basic problem is:How to cluster a massive data set without any knowledgeabout the clustering a priori?

In this small note, we introduce a novel clustering method,called CPF, i.e., Completely Positive Factorization. Thismethod is based upon the construction of an affinity (orsimilarity, or distance) matrix of the dataset. We show thatan affinity matrix 𝐴 is a 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒𝑙𝑦 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑚𝑎𝑡𝑟𝑖𝑥 in thesense that it can be factorized as 𝐴 = 𝐵𝐵𝑇 where 𝐵 is anentrywise nonnegative matrix. We also use the 𝐶𝑃 rank (tobe defined later) to estimate the number of the clusters. Anexample is supplied to illustrate the implementation of a CPFalgorithm.

The study of the theory of completely positive factorizationhas been experienced for more than half a century. In fact,this kind of factorization has been investigated even since1930 when M. Modell [16] and C. Ko [12] deal with integralquadratic form decompositions where the associated matricesand the decompositions are restricted in the ring of integers).In 1963, H. Diananda [7] consider the same problem whenhe came to decompose a nonnegative real quadratic into a

sum of squares of some nonnegative linear forms. M.HallJr. [10,11] uses cone theory to cope with this problem andshows that the convex cone consisting of all the 𝐶𝑃 matricesof an given order 𝑛 is a dual cone of the cone consisting ofall the copositive matrices of the same order, M.Marcus andH.Minc [17] are the first to tackle such kind of problem byusing of matrix theory. L.Gray and P.Wilson [9] relate thisquestion to the covariance matrix and confirm a conjectureof Chernic’s on covariance in the case of order less than5. In 1988, A.Berman put forward two basic problemsconcerning completely positive matrices [3]. From 1991 to2004, A.Berman,D.Hershkowtiz, N. Kogan, and Shade-mondeNomi etc. obtained many interesting results and some usefulcharacterization of 𝐶𝑃 matrices and their factorizations(see [4] for more detail). The first author [19] present somefeasible necessary and sufficient conditions for 𝐷𝑁𝑁 matricesto be 𝐶𝑃 , and fully investigate the case of matrices of order 5.

II. COMPLETELY POSITIVE MATRICES

An 𝑛 × 𝑛 entrywise nonnegative,positive semi-definitematrix (called Doubly Non-Negative)𝐴,denoted by 𝐴 ∈𝐷𝑁𝑁𝑛,is called completely positive, or briefly, 𝐶𝑃 , if 𝐴 canbe factorized into

𝐴 = 𝐵𝐵𝑇 (1)

where 𝐵 is any 𝑛×𝑚 entrywise nonnegative matrix for somepositive integer 𝑚. The smallest possible number of columnsof 𝐵, 𝑚, is called completely positive index, or 𝐶𝑃 rank of𝐴, which is usually denoted by 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴).𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) can be regarded as the minimal number of

summands in a rank-1 representation of 𝐴, i.e.,

𝐴 =

𝑘∑𝑖=1

𝑏𝑖𝑏𝑇𝑖 (2)

where 𝑘 = 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) and 𝑏1, 𝑏2, . . . , 𝑏𝑘 ∈ R𝑛 are thecolumns of 𝐵.

It is obvious that 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) ≥ 𝑟𝑎𝑛𝑘(𝐴) for any given𝐶𝑃 matrix 𝐴 of order 𝑛, and it is also shown in [9] that𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) ≤ 𝑛(𝑛 − 1)/2 − 𝑁 where 𝑁 is the number ofzeros above the main diagonal of 𝐴. We have [4]

Let 𝐴 be a 𝐶𝑃 matrix of order 𝑛.

1) If 𝑛 ≤ 3 or 𝑟𝑎𝑛𝑘(𝐴) ≤ 2, 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) = 𝑟𝑎𝑛𝑘(𝐴).2) For 𝑛 ≤ 4, 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) ≤ 𝑛.

Page 2: [IEEE 2010 3rd International Conference on Biomedical Engineering and Informatics (BMEI) - Yantai, China (2010.10.16-2010.10.18)] 2010 3rd International Conference on Biomedical Engineering

2959

Note that it is possible that 2 < 𝑟𝑎𝑛𝑘(𝐴) < 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) ≤𝑛 when 𝑛 = 4. For example, the matrix

𝐴 =

⎡⎢⎢⎣

6 3 3 03 5 1 33 1 5 30 3 3 6

⎤⎥⎥⎦

satisfies 3 = 𝑟𝑎𝑛𝑘(𝐴) < 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) = 4. We refer the readerto [4] for the proof.

For 𝑛 ≥ 5, we may have 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) > 𝑛. But this is of nouse in this paper, as the reader will see in the following.

The following properties are fundamental to the study of𝐶𝑃 matrices.

1) For any matrix 𝐴 of order 𝑛 ≤ 4, 𝐴 is 𝐶𝑃 if and onlyif 𝐴 is 𝐷𝑁𝑁 .

2) There is some 𝐷𝑁𝑁 matrix 𝐴 of order larger than 4such that 𝐴 is not 𝐶𝑃 .

III. AFFINITY MATRICES ARE COMPLETELY POSITIVE

Suppose there are 𝑛 data points (feature vectors)𝑥1, 𝑥2, . . . , 𝑥𝑛 ∈ R𝑑 which are to be clustered into severalgroups or classes. A common and natural phenomena is: thereis no any a priori information of clustering for us to use athand. It seems that all the previous clustering methods becomeinvalid since all of them are based upon such a hypothesis. Inthis section, we will establish a novel method to do clusteringin this case.

We first assume that the data points come from actually 𝑘clusters, say, C1,C2, . . . ,C𝑘. ∀𝑖 = 1, . . . , 𝑘; 𝑗 = 1, . . . , 𝑛, wedenote

𝑃𝑖(𝑥𝑗) = 𝑃𝑟(𝑥𝑗 ∈ C𝑖),

and for ∀𝑗 = 1, . . . , 𝑛, denote

𝑃𝑗 ≡ 𝑃 (𝑥𝑗) = [𝑃1(𝑥𝑗), 𝑃2(𝑥𝑗), . . . , 𝑃𝑘(𝑥𝑗)]𝑇 , (3)

and𝑃 = [𝑃1, 𝑃2, . . . , 𝑃𝑛]. (4)

Then it is obvious that 𝑃 ∈ 𝑅𝑘×𝑛 is an entrywise nonneg-ative matrix.

Now we define the affinity matrix of the data set Ω ={𝑥1, 𝑥2, . . . , 𝑥𝑛} to be the 𝑛×𝑛 matrix 𝐴 = (𝑎𝑖𝑗) such that forany (𝑖, 𝑗), 𝑎𝑖𝑗 stands for the probability of the two points 𝑥𝑖, 𝑥𝑗belonging to the same cluster. This is called soft clustering,analogue to the ’hard’ case which we are all familiar with, andwe will discuss it in detail in the following as a special caseof ”soft”. Specifically, Soft clustering means that we assign apoint 𝑥 to a cluster 𝐶𝑟 if

𝑃𝑟(𝑥 ∈ 𝐶𝑟) = 𝑚𝑎𝑥{𝑃𝑟(𝑥 ∈ 𝐶𝑗) : 𝑗 = 1, 2, . . . , 𝑘.}where 𝐶1, 𝐶2, . . . , 𝐶𝑘 are all the clusters containing𝑥1, 𝑥2, . . . , 𝑥𝑛. Usually we refer to this soft clustering methodby saying that we cluster the data points by the probabilitydistribution, instead of the ambiguous ’soft’ clustering. Wenow have

Given 𝑛 points 𝑥1, 𝑥2, . . . , 𝑥𝑛 ∈ R𝑑 which have beenclassified into 𝑘 clusters by the probability distribution. Let

𝐴 = (𝑎𝑖𝑗) ∈ 𝑅𝑛×𝑛 be the corresponding affinity matrixdefined as above. Then 𝐴 is a 𝐶𝑃 matrix where 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) ≤𝑘.

Proof: Suppose that the data set is clustered into 𝑘groups,say, 𝐶1, 𝐶2, . . . , 𝐶𝑘. By definition, we have

𝑎𝑖𝑗 =𝑘∑

𝑟=1

𝑃𝑟({𝑥𝑖, 𝑥𝑗} ∈ C𝑟). (5)

We may assume, without loss of generality, that all thepoints 𝑥1, 𝑥2, . . . , 𝑥𝑛 are sampled independently from some(one or more) populations. Thus (1) becomes

𝑎𝑖𝑗 =∑𝑘

𝑠=1 𝑃𝑟({𝑥𝑖, 𝑥𝑗} ∈ C𝑠)

=∑𝑘

𝑠=1 𝑃𝑟(𝑥𝑖 ∈ C𝑠) ⋅ 𝑃𝑟(𝑥𝑗 ∈ C𝑠)

=∑𝑘

𝑠=1 𝑃𝑠(𝑥𝑖) ⋅ 𝑃𝑠(𝑥𝑗)= < 𝑃𝑖, 𝑃𝑗 >

Write 𝑃 = [𝑃1, 𝑃2, . . . , 𝑃𝑛] ∈ 𝑅𝑘×𝑛, then 𝑃 is entrywisenonnegative (in fact, 𝑃 is a column stochastic matrix,i.e., thesum of each column of 𝑃 equals 1), and 𝐴 = 𝑃𝑇𝑃 . It turnsout that 𝐴 is 𝐶𝑃 with

𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) ≤ 𝑘 (6)

Note that here we are in a position of a supervised case:the clusters are given first so that we can assign each clustera probability vector of dimension 𝑛, i.e., the 𝑖th row of𝑃 , denoted by 𝑃𝑖⋅, corresponds to the cluster 𝐶𝑖, and theaffinity matrix 𝐴 is in fact the Gramian matrix of the vectors𝑃1, 𝑃2, . . . , 𝑃𝑛.

For the hardcase, that is, each point 𝑥 exclusively belongsto a unique cluster. So 𝑃𝑖(𝑥𝑗) = 0 or 1 for each pair 𝑖, 𝑗. Itfollows that 𝑃 is a (0,1)-matrix, i.e., each entry of 𝑃 is either0 or 1. Similar argument tells that 𝑎𝑖𝑗 shall be a (0,1)-matrixwhere 𝑎𝑖𝑗 = 1 if and only if 𝑥𝑖 and 𝑥𝑗 lie in the same group.

So now our focus is: Given an affinity matrix 𝐴 associatedwith the 𝑛 data points (this is possible, as we are mightsupplied with some knowledge a priori about the similarityof these points), how can we find an optimal way to clusterthe dataset?

Here ’optimal’ means the minimal error probability of theclustering, i.e.,

min𝐹 (C1,C2, . . . ,C𝐾)

=∑𝐾

𝑘=1(∑

𝑖∈C𝑘𝑃𝑟2(𝑥𝑖 ∈ C𝑘)− 𝑃𝑟2(𝑥𝑖 ∈ C𝑟𝑖))

where 𝐾 is the number of the clusters, and 𝑟𝑖 is the indexsatisfying

C𝑟𝑖 = 𝐴𝑟𝑔𝑚𝑎𝑥C𝑗𝑃𝑟(𝑥𝑖 ∈ C𝑗) (7)

Suppose that the clustering {C1,C2, . . . ,C𝐾} is associatedwith the factorization 𝐴 = 𝐵𝑇𝐵. We use 𝐵𝑖⋅ to denote the𝑖th row of 𝐵. Then it is easy to see

𝐹 = ∥𝐵∥2𝐹 −∑𝐾𝑘=1 ∥𝐵𝑘⋅∥2∞ (8)

= 𝑡𝑟(𝐴)−∑𝐾𝑘=1 ∥𝐵𝑘⋅∥2∞ (9)

Page 3: [IEEE 2010 3rd International Conference on Biomedical Engineering and Informatics (BMEI) - Yantai, China (2010.10.16-2010.10.18)] 2010 3rd International Conference on Biomedical Engineering

2960

since ∥𝐵∥2𝐹 = 𝑡𝑟(𝐵𝑇𝐵) = 𝑡𝑟(𝐴)(note for any vector 𝑥 ∈ 𝐶𝑛,∥𝑥∥∞ = 𝑚𝑎𝑥{∣𝑥𝑖∣ : 1 ≤ 𝑖 ≤ 𝑛}). Thus, (5) is equivalent to

max𝐾∑

𝑘=1

∥𝐵𝑘⋅∥2∞ (10)

We now can omit the square in (7), and get

max𝐾∑

𝑘=1

max{𝐵𝑘𝑗 : 𝑗 = 1, 2, . . . , 𝑛.} (11)

This is equivalent to the following problem:Let 𝐴 = (𝑎𝑖𝑗) ∈ 𝑅𝑛×𝑛 be an affinity matrix of a dataset

X = {𝑥1, 𝑥2, . . . , 𝑥𝑛} ⊂ R𝑑. Then an optimal clustering prob-lem for a given dataset is equivalent to the 𝐶𝑃 factorizationof the affinity matrix 𝐴 = 𝐵𝑇𝐵 such that 𝐵 is an 𝑚 × 𝑛column stochastic matrix where 𝑚 = 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) < 𝑛 is thenumber of clusters for an optimized clustering.

Proof: We only need to prove that if 𝐴 is 𝐶𝑃 with𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) = 𝑚 < 𝑛, then 𝑚 is the number of the clusters foran optimized clustering of X. Since 𝑎𝑖𝑗 reflects the similarity(affinity)between data point 𝑥𝑖 and 𝑥𝑗 , it follows that 𝑎𝑖𝑗 canalso be interpreted as the probability for 𝑥𝑖 and 𝑥𝑗 belongingto the same cluster.From [4] it is known that 𝐴 is 𝐶𝑃 . Thus we may assume that𝐴 = 𝐵𝑇𝐵 for some 𝑚× 𝑛 entrywise nonnegative matrix 𝐵,where 𝑚 = 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴). Now write 𝐵 = [𝛽1, 𝛽2, . . . , 𝛽𝑛], anddenote the sum of the 𝑗th column of 𝐵 by 𝑐𝑗 for any 𝑗 =1, 2, . . . , 𝑛. It is obvious that 𝑐𝑗 > 0 for each 𝑗 since 𝑐𝑗 = 0is identical to 𝛽𝑗 = 0, which is followed by 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) < 𝑚.Now we denote

𝐷 = 𝑑𝑖𝑎𝑔(𝑐1, 𝑐2, . . . , 𝑐𝑛),

and

𝐴 = 𝐷−1𝐴𝐷−1.

Then we have 𝐴 = ��𝑇 �� if we write

�� = 𝐵𝐷−1 = [𝛽1, 𝛽2, . . . , 𝛽𝑛].

It is easy to see that �� ≥ 0 (entrywise nonnegative) since𝐵 ≥ 0, and that

𝛽𝑗 =1

𝑐𝑗𝛽𝑗 , ∀𝑗 = 1, 2, . . . , 𝑛.

Thus �� is an 𝑚 × 𝑛 column stochastic matrix, and𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) = 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) = 𝑚 < 𝑛.

In order to show that 𝑚 = 𝑐, where 𝑐 is the number ofthe clusters for the optimized clustering, we first prove that𝑚 ≤ 𝑐. Suppose, on the contrary, that there is an optimizedclustering with 𝑐 < 𝑚 as its cluster number, then by the aboveargument, we know that 𝐴 has a factorization 𝐴 = ��𝑇 �� suchthat �� is an 𝑐 × 𝑛 column stochastic matrix. It follows that𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) ≤ 𝑐 < 𝑚, a contradiction. Thus 𝑚 ≤ 𝑐.

Next we prove 𝑚 ≥ 𝑐. For convenience, we still denote 𝐴 =𝐵𝑇𝐵 where 𝐵 = [𝛽1, 𝛽2, . . . , 𝛽𝑛] such that

𝛽𝑗 = [𝑏1𝑗 , 𝑏2𝑗 , . . . , 𝑏𝑚𝑗 ]𝑇 ,

𝑚∑𝑖=1

𝑏𝑖𝑗 = 1.

We now construct a softclustering C1,C2, . . . ,C𝑚 by defining

𝑃𝑟(𝑥𝑗 ∈ C𝑖) = 𝑏𝑖𝑗 (12)

for each 𝑗 = 1, 2, . . . , 𝑛 and 𝑖 = 1, 2, . . . ,𝑚. This clusteringis well-defined in the sense that

(1) For each 𝑥𝑗 , we have

𝑚∑𝑖=1

𝑃𝑟(𝑥𝑗 ∈ C𝑖) = 1 (13)

(2) Each cluster C𝑖 is nonempty.

(1) is obvious since it is equivalent to∑𝑚

𝑖=1 𝑏𝑖𝑗 = 1. To prove(2), we suppose that there exists a cluster, say, 𝐶𝑘 whichis empty. It follows that the 𝑘th row of 𝐵 is a zero vector.We form an (𝑚− 1) × 𝑛 nonnegative matrix 𝐵1 by deletingthe 𝑘th row of 𝐵, and we have 𝐴 == 𝐵𝑇

1 𝐵1. It turns outthat 𝑐𝑝𝑟𝑎𝑛𝑘(𝐴) ≤ 𝑚−1, a contradiction. Thus (2) is satisfied.

So far, we have proved that the probabilistic clusteringproblem can be translated into an 𝐶𝑃 problem, and thenumber of the clusters corresponding to an optimizedclustering is just the 𝐶𝑃 -rank of the affinity matrix.

For the computation of the 𝐶𝑃 -rank and the 𝐶𝑃factorization of a 𝐶𝑃 matrix, we refer the reader to [4].Here we end the paper with an example to illustrate theestablishment of the probabilistic affinity matrix 𝐴 from thedataset and the 𝐶𝑃 factorization of 𝐴. the example is codedby MATLAB7.1.

IV. AN EXAMPLE: UNSUPERVISED CLUSTERING WITH CPMATRICES

In this section, we present an example to demonstratehow to use this result to give a 𝐶𝑃 factorization for a𝐶𝑃 matrix, and how to use 𝐶𝑃 factorizations in theunsupervised clustering. We first produce 2-dimensionalpoints {𝑥1, 𝑥2, . . . , 𝑥𝑁} where 𝑁 = 100, of which 50 pointsare sampled from the population normally distributed withmeans 𝜇1 = 10 and the standard deviation 𝜎1 = 1, theother 50 points are sampled from the second populationnormally distributed with means 𝜇2 = −8 and the standarddeviation 𝜎2 = 5. We then use K-means and our CP methodsseparately to cluster them. Our experiment (Matlab7.1coding) shows that the clustering result produced by 𝐶𝑃factorization is better than that produced by PCA. In fact, wealso experimented by comparing the 𝐶𝑃 clustering and theK-means clustering, and it turns out that still the clusteringby using our 𝐶𝑃 method is better than that by K-means.

Page 4: [IEEE 2010 3rd International Conference on Biomedical Engineering and Informatics (BMEI) - Yantai, China (2010.10.16-2010.10.18)] 2010 3rd International Conference on Biomedical Engineering

2961

V. CONCLUSION

In this note, we put forward a new clustering method–clustering by completely positive factorization. We firstintroduce some basic terminology on completely positivematrix and 𝑐𝑝𝑟𝑎𝑛𝑘, and also present some basic result about𝐶𝑃 factorization. Then we introduce the soft clusteringmethod which is established upon the affinity matrixconsisting of the probabilities of each pair of data pointscontaining in a same cluster, and the relationship between𝐶𝑃 matrices and the soft clustering. Finally we demonstratethe advantage of 𝐶𝑃 method by comparing the clusteringresults between all these methods. We shall mention thatwe omit the 𝐶𝑃 algorithm here since we can also use thesimilar 𝑁𝑁𝑀𝐹 algorithm to calculate 𝐶𝑃 factorization.But we will improve the algorithm in our future researchwork to make it independent and more accurate than 𝑁𝑁𝑀𝐹 .

ACKNOWLEDGMENT

This work is supported by Natural Science Fundation ofChina (No.10871230) and Natural Science Fundation of Zhe-jiang province (No.Y607840 and No.Y7080364). The corre-sponding author is C.Xu. E-mail: [email protected](C.Xu)

REFERENCES

[1] T. Ando, Completely Positive Matrices, Lecture Notes, The University ofWisconsin, Madison, 1991.

[2] A. Berman, D. Hershkowitz,Combinatorial Results on Completely Posi-tive Matrices, Linear Algebr. Appl., 95(1987),111-125.

[3] A. Berman, Complete positivity, Linear Algebra and its Applications, 107(1988), 57-63.

[4] A. Berman and N. Shaked-Monderer, Completely Positive Matrices,World Scientific, 2003.

[5] A. Berman and C. Xu, [0,1] completely positive matrices, Linear AlgebraAppl. 399(2005), 35-51.

[6] M. Catral, L. Han, M. Neumann and R. Plemmons, On Reduced RankNonnegative Matrix Factorizations for Symmetric Matrices, Linear Alge-bra and its Applications 393(2004), 107-126.

[7] P. H. Diananda, On nonnegative forms in real variables some or all ofwhich are nonnegative, Proc. Cambridge Philos. Soc. 58 (1962), 17-25.

[8] I. S. Dhillon and S. Sra, Generalized Nonnegative Matrix Approxima-tions with Bregman Divergences, Technical report, Computer Sciences,University of Texas at Austin, 2005.

[9] L. J. Gray, D. G. wilson,Nonnegative Factorization of Positive Semidefi-nite Nonnegative Matrices, Linear Algebr. Appl., 31(1980),119-127.

[10] M. Hall Jr, Combinatorial Theory 2nd ed. New York: Wiley 1986.[11] M. Hall Jr, Surveys in Applied Mathematics, 4(1958), 35-104.[12] C. Ko, On the representation of a quadratic form as a sum of squares

of linear forms, Quart. J. Math. Oxford 8(1937), 81-98.[13] D. D. Lee and H. S. Seung, Learning the parts of objects by non-negative

matrix factorization, Nature 401(1999), 788-791.[14] D. D. Lee and H. S. Seung, Algorithms for non-negative matrix

factorization, In Advances in Neural Information Processing 13(2001),556-562.

[15] H. Lee,Y. Kim,A. Cichocki,S. Choi, Nonnegative Tensor Factorizationfor Continuous EEG Classification, International Journal of Neural Sys-tems,17(2007), 305-317.

[16] L. J. Mordell, A new Warings problem with squares of linear forms,Quart. J. Math. Oxford 1(1930),276-288.

[17] J. E. Maxfield and H. Minc, On the matrix equation X’X=A, Proc.Edinburgh Math. Soc. 13(1962), 125-129.

[18] A. Shashua and T. Hazan. Non-Negative Tensor Factorization withApplications to Statistics and Computer Vision, Proc. of the 22nd Int”lConference on Machine Learning, Bonn, Germany, 2005.

[19] C. Xu, Completely positive matrices, Linear Algebra Appl. 379(2004),319-327.