research article a constrained algorithm based nmf for...

13
Research Article A Constrained Algorithm Based NMF for Image Representation Chenxue Yang, 1 Tao Li, 1 Mao Ye, 1 Zijian Liu, 2,3 and Jiao Bao 1 1 School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China 2 School of Science, Chongqing Jiaotong University, Chongqing 400074, China 3 Department of Mathematics, Hangzhou Normal University, Hangzhou 310036, China Correspondence should be addressed to Mao Ye; [email protected] Received 29 March 2014; Revised 2 October 2014; Accepted 3 October 2014; Published 22 October 2014 Academic Editor: Zhan Zhou Copyright © 2014 Chenxue Yang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Nonnegative matrix factorization (NMF) is a useful tool in learning a basic representation of image data. However, its performance and applicability in real scenarios are limited because of the lack of image information. In this paper, we propose a constrained matrix decomposition algorithm for image representation which contains parameters associated with the characteristics of image data sets. Particularly, we impose label information as additional hard constraints to the -divergence-NMF unsupervised learning algorithm. e resulted algorithm is derived by using Karush-Kuhn-Tucker (KKT) conditions as well as the projected gradient and its monotonic local convergence is proved by using auxiliary functions. In addition, we provide a method to select the parameters to our semisupervised matrix decomposition algorithm in the experiment. Compared with the state-of-the-art approaches, our method with the parameters has the best classification accuracy on three image data sets. 1. Introduction Learning an efficient representation of image information is a key problem in machine learning and computer vision. Efficiency of the representation refers to the ability to capture significant information from a high dimensional image space. Such a high dimensional problem is difficult to manipulate and compute; therefore dimension reduction becomes the crucial method to cope with this problem. Fortunately, matrix factorization is a valid approach to solve the dimension reduction problem, and it has a long and successful history in dealing with image representation [13]. Some methods of matrix factorization can be referred to as principal compo- nent analysis (PCA) [4], singular value decomposition (SVD) [5], vector quantization (VQ) [6], and nonnegative matrix factorization (NMF) [7]. Among all techniques for matrix factorization, NMF is distinguished from others by its use of nonnegative con- straints in learning a basis representation of image data [8] and has been applied in face recognition [911], medical imaging [12, 13], electroencephalogram (EEG) classifica- tion for brain computer interface [14], and many other areas. However, NMF is an unsupervised learning algorithm and inapplicable to learning a basic representation from lim- ited image information. us, to make up for this deficiency, extra constraints are implicitly or explicitly incorporated into NMF to derive some semisupervised matrix decomposition algorithms. In [15], the authors impose label information as additional hard constraints to NMF based on the squared Euclidean distance and Kulback-Leibler divergence. Such a representation encodes the data points from the same class using the indicator matrix in a new representation space, where the obtained part-based representation is more discriminating. However, none of the semisupervised NMF algorithms mentioned above contain parameters associated with the characteristics of image data sets. In this paper, we introduce -divergence-NMF algorithm [16], where is a parameter. We impose the labeled constraints to the -divergence-NMF algorithm to derive a generic constraint matrix decomposi- tion algorithm which includes some existing algorithms as their special cases: one of them is CNMF KL [15] with = 1. en, we obtain the proposed algorithm using Karush- Kuhn-Tucker (KKT) method as well as the projected gradient method and prove its monotonic local convergence using an auxiliary function. Comparing to the current semisupervised Hindawi Publishing Corporation Discrete Dynamics in Nature and Society Volume 2014, Article ID 179129, 12 pages http://dx.doi.org/10.1155/2014/179129

Upload: others

Post on 24-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article A Constrained Algorithm Based NMF for ...downloads.hindawi.com/journals/ddns/2014/179129.pdfResearch Article A Constrained Algorithm Based NMF for Image Representation

Research ArticleA Constrained Algorithm Based NMF

120572for Image Representation

Chenxue Yang1 Tao Li1 Mao Ye1 Zijian Liu23 and Jiao Bao1

1 School of Computer Science and Engineering University of Electronic Science and Technology of China Chengdu 611731 China2 School of Science Chongqing Jiaotong University Chongqing 400074 China3Department of Mathematics Hangzhou Normal University Hangzhou 310036 China

Correspondence should be addressed to Mao Ye cvlabuestcgmailcom

Received 29 March 2014 Revised 2 October 2014 Accepted 3 October 2014 Published 22 October 2014

Academic Editor Zhan Zhou

Copyright copy 2014 Chenxue Yang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Nonnegative matrix factorization (NMF) is a useful tool in learning a basic representation of image data However its performanceand applicability in real scenarios are limited because of the lack of image information In this paper we propose a constrainedmatrix decomposition algorithm for image representation which contains parameters associated with the characteristics of imagedata sets Particularly we impose label information as additional hard constraints to the 120572-divergence-NMF unsupervised learningalgorithmThe resulted algorithm is derived by using Karush-Kuhn-Tucker (KKT) conditions as well as the projected gradient andits monotonic local convergence is proved by using auxiliary functions In addition we provide a method to select the parametersto our semisupervised matrix decomposition algorithm in the experiment Compared with the state-of-the-art approaches ourmethod with the parameters has the best classification accuracy on three image data sets

1 Introduction

Learning an efficient representation of image information isa key problem in machine learning and computer visionEfficiency of the representation refers to the ability to capturesignificant information from a high dimensional image spaceSuch a high dimensional problem is difficult to manipulateand compute therefore dimension reduction becomes thecrucialmethod to copewith this problem Fortunatelymatrixfactorization is a valid approach to solve the dimensionreduction problem and it has a long and successful historyin dealing with image representation [1ndash3] Some methods ofmatrix factorization can be referred to as principal compo-nent analysis (PCA) [4] singular value decomposition (SVD)[5] vector quantization (VQ) [6] and nonnegative matrixfactorization (NMF) [7]

Among all techniques for matrix factorization NMF isdistinguished from others by its use of nonnegative con-straints in learning a basis representation of image data [8]and has been applied in face recognition [9ndash11] medicalimaging [12 13] electroencephalogram (EEG) classifica-tion for brain computer interface [14] and many otherareas However NMF is an unsupervised learning algorithm

and inapplicable to learning a basic representation from lim-ited image informationThus to make up for this deficiencyextra constraints are implicitly or explicitly incorporated intoNMF to derive some semisupervised matrix decompositionalgorithms In [15] the authors impose label information asadditional hard constraints to NMF based on the squaredEuclidean distance and Kulback-Leibler divergence Sucha representation encodes the data points from the sameclass using the indicator matrix in a new representationspace where the obtained part-based representation is morediscriminating

However none of the semisupervised NMF algorithmsmentioned above contain parameters associated with thecharacteristics of image data sets In this paper we introduce120572-divergence-NMF algorithm [16] where 120572 is a parameterWe impose the labeled constraints to the 120572-divergence-NMFalgorithm to derive a generic constraint matrix decomposi-tion algorithm which includes some existing algorithms astheir special cases one of them is CNMFKL [15] with 120572 =

1 Then we obtain the proposed algorithm using Karush-Kuhn-Tucker (KKT)method as well as the projected gradientmethod and prove its monotonic local convergence using anauxiliary function Comparing to the current semisupervised

Hindawi Publishing CorporationDiscrete Dynamics in Nature and SocietyVolume 2014 Article ID 179129 12 pageshttpdxdoiorg1011552014179129

2 Discrete Dynamics in Nature and Society

NMF algorithms we analyze the classification accuracy fortwo fixed values of 120572 (120572 = 05 2) on three image data sets

The CNMF120572algorithm does not work well for a fixed

value of 120572 Since the parameter 120572 is associated with thecharacteristics of a learning machine the model distributionis more inclusive when 120572 goes to +infin and is more exclusivewhen 120572 approaches to minusinfinThe selection of the optimal valueof 120572 plays a critical role in determining the discriminativebasis vectors In this paper we provide a method to selectparameters 120572 for our semisupervised CNMF

120572algorithmThe

variation of 120572 is associated with the characteristics of imagedata sets Compared with the algorithms in [15 16] ouralgorithm is more complete and systemic

The rest of the paper is organized as follows In Section 2we make a brief overview on standard NMF algorithm andconstraint NMF algorithm The detailed algorithms withlabeled constraints and theoretical proof of the convergenceof the algorithms are provided in Sections 3 and 4 separatelySection 5 presents some experimental results to show theadvantages of our algorithm Finally a conclusion is given inSection 6

2 Related Work

NMF proposed by Lee and Seung [7] is considered to providea part-based representation and applied in diverse examplesof nonnegative data [17ndash21] including text data miningsubsystem identification spectral data analysis audio andsound processing and document clustering

Suppose 119883 = [1199091 119909

119899] isin 119877

119898times119899 is a set of 119899 trainingimages where 119909

119905119899

119905=1is a column vector and consists of 119898

nonnegative pixel values of a training image NMF is to findtwo nonnegative matrix factors 119882 isin 119877

119898times119903 and 119867 isin 119877119899times119903 to

approximate the original image matrix

119883 asymp 119882119867119879

(1)

where the positive integer 119903 is smaller than119898 or 119899NMF uses nonnegative constraints to make the rep-

resentation purely adapted to an unsupervised way It isinapplicable to learn a basis representation to the limitedimage information To make up for this deficiency extraconstraints such as locality [22] sparseness [9] and orthog-onality [23] were implicitly or explicitly incorporated intoNMF to identify better local features or provide more sparserepresentation

In [15] the authors impose label information as additionalhard constraints to NMF unsupervised learning algorithmto derive a semisupervised matrix decomposition algorithmwhich makes the obtained representation more discriminat-ing The label information is incorporated as follows

Suppose 119883 = 119909119894119899

119894=1is a data set which consists of n

training images Set that the first 119904 images 1199091 119909

119904 (119904 le 119899)

are represented by the label information and the remaining119899 minus 119904 images 119909

119904+1 119909

119899 (119904 le 119899) are represented by the

unlabeled Assume there exist 119888 classes and each image from

1199091 119909

119904 is designated one class Then we have an 119904 times 119888

indicator matrix 119878 which can be represented as

119904119894119895=

1 if119909119894is designated the 119895th class

0 otherwise(2)

From the indicator matrix 119878 a label constraint matrix 119880 canbe defined as

119880 = (

119878119904times119888

0

0 119868119899minus119904

) (3)

where 119868119899minus119904

denotes an (119899 minus 119904) times (119899 minus 119904) identity matrixImposing the label information as additional hard con-

straint by an auxiliary matrix 119881 there is 119867 = 119880119881 Itverifies that ℎ

119894= ℎ119895if 119909119894and 119909

119895have the same labels With

the label constraints the standard NMF is transformed intofactorizing a large-size matrix 119883 into the product of threesmall-size matrices119882 119880 and 119881

119883 asymp 119882(119880119881)119879

(4)

Such a representation encodes the data points from the sameclass using the indicatormatrix in a new representation space

3 A Constrained Algorithm Based NMF120572

The exact form of the error measure of (1) is as crucialas the nonnegative constraints in the success of the NMFalgorithm in learning a useful representation of image dataIn the researches on NMF there are quite a large numberof investigations for error measure such as Csiszars f-divergences [24] Amaris 120572-divergence [25] and Bregmandivergences [26] Here we introduce a genetic multiplicativeupdating algorithm [16] which iteratively minimizes the 120572-divergence between119883 and119882119867

119879 We define the 120572-divergenceas119863120572[119883119882119867

119879

]

=

1

120572 (1 minus 120572)

119898

sum

119894=1

119899

sum

119895=1

120572119883119894119895+ (1 minus 120572) [119882119867

119879

]119894119895

minus 119883120572

119894119895[119882119867119879

]

1minus120572

119894119895

(5)

where 120572 is a positive parameter We combine the labeled con-straints with (4) to derive the following objective functionwhich is based on the 120572-divergence between119883 and119882(119880119881)

119879

119863120572[119883119882(119880119881)

119879

]

=

1

120572 (1 minus 120572)

times

119898

sum

119894=1

119899

sum

119895=1

120572119883119894119895+ (1 minus 120572) [119882119881

119879

119880119879

]119894119895

minus 119883120572

119894119895[119882119881119879

119880119879

]

1minus120572

119894119895

(6)

With the constraints 119882119894119895

ge 0 119880119894119895

ge 0 and 119881119894119895

ge 0 theminimization of119863

120572[119883119882(119880119881)

119879

] can be formulated as a con-strained minimization problem with inequality constraintsIn the following we will present two methods to find a localminimum of (6)

Discrete Dynamics in Nature and Society 3

31 KKT Method Let 120582119894119895ge 0 and 120583

119894119895ge 0 be the Lagrangian

multipliers associated with constraints 119881119894119895

ge 0 and 119882119894119895

ge

0 respectively The Karush-Kuhn-Tucker conditions requirethat both the optimality conditions

120597119863120572[119883119882(119880119881)

119879

]

120597119881119894119895

=

1

120572

sum

119896

[119880119879

119882]119896119894

minussum

119896

[119880119879

119882]119896119894

(

119883119896119895

[119882119881119879119880119879]119896119895

)

120572

= 120582119894119895

(7)

120597119863120572[119883119882(119880119881)

119879

]

120597119882119894119895

=

1

120572

sum

119896

[119881119879

119880119879

]119895119896

minussum

119896

[119881119879

119880119879

]119895119896

(

119883119894119896

[119882119881119879119880119879]119894119896

)

120572

= 120583119894119895

(8)

and the complementary slackness conditions

120582119894119895119881119894119895= 0

120583119894119895119882119894119895= 0

(9)

are satisfied If 119881119894119895= 0 and 119882

119894119895= 0 then either 120582

119894119895120583119894119895can

have any values At the same time if 120582119894119895= 0 and 120583

119894119895= 0 then

119881119894119895ge 0 and119882

119894119895ge 0 Hence we need both 120582

119894119895= 0 120583119894119895

= 0 and119881119894119895

= 0 119882119894119895

= 0 It follows from (9) that

120582119894119895119881120572

119894119895= 0

120583119894119895119882120572

119894119895= 0

(10)

We multiply both sides of (7) and (8) by 119881119894119895

and 119882119894119895

respectively and incorporate with (10) and then we obtainthe following updating rules

119881119894119895larr997888 119881

119894119895

[

[

[

sum119896[119880119879

119882]119896119894

(119883119896119895[119882119881

119879

119880119879

]119896119895

)

120572

sum119897[119880119879119882]119897119894

]

]

]

1120572

(11)

119882119894119895larr997888 119882

119894119895

[

[

sum119896[119881119879

119880119879

]119895119896

(119883119894119896[119882119881

119879

119880119879

]119894119896

)

120572

sum119897[119881119879119880119879]119895119897

]

]

1120572

(12)

32 Projected Gradient Method Considering the gradientdescent algorithm [24 25] the updating rules for the objec-tive function (6) can be also derived by using the projectedgradient method [27] and have the form

120601 (119881119894119895) larr997888 120601 (119881

119894119895) + 120575119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119881119894119895

120601 (119882119894119895) larr997888 120601 (119882

119894119895) + 120574119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119882119894119895

(13)

where 120601(sdot) is a suitably chosen function and 120575119894119895and 120574119894119895are two

parameters to control the step size of gradient descent Thenwe have

119881119894119895larr997888 120601

minus1

(120601 (119881119894119895) + 120575119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119881119894119895

)

119882119894119895larr997888 120601

minus1

(120601 (119882119894119895) + 120574119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119882119894119895

)

(14)

Setting 120601(Ω) = Ω120572 to guarantee that the updating rules (11)

and (12) hold we need

120575119894119895= minus

120572119881120572

119894119895

sum119896[119880119879119882]119896119894

120574119894119895= minus

120572119882120572

119894119895

sum119896[119881119879119880119879]119895119896

(15)

From (7) and (8) the updating rules become

119881120572

119894119895larr997888 119881

120572

119894119895+ 120575119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119881119894119895

= 119881120572

119894119895

sum119896[119880119879

119882]119896119894

(119883119896119895[119882119881

119879

119880119879

]119896119895

)

120572

sum119897[119880119879119882]119897119894

119882120572

119894119895larr997888 119882

120572

119894119895+ 120574119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119882119894119895

= 119882120572

119894119895

sum119896[119881119879

119880119879

]119895119896

(119883119894119896[119882119881

119879

119880119879

]119894119896

)

120572

sum119897[119881119879119880119879]119895119897

(16)

which are the same as the updating rules (11) and (12)We have shown that the algorithm can be derived usingKarush-Kuhn-Tucker conditions and presented alternative ofthe algorithm using the projected gradient The use of thetwo methods guarantees the correctness of the algorithmtheoretically

In the following we will give a theorem to guarantee theconvergence of the iterations in updates (11) and (12)

Theorem 1 For the objective function (6) 119863120572[119883119882(119880119881)

119879

]

is nonincreasing under the updating rules (11) and (12) Theobjective function 119863

120572is invariant under these updates if and

only if 119881 and119882 are at a stationary point

Multiplicative updates for our constrained algorithmbased NMF

120572are given in (11) and (12) These updates find

a local minimum of 119863120572[119883119882(119880119881)

119879

] which is the finalsolution of (11) and (12) Note that when 120572 = 1 the updates(11) and (12) are the samewithCNMFKL algorithm [15] whichis a special case included in our generic constraint matrixdecomposition algorithm In the following we will give theproof of Theorem 1

4 Convergence Analysis

To proveTheorem 1 wewillmake use of an auxiliary functionthatwas used in the expectation-maximization algorithm [2829]

4 Discrete Dynamics in Nature and Society

Definition 2 A function 119866(119909 119909) is defined as an auxiliaryfunction for 119865(119909) if the following two conditions are bothsatisfied

119866 (119909 119909) = 119865 (119909) 119866 (119909 119909) ge 119865 (119909) (17)

Lemma 3 Assume that the function 119866(119909 119909) is an auxiliaryfunction for 119865(119909) then 119865(119909) is nonincreasing under the update

119909119905+1

= argmin119909

119866(119909 119909119905

) (18)

Proof Consider 119865(119909119905+1) le 119866(119909119905+1

119909119905

) le 119866(119909119905

119909119905

) = 119865(119909119905

)

It can be observed that the equality 119865(119909119905+1

) = 119865(119909119905

)

holds only if 119909119905 is a local minimum of 119866(119909 119909) We iterate theupdate in (18) to obtain a sequence of estimates that convergeto a local minimum 119909min = argmin

119909119865(119909) of the objective

function given by

119865 (119909min) le sdot sdot sdot le 119865 (119909119905+1

) le 119865 (119909119905

)

le sdot sdot sdot le 119865 (1199092) le 119865 (119909

1) le 119865 (119909

0)

(19)

In the following we will show that the objective function (6)is nonincreasing under the updating rules (11) and (12) bydefining the appropriate auxiliary functions with respect to119881119894119895and119882

119894119895

Lemma 4 Function

119866(119881 )

=

1

120572 (1 minus 120572)

sum

119894119895119896

119883119894119895120585119894119895119896

()

times

120572 + (1 minus 120572)

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120585119894119895119896

()

minus (

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120585119894119895119896()

)

1minus120572

(20)

where 120585119894119895119896() = 119882

119894119896[119879

119880119879

]119896119895sum119897119882119894119897[119879

119880119879

]119897119895 is an auxiliary

function for

119865 (119881) =

1

120572 (1 minus 120572)

timessum

119894119895

120572119883119894119895+ (1 minus 120572) [119882119881

119879

119880119879

]119894119895

minus 119883120572

119894119895[119882119881119879

119880119879

]

1minus120572

119894119895

(21)

Proof Obviously 119866(119881119881) = 119865(119881) According to the defini-tion of auxiliary function we only need to prove 119866(119881 ) ge

119865(119881) To do this we use the convex function 119891(sdot) for positive120572 to rewrite the 120572-divergence function 119865(119881) as

119865 (119881) = sum

119894119895

119883119894119895119891(

sum119896119882119894119896[119881119879

119880119879

]119896119895

119883119894119895

) (22)

where

119891 (119909) =

1

120572 (1 minus 120572)

[120572 minus (1 minus 120572) 119909 minus 1199091minus120572

] (23)

Note that sum119896120585119894119895119896() = 1 and 120585

119894119895119896() ge 0 from the definition

of 120585119894119895119896() Applying Jensens inequality [30] it leads to

119891(sum

119896

119882119894119896[119881119879

119880119879

]119896119895

) le sum

119896

120585119894119895119896

() 119891(

119882119894119896[119881119879

119880119879

]119896119895

120585119894119895119896

()

)

(24)

From the above inequality it follows that

119865 (119881) = sum

119894119895

119883119894119895119891(

sum119896119882119894119896[119881119879

119880119879

]119896119895

119883119894119895

)

le sum

119894119895119896

119883119894119895120585119894119895119896

() 119891(

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120585119894119895119896

()

)

= 119866 (119881 )

(25)

which satisfies the condition of auxiliary function

Reversing the rules of 119881119894119895and119882

119894119895in Lemma 4 we define

an auxiliary function for the update (12)

Lemma 5 Function

119866(119882 )

=

1

120572 (1 minus 120572)

sum

119894119895119896

119883119894119895120578119894119895119896

()

times

120572 + (1 minus 120572)

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120578119894119895119896

()

minus (

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120578119894119895119896

()

)

1minus120572

(26)

where 120578119894119895119896() =

119894119896[119881119879

119880119879

]119896119895sum119897119894119897[119881119879

119880119879

]119897119895 is an auxil-

iary function for

119865 (119882)

=

1

120572 (1 minus 120572)

timessum

119894119895

120572119883119894119895+ (1 minus 120572) [119882119881

119879

119880119879

]119894119895

minus 119883120572

119894119895[119882119881119879

119880119879

]

1minus120572

119894119895

(27)

This can be easily proved in the same way as Lemma 4From Lemmas 4 and 5 now we can demonstrate the conver-gence of Theorem 1

Discrete Dynamics in Nature and Society 5

Proof To guarantee the stability of 119865(119881) from Lemma 3 wejust need to obtain the minimum of 119866(119881 ) with respect to119881119894119895 Set the gradient of (20) to zero there is

120597119866 (119881 )

120597119881119894119895

=

1

120572

sum

119896

[119880119879

119882]119896119894

1 minus (

119882119896119894[119881119879

119880119879

]119894119895

119883119896119895120585119896119895119894

()

)

minus120572

= 0

(28)

Then it follows that

119881119894119895= 119894119895

[

[

[

sum119896[119880119879

119882]119896119894

(119883119896119895sum119897119882119896119897[119879

119880119879

]119897119895

)

120572

sum119896[119880119879119882]119896119894

]

]

]

1120572

(29)

which is similar to the form of the updating rule (11)Similarly to guarantee the updating rule (12) holds theminimum of 119866(119882 ) which can be determined by settingthe gradient of (26) to zero must exist

Since 119866 is an auxiliary function according to Lemma 4119865 in (21) is nonincreasing under the update (11) Multiplyingupdates (11) and (12) we can find a local minimum of119863120572[119883119882(119880119881)

119879

]

5 Experiments

In this section the CNMF120572

algorithm is systematicallycompared with the current constrained NMF algorithmson three image data sets named ORL Database [31] YaleDatabase [32] and Caltech 101 Database [33] The details ofthe above three databases will be described individually laterWe introduce the evaluated algorithms firstly

(i) Constrained nonnegative matrix factorization algo-rithm in [15] aims to minimize the F-norm cost

(ii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 05 in this paper aims tominimize the Hellinger divergence cost

(iii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 1 in [15] aiming at min-imizing the KL-divergence cost is the best reportedalgorithm in image representation

(iv) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 2 in this paper aims tominimize the 1205942-divergence cost

(v) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 120572

119896in this paper aims to

minimize the 120572-divergence cost where the param-eters 120572

119896are associated with the characteristics of

the image database and designed by the presentedmethod CNMFKL algorithm is a special case of ourCNMF

120572119896algorithm with 120572 = 120572

119896= 1

We apply these algorithms to a problem of classificationand evaluate their performance on three image data setswhich contain a number of different categories of image Foreach date set the evaluations are conducted with differentnumbers of clusters here the number of clusters 119896 varies from2 to 10We randomly choose 119896 categories fromone image dataset and mix the images of these 119896 categories as the collection119883 Then for the semisupervised algorithms we randomlypick up 10 percent images from each category in119883 and recordtheir category number as the available label information toobtain the label matrix119880 For some special data sets the labelprocess is different and we will describe the details later

Suppose a data set has 119873 categories 1198621 1198622 119862

119873 and

the cardinalities of these labeled images are 1198711 1198712 119871119873

respectively Since the label constraint matrix 119880 is composedof the indicatormatrix 119878 and the identity matrix 119868 the indica-tor matrix 119878 plays a critical role in classification performancefor different categories in119883 To determine the effectiveness of119878 we define a measure to represent the relationship betweenthe cardinalities of labeled samples and the total samples

119903 (119896) =

119871max (119896) minus 119871min (119896)

sum119896

119894=1119862119894

119896 (30)

where 119871max(119896) and 119871min(119896) denote the maximum and mini-mum labeled cardinalities of 119896 categories For the fixed clusternumber 119896 in the data set 119903(119896) is different if the number ofsamples in each category is different Then we compute 120572 asfollows

120572119896=

120572119873

119903 (119873)

( |119903 (119896) minus 119903 (119873)|

times (

1003816100381610038161003816100381610038161003816100381610038161003816

119871max (119896) 119896

sum119896

119894=1119862119894

minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

+

1003816100381610038161003816100381610038161003816100381610038161003816

119871min (119896) 119896

sum119896

119894=1119862119894

minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

)

minus1

)

(31)

where

120572119873= (119871max (119873) minus 119871min (119873))

times (

1003816100381610038161003816100381610038161003816100381610038161003816

119871max (119873) minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

+

1003816100381610038161003816100381610038161003816100381610038161003816

119871min (119873) minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

)

minus1

(32)

The value of 120572119896computed by (31) is associated with the

characteristics of image data sets since its variation is causedby both the cardinalities of labeled samples in each categoryand the total samples We can obtain both the cardinalitiesof labeled samples and the total samples in our semisuper-vised algorithms However we can not get the cardinalitiesof labeled images exactly in many real-word applicationsMoreover the value of 120572 varies depending on data sets It isstill an open problem how to select the optimal 120572 [16]

To evaluate the classification performance we defineclassification accuracy as the first measure Our CNMF

120572

6 Discrete Dynamics in Nature and Society

algorithm described in (11) and (12) provides a classificationlabel of each sample marked 119904

119894119897119894 Suppose 119883 = 119909

119894119899

119894=1is a

data set which consists of n training images For each samplelet 119897119894be the true class label provided by the image data set

More specifically if the image 119909119894is designated the 119897

119894th class

we evaluate it as a correct label and set 119904119894119897119894

= 1 Otherwiseit is counted as a false label and noted 119904

119894119897119894= 0 Eventually we

compute the percentage of correct labels obtained by definingthe accuracy measure as

Ac (119904119894119897119894) =

sum119899

119894=1119904119894119897119894

119899

(33)

To evaluate the classification performance we carryout computation about the normalized mutual informationwhich is used to measure how similar two sets of clusters areas the second measure Given two data sets of clusters 119862

119894and

119862119895 their normalized mutual information is defined as

NMI (119862119894 119862119895)

=

sum119888119894isin119862119894 119888119895isin119862119895

119901 (119888119894 119888119895) sdot log 119901 (119888

119894 119888119895) [119901 (119888

119894) 119901 (119888119895)]

max (119867 (119862119894) 119867 (119862

119895))

(34)

which takes values between 0 and 1 Where 119901(119888119894) 119901(119888

119895)

denote the probabilities that an image arbitrarily chosen fromthe data set belongs to the clusters 119862

119894and 119862

119895 respectively

and 119901(119888119894 119888119895) denotes the joint probability that this arbitrarily

selected image belongs to the cluster 119862119894as well as 119862

119895at the

same time119867(119862119894) and 119867(119862

119895) are the entropies of 119862

119894and 119862

119895

respectivelyExperimental results on each data set will be presented as

classification accuracy and the normalized mutual informa-tion is in Tables 1 and 2

51 ORL Database The Cambridge ORL Face Database has400 images for 40 different people 10 images per personThe images of some people are taken at different timesvarying lighting slightly facial expressions (openclosed eyessmilingnonsmiling) and facial details (glassesno glasses)All the images are taken against a dark homogeneous back-ground with the subjects in an upright frontal position andslight left-right out-of-plane rotation To locate the facesthe input images are preprocessed They are resized to 32 times

32 pixels with 256 gray levels per pixel and normalized inorientation so that two eyes in the facial areas are aligned atthe same position

There are 10 images for each category in ORL and 10percent is just one image For the fixed parameter 120572 (120572 =

05 1 2) we randomly choose two images from each categoryto provide the label information Note that the same labelis meaningless for (30) To obtain 119903(119896) we divide the 40categories into 3 groups 10 categories 20 categories and 10categories In the first 10 categories pick up 1 image fromeach category to provide the label information pick up 2images from each category in the second 20 categories andpick up 3 from each category in the remaining categoriesThe

dividing process is repeated 10 times and the obtained averageclassification accuracy is recorded as the final result

Figure 1 shows the graphical classification accuracy ratesand normalized mutual information on the ORL DatabaseNote that if the samples in the collection 119883 come fromthe same group we set 120572

119896= 067 Because of the same

number of samples in each category the variation of the120572 is small even though we label different cardinalities ofsamples Compared to the constrained nonnegative matrixfactorization algorithms with fixed parameters our CNMF

120572119896

algorithm gives the best performance since the selection of120572119896is suitable to the collection 119883 Table 1 summarizes the

detailed classification accuracy and error bars of CNMF120572=05

CNMF

120572=2 and CNMF

120572119896 It shows that our CNMF

120572119896algo-

rithm achieves 192 percent improvement compared to thebest reported CNMFKL algorithm (120572 = 1) [15] in averageclassification accuracy For normalized mutual informationthe details and the error bars of our constrained algorithmswith 120572 = 05 120572 = 2 and 120572

119896are listed in Table 2 Comparing

to the best algorithmCNMF ourCNMF120572119896algorithmachieves

054 percent improvement

52 Yale Database The Yale Database consists of 165grayscale images for 15 individuals 11 images per personOne per image is taken from different facial expression orconfiguration center-light wglasses happy left-light wnoglasses normal right-light sad sleepy surprised and winkWe preprocess all the images of Yale Database in the sameway as the ORL Database Each image is linearly stretched toa 1024-dimensional vector in image space

TheYaleDatabase also has the same number of samples ineach category To obtain an appropriate 120572 we do similar labelprocessing to the ORL Database Divide the 15 individualsinto 3 groups averagely choose 1 image from each categoryin the first group choose 2 from each category in the secondand choose 3 from each category in the remaining groupWe repeat the process 10 times and record the averageclassification accuracy as the final result

Figure 2 shows the classification accuracy and normal-ized mutual information on the Yale Database Set 120572

119896=

055 when 119903(119896) minus 119903(119873) = 0 It indicates that the samplesin the collection 119883 just come from two groups that ischoose 15 images from 10 categories in the first and secondgroup CNMF

120572119896achieves an extraordinary performance for

all the cases and CNMF120572=05

follows This suggests that theconstrained nonnegative matrix factorization algorithm hasa higher classification accuracy when the value of 120572 is closeto 120572119896 Comparing to the best reported CNMFKL algorithm

CNMF120572119896

achieves 242 percent improvement in averageclassification accuracy For normalized mutual informationCNMF

120572119896achieves 72 percent improvement compared to

CNMF The details of classification accuracy and normal-ized mutual information are provided in Tables 3 and 4which contain the error bars of CNMF

120572=05 CNMF

120572=2 and

CNMF120572119896

53 Caltech 101 Database The Caltech 101 Database createdby Caltech University has images of 101 different object

Discrete Dynamics in Nature and Society 7

Table 1 The comparison of classification accuracy rates on the ORL Database

119896

Classification accuracy Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321

Table 2 The comparison of normalized mutual information on the ORL Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027

categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram

We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572

119896computed by (31) are listed in Table 7

The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572

119896can fully reflect the effectiveness of the

indicator matrix 119878Figure 3 shows the effect that derived from the using of

proposed algorithm with 120572119896 The upper part of figure is the

original samples which contain 26 images the middle part

is their gray images and the lower is the combination of thebasis vectors learned by CNMF

120572119896

The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF

120572=05 CNMF

120572=2 and CNMF

120572119896 The graphical results

of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572

119896listed in Table 7were selected In general ourmethod

demonstrates much better effectiveness in classification bychoosing 120572

119896 Comparing to the best reported algorithm

other than our CNMF120572algorithm CNMF

120572119896achieves 24

percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF

120572

algorithm CNMF120572119896

achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF

120572119896achieves 229 percent improvement and

consistently outperforms the other algorithms

6 Conclusion

In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using

8 Discrete Dynamics in Nature and Society

Table 3 The comparison of classification accuracy rates on the Yale Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831

Table 4 The comparison of normalized mutual information on the Yale Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765

2 3 4 5 6 7 8 9 10076

078

08

082

084

086

088

09

092

094

Number of classes

Accu

racy

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy rates

2 3 4 5 6 7 8 9 10072

074

076

078

08

082

084

086

088

09

Nor

mal

ized

mut

ual i

nfor

mat

ion

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 1 Classification performance on the ORL Database

Discrete Dynamics in Nature and Society 9

2 3 4 5 6 7 8 9 1005

055

06

065

07

075

08

085

09Ac

cura

cy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 10042

044

046

048

05

052

054

056

058

06

062

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 2 Classification performance on the Yale Database

(a)

(b)

(c)

Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF

120572119896 Positive values are illustrated with white pixels

Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the

model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF

120572

algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total

10 Discrete Dynamics in Nature and Society

2 3 4 5 6 7 8 9 1004

045

05

055

06

065

07

075

08

085

09

Accu

racy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 1002

025

03

035

04

045

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 4 Classification performance on the Caltech 101 Database

Table 5 The comparison of classification accuracy rates on the Caltech 101 Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898

Table 6 The comparison of normalized mutual information on the Caltech 101 Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869

Discrete Dynamics in Nature and Society 11

Table 7 The value of 120572 in 119896 categories

119896 1205721

1205722

1205723

1205724

1205725

1205726

1205727

1205728

1205729

12057210

2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100

samples In the experiments we apply the fixed parameters120572 and 120572

119896to analyze the classification accuracy on three

image databasesThe experimental results have demonstratedthat the CNMF

120572119896algorithm has best classification accuracy

However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)

References

[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987

[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996

[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994

[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986

[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001

[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992

[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999

[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004

[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001

[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005

[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004

[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004

[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001

[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006

[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012

[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008

[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003

[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003

[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006

[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of

12 Discrete Dynamics in Nature and Society

IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003

[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006

[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004

[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008

[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006

[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006

[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006

[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006

[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997

[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982

[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml

[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant

keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article A Constrained Algorithm Based NMF for ...downloads.hindawi.com/journals/ddns/2014/179129.pdfResearch Article A Constrained Algorithm Based NMF for Image Representation

2 Discrete Dynamics in Nature and Society

NMF algorithms we analyze the classification accuracy fortwo fixed values of 120572 (120572 = 05 2) on three image data sets

The CNMF120572algorithm does not work well for a fixed

value of 120572 Since the parameter 120572 is associated with thecharacteristics of a learning machine the model distributionis more inclusive when 120572 goes to +infin and is more exclusivewhen 120572 approaches to minusinfinThe selection of the optimal valueof 120572 plays a critical role in determining the discriminativebasis vectors In this paper we provide a method to selectparameters 120572 for our semisupervised CNMF

120572algorithmThe

variation of 120572 is associated with the characteristics of imagedata sets Compared with the algorithms in [15 16] ouralgorithm is more complete and systemic

The rest of the paper is organized as follows In Section 2we make a brief overview on standard NMF algorithm andconstraint NMF algorithm The detailed algorithms withlabeled constraints and theoretical proof of the convergenceof the algorithms are provided in Sections 3 and 4 separatelySection 5 presents some experimental results to show theadvantages of our algorithm Finally a conclusion is given inSection 6

2 Related Work

NMF proposed by Lee and Seung [7] is considered to providea part-based representation and applied in diverse examplesof nonnegative data [17ndash21] including text data miningsubsystem identification spectral data analysis audio andsound processing and document clustering

Suppose 119883 = [1199091 119909

119899] isin 119877

119898times119899 is a set of 119899 trainingimages where 119909

119905119899

119905=1is a column vector and consists of 119898

nonnegative pixel values of a training image NMF is to findtwo nonnegative matrix factors 119882 isin 119877

119898times119903 and 119867 isin 119877119899times119903 to

approximate the original image matrix

119883 asymp 119882119867119879

(1)

where the positive integer 119903 is smaller than119898 or 119899NMF uses nonnegative constraints to make the rep-

resentation purely adapted to an unsupervised way It isinapplicable to learn a basis representation to the limitedimage information To make up for this deficiency extraconstraints such as locality [22] sparseness [9] and orthog-onality [23] were implicitly or explicitly incorporated intoNMF to identify better local features or provide more sparserepresentation

In [15] the authors impose label information as additionalhard constraints to NMF unsupervised learning algorithmto derive a semisupervised matrix decomposition algorithmwhich makes the obtained representation more discriminat-ing The label information is incorporated as follows

Suppose 119883 = 119909119894119899

119894=1is a data set which consists of n

training images Set that the first 119904 images 1199091 119909

119904 (119904 le 119899)

are represented by the label information and the remaining119899 minus 119904 images 119909

119904+1 119909

119899 (119904 le 119899) are represented by the

unlabeled Assume there exist 119888 classes and each image from

1199091 119909

119904 is designated one class Then we have an 119904 times 119888

indicator matrix 119878 which can be represented as

119904119894119895=

1 if119909119894is designated the 119895th class

0 otherwise(2)

From the indicator matrix 119878 a label constraint matrix 119880 canbe defined as

119880 = (

119878119904times119888

0

0 119868119899minus119904

) (3)

where 119868119899minus119904

denotes an (119899 minus 119904) times (119899 minus 119904) identity matrixImposing the label information as additional hard con-

straint by an auxiliary matrix 119881 there is 119867 = 119880119881 Itverifies that ℎ

119894= ℎ119895if 119909119894and 119909

119895have the same labels With

the label constraints the standard NMF is transformed intofactorizing a large-size matrix 119883 into the product of threesmall-size matrices119882 119880 and 119881

119883 asymp 119882(119880119881)119879

(4)

Such a representation encodes the data points from the sameclass using the indicatormatrix in a new representation space

3 A Constrained Algorithm Based NMF120572

The exact form of the error measure of (1) is as crucialas the nonnegative constraints in the success of the NMFalgorithm in learning a useful representation of image dataIn the researches on NMF there are quite a large numberof investigations for error measure such as Csiszars f-divergences [24] Amaris 120572-divergence [25] and Bregmandivergences [26] Here we introduce a genetic multiplicativeupdating algorithm [16] which iteratively minimizes the 120572-divergence between119883 and119882119867

119879 We define the 120572-divergenceas119863120572[119883119882119867

119879

]

=

1

120572 (1 minus 120572)

119898

sum

119894=1

119899

sum

119895=1

120572119883119894119895+ (1 minus 120572) [119882119867

119879

]119894119895

minus 119883120572

119894119895[119882119867119879

]

1minus120572

119894119895

(5)

where 120572 is a positive parameter We combine the labeled con-straints with (4) to derive the following objective functionwhich is based on the 120572-divergence between119883 and119882(119880119881)

119879

119863120572[119883119882(119880119881)

119879

]

=

1

120572 (1 minus 120572)

times

119898

sum

119894=1

119899

sum

119895=1

120572119883119894119895+ (1 minus 120572) [119882119881

119879

119880119879

]119894119895

minus 119883120572

119894119895[119882119881119879

119880119879

]

1minus120572

119894119895

(6)

With the constraints 119882119894119895

ge 0 119880119894119895

ge 0 and 119881119894119895

ge 0 theminimization of119863

120572[119883119882(119880119881)

119879

] can be formulated as a con-strained minimization problem with inequality constraintsIn the following we will present two methods to find a localminimum of (6)

Discrete Dynamics in Nature and Society 3

31 KKT Method Let 120582119894119895ge 0 and 120583

119894119895ge 0 be the Lagrangian

multipliers associated with constraints 119881119894119895

ge 0 and 119882119894119895

ge

0 respectively The Karush-Kuhn-Tucker conditions requirethat both the optimality conditions

120597119863120572[119883119882(119880119881)

119879

]

120597119881119894119895

=

1

120572

sum

119896

[119880119879

119882]119896119894

minussum

119896

[119880119879

119882]119896119894

(

119883119896119895

[119882119881119879119880119879]119896119895

)

120572

= 120582119894119895

(7)

120597119863120572[119883119882(119880119881)

119879

]

120597119882119894119895

=

1

120572

sum

119896

[119881119879

119880119879

]119895119896

minussum

119896

[119881119879

119880119879

]119895119896

(

119883119894119896

[119882119881119879119880119879]119894119896

)

120572

= 120583119894119895

(8)

and the complementary slackness conditions

120582119894119895119881119894119895= 0

120583119894119895119882119894119895= 0

(9)

are satisfied If 119881119894119895= 0 and 119882

119894119895= 0 then either 120582

119894119895120583119894119895can

have any values At the same time if 120582119894119895= 0 and 120583

119894119895= 0 then

119881119894119895ge 0 and119882

119894119895ge 0 Hence we need both 120582

119894119895= 0 120583119894119895

= 0 and119881119894119895

= 0 119882119894119895

= 0 It follows from (9) that

120582119894119895119881120572

119894119895= 0

120583119894119895119882120572

119894119895= 0

(10)

We multiply both sides of (7) and (8) by 119881119894119895

and 119882119894119895

respectively and incorporate with (10) and then we obtainthe following updating rules

119881119894119895larr997888 119881

119894119895

[

[

[

sum119896[119880119879

119882]119896119894

(119883119896119895[119882119881

119879

119880119879

]119896119895

)

120572

sum119897[119880119879119882]119897119894

]

]

]

1120572

(11)

119882119894119895larr997888 119882

119894119895

[

[

sum119896[119881119879

119880119879

]119895119896

(119883119894119896[119882119881

119879

119880119879

]119894119896

)

120572

sum119897[119881119879119880119879]119895119897

]

]

1120572

(12)

32 Projected Gradient Method Considering the gradientdescent algorithm [24 25] the updating rules for the objec-tive function (6) can be also derived by using the projectedgradient method [27] and have the form

120601 (119881119894119895) larr997888 120601 (119881

119894119895) + 120575119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119881119894119895

120601 (119882119894119895) larr997888 120601 (119882

119894119895) + 120574119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119882119894119895

(13)

where 120601(sdot) is a suitably chosen function and 120575119894119895and 120574119894119895are two

parameters to control the step size of gradient descent Thenwe have

119881119894119895larr997888 120601

minus1

(120601 (119881119894119895) + 120575119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119881119894119895

)

119882119894119895larr997888 120601

minus1

(120601 (119882119894119895) + 120574119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119882119894119895

)

(14)

Setting 120601(Ω) = Ω120572 to guarantee that the updating rules (11)

and (12) hold we need

120575119894119895= minus

120572119881120572

119894119895

sum119896[119880119879119882]119896119894

120574119894119895= minus

120572119882120572

119894119895

sum119896[119881119879119880119879]119895119896

(15)

From (7) and (8) the updating rules become

119881120572

119894119895larr997888 119881

120572

119894119895+ 120575119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119881119894119895

= 119881120572

119894119895

sum119896[119880119879

119882]119896119894

(119883119896119895[119882119881

119879

119880119879

]119896119895

)

120572

sum119897[119880119879119882]119897119894

119882120572

119894119895larr997888 119882

120572

119894119895+ 120574119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119882119894119895

= 119882120572

119894119895

sum119896[119881119879

119880119879

]119895119896

(119883119894119896[119882119881

119879

119880119879

]119894119896

)

120572

sum119897[119881119879119880119879]119895119897

(16)

which are the same as the updating rules (11) and (12)We have shown that the algorithm can be derived usingKarush-Kuhn-Tucker conditions and presented alternative ofthe algorithm using the projected gradient The use of thetwo methods guarantees the correctness of the algorithmtheoretically

In the following we will give a theorem to guarantee theconvergence of the iterations in updates (11) and (12)

Theorem 1 For the objective function (6) 119863120572[119883119882(119880119881)

119879

]

is nonincreasing under the updating rules (11) and (12) Theobjective function 119863

120572is invariant under these updates if and

only if 119881 and119882 are at a stationary point

Multiplicative updates for our constrained algorithmbased NMF

120572are given in (11) and (12) These updates find

a local minimum of 119863120572[119883119882(119880119881)

119879

] which is the finalsolution of (11) and (12) Note that when 120572 = 1 the updates(11) and (12) are the samewithCNMFKL algorithm [15] whichis a special case included in our generic constraint matrixdecomposition algorithm In the following we will give theproof of Theorem 1

4 Convergence Analysis

To proveTheorem 1 wewillmake use of an auxiliary functionthatwas used in the expectation-maximization algorithm [2829]

4 Discrete Dynamics in Nature and Society

Definition 2 A function 119866(119909 119909) is defined as an auxiliaryfunction for 119865(119909) if the following two conditions are bothsatisfied

119866 (119909 119909) = 119865 (119909) 119866 (119909 119909) ge 119865 (119909) (17)

Lemma 3 Assume that the function 119866(119909 119909) is an auxiliaryfunction for 119865(119909) then 119865(119909) is nonincreasing under the update

119909119905+1

= argmin119909

119866(119909 119909119905

) (18)

Proof Consider 119865(119909119905+1) le 119866(119909119905+1

119909119905

) le 119866(119909119905

119909119905

) = 119865(119909119905

)

It can be observed that the equality 119865(119909119905+1

) = 119865(119909119905

)

holds only if 119909119905 is a local minimum of 119866(119909 119909) We iterate theupdate in (18) to obtain a sequence of estimates that convergeto a local minimum 119909min = argmin

119909119865(119909) of the objective

function given by

119865 (119909min) le sdot sdot sdot le 119865 (119909119905+1

) le 119865 (119909119905

)

le sdot sdot sdot le 119865 (1199092) le 119865 (119909

1) le 119865 (119909

0)

(19)

In the following we will show that the objective function (6)is nonincreasing under the updating rules (11) and (12) bydefining the appropriate auxiliary functions with respect to119881119894119895and119882

119894119895

Lemma 4 Function

119866(119881 )

=

1

120572 (1 minus 120572)

sum

119894119895119896

119883119894119895120585119894119895119896

()

times

120572 + (1 minus 120572)

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120585119894119895119896

()

minus (

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120585119894119895119896()

)

1minus120572

(20)

where 120585119894119895119896() = 119882

119894119896[119879

119880119879

]119896119895sum119897119882119894119897[119879

119880119879

]119897119895 is an auxiliary

function for

119865 (119881) =

1

120572 (1 minus 120572)

timessum

119894119895

120572119883119894119895+ (1 minus 120572) [119882119881

119879

119880119879

]119894119895

minus 119883120572

119894119895[119882119881119879

119880119879

]

1minus120572

119894119895

(21)

Proof Obviously 119866(119881119881) = 119865(119881) According to the defini-tion of auxiliary function we only need to prove 119866(119881 ) ge

119865(119881) To do this we use the convex function 119891(sdot) for positive120572 to rewrite the 120572-divergence function 119865(119881) as

119865 (119881) = sum

119894119895

119883119894119895119891(

sum119896119882119894119896[119881119879

119880119879

]119896119895

119883119894119895

) (22)

where

119891 (119909) =

1

120572 (1 minus 120572)

[120572 minus (1 minus 120572) 119909 minus 1199091minus120572

] (23)

Note that sum119896120585119894119895119896() = 1 and 120585

119894119895119896() ge 0 from the definition

of 120585119894119895119896() Applying Jensens inequality [30] it leads to

119891(sum

119896

119882119894119896[119881119879

119880119879

]119896119895

) le sum

119896

120585119894119895119896

() 119891(

119882119894119896[119881119879

119880119879

]119896119895

120585119894119895119896

()

)

(24)

From the above inequality it follows that

119865 (119881) = sum

119894119895

119883119894119895119891(

sum119896119882119894119896[119881119879

119880119879

]119896119895

119883119894119895

)

le sum

119894119895119896

119883119894119895120585119894119895119896

() 119891(

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120585119894119895119896

()

)

= 119866 (119881 )

(25)

which satisfies the condition of auxiliary function

Reversing the rules of 119881119894119895and119882

119894119895in Lemma 4 we define

an auxiliary function for the update (12)

Lemma 5 Function

119866(119882 )

=

1

120572 (1 minus 120572)

sum

119894119895119896

119883119894119895120578119894119895119896

()

times

120572 + (1 minus 120572)

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120578119894119895119896

()

minus (

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120578119894119895119896

()

)

1minus120572

(26)

where 120578119894119895119896() =

119894119896[119881119879

119880119879

]119896119895sum119897119894119897[119881119879

119880119879

]119897119895 is an auxil-

iary function for

119865 (119882)

=

1

120572 (1 minus 120572)

timessum

119894119895

120572119883119894119895+ (1 minus 120572) [119882119881

119879

119880119879

]119894119895

minus 119883120572

119894119895[119882119881119879

119880119879

]

1minus120572

119894119895

(27)

This can be easily proved in the same way as Lemma 4From Lemmas 4 and 5 now we can demonstrate the conver-gence of Theorem 1

Discrete Dynamics in Nature and Society 5

Proof To guarantee the stability of 119865(119881) from Lemma 3 wejust need to obtain the minimum of 119866(119881 ) with respect to119881119894119895 Set the gradient of (20) to zero there is

120597119866 (119881 )

120597119881119894119895

=

1

120572

sum

119896

[119880119879

119882]119896119894

1 minus (

119882119896119894[119881119879

119880119879

]119894119895

119883119896119895120585119896119895119894

()

)

minus120572

= 0

(28)

Then it follows that

119881119894119895= 119894119895

[

[

[

sum119896[119880119879

119882]119896119894

(119883119896119895sum119897119882119896119897[119879

119880119879

]119897119895

)

120572

sum119896[119880119879119882]119896119894

]

]

]

1120572

(29)

which is similar to the form of the updating rule (11)Similarly to guarantee the updating rule (12) holds theminimum of 119866(119882 ) which can be determined by settingthe gradient of (26) to zero must exist

Since 119866 is an auxiliary function according to Lemma 4119865 in (21) is nonincreasing under the update (11) Multiplyingupdates (11) and (12) we can find a local minimum of119863120572[119883119882(119880119881)

119879

]

5 Experiments

In this section the CNMF120572

algorithm is systematicallycompared with the current constrained NMF algorithmson three image data sets named ORL Database [31] YaleDatabase [32] and Caltech 101 Database [33] The details ofthe above three databases will be described individually laterWe introduce the evaluated algorithms firstly

(i) Constrained nonnegative matrix factorization algo-rithm in [15] aims to minimize the F-norm cost

(ii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 05 in this paper aims tominimize the Hellinger divergence cost

(iii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 1 in [15] aiming at min-imizing the KL-divergence cost is the best reportedalgorithm in image representation

(iv) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 2 in this paper aims tominimize the 1205942-divergence cost

(v) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 120572

119896in this paper aims to

minimize the 120572-divergence cost where the param-eters 120572

119896are associated with the characteristics of

the image database and designed by the presentedmethod CNMFKL algorithm is a special case of ourCNMF

120572119896algorithm with 120572 = 120572

119896= 1

We apply these algorithms to a problem of classificationand evaluate their performance on three image data setswhich contain a number of different categories of image Foreach date set the evaluations are conducted with differentnumbers of clusters here the number of clusters 119896 varies from2 to 10We randomly choose 119896 categories fromone image dataset and mix the images of these 119896 categories as the collection119883 Then for the semisupervised algorithms we randomlypick up 10 percent images from each category in119883 and recordtheir category number as the available label information toobtain the label matrix119880 For some special data sets the labelprocess is different and we will describe the details later

Suppose a data set has 119873 categories 1198621 1198622 119862

119873 and

the cardinalities of these labeled images are 1198711 1198712 119871119873

respectively Since the label constraint matrix 119880 is composedof the indicatormatrix 119878 and the identity matrix 119868 the indica-tor matrix 119878 plays a critical role in classification performancefor different categories in119883 To determine the effectiveness of119878 we define a measure to represent the relationship betweenthe cardinalities of labeled samples and the total samples

119903 (119896) =

119871max (119896) minus 119871min (119896)

sum119896

119894=1119862119894

119896 (30)

where 119871max(119896) and 119871min(119896) denote the maximum and mini-mum labeled cardinalities of 119896 categories For the fixed clusternumber 119896 in the data set 119903(119896) is different if the number ofsamples in each category is different Then we compute 120572 asfollows

120572119896=

120572119873

119903 (119873)

( |119903 (119896) minus 119903 (119873)|

times (

1003816100381610038161003816100381610038161003816100381610038161003816

119871max (119896) 119896

sum119896

119894=1119862119894

minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

+

1003816100381610038161003816100381610038161003816100381610038161003816

119871min (119896) 119896

sum119896

119894=1119862119894

minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

)

minus1

)

(31)

where

120572119873= (119871max (119873) minus 119871min (119873))

times (

1003816100381610038161003816100381610038161003816100381610038161003816

119871max (119873) minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

+

1003816100381610038161003816100381610038161003816100381610038161003816

119871min (119873) minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

)

minus1

(32)

The value of 120572119896computed by (31) is associated with the

characteristics of image data sets since its variation is causedby both the cardinalities of labeled samples in each categoryand the total samples We can obtain both the cardinalitiesof labeled samples and the total samples in our semisuper-vised algorithms However we can not get the cardinalitiesof labeled images exactly in many real-word applicationsMoreover the value of 120572 varies depending on data sets It isstill an open problem how to select the optimal 120572 [16]

To evaluate the classification performance we defineclassification accuracy as the first measure Our CNMF

120572

6 Discrete Dynamics in Nature and Society

algorithm described in (11) and (12) provides a classificationlabel of each sample marked 119904

119894119897119894 Suppose 119883 = 119909

119894119899

119894=1is a

data set which consists of n training images For each samplelet 119897119894be the true class label provided by the image data set

More specifically if the image 119909119894is designated the 119897

119894th class

we evaluate it as a correct label and set 119904119894119897119894

= 1 Otherwiseit is counted as a false label and noted 119904

119894119897119894= 0 Eventually we

compute the percentage of correct labels obtained by definingthe accuracy measure as

Ac (119904119894119897119894) =

sum119899

119894=1119904119894119897119894

119899

(33)

To evaluate the classification performance we carryout computation about the normalized mutual informationwhich is used to measure how similar two sets of clusters areas the second measure Given two data sets of clusters 119862

119894and

119862119895 their normalized mutual information is defined as

NMI (119862119894 119862119895)

=

sum119888119894isin119862119894 119888119895isin119862119895

119901 (119888119894 119888119895) sdot log 119901 (119888

119894 119888119895) [119901 (119888

119894) 119901 (119888119895)]

max (119867 (119862119894) 119867 (119862

119895))

(34)

which takes values between 0 and 1 Where 119901(119888119894) 119901(119888

119895)

denote the probabilities that an image arbitrarily chosen fromthe data set belongs to the clusters 119862

119894and 119862

119895 respectively

and 119901(119888119894 119888119895) denotes the joint probability that this arbitrarily

selected image belongs to the cluster 119862119894as well as 119862

119895at the

same time119867(119862119894) and 119867(119862

119895) are the entropies of 119862

119894and 119862

119895

respectivelyExperimental results on each data set will be presented as

classification accuracy and the normalized mutual informa-tion is in Tables 1 and 2

51 ORL Database The Cambridge ORL Face Database has400 images for 40 different people 10 images per personThe images of some people are taken at different timesvarying lighting slightly facial expressions (openclosed eyessmilingnonsmiling) and facial details (glassesno glasses)All the images are taken against a dark homogeneous back-ground with the subjects in an upright frontal position andslight left-right out-of-plane rotation To locate the facesthe input images are preprocessed They are resized to 32 times

32 pixels with 256 gray levels per pixel and normalized inorientation so that two eyes in the facial areas are aligned atthe same position

There are 10 images for each category in ORL and 10percent is just one image For the fixed parameter 120572 (120572 =

05 1 2) we randomly choose two images from each categoryto provide the label information Note that the same labelis meaningless for (30) To obtain 119903(119896) we divide the 40categories into 3 groups 10 categories 20 categories and 10categories In the first 10 categories pick up 1 image fromeach category to provide the label information pick up 2images from each category in the second 20 categories andpick up 3 from each category in the remaining categoriesThe

dividing process is repeated 10 times and the obtained averageclassification accuracy is recorded as the final result

Figure 1 shows the graphical classification accuracy ratesand normalized mutual information on the ORL DatabaseNote that if the samples in the collection 119883 come fromthe same group we set 120572

119896= 067 Because of the same

number of samples in each category the variation of the120572 is small even though we label different cardinalities ofsamples Compared to the constrained nonnegative matrixfactorization algorithms with fixed parameters our CNMF

120572119896

algorithm gives the best performance since the selection of120572119896is suitable to the collection 119883 Table 1 summarizes the

detailed classification accuracy and error bars of CNMF120572=05

CNMF

120572=2 and CNMF

120572119896 It shows that our CNMF

120572119896algo-

rithm achieves 192 percent improvement compared to thebest reported CNMFKL algorithm (120572 = 1) [15] in averageclassification accuracy For normalized mutual informationthe details and the error bars of our constrained algorithmswith 120572 = 05 120572 = 2 and 120572

119896are listed in Table 2 Comparing

to the best algorithmCNMF ourCNMF120572119896algorithmachieves

054 percent improvement

52 Yale Database The Yale Database consists of 165grayscale images for 15 individuals 11 images per personOne per image is taken from different facial expression orconfiguration center-light wglasses happy left-light wnoglasses normal right-light sad sleepy surprised and winkWe preprocess all the images of Yale Database in the sameway as the ORL Database Each image is linearly stretched toa 1024-dimensional vector in image space

TheYaleDatabase also has the same number of samples ineach category To obtain an appropriate 120572 we do similar labelprocessing to the ORL Database Divide the 15 individualsinto 3 groups averagely choose 1 image from each categoryin the first group choose 2 from each category in the secondand choose 3 from each category in the remaining groupWe repeat the process 10 times and record the averageclassification accuracy as the final result

Figure 2 shows the classification accuracy and normal-ized mutual information on the Yale Database Set 120572

119896=

055 when 119903(119896) minus 119903(119873) = 0 It indicates that the samplesin the collection 119883 just come from two groups that ischoose 15 images from 10 categories in the first and secondgroup CNMF

120572119896achieves an extraordinary performance for

all the cases and CNMF120572=05

follows This suggests that theconstrained nonnegative matrix factorization algorithm hasa higher classification accuracy when the value of 120572 is closeto 120572119896 Comparing to the best reported CNMFKL algorithm

CNMF120572119896

achieves 242 percent improvement in averageclassification accuracy For normalized mutual informationCNMF

120572119896achieves 72 percent improvement compared to

CNMF The details of classification accuracy and normal-ized mutual information are provided in Tables 3 and 4which contain the error bars of CNMF

120572=05 CNMF

120572=2 and

CNMF120572119896

53 Caltech 101 Database The Caltech 101 Database createdby Caltech University has images of 101 different object

Discrete Dynamics in Nature and Society 7

Table 1 The comparison of classification accuracy rates on the ORL Database

119896

Classification accuracy Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321

Table 2 The comparison of normalized mutual information on the ORL Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027

categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram

We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572

119896computed by (31) are listed in Table 7

The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572

119896can fully reflect the effectiveness of the

indicator matrix 119878Figure 3 shows the effect that derived from the using of

proposed algorithm with 120572119896 The upper part of figure is the

original samples which contain 26 images the middle part

is their gray images and the lower is the combination of thebasis vectors learned by CNMF

120572119896

The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF

120572=05 CNMF

120572=2 and CNMF

120572119896 The graphical results

of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572

119896listed in Table 7were selected In general ourmethod

demonstrates much better effectiveness in classification bychoosing 120572

119896 Comparing to the best reported algorithm

other than our CNMF120572algorithm CNMF

120572119896achieves 24

percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF

120572

algorithm CNMF120572119896

achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF

120572119896achieves 229 percent improvement and

consistently outperforms the other algorithms

6 Conclusion

In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using

8 Discrete Dynamics in Nature and Society

Table 3 The comparison of classification accuracy rates on the Yale Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831

Table 4 The comparison of normalized mutual information on the Yale Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765

2 3 4 5 6 7 8 9 10076

078

08

082

084

086

088

09

092

094

Number of classes

Accu

racy

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy rates

2 3 4 5 6 7 8 9 10072

074

076

078

08

082

084

086

088

09

Nor

mal

ized

mut

ual i

nfor

mat

ion

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 1 Classification performance on the ORL Database

Discrete Dynamics in Nature and Society 9

2 3 4 5 6 7 8 9 1005

055

06

065

07

075

08

085

09Ac

cura

cy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 10042

044

046

048

05

052

054

056

058

06

062

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 2 Classification performance on the Yale Database

(a)

(b)

(c)

Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF

120572119896 Positive values are illustrated with white pixels

Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the

model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF

120572

algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total

10 Discrete Dynamics in Nature and Society

2 3 4 5 6 7 8 9 1004

045

05

055

06

065

07

075

08

085

09

Accu

racy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 1002

025

03

035

04

045

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 4 Classification performance on the Caltech 101 Database

Table 5 The comparison of classification accuracy rates on the Caltech 101 Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898

Table 6 The comparison of normalized mutual information on the Caltech 101 Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869

Discrete Dynamics in Nature and Society 11

Table 7 The value of 120572 in 119896 categories

119896 1205721

1205722

1205723

1205724

1205725

1205726

1205727

1205728

1205729

12057210

2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100

samples In the experiments we apply the fixed parameters120572 and 120572

119896to analyze the classification accuracy on three

image databasesThe experimental results have demonstratedthat the CNMF

120572119896algorithm has best classification accuracy

However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)

References

[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987

[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996

[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994

[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986

[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001

[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992

[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999

[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004

[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001

[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005

[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004

[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004

[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001

[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006

[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012

[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008

[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003

[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003

[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006

[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of

12 Discrete Dynamics in Nature and Society

IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003

[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006

[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004

[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008

[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006

[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006

[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006

[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006

[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997

[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982

[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml

[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant

keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article A Constrained Algorithm Based NMF for ...downloads.hindawi.com/journals/ddns/2014/179129.pdfResearch Article A Constrained Algorithm Based NMF for Image Representation

Discrete Dynamics in Nature and Society 3

31 KKT Method Let 120582119894119895ge 0 and 120583

119894119895ge 0 be the Lagrangian

multipliers associated with constraints 119881119894119895

ge 0 and 119882119894119895

ge

0 respectively The Karush-Kuhn-Tucker conditions requirethat both the optimality conditions

120597119863120572[119883119882(119880119881)

119879

]

120597119881119894119895

=

1

120572

sum

119896

[119880119879

119882]119896119894

minussum

119896

[119880119879

119882]119896119894

(

119883119896119895

[119882119881119879119880119879]119896119895

)

120572

= 120582119894119895

(7)

120597119863120572[119883119882(119880119881)

119879

]

120597119882119894119895

=

1

120572

sum

119896

[119881119879

119880119879

]119895119896

minussum

119896

[119881119879

119880119879

]119895119896

(

119883119894119896

[119882119881119879119880119879]119894119896

)

120572

= 120583119894119895

(8)

and the complementary slackness conditions

120582119894119895119881119894119895= 0

120583119894119895119882119894119895= 0

(9)

are satisfied If 119881119894119895= 0 and 119882

119894119895= 0 then either 120582

119894119895120583119894119895can

have any values At the same time if 120582119894119895= 0 and 120583

119894119895= 0 then

119881119894119895ge 0 and119882

119894119895ge 0 Hence we need both 120582

119894119895= 0 120583119894119895

= 0 and119881119894119895

= 0 119882119894119895

= 0 It follows from (9) that

120582119894119895119881120572

119894119895= 0

120583119894119895119882120572

119894119895= 0

(10)

We multiply both sides of (7) and (8) by 119881119894119895

and 119882119894119895

respectively and incorporate with (10) and then we obtainthe following updating rules

119881119894119895larr997888 119881

119894119895

[

[

[

sum119896[119880119879

119882]119896119894

(119883119896119895[119882119881

119879

119880119879

]119896119895

)

120572

sum119897[119880119879119882]119897119894

]

]

]

1120572

(11)

119882119894119895larr997888 119882

119894119895

[

[

sum119896[119881119879

119880119879

]119895119896

(119883119894119896[119882119881

119879

119880119879

]119894119896

)

120572

sum119897[119881119879119880119879]119895119897

]

]

1120572

(12)

32 Projected Gradient Method Considering the gradientdescent algorithm [24 25] the updating rules for the objec-tive function (6) can be also derived by using the projectedgradient method [27] and have the form

120601 (119881119894119895) larr997888 120601 (119881

119894119895) + 120575119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119881119894119895

120601 (119882119894119895) larr997888 120601 (119882

119894119895) + 120574119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119882119894119895

(13)

where 120601(sdot) is a suitably chosen function and 120575119894119895and 120574119894119895are two

parameters to control the step size of gradient descent Thenwe have

119881119894119895larr997888 120601

minus1

(120601 (119881119894119895) + 120575119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119881119894119895

)

119882119894119895larr997888 120601

minus1

(120601 (119882119894119895) + 120574119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119882119894119895

)

(14)

Setting 120601(Ω) = Ω120572 to guarantee that the updating rules (11)

and (12) hold we need

120575119894119895= minus

120572119881120572

119894119895

sum119896[119880119879119882]119896119894

120574119894119895= minus

120572119882120572

119894119895

sum119896[119881119879119880119879]119895119896

(15)

From (7) and (8) the updating rules become

119881120572

119894119895larr997888 119881

120572

119894119895+ 120575119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119881119894119895

= 119881120572

119894119895

sum119896[119880119879

119882]119896119894

(119883119896119895[119882119881

119879

119880119879

]119896119895

)

120572

sum119897[119880119879119882]119897119894

119882120572

119894119895larr997888 119882

120572

119894119895+ 120574119894119895

120597119863120572[119883119882(119880119881)

119879

]

120597119882119894119895

= 119882120572

119894119895

sum119896[119881119879

119880119879

]119895119896

(119883119894119896[119882119881

119879

119880119879

]119894119896

)

120572

sum119897[119881119879119880119879]119895119897

(16)

which are the same as the updating rules (11) and (12)We have shown that the algorithm can be derived usingKarush-Kuhn-Tucker conditions and presented alternative ofthe algorithm using the projected gradient The use of thetwo methods guarantees the correctness of the algorithmtheoretically

In the following we will give a theorem to guarantee theconvergence of the iterations in updates (11) and (12)

Theorem 1 For the objective function (6) 119863120572[119883119882(119880119881)

119879

]

is nonincreasing under the updating rules (11) and (12) Theobjective function 119863

120572is invariant under these updates if and

only if 119881 and119882 are at a stationary point

Multiplicative updates for our constrained algorithmbased NMF

120572are given in (11) and (12) These updates find

a local minimum of 119863120572[119883119882(119880119881)

119879

] which is the finalsolution of (11) and (12) Note that when 120572 = 1 the updates(11) and (12) are the samewithCNMFKL algorithm [15] whichis a special case included in our generic constraint matrixdecomposition algorithm In the following we will give theproof of Theorem 1

4 Convergence Analysis

To proveTheorem 1 wewillmake use of an auxiliary functionthatwas used in the expectation-maximization algorithm [2829]

4 Discrete Dynamics in Nature and Society

Definition 2 A function 119866(119909 119909) is defined as an auxiliaryfunction for 119865(119909) if the following two conditions are bothsatisfied

119866 (119909 119909) = 119865 (119909) 119866 (119909 119909) ge 119865 (119909) (17)

Lemma 3 Assume that the function 119866(119909 119909) is an auxiliaryfunction for 119865(119909) then 119865(119909) is nonincreasing under the update

119909119905+1

= argmin119909

119866(119909 119909119905

) (18)

Proof Consider 119865(119909119905+1) le 119866(119909119905+1

119909119905

) le 119866(119909119905

119909119905

) = 119865(119909119905

)

It can be observed that the equality 119865(119909119905+1

) = 119865(119909119905

)

holds only if 119909119905 is a local minimum of 119866(119909 119909) We iterate theupdate in (18) to obtain a sequence of estimates that convergeto a local minimum 119909min = argmin

119909119865(119909) of the objective

function given by

119865 (119909min) le sdot sdot sdot le 119865 (119909119905+1

) le 119865 (119909119905

)

le sdot sdot sdot le 119865 (1199092) le 119865 (119909

1) le 119865 (119909

0)

(19)

In the following we will show that the objective function (6)is nonincreasing under the updating rules (11) and (12) bydefining the appropriate auxiliary functions with respect to119881119894119895and119882

119894119895

Lemma 4 Function

119866(119881 )

=

1

120572 (1 minus 120572)

sum

119894119895119896

119883119894119895120585119894119895119896

()

times

120572 + (1 minus 120572)

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120585119894119895119896

()

minus (

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120585119894119895119896()

)

1minus120572

(20)

where 120585119894119895119896() = 119882

119894119896[119879

119880119879

]119896119895sum119897119882119894119897[119879

119880119879

]119897119895 is an auxiliary

function for

119865 (119881) =

1

120572 (1 minus 120572)

timessum

119894119895

120572119883119894119895+ (1 minus 120572) [119882119881

119879

119880119879

]119894119895

minus 119883120572

119894119895[119882119881119879

119880119879

]

1minus120572

119894119895

(21)

Proof Obviously 119866(119881119881) = 119865(119881) According to the defini-tion of auxiliary function we only need to prove 119866(119881 ) ge

119865(119881) To do this we use the convex function 119891(sdot) for positive120572 to rewrite the 120572-divergence function 119865(119881) as

119865 (119881) = sum

119894119895

119883119894119895119891(

sum119896119882119894119896[119881119879

119880119879

]119896119895

119883119894119895

) (22)

where

119891 (119909) =

1

120572 (1 minus 120572)

[120572 minus (1 minus 120572) 119909 minus 1199091minus120572

] (23)

Note that sum119896120585119894119895119896() = 1 and 120585

119894119895119896() ge 0 from the definition

of 120585119894119895119896() Applying Jensens inequality [30] it leads to

119891(sum

119896

119882119894119896[119881119879

119880119879

]119896119895

) le sum

119896

120585119894119895119896

() 119891(

119882119894119896[119881119879

119880119879

]119896119895

120585119894119895119896

()

)

(24)

From the above inequality it follows that

119865 (119881) = sum

119894119895

119883119894119895119891(

sum119896119882119894119896[119881119879

119880119879

]119896119895

119883119894119895

)

le sum

119894119895119896

119883119894119895120585119894119895119896

() 119891(

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120585119894119895119896

()

)

= 119866 (119881 )

(25)

which satisfies the condition of auxiliary function

Reversing the rules of 119881119894119895and119882

119894119895in Lemma 4 we define

an auxiliary function for the update (12)

Lemma 5 Function

119866(119882 )

=

1

120572 (1 minus 120572)

sum

119894119895119896

119883119894119895120578119894119895119896

()

times

120572 + (1 minus 120572)

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120578119894119895119896

()

minus (

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120578119894119895119896

()

)

1minus120572

(26)

where 120578119894119895119896() =

119894119896[119881119879

119880119879

]119896119895sum119897119894119897[119881119879

119880119879

]119897119895 is an auxil-

iary function for

119865 (119882)

=

1

120572 (1 minus 120572)

timessum

119894119895

120572119883119894119895+ (1 minus 120572) [119882119881

119879

119880119879

]119894119895

minus 119883120572

119894119895[119882119881119879

119880119879

]

1minus120572

119894119895

(27)

This can be easily proved in the same way as Lemma 4From Lemmas 4 and 5 now we can demonstrate the conver-gence of Theorem 1

Discrete Dynamics in Nature and Society 5

Proof To guarantee the stability of 119865(119881) from Lemma 3 wejust need to obtain the minimum of 119866(119881 ) with respect to119881119894119895 Set the gradient of (20) to zero there is

120597119866 (119881 )

120597119881119894119895

=

1

120572

sum

119896

[119880119879

119882]119896119894

1 minus (

119882119896119894[119881119879

119880119879

]119894119895

119883119896119895120585119896119895119894

()

)

minus120572

= 0

(28)

Then it follows that

119881119894119895= 119894119895

[

[

[

sum119896[119880119879

119882]119896119894

(119883119896119895sum119897119882119896119897[119879

119880119879

]119897119895

)

120572

sum119896[119880119879119882]119896119894

]

]

]

1120572

(29)

which is similar to the form of the updating rule (11)Similarly to guarantee the updating rule (12) holds theminimum of 119866(119882 ) which can be determined by settingthe gradient of (26) to zero must exist

Since 119866 is an auxiliary function according to Lemma 4119865 in (21) is nonincreasing under the update (11) Multiplyingupdates (11) and (12) we can find a local minimum of119863120572[119883119882(119880119881)

119879

]

5 Experiments

In this section the CNMF120572

algorithm is systematicallycompared with the current constrained NMF algorithmson three image data sets named ORL Database [31] YaleDatabase [32] and Caltech 101 Database [33] The details ofthe above three databases will be described individually laterWe introduce the evaluated algorithms firstly

(i) Constrained nonnegative matrix factorization algo-rithm in [15] aims to minimize the F-norm cost

(ii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 05 in this paper aims tominimize the Hellinger divergence cost

(iii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 1 in [15] aiming at min-imizing the KL-divergence cost is the best reportedalgorithm in image representation

(iv) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 2 in this paper aims tominimize the 1205942-divergence cost

(v) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 120572

119896in this paper aims to

minimize the 120572-divergence cost where the param-eters 120572

119896are associated with the characteristics of

the image database and designed by the presentedmethod CNMFKL algorithm is a special case of ourCNMF

120572119896algorithm with 120572 = 120572

119896= 1

We apply these algorithms to a problem of classificationand evaluate their performance on three image data setswhich contain a number of different categories of image Foreach date set the evaluations are conducted with differentnumbers of clusters here the number of clusters 119896 varies from2 to 10We randomly choose 119896 categories fromone image dataset and mix the images of these 119896 categories as the collection119883 Then for the semisupervised algorithms we randomlypick up 10 percent images from each category in119883 and recordtheir category number as the available label information toobtain the label matrix119880 For some special data sets the labelprocess is different and we will describe the details later

Suppose a data set has 119873 categories 1198621 1198622 119862

119873 and

the cardinalities of these labeled images are 1198711 1198712 119871119873

respectively Since the label constraint matrix 119880 is composedof the indicatormatrix 119878 and the identity matrix 119868 the indica-tor matrix 119878 plays a critical role in classification performancefor different categories in119883 To determine the effectiveness of119878 we define a measure to represent the relationship betweenthe cardinalities of labeled samples and the total samples

119903 (119896) =

119871max (119896) minus 119871min (119896)

sum119896

119894=1119862119894

119896 (30)

where 119871max(119896) and 119871min(119896) denote the maximum and mini-mum labeled cardinalities of 119896 categories For the fixed clusternumber 119896 in the data set 119903(119896) is different if the number ofsamples in each category is different Then we compute 120572 asfollows

120572119896=

120572119873

119903 (119873)

( |119903 (119896) minus 119903 (119873)|

times (

1003816100381610038161003816100381610038161003816100381610038161003816

119871max (119896) 119896

sum119896

119894=1119862119894

minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

+

1003816100381610038161003816100381610038161003816100381610038161003816

119871min (119896) 119896

sum119896

119894=1119862119894

minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

)

minus1

)

(31)

where

120572119873= (119871max (119873) minus 119871min (119873))

times (

1003816100381610038161003816100381610038161003816100381610038161003816

119871max (119873) minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

+

1003816100381610038161003816100381610038161003816100381610038161003816

119871min (119873) minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

)

minus1

(32)

The value of 120572119896computed by (31) is associated with the

characteristics of image data sets since its variation is causedby both the cardinalities of labeled samples in each categoryand the total samples We can obtain both the cardinalitiesof labeled samples and the total samples in our semisuper-vised algorithms However we can not get the cardinalitiesof labeled images exactly in many real-word applicationsMoreover the value of 120572 varies depending on data sets It isstill an open problem how to select the optimal 120572 [16]

To evaluate the classification performance we defineclassification accuracy as the first measure Our CNMF

120572

6 Discrete Dynamics in Nature and Society

algorithm described in (11) and (12) provides a classificationlabel of each sample marked 119904

119894119897119894 Suppose 119883 = 119909

119894119899

119894=1is a

data set which consists of n training images For each samplelet 119897119894be the true class label provided by the image data set

More specifically if the image 119909119894is designated the 119897

119894th class

we evaluate it as a correct label and set 119904119894119897119894

= 1 Otherwiseit is counted as a false label and noted 119904

119894119897119894= 0 Eventually we

compute the percentage of correct labels obtained by definingthe accuracy measure as

Ac (119904119894119897119894) =

sum119899

119894=1119904119894119897119894

119899

(33)

To evaluate the classification performance we carryout computation about the normalized mutual informationwhich is used to measure how similar two sets of clusters areas the second measure Given two data sets of clusters 119862

119894and

119862119895 their normalized mutual information is defined as

NMI (119862119894 119862119895)

=

sum119888119894isin119862119894 119888119895isin119862119895

119901 (119888119894 119888119895) sdot log 119901 (119888

119894 119888119895) [119901 (119888

119894) 119901 (119888119895)]

max (119867 (119862119894) 119867 (119862

119895))

(34)

which takes values between 0 and 1 Where 119901(119888119894) 119901(119888

119895)

denote the probabilities that an image arbitrarily chosen fromthe data set belongs to the clusters 119862

119894and 119862

119895 respectively

and 119901(119888119894 119888119895) denotes the joint probability that this arbitrarily

selected image belongs to the cluster 119862119894as well as 119862

119895at the

same time119867(119862119894) and 119867(119862

119895) are the entropies of 119862

119894and 119862

119895

respectivelyExperimental results on each data set will be presented as

classification accuracy and the normalized mutual informa-tion is in Tables 1 and 2

51 ORL Database The Cambridge ORL Face Database has400 images for 40 different people 10 images per personThe images of some people are taken at different timesvarying lighting slightly facial expressions (openclosed eyessmilingnonsmiling) and facial details (glassesno glasses)All the images are taken against a dark homogeneous back-ground with the subjects in an upright frontal position andslight left-right out-of-plane rotation To locate the facesthe input images are preprocessed They are resized to 32 times

32 pixels with 256 gray levels per pixel and normalized inorientation so that two eyes in the facial areas are aligned atthe same position

There are 10 images for each category in ORL and 10percent is just one image For the fixed parameter 120572 (120572 =

05 1 2) we randomly choose two images from each categoryto provide the label information Note that the same labelis meaningless for (30) To obtain 119903(119896) we divide the 40categories into 3 groups 10 categories 20 categories and 10categories In the first 10 categories pick up 1 image fromeach category to provide the label information pick up 2images from each category in the second 20 categories andpick up 3 from each category in the remaining categoriesThe

dividing process is repeated 10 times and the obtained averageclassification accuracy is recorded as the final result

Figure 1 shows the graphical classification accuracy ratesand normalized mutual information on the ORL DatabaseNote that if the samples in the collection 119883 come fromthe same group we set 120572

119896= 067 Because of the same

number of samples in each category the variation of the120572 is small even though we label different cardinalities ofsamples Compared to the constrained nonnegative matrixfactorization algorithms with fixed parameters our CNMF

120572119896

algorithm gives the best performance since the selection of120572119896is suitable to the collection 119883 Table 1 summarizes the

detailed classification accuracy and error bars of CNMF120572=05

CNMF

120572=2 and CNMF

120572119896 It shows that our CNMF

120572119896algo-

rithm achieves 192 percent improvement compared to thebest reported CNMFKL algorithm (120572 = 1) [15] in averageclassification accuracy For normalized mutual informationthe details and the error bars of our constrained algorithmswith 120572 = 05 120572 = 2 and 120572

119896are listed in Table 2 Comparing

to the best algorithmCNMF ourCNMF120572119896algorithmachieves

054 percent improvement

52 Yale Database The Yale Database consists of 165grayscale images for 15 individuals 11 images per personOne per image is taken from different facial expression orconfiguration center-light wglasses happy left-light wnoglasses normal right-light sad sleepy surprised and winkWe preprocess all the images of Yale Database in the sameway as the ORL Database Each image is linearly stretched toa 1024-dimensional vector in image space

TheYaleDatabase also has the same number of samples ineach category To obtain an appropriate 120572 we do similar labelprocessing to the ORL Database Divide the 15 individualsinto 3 groups averagely choose 1 image from each categoryin the first group choose 2 from each category in the secondand choose 3 from each category in the remaining groupWe repeat the process 10 times and record the averageclassification accuracy as the final result

Figure 2 shows the classification accuracy and normal-ized mutual information on the Yale Database Set 120572

119896=

055 when 119903(119896) minus 119903(119873) = 0 It indicates that the samplesin the collection 119883 just come from two groups that ischoose 15 images from 10 categories in the first and secondgroup CNMF

120572119896achieves an extraordinary performance for

all the cases and CNMF120572=05

follows This suggests that theconstrained nonnegative matrix factorization algorithm hasa higher classification accuracy when the value of 120572 is closeto 120572119896 Comparing to the best reported CNMFKL algorithm

CNMF120572119896

achieves 242 percent improvement in averageclassification accuracy For normalized mutual informationCNMF

120572119896achieves 72 percent improvement compared to

CNMF The details of classification accuracy and normal-ized mutual information are provided in Tables 3 and 4which contain the error bars of CNMF

120572=05 CNMF

120572=2 and

CNMF120572119896

53 Caltech 101 Database The Caltech 101 Database createdby Caltech University has images of 101 different object

Discrete Dynamics in Nature and Society 7

Table 1 The comparison of classification accuracy rates on the ORL Database

119896

Classification accuracy Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321

Table 2 The comparison of normalized mutual information on the ORL Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027

categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram

We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572

119896computed by (31) are listed in Table 7

The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572

119896can fully reflect the effectiveness of the

indicator matrix 119878Figure 3 shows the effect that derived from the using of

proposed algorithm with 120572119896 The upper part of figure is the

original samples which contain 26 images the middle part

is their gray images and the lower is the combination of thebasis vectors learned by CNMF

120572119896

The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF

120572=05 CNMF

120572=2 and CNMF

120572119896 The graphical results

of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572

119896listed in Table 7were selected In general ourmethod

demonstrates much better effectiveness in classification bychoosing 120572

119896 Comparing to the best reported algorithm

other than our CNMF120572algorithm CNMF

120572119896achieves 24

percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF

120572

algorithm CNMF120572119896

achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF

120572119896achieves 229 percent improvement and

consistently outperforms the other algorithms

6 Conclusion

In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using

8 Discrete Dynamics in Nature and Society

Table 3 The comparison of classification accuracy rates on the Yale Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831

Table 4 The comparison of normalized mutual information on the Yale Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765

2 3 4 5 6 7 8 9 10076

078

08

082

084

086

088

09

092

094

Number of classes

Accu

racy

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy rates

2 3 4 5 6 7 8 9 10072

074

076

078

08

082

084

086

088

09

Nor

mal

ized

mut

ual i

nfor

mat

ion

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 1 Classification performance on the ORL Database

Discrete Dynamics in Nature and Society 9

2 3 4 5 6 7 8 9 1005

055

06

065

07

075

08

085

09Ac

cura

cy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 10042

044

046

048

05

052

054

056

058

06

062

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 2 Classification performance on the Yale Database

(a)

(b)

(c)

Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF

120572119896 Positive values are illustrated with white pixels

Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the

model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF

120572

algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total

10 Discrete Dynamics in Nature and Society

2 3 4 5 6 7 8 9 1004

045

05

055

06

065

07

075

08

085

09

Accu

racy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 1002

025

03

035

04

045

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 4 Classification performance on the Caltech 101 Database

Table 5 The comparison of classification accuracy rates on the Caltech 101 Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898

Table 6 The comparison of normalized mutual information on the Caltech 101 Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869

Discrete Dynamics in Nature and Society 11

Table 7 The value of 120572 in 119896 categories

119896 1205721

1205722

1205723

1205724

1205725

1205726

1205727

1205728

1205729

12057210

2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100

samples In the experiments we apply the fixed parameters120572 and 120572

119896to analyze the classification accuracy on three

image databasesThe experimental results have demonstratedthat the CNMF

120572119896algorithm has best classification accuracy

However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)

References

[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987

[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996

[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994

[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986

[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001

[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992

[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999

[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004

[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001

[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005

[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004

[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004

[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001

[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006

[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012

[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008

[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003

[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003

[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006

[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of

12 Discrete Dynamics in Nature and Society

IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003

[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006

[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004

[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008

[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006

[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006

[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006

[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006

[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997

[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982

[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml

[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant

keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article A Constrained Algorithm Based NMF for ...downloads.hindawi.com/journals/ddns/2014/179129.pdfResearch Article A Constrained Algorithm Based NMF for Image Representation

4 Discrete Dynamics in Nature and Society

Definition 2 A function 119866(119909 119909) is defined as an auxiliaryfunction for 119865(119909) if the following two conditions are bothsatisfied

119866 (119909 119909) = 119865 (119909) 119866 (119909 119909) ge 119865 (119909) (17)

Lemma 3 Assume that the function 119866(119909 119909) is an auxiliaryfunction for 119865(119909) then 119865(119909) is nonincreasing under the update

119909119905+1

= argmin119909

119866(119909 119909119905

) (18)

Proof Consider 119865(119909119905+1) le 119866(119909119905+1

119909119905

) le 119866(119909119905

119909119905

) = 119865(119909119905

)

It can be observed that the equality 119865(119909119905+1

) = 119865(119909119905

)

holds only if 119909119905 is a local minimum of 119866(119909 119909) We iterate theupdate in (18) to obtain a sequence of estimates that convergeto a local minimum 119909min = argmin

119909119865(119909) of the objective

function given by

119865 (119909min) le sdot sdot sdot le 119865 (119909119905+1

) le 119865 (119909119905

)

le sdot sdot sdot le 119865 (1199092) le 119865 (119909

1) le 119865 (119909

0)

(19)

In the following we will show that the objective function (6)is nonincreasing under the updating rules (11) and (12) bydefining the appropriate auxiliary functions with respect to119881119894119895and119882

119894119895

Lemma 4 Function

119866(119881 )

=

1

120572 (1 minus 120572)

sum

119894119895119896

119883119894119895120585119894119895119896

()

times

120572 + (1 minus 120572)

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120585119894119895119896

()

minus (

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120585119894119895119896()

)

1minus120572

(20)

where 120585119894119895119896() = 119882

119894119896[119879

119880119879

]119896119895sum119897119882119894119897[119879

119880119879

]119897119895 is an auxiliary

function for

119865 (119881) =

1

120572 (1 minus 120572)

timessum

119894119895

120572119883119894119895+ (1 minus 120572) [119882119881

119879

119880119879

]119894119895

minus 119883120572

119894119895[119882119881119879

119880119879

]

1minus120572

119894119895

(21)

Proof Obviously 119866(119881119881) = 119865(119881) According to the defini-tion of auxiliary function we only need to prove 119866(119881 ) ge

119865(119881) To do this we use the convex function 119891(sdot) for positive120572 to rewrite the 120572-divergence function 119865(119881) as

119865 (119881) = sum

119894119895

119883119894119895119891(

sum119896119882119894119896[119881119879

119880119879

]119896119895

119883119894119895

) (22)

where

119891 (119909) =

1

120572 (1 minus 120572)

[120572 minus (1 minus 120572) 119909 minus 1199091minus120572

] (23)

Note that sum119896120585119894119895119896() = 1 and 120585

119894119895119896() ge 0 from the definition

of 120585119894119895119896() Applying Jensens inequality [30] it leads to

119891(sum

119896

119882119894119896[119881119879

119880119879

]119896119895

) le sum

119896

120585119894119895119896

() 119891(

119882119894119896[119881119879

119880119879

]119896119895

120585119894119895119896

()

)

(24)

From the above inequality it follows that

119865 (119881) = sum

119894119895

119883119894119895119891(

sum119896119882119894119896[119881119879

119880119879

]119896119895

119883119894119895

)

le sum

119894119895119896

119883119894119895120585119894119895119896

() 119891(

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120585119894119895119896

()

)

= 119866 (119881 )

(25)

which satisfies the condition of auxiliary function

Reversing the rules of 119881119894119895and119882

119894119895in Lemma 4 we define

an auxiliary function for the update (12)

Lemma 5 Function

119866(119882 )

=

1

120572 (1 minus 120572)

sum

119894119895119896

119883119894119895120578119894119895119896

()

times

120572 + (1 minus 120572)

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120578119894119895119896

()

minus (

119882119894119896[119881119879

119880119879

]119896119895

119883119894119895120578119894119895119896

()

)

1minus120572

(26)

where 120578119894119895119896() =

119894119896[119881119879

119880119879

]119896119895sum119897119894119897[119881119879

119880119879

]119897119895 is an auxil-

iary function for

119865 (119882)

=

1

120572 (1 minus 120572)

timessum

119894119895

120572119883119894119895+ (1 minus 120572) [119882119881

119879

119880119879

]119894119895

minus 119883120572

119894119895[119882119881119879

119880119879

]

1minus120572

119894119895

(27)

This can be easily proved in the same way as Lemma 4From Lemmas 4 and 5 now we can demonstrate the conver-gence of Theorem 1

Discrete Dynamics in Nature and Society 5

Proof To guarantee the stability of 119865(119881) from Lemma 3 wejust need to obtain the minimum of 119866(119881 ) with respect to119881119894119895 Set the gradient of (20) to zero there is

120597119866 (119881 )

120597119881119894119895

=

1

120572

sum

119896

[119880119879

119882]119896119894

1 minus (

119882119896119894[119881119879

119880119879

]119894119895

119883119896119895120585119896119895119894

()

)

minus120572

= 0

(28)

Then it follows that

119881119894119895= 119894119895

[

[

[

sum119896[119880119879

119882]119896119894

(119883119896119895sum119897119882119896119897[119879

119880119879

]119897119895

)

120572

sum119896[119880119879119882]119896119894

]

]

]

1120572

(29)

which is similar to the form of the updating rule (11)Similarly to guarantee the updating rule (12) holds theminimum of 119866(119882 ) which can be determined by settingthe gradient of (26) to zero must exist

Since 119866 is an auxiliary function according to Lemma 4119865 in (21) is nonincreasing under the update (11) Multiplyingupdates (11) and (12) we can find a local minimum of119863120572[119883119882(119880119881)

119879

]

5 Experiments

In this section the CNMF120572

algorithm is systematicallycompared with the current constrained NMF algorithmson three image data sets named ORL Database [31] YaleDatabase [32] and Caltech 101 Database [33] The details ofthe above three databases will be described individually laterWe introduce the evaluated algorithms firstly

(i) Constrained nonnegative matrix factorization algo-rithm in [15] aims to minimize the F-norm cost

(ii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 05 in this paper aims tominimize the Hellinger divergence cost

(iii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 1 in [15] aiming at min-imizing the KL-divergence cost is the best reportedalgorithm in image representation

(iv) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 2 in this paper aims tominimize the 1205942-divergence cost

(v) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 120572

119896in this paper aims to

minimize the 120572-divergence cost where the param-eters 120572

119896are associated with the characteristics of

the image database and designed by the presentedmethod CNMFKL algorithm is a special case of ourCNMF

120572119896algorithm with 120572 = 120572

119896= 1

We apply these algorithms to a problem of classificationand evaluate their performance on three image data setswhich contain a number of different categories of image Foreach date set the evaluations are conducted with differentnumbers of clusters here the number of clusters 119896 varies from2 to 10We randomly choose 119896 categories fromone image dataset and mix the images of these 119896 categories as the collection119883 Then for the semisupervised algorithms we randomlypick up 10 percent images from each category in119883 and recordtheir category number as the available label information toobtain the label matrix119880 For some special data sets the labelprocess is different and we will describe the details later

Suppose a data set has 119873 categories 1198621 1198622 119862

119873 and

the cardinalities of these labeled images are 1198711 1198712 119871119873

respectively Since the label constraint matrix 119880 is composedof the indicatormatrix 119878 and the identity matrix 119868 the indica-tor matrix 119878 plays a critical role in classification performancefor different categories in119883 To determine the effectiveness of119878 we define a measure to represent the relationship betweenthe cardinalities of labeled samples and the total samples

119903 (119896) =

119871max (119896) minus 119871min (119896)

sum119896

119894=1119862119894

119896 (30)

where 119871max(119896) and 119871min(119896) denote the maximum and mini-mum labeled cardinalities of 119896 categories For the fixed clusternumber 119896 in the data set 119903(119896) is different if the number ofsamples in each category is different Then we compute 120572 asfollows

120572119896=

120572119873

119903 (119873)

( |119903 (119896) minus 119903 (119873)|

times (

1003816100381610038161003816100381610038161003816100381610038161003816

119871max (119896) 119896

sum119896

119894=1119862119894

minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

+

1003816100381610038161003816100381610038161003816100381610038161003816

119871min (119896) 119896

sum119896

119894=1119862119894

minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

)

minus1

)

(31)

where

120572119873= (119871max (119873) minus 119871min (119873))

times (

1003816100381610038161003816100381610038161003816100381610038161003816

119871max (119873) minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

+

1003816100381610038161003816100381610038161003816100381610038161003816

119871min (119873) minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

)

minus1

(32)

The value of 120572119896computed by (31) is associated with the

characteristics of image data sets since its variation is causedby both the cardinalities of labeled samples in each categoryand the total samples We can obtain both the cardinalitiesof labeled samples and the total samples in our semisuper-vised algorithms However we can not get the cardinalitiesof labeled images exactly in many real-word applicationsMoreover the value of 120572 varies depending on data sets It isstill an open problem how to select the optimal 120572 [16]

To evaluate the classification performance we defineclassification accuracy as the first measure Our CNMF

120572

6 Discrete Dynamics in Nature and Society

algorithm described in (11) and (12) provides a classificationlabel of each sample marked 119904

119894119897119894 Suppose 119883 = 119909

119894119899

119894=1is a

data set which consists of n training images For each samplelet 119897119894be the true class label provided by the image data set

More specifically if the image 119909119894is designated the 119897

119894th class

we evaluate it as a correct label and set 119904119894119897119894

= 1 Otherwiseit is counted as a false label and noted 119904

119894119897119894= 0 Eventually we

compute the percentage of correct labels obtained by definingthe accuracy measure as

Ac (119904119894119897119894) =

sum119899

119894=1119904119894119897119894

119899

(33)

To evaluate the classification performance we carryout computation about the normalized mutual informationwhich is used to measure how similar two sets of clusters areas the second measure Given two data sets of clusters 119862

119894and

119862119895 their normalized mutual information is defined as

NMI (119862119894 119862119895)

=

sum119888119894isin119862119894 119888119895isin119862119895

119901 (119888119894 119888119895) sdot log 119901 (119888

119894 119888119895) [119901 (119888

119894) 119901 (119888119895)]

max (119867 (119862119894) 119867 (119862

119895))

(34)

which takes values between 0 and 1 Where 119901(119888119894) 119901(119888

119895)

denote the probabilities that an image arbitrarily chosen fromthe data set belongs to the clusters 119862

119894and 119862

119895 respectively

and 119901(119888119894 119888119895) denotes the joint probability that this arbitrarily

selected image belongs to the cluster 119862119894as well as 119862

119895at the

same time119867(119862119894) and 119867(119862

119895) are the entropies of 119862

119894and 119862

119895

respectivelyExperimental results on each data set will be presented as

classification accuracy and the normalized mutual informa-tion is in Tables 1 and 2

51 ORL Database The Cambridge ORL Face Database has400 images for 40 different people 10 images per personThe images of some people are taken at different timesvarying lighting slightly facial expressions (openclosed eyessmilingnonsmiling) and facial details (glassesno glasses)All the images are taken against a dark homogeneous back-ground with the subjects in an upright frontal position andslight left-right out-of-plane rotation To locate the facesthe input images are preprocessed They are resized to 32 times

32 pixels with 256 gray levels per pixel and normalized inorientation so that two eyes in the facial areas are aligned atthe same position

There are 10 images for each category in ORL and 10percent is just one image For the fixed parameter 120572 (120572 =

05 1 2) we randomly choose two images from each categoryto provide the label information Note that the same labelis meaningless for (30) To obtain 119903(119896) we divide the 40categories into 3 groups 10 categories 20 categories and 10categories In the first 10 categories pick up 1 image fromeach category to provide the label information pick up 2images from each category in the second 20 categories andpick up 3 from each category in the remaining categoriesThe

dividing process is repeated 10 times and the obtained averageclassification accuracy is recorded as the final result

Figure 1 shows the graphical classification accuracy ratesand normalized mutual information on the ORL DatabaseNote that if the samples in the collection 119883 come fromthe same group we set 120572

119896= 067 Because of the same

number of samples in each category the variation of the120572 is small even though we label different cardinalities ofsamples Compared to the constrained nonnegative matrixfactorization algorithms with fixed parameters our CNMF

120572119896

algorithm gives the best performance since the selection of120572119896is suitable to the collection 119883 Table 1 summarizes the

detailed classification accuracy and error bars of CNMF120572=05

CNMF

120572=2 and CNMF

120572119896 It shows that our CNMF

120572119896algo-

rithm achieves 192 percent improvement compared to thebest reported CNMFKL algorithm (120572 = 1) [15] in averageclassification accuracy For normalized mutual informationthe details and the error bars of our constrained algorithmswith 120572 = 05 120572 = 2 and 120572

119896are listed in Table 2 Comparing

to the best algorithmCNMF ourCNMF120572119896algorithmachieves

054 percent improvement

52 Yale Database The Yale Database consists of 165grayscale images for 15 individuals 11 images per personOne per image is taken from different facial expression orconfiguration center-light wglasses happy left-light wnoglasses normal right-light sad sleepy surprised and winkWe preprocess all the images of Yale Database in the sameway as the ORL Database Each image is linearly stretched toa 1024-dimensional vector in image space

TheYaleDatabase also has the same number of samples ineach category To obtain an appropriate 120572 we do similar labelprocessing to the ORL Database Divide the 15 individualsinto 3 groups averagely choose 1 image from each categoryin the first group choose 2 from each category in the secondand choose 3 from each category in the remaining groupWe repeat the process 10 times and record the averageclassification accuracy as the final result

Figure 2 shows the classification accuracy and normal-ized mutual information on the Yale Database Set 120572

119896=

055 when 119903(119896) minus 119903(119873) = 0 It indicates that the samplesin the collection 119883 just come from two groups that ischoose 15 images from 10 categories in the first and secondgroup CNMF

120572119896achieves an extraordinary performance for

all the cases and CNMF120572=05

follows This suggests that theconstrained nonnegative matrix factorization algorithm hasa higher classification accuracy when the value of 120572 is closeto 120572119896 Comparing to the best reported CNMFKL algorithm

CNMF120572119896

achieves 242 percent improvement in averageclassification accuracy For normalized mutual informationCNMF

120572119896achieves 72 percent improvement compared to

CNMF The details of classification accuracy and normal-ized mutual information are provided in Tables 3 and 4which contain the error bars of CNMF

120572=05 CNMF

120572=2 and

CNMF120572119896

53 Caltech 101 Database The Caltech 101 Database createdby Caltech University has images of 101 different object

Discrete Dynamics in Nature and Society 7

Table 1 The comparison of classification accuracy rates on the ORL Database

119896

Classification accuracy Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321

Table 2 The comparison of normalized mutual information on the ORL Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027

categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram

We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572

119896computed by (31) are listed in Table 7

The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572

119896can fully reflect the effectiveness of the

indicator matrix 119878Figure 3 shows the effect that derived from the using of

proposed algorithm with 120572119896 The upper part of figure is the

original samples which contain 26 images the middle part

is their gray images and the lower is the combination of thebasis vectors learned by CNMF

120572119896

The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF

120572=05 CNMF

120572=2 and CNMF

120572119896 The graphical results

of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572

119896listed in Table 7were selected In general ourmethod

demonstrates much better effectiveness in classification bychoosing 120572

119896 Comparing to the best reported algorithm

other than our CNMF120572algorithm CNMF

120572119896achieves 24

percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF

120572

algorithm CNMF120572119896

achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF

120572119896achieves 229 percent improvement and

consistently outperforms the other algorithms

6 Conclusion

In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using

8 Discrete Dynamics in Nature and Society

Table 3 The comparison of classification accuracy rates on the Yale Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831

Table 4 The comparison of normalized mutual information on the Yale Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765

2 3 4 5 6 7 8 9 10076

078

08

082

084

086

088

09

092

094

Number of classes

Accu

racy

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy rates

2 3 4 5 6 7 8 9 10072

074

076

078

08

082

084

086

088

09

Nor

mal

ized

mut

ual i

nfor

mat

ion

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 1 Classification performance on the ORL Database

Discrete Dynamics in Nature and Society 9

2 3 4 5 6 7 8 9 1005

055

06

065

07

075

08

085

09Ac

cura

cy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 10042

044

046

048

05

052

054

056

058

06

062

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 2 Classification performance on the Yale Database

(a)

(b)

(c)

Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF

120572119896 Positive values are illustrated with white pixels

Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the

model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF

120572

algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total

10 Discrete Dynamics in Nature and Society

2 3 4 5 6 7 8 9 1004

045

05

055

06

065

07

075

08

085

09

Accu

racy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 1002

025

03

035

04

045

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 4 Classification performance on the Caltech 101 Database

Table 5 The comparison of classification accuracy rates on the Caltech 101 Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898

Table 6 The comparison of normalized mutual information on the Caltech 101 Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869

Discrete Dynamics in Nature and Society 11

Table 7 The value of 120572 in 119896 categories

119896 1205721

1205722

1205723

1205724

1205725

1205726

1205727

1205728

1205729

12057210

2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100

samples In the experiments we apply the fixed parameters120572 and 120572

119896to analyze the classification accuracy on three

image databasesThe experimental results have demonstratedthat the CNMF

120572119896algorithm has best classification accuracy

However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)

References

[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987

[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996

[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994

[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986

[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001

[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992

[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999

[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004

[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001

[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005

[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004

[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004

[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001

[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006

[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012

[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008

[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003

[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003

[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006

[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of

12 Discrete Dynamics in Nature and Society

IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003

[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006

[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004

[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008

[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006

[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006

[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006

[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006

[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997

[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982

[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml

[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant

keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article A Constrained Algorithm Based NMF for ...downloads.hindawi.com/journals/ddns/2014/179129.pdfResearch Article A Constrained Algorithm Based NMF for Image Representation

Discrete Dynamics in Nature and Society 5

Proof To guarantee the stability of 119865(119881) from Lemma 3 wejust need to obtain the minimum of 119866(119881 ) with respect to119881119894119895 Set the gradient of (20) to zero there is

120597119866 (119881 )

120597119881119894119895

=

1

120572

sum

119896

[119880119879

119882]119896119894

1 minus (

119882119896119894[119881119879

119880119879

]119894119895

119883119896119895120585119896119895119894

()

)

minus120572

= 0

(28)

Then it follows that

119881119894119895= 119894119895

[

[

[

sum119896[119880119879

119882]119896119894

(119883119896119895sum119897119882119896119897[119879

119880119879

]119897119895

)

120572

sum119896[119880119879119882]119896119894

]

]

]

1120572

(29)

which is similar to the form of the updating rule (11)Similarly to guarantee the updating rule (12) holds theminimum of 119866(119882 ) which can be determined by settingthe gradient of (26) to zero must exist

Since 119866 is an auxiliary function according to Lemma 4119865 in (21) is nonincreasing under the update (11) Multiplyingupdates (11) and (12) we can find a local minimum of119863120572[119883119882(119880119881)

119879

]

5 Experiments

In this section the CNMF120572

algorithm is systematicallycompared with the current constrained NMF algorithmson three image data sets named ORL Database [31] YaleDatabase [32] and Caltech 101 Database [33] The details ofthe above three databases will be described individually laterWe introduce the evaluated algorithms firstly

(i) Constrained nonnegative matrix factorization algo-rithm in [15] aims to minimize the F-norm cost

(ii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 05 in this paper aims tominimize the Hellinger divergence cost

(iii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 1 in [15] aiming at min-imizing the KL-divergence cost is the best reportedalgorithm in image representation

(iv) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 2 in this paper aims tominimize the 1205942-divergence cost

(v) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 120572

119896in this paper aims to

minimize the 120572-divergence cost where the param-eters 120572

119896are associated with the characteristics of

the image database and designed by the presentedmethod CNMFKL algorithm is a special case of ourCNMF

120572119896algorithm with 120572 = 120572

119896= 1

We apply these algorithms to a problem of classificationand evaluate their performance on three image data setswhich contain a number of different categories of image Foreach date set the evaluations are conducted with differentnumbers of clusters here the number of clusters 119896 varies from2 to 10We randomly choose 119896 categories fromone image dataset and mix the images of these 119896 categories as the collection119883 Then for the semisupervised algorithms we randomlypick up 10 percent images from each category in119883 and recordtheir category number as the available label information toobtain the label matrix119880 For some special data sets the labelprocess is different and we will describe the details later

Suppose a data set has 119873 categories 1198621 1198622 119862

119873 and

the cardinalities of these labeled images are 1198711 1198712 119871119873

respectively Since the label constraint matrix 119880 is composedof the indicatormatrix 119878 and the identity matrix 119868 the indica-tor matrix 119878 plays a critical role in classification performancefor different categories in119883 To determine the effectiveness of119878 we define a measure to represent the relationship betweenthe cardinalities of labeled samples and the total samples

119903 (119896) =

119871max (119896) minus 119871min (119896)

sum119896

119894=1119862119894

119896 (30)

where 119871max(119896) and 119871min(119896) denote the maximum and mini-mum labeled cardinalities of 119896 categories For the fixed clusternumber 119896 in the data set 119903(119896) is different if the number ofsamples in each category is different Then we compute 120572 asfollows

120572119896=

120572119873

119903 (119873)

( |119903 (119896) minus 119903 (119873)|

times (

1003816100381610038161003816100381610038161003816100381610038161003816

119871max (119896) 119896

sum119896

119894=1119862119894

minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

+

1003816100381610038161003816100381610038161003816100381610038161003816

119871min (119896) 119896

sum119896

119894=1119862119894

minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

)

minus1

)

(31)

where

120572119873= (119871max (119873) minus 119871min (119873))

times (

1003816100381610038161003816100381610038161003816100381610038161003816

119871max (119873) minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

+

1003816100381610038161003816100381610038161003816100381610038161003816

119871min (119873) minus

sum119873

119894=1119871119894

sum119873

119894=1119862119894

1003816100381610038161003816100381610038161003816100381610038161003816

)

minus1

(32)

The value of 120572119896computed by (31) is associated with the

characteristics of image data sets since its variation is causedby both the cardinalities of labeled samples in each categoryand the total samples We can obtain both the cardinalitiesof labeled samples and the total samples in our semisuper-vised algorithms However we can not get the cardinalitiesof labeled images exactly in many real-word applicationsMoreover the value of 120572 varies depending on data sets It isstill an open problem how to select the optimal 120572 [16]

To evaluate the classification performance we defineclassification accuracy as the first measure Our CNMF

120572

6 Discrete Dynamics in Nature and Society

algorithm described in (11) and (12) provides a classificationlabel of each sample marked 119904

119894119897119894 Suppose 119883 = 119909

119894119899

119894=1is a

data set which consists of n training images For each samplelet 119897119894be the true class label provided by the image data set

More specifically if the image 119909119894is designated the 119897

119894th class

we evaluate it as a correct label and set 119904119894119897119894

= 1 Otherwiseit is counted as a false label and noted 119904

119894119897119894= 0 Eventually we

compute the percentage of correct labels obtained by definingthe accuracy measure as

Ac (119904119894119897119894) =

sum119899

119894=1119904119894119897119894

119899

(33)

To evaluate the classification performance we carryout computation about the normalized mutual informationwhich is used to measure how similar two sets of clusters areas the second measure Given two data sets of clusters 119862

119894and

119862119895 their normalized mutual information is defined as

NMI (119862119894 119862119895)

=

sum119888119894isin119862119894 119888119895isin119862119895

119901 (119888119894 119888119895) sdot log 119901 (119888

119894 119888119895) [119901 (119888

119894) 119901 (119888119895)]

max (119867 (119862119894) 119867 (119862

119895))

(34)

which takes values between 0 and 1 Where 119901(119888119894) 119901(119888

119895)

denote the probabilities that an image arbitrarily chosen fromthe data set belongs to the clusters 119862

119894and 119862

119895 respectively

and 119901(119888119894 119888119895) denotes the joint probability that this arbitrarily

selected image belongs to the cluster 119862119894as well as 119862

119895at the

same time119867(119862119894) and 119867(119862

119895) are the entropies of 119862

119894and 119862

119895

respectivelyExperimental results on each data set will be presented as

classification accuracy and the normalized mutual informa-tion is in Tables 1 and 2

51 ORL Database The Cambridge ORL Face Database has400 images for 40 different people 10 images per personThe images of some people are taken at different timesvarying lighting slightly facial expressions (openclosed eyessmilingnonsmiling) and facial details (glassesno glasses)All the images are taken against a dark homogeneous back-ground with the subjects in an upright frontal position andslight left-right out-of-plane rotation To locate the facesthe input images are preprocessed They are resized to 32 times

32 pixels with 256 gray levels per pixel and normalized inorientation so that two eyes in the facial areas are aligned atthe same position

There are 10 images for each category in ORL and 10percent is just one image For the fixed parameter 120572 (120572 =

05 1 2) we randomly choose two images from each categoryto provide the label information Note that the same labelis meaningless for (30) To obtain 119903(119896) we divide the 40categories into 3 groups 10 categories 20 categories and 10categories In the first 10 categories pick up 1 image fromeach category to provide the label information pick up 2images from each category in the second 20 categories andpick up 3 from each category in the remaining categoriesThe

dividing process is repeated 10 times and the obtained averageclassification accuracy is recorded as the final result

Figure 1 shows the graphical classification accuracy ratesand normalized mutual information on the ORL DatabaseNote that if the samples in the collection 119883 come fromthe same group we set 120572

119896= 067 Because of the same

number of samples in each category the variation of the120572 is small even though we label different cardinalities ofsamples Compared to the constrained nonnegative matrixfactorization algorithms with fixed parameters our CNMF

120572119896

algorithm gives the best performance since the selection of120572119896is suitable to the collection 119883 Table 1 summarizes the

detailed classification accuracy and error bars of CNMF120572=05

CNMF

120572=2 and CNMF

120572119896 It shows that our CNMF

120572119896algo-

rithm achieves 192 percent improvement compared to thebest reported CNMFKL algorithm (120572 = 1) [15] in averageclassification accuracy For normalized mutual informationthe details and the error bars of our constrained algorithmswith 120572 = 05 120572 = 2 and 120572

119896are listed in Table 2 Comparing

to the best algorithmCNMF ourCNMF120572119896algorithmachieves

054 percent improvement

52 Yale Database The Yale Database consists of 165grayscale images for 15 individuals 11 images per personOne per image is taken from different facial expression orconfiguration center-light wglasses happy left-light wnoglasses normal right-light sad sleepy surprised and winkWe preprocess all the images of Yale Database in the sameway as the ORL Database Each image is linearly stretched toa 1024-dimensional vector in image space

TheYaleDatabase also has the same number of samples ineach category To obtain an appropriate 120572 we do similar labelprocessing to the ORL Database Divide the 15 individualsinto 3 groups averagely choose 1 image from each categoryin the first group choose 2 from each category in the secondand choose 3 from each category in the remaining groupWe repeat the process 10 times and record the averageclassification accuracy as the final result

Figure 2 shows the classification accuracy and normal-ized mutual information on the Yale Database Set 120572

119896=

055 when 119903(119896) minus 119903(119873) = 0 It indicates that the samplesin the collection 119883 just come from two groups that ischoose 15 images from 10 categories in the first and secondgroup CNMF

120572119896achieves an extraordinary performance for

all the cases and CNMF120572=05

follows This suggests that theconstrained nonnegative matrix factorization algorithm hasa higher classification accuracy when the value of 120572 is closeto 120572119896 Comparing to the best reported CNMFKL algorithm

CNMF120572119896

achieves 242 percent improvement in averageclassification accuracy For normalized mutual informationCNMF

120572119896achieves 72 percent improvement compared to

CNMF The details of classification accuracy and normal-ized mutual information are provided in Tables 3 and 4which contain the error bars of CNMF

120572=05 CNMF

120572=2 and

CNMF120572119896

53 Caltech 101 Database The Caltech 101 Database createdby Caltech University has images of 101 different object

Discrete Dynamics in Nature and Society 7

Table 1 The comparison of classification accuracy rates on the ORL Database

119896

Classification accuracy Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321

Table 2 The comparison of normalized mutual information on the ORL Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027

categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram

We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572

119896computed by (31) are listed in Table 7

The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572

119896can fully reflect the effectiveness of the

indicator matrix 119878Figure 3 shows the effect that derived from the using of

proposed algorithm with 120572119896 The upper part of figure is the

original samples which contain 26 images the middle part

is their gray images and the lower is the combination of thebasis vectors learned by CNMF

120572119896

The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF

120572=05 CNMF

120572=2 and CNMF

120572119896 The graphical results

of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572

119896listed in Table 7were selected In general ourmethod

demonstrates much better effectiveness in classification bychoosing 120572

119896 Comparing to the best reported algorithm

other than our CNMF120572algorithm CNMF

120572119896achieves 24

percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF

120572

algorithm CNMF120572119896

achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF

120572119896achieves 229 percent improvement and

consistently outperforms the other algorithms

6 Conclusion

In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using

8 Discrete Dynamics in Nature and Society

Table 3 The comparison of classification accuracy rates on the Yale Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831

Table 4 The comparison of normalized mutual information on the Yale Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765

2 3 4 5 6 7 8 9 10076

078

08

082

084

086

088

09

092

094

Number of classes

Accu

racy

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy rates

2 3 4 5 6 7 8 9 10072

074

076

078

08

082

084

086

088

09

Nor

mal

ized

mut

ual i

nfor

mat

ion

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 1 Classification performance on the ORL Database

Discrete Dynamics in Nature and Society 9

2 3 4 5 6 7 8 9 1005

055

06

065

07

075

08

085

09Ac

cura

cy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 10042

044

046

048

05

052

054

056

058

06

062

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 2 Classification performance on the Yale Database

(a)

(b)

(c)

Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF

120572119896 Positive values are illustrated with white pixels

Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the

model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF

120572

algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total

10 Discrete Dynamics in Nature and Society

2 3 4 5 6 7 8 9 1004

045

05

055

06

065

07

075

08

085

09

Accu

racy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 1002

025

03

035

04

045

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 4 Classification performance on the Caltech 101 Database

Table 5 The comparison of classification accuracy rates on the Caltech 101 Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898

Table 6 The comparison of normalized mutual information on the Caltech 101 Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869

Discrete Dynamics in Nature and Society 11

Table 7 The value of 120572 in 119896 categories

119896 1205721

1205722

1205723

1205724

1205725

1205726

1205727

1205728

1205729

12057210

2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100

samples In the experiments we apply the fixed parameters120572 and 120572

119896to analyze the classification accuracy on three

image databasesThe experimental results have demonstratedthat the CNMF

120572119896algorithm has best classification accuracy

However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)

References

[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987

[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996

[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994

[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986

[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001

[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992

[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999

[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004

[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001

[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005

[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004

[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004

[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001

[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006

[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012

[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008

[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003

[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003

[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006

[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of

12 Discrete Dynamics in Nature and Society

IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003

[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006

[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004

[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008

[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006

[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006

[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006

[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006

[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997

[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982

[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml

[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant

keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article A Constrained Algorithm Based NMF for ...downloads.hindawi.com/journals/ddns/2014/179129.pdfResearch Article A Constrained Algorithm Based NMF for Image Representation

6 Discrete Dynamics in Nature and Society

algorithm described in (11) and (12) provides a classificationlabel of each sample marked 119904

119894119897119894 Suppose 119883 = 119909

119894119899

119894=1is a

data set which consists of n training images For each samplelet 119897119894be the true class label provided by the image data set

More specifically if the image 119909119894is designated the 119897

119894th class

we evaluate it as a correct label and set 119904119894119897119894

= 1 Otherwiseit is counted as a false label and noted 119904

119894119897119894= 0 Eventually we

compute the percentage of correct labels obtained by definingthe accuracy measure as

Ac (119904119894119897119894) =

sum119899

119894=1119904119894119897119894

119899

(33)

To evaluate the classification performance we carryout computation about the normalized mutual informationwhich is used to measure how similar two sets of clusters areas the second measure Given two data sets of clusters 119862

119894and

119862119895 their normalized mutual information is defined as

NMI (119862119894 119862119895)

=

sum119888119894isin119862119894 119888119895isin119862119895

119901 (119888119894 119888119895) sdot log 119901 (119888

119894 119888119895) [119901 (119888

119894) 119901 (119888119895)]

max (119867 (119862119894) 119867 (119862

119895))

(34)

which takes values between 0 and 1 Where 119901(119888119894) 119901(119888

119895)

denote the probabilities that an image arbitrarily chosen fromthe data set belongs to the clusters 119862

119894and 119862

119895 respectively

and 119901(119888119894 119888119895) denotes the joint probability that this arbitrarily

selected image belongs to the cluster 119862119894as well as 119862

119895at the

same time119867(119862119894) and 119867(119862

119895) are the entropies of 119862

119894and 119862

119895

respectivelyExperimental results on each data set will be presented as

classification accuracy and the normalized mutual informa-tion is in Tables 1 and 2

51 ORL Database The Cambridge ORL Face Database has400 images for 40 different people 10 images per personThe images of some people are taken at different timesvarying lighting slightly facial expressions (openclosed eyessmilingnonsmiling) and facial details (glassesno glasses)All the images are taken against a dark homogeneous back-ground with the subjects in an upright frontal position andslight left-right out-of-plane rotation To locate the facesthe input images are preprocessed They are resized to 32 times

32 pixels with 256 gray levels per pixel and normalized inorientation so that two eyes in the facial areas are aligned atthe same position

There are 10 images for each category in ORL and 10percent is just one image For the fixed parameter 120572 (120572 =

05 1 2) we randomly choose two images from each categoryto provide the label information Note that the same labelis meaningless for (30) To obtain 119903(119896) we divide the 40categories into 3 groups 10 categories 20 categories and 10categories In the first 10 categories pick up 1 image fromeach category to provide the label information pick up 2images from each category in the second 20 categories andpick up 3 from each category in the remaining categoriesThe

dividing process is repeated 10 times and the obtained averageclassification accuracy is recorded as the final result

Figure 1 shows the graphical classification accuracy ratesand normalized mutual information on the ORL DatabaseNote that if the samples in the collection 119883 come fromthe same group we set 120572

119896= 067 Because of the same

number of samples in each category the variation of the120572 is small even though we label different cardinalities ofsamples Compared to the constrained nonnegative matrixfactorization algorithms with fixed parameters our CNMF

120572119896

algorithm gives the best performance since the selection of120572119896is suitable to the collection 119883 Table 1 summarizes the

detailed classification accuracy and error bars of CNMF120572=05

CNMF

120572=2 and CNMF

120572119896 It shows that our CNMF

120572119896algo-

rithm achieves 192 percent improvement compared to thebest reported CNMFKL algorithm (120572 = 1) [15] in averageclassification accuracy For normalized mutual informationthe details and the error bars of our constrained algorithmswith 120572 = 05 120572 = 2 and 120572

119896are listed in Table 2 Comparing

to the best algorithmCNMF ourCNMF120572119896algorithmachieves

054 percent improvement

52 Yale Database The Yale Database consists of 165grayscale images for 15 individuals 11 images per personOne per image is taken from different facial expression orconfiguration center-light wglasses happy left-light wnoglasses normal right-light sad sleepy surprised and winkWe preprocess all the images of Yale Database in the sameway as the ORL Database Each image is linearly stretched toa 1024-dimensional vector in image space

TheYaleDatabase also has the same number of samples ineach category To obtain an appropriate 120572 we do similar labelprocessing to the ORL Database Divide the 15 individualsinto 3 groups averagely choose 1 image from each categoryin the first group choose 2 from each category in the secondand choose 3 from each category in the remaining groupWe repeat the process 10 times and record the averageclassification accuracy as the final result

Figure 2 shows the classification accuracy and normal-ized mutual information on the Yale Database Set 120572

119896=

055 when 119903(119896) minus 119903(119873) = 0 It indicates that the samplesin the collection 119883 just come from two groups that ischoose 15 images from 10 categories in the first and secondgroup CNMF

120572119896achieves an extraordinary performance for

all the cases and CNMF120572=05

follows This suggests that theconstrained nonnegative matrix factorization algorithm hasa higher classification accuracy when the value of 120572 is closeto 120572119896 Comparing to the best reported CNMFKL algorithm

CNMF120572119896

achieves 242 percent improvement in averageclassification accuracy For normalized mutual informationCNMF

120572119896achieves 72 percent improvement compared to

CNMF The details of classification accuracy and normal-ized mutual information are provided in Tables 3 and 4which contain the error bars of CNMF

120572=05 CNMF

120572=2 and

CNMF120572119896

53 Caltech 101 Database The Caltech 101 Database createdby Caltech University has images of 101 different object

Discrete Dynamics in Nature and Society 7

Table 1 The comparison of classification accuracy rates on the ORL Database

119896

Classification accuracy Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321

Table 2 The comparison of normalized mutual information on the ORL Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027

categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram

We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572

119896computed by (31) are listed in Table 7

The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572

119896can fully reflect the effectiveness of the

indicator matrix 119878Figure 3 shows the effect that derived from the using of

proposed algorithm with 120572119896 The upper part of figure is the

original samples which contain 26 images the middle part

is their gray images and the lower is the combination of thebasis vectors learned by CNMF

120572119896

The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF

120572=05 CNMF

120572=2 and CNMF

120572119896 The graphical results

of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572

119896listed in Table 7were selected In general ourmethod

demonstrates much better effectiveness in classification bychoosing 120572

119896 Comparing to the best reported algorithm

other than our CNMF120572algorithm CNMF

120572119896achieves 24

percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF

120572

algorithm CNMF120572119896

achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF

120572119896achieves 229 percent improvement and

consistently outperforms the other algorithms

6 Conclusion

In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using

8 Discrete Dynamics in Nature and Society

Table 3 The comparison of classification accuracy rates on the Yale Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831

Table 4 The comparison of normalized mutual information on the Yale Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765

2 3 4 5 6 7 8 9 10076

078

08

082

084

086

088

09

092

094

Number of classes

Accu

racy

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy rates

2 3 4 5 6 7 8 9 10072

074

076

078

08

082

084

086

088

09

Nor

mal

ized

mut

ual i

nfor

mat

ion

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 1 Classification performance on the ORL Database

Discrete Dynamics in Nature and Society 9

2 3 4 5 6 7 8 9 1005

055

06

065

07

075

08

085

09Ac

cura

cy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 10042

044

046

048

05

052

054

056

058

06

062

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 2 Classification performance on the Yale Database

(a)

(b)

(c)

Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF

120572119896 Positive values are illustrated with white pixels

Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the

model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF

120572

algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total

10 Discrete Dynamics in Nature and Society

2 3 4 5 6 7 8 9 1004

045

05

055

06

065

07

075

08

085

09

Accu

racy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 1002

025

03

035

04

045

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 4 Classification performance on the Caltech 101 Database

Table 5 The comparison of classification accuracy rates on the Caltech 101 Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898

Table 6 The comparison of normalized mutual information on the Caltech 101 Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869

Discrete Dynamics in Nature and Society 11

Table 7 The value of 120572 in 119896 categories

119896 1205721

1205722

1205723

1205724

1205725

1205726

1205727

1205728

1205729

12057210

2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100

samples In the experiments we apply the fixed parameters120572 and 120572

119896to analyze the classification accuracy on three

image databasesThe experimental results have demonstratedthat the CNMF

120572119896algorithm has best classification accuracy

However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)

References

[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987

[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996

[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994

[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986

[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001

[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992

[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999

[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004

[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001

[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005

[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004

[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004

[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001

[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006

[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012

[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008

[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003

[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003

[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006

[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of

12 Discrete Dynamics in Nature and Society

IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003

[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006

[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004

[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008

[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006

[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006

[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006

[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006

[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997

[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982

[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml

[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant

keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article A Constrained Algorithm Based NMF for ...downloads.hindawi.com/journals/ddns/2014/179129.pdfResearch Article A Constrained Algorithm Based NMF for Image Representation

Discrete Dynamics in Nature and Society 7

Table 1 The comparison of classification accuracy rates on the ORL Database

119896

Classification accuracy Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321

Table 2 The comparison of normalized mutual information on the ORL Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027

categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram

We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572

119896computed by (31) are listed in Table 7

The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572

119896can fully reflect the effectiveness of the

indicator matrix 119878Figure 3 shows the effect that derived from the using of

proposed algorithm with 120572119896 The upper part of figure is the

original samples which contain 26 images the middle part

is their gray images and the lower is the combination of thebasis vectors learned by CNMF

120572119896

The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF

120572=05 CNMF

120572=2 and CNMF

120572119896 The graphical results

of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572

119896listed in Table 7were selected In general ourmethod

demonstrates much better effectiveness in classification bychoosing 120572

119896 Comparing to the best reported algorithm

other than our CNMF120572algorithm CNMF

120572119896achieves 24

percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF

120572

algorithm CNMF120572119896

achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF

120572119896achieves 229 percent improvement and

consistently outperforms the other algorithms

6 Conclusion

In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using

8 Discrete Dynamics in Nature and Society

Table 3 The comparison of classification accuracy rates on the Yale Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831

Table 4 The comparison of normalized mutual information on the Yale Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765

2 3 4 5 6 7 8 9 10076

078

08

082

084

086

088

09

092

094

Number of classes

Accu

racy

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy rates

2 3 4 5 6 7 8 9 10072

074

076

078

08

082

084

086

088

09

Nor

mal

ized

mut

ual i

nfor

mat

ion

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 1 Classification performance on the ORL Database

Discrete Dynamics in Nature and Society 9

2 3 4 5 6 7 8 9 1005

055

06

065

07

075

08

085

09Ac

cura

cy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 10042

044

046

048

05

052

054

056

058

06

062

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 2 Classification performance on the Yale Database

(a)

(b)

(c)

Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF

120572119896 Positive values are illustrated with white pixels

Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the

model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF

120572

algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total

10 Discrete Dynamics in Nature and Society

2 3 4 5 6 7 8 9 1004

045

05

055

06

065

07

075

08

085

09

Accu

racy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 1002

025

03

035

04

045

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 4 Classification performance on the Caltech 101 Database

Table 5 The comparison of classification accuracy rates on the Caltech 101 Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898

Table 6 The comparison of normalized mutual information on the Caltech 101 Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869

Discrete Dynamics in Nature and Society 11

Table 7 The value of 120572 in 119896 categories

119896 1205721

1205722

1205723

1205724

1205725

1205726

1205727

1205728

1205729

12057210

2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100

samples In the experiments we apply the fixed parameters120572 and 120572

119896to analyze the classification accuracy on three

image databasesThe experimental results have demonstratedthat the CNMF

120572119896algorithm has best classification accuracy

However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)

References

[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987

[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996

[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994

[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986

[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001

[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992

[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999

[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004

[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001

[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005

[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004

[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004

[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001

[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006

[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012

[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008

[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003

[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003

[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006

[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of

12 Discrete Dynamics in Nature and Society

IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003

[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006

[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004

[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008

[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006

[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006

[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006

[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006

[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997

[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982

[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml

[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant

keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article A Constrained Algorithm Based NMF for ...downloads.hindawi.com/journals/ddns/2014/179129.pdfResearch Article A Constrained Algorithm Based NMF for Image Representation

8 Discrete Dynamics in Nature and Society

Table 3 The comparison of classification accuracy rates on the Yale Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831

Table 4 The comparison of normalized mutual information on the Yale Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765

2 3 4 5 6 7 8 9 10076

078

08

082

084

086

088

09

092

094

Number of classes

Accu

racy

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy rates

2 3 4 5 6 7 8 9 10072

074

076

078

08

082

084

086

088

09

Nor

mal

ized

mut

ual i

nfor

mat

ion

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 1 Classification performance on the ORL Database

Discrete Dynamics in Nature and Society 9

2 3 4 5 6 7 8 9 1005

055

06

065

07

075

08

085

09Ac

cura

cy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 10042

044

046

048

05

052

054

056

058

06

062

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 2 Classification performance on the Yale Database

(a)

(b)

(c)

Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF

120572119896 Positive values are illustrated with white pixels

Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the

model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF

120572

algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total

10 Discrete Dynamics in Nature and Society

2 3 4 5 6 7 8 9 1004

045

05

055

06

065

07

075

08

085

09

Accu

racy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 1002

025

03

035

04

045

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 4 Classification performance on the Caltech 101 Database

Table 5 The comparison of classification accuracy rates on the Caltech 101 Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898

Table 6 The comparison of normalized mutual information on the Caltech 101 Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869

Discrete Dynamics in Nature and Society 11

Table 7 The value of 120572 in 119896 categories

119896 1205721

1205722

1205723

1205724

1205725

1205726

1205727

1205728

1205729

12057210

2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100

samples In the experiments we apply the fixed parameters120572 and 120572

119896to analyze the classification accuracy on three

image databasesThe experimental results have demonstratedthat the CNMF

120572119896algorithm has best classification accuracy

However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)

References

[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987

[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996

[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994

[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986

[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001

[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992

[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999

[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004

[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001

[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005

[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004

[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004

[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001

[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006

[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012

[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008

[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003

[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003

[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006

[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of

12 Discrete Dynamics in Nature and Society

IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003

[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006

[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004

[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008

[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006

[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006

[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006

[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006

[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997

[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982

[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml

[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant

keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article A Constrained Algorithm Based NMF for ...downloads.hindawi.com/journals/ddns/2014/179129.pdfResearch Article A Constrained Algorithm Based NMF for Image Representation

Discrete Dynamics in Nature and Society 9

2 3 4 5 6 7 8 9 1005

055

06

065

07

075

08

085

09Ac

cura

cy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 10042

044

046

048

05

052

054

056

058

06

062

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 2 Classification performance on the Yale Database

(a)

(b)

(c)

Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF

120572119896 Positive values are illustrated with white pixels

Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the

model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF

120572

algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total

10 Discrete Dynamics in Nature and Society

2 3 4 5 6 7 8 9 1004

045

05

055

06

065

07

075

08

085

09

Accu

racy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 1002

025

03

035

04

045

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 4 Classification performance on the Caltech 101 Database

Table 5 The comparison of classification accuracy rates on the Caltech 101 Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898

Table 6 The comparison of normalized mutual information on the Caltech 101 Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869

Discrete Dynamics in Nature and Society 11

Table 7 The value of 120572 in 119896 categories

119896 1205721

1205722

1205723

1205724

1205725

1205726

1205727

1205728

1205729

12057210

2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100

samples In the experiments we apply the fixed parameters120572 and 120572

119896to analyze the classification accuracy on three

image databasesThe experimental results have demonstratedthat the CNMF

120572119896algorithm has best classification accuracy

However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)

References

[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987

[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996

[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994

[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986

[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001

[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992

[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999

[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004

[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001

[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005

[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004

[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004

[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001

[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006

[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012

[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008

[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003

[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003

[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006

[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of

12 Discrete Dynamics in Nature and Society

IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003

[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006

[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004

[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008

[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006

[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006

[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006

[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006

[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997

[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982

[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml

[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant

keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article A Constrained Algorithm Based NMF for ...downloads.hindawi.com/journals/ddns/2014/179129.pdfResearch Article A Constrained Algorithm Based NMF for Image Representation

10 Discrete Dynamics in Nature and Society

2 3 4 5 6 7 8 9 1004

045

05

055

06

065

07

075

08

085

09

Accu

racy

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(a) Classification accuracy ratesN

orm

aliz

ed m

utua

l inf

orm

atio

n

2 3 4 5 6 7 8 9 1002

025

03

035

04

045

Number of classes

CNMFCNMF

120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

(b) Normalized mutual information

Figure 4 Classification performance on the Caltech 101 Database

Table 5 The comparison of classification accuracy rates on the Caltech 101 Database

119896

Ac(119904119894119897119894) ()

CNMF CNMF120572=1

CNMF120572=05

CNMF120572=2

CNMF120572119896

2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898

Table 6 The comparison of normalized mutual information on the Caltech 101 Database

119896

Normalized mutual information ()CNMF CNMF

120572=1CNMF

120572=05CNMF

120572=2CNMF

120572119896

2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869

Discrete Dynamics in Nature and Society 11

Table 7 The value of 120572 in 119896 categories

119896 1205721

1205722

1205723

1205724

1205725

1205726

1205727

1205728

1205729

12057210

2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100

samples In the experiments we apply the fixed parameters120572 and 120572

119896to analyze the classification accuracy on three

image databasesThe experimental results have demonstratedthat the CNMF

120572119896algorithm has best classification accuracy

However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)

References

[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987

[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996

[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994

[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986

[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001

[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992

[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999

[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004

[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001

[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005

[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004

[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004

[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001

[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006

[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012

[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008

[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003

[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003

[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006

[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of

12 Discrete Dynamics in Nature and Society

IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003

[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006

[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004

[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008

[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006

[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006

[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006

[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006

[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997

[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982

[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml

[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant

keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article A Constrained Algorithm Based NMF for ...downloads.hindawi.com/journals/ddns/2014/179129.pdfResearch Article A Constrained Algorithm Based NMF for Image Representation

Discrete Dynamics in Nature and Society 11

Table 7 The value of 120572 in 119896 categories

119896 1205721

1205722

1205723

1205724

1205725

1205726

1205727

1205728

1205729

12057210

2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100

samples In the experiments we apply the fixed parameters120572 and 120572

119896to analyze the classification accuracy on three

image databasesThe experimental results have demonstratedthat the CNMF

120572119896algorithm has best classification accuracy

However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)

References

[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987

[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996

[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994

[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986

[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001

[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992

[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999

[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004

[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001

[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005

[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004

[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004

[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001

[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006

[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012

[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008

[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003

[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003

[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006

[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of

12 Discrete Dynamics in Nature and Society

IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003

[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006

[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004

[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008

[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006

[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006

[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006

[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006

[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997

[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982

[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml

[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant

keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article A Constrained Algorithm Based NMF for ...downloads.hindawi.com/journals/ddns/2014/179129.pdfResearch Article A Constrained Algorithm Based NMF for Image Representation

12 Discrete Dynamics in Nature and Society

IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003

[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006

[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004

[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008

[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006

[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006

[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006

[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006

[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977

[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997

[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982

[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml

[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant

keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 13: Research Article A Constrained Algorithm Based NMF for ...downloads.hindawi.com/journals/ddns/2014/179129.pdfResearch Article A Constrained Algorithm Based NMF for Image Representation

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of