research article a constrained algorithm based nmf for...
TRANSCRIPT
Research ArticleA Constrained Algorithm Based NMF
120572for Image Representation
Chenxue Yang1 Tao Li1 Mao Ye1 Zijian Liu23 and Jiao Bao1
1 School of Computer Science and Engineering University of Electronic Science and Technology of China Chengdu 611731 China2 School of Science Chongqing Jiaotong University Chongqing 400074 China3Department of Mathematics Hangzhou Normal University Hangzhou 310036 China
Correspondence should be addressed to Mao Ye cvlabuestcgmailcom
Received 29 March 2014 Revised 2 October 2014 Accepted 3 October 2014 Published 22 October 2014
Academic Editor Zhan Zhou
Copyright copy 2014 Chenxue Yang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Nonnegative matrix factorization (NMF) is a useful tool in learning a basic representation of image data However its performanceand applicability in real scenarios are limited because of the lack of image information In this paper we propose a constrainedmatrix decomposition algorithm for image representation which contains parameters associated with the characteristics of imagedata sets Particularly we impose label information as additional hard constraints to the 120572-divergence-NMF unsupervised learningalgorithmThe resulted algorithm is derived by using Karush-Kuhn-Tucker (KKT) conditions as well as the projected gradient andits monotonic local convergence is proved by using auxiliary functions In addition we provide a method to select the parametersto our semisupervised matrix decomposition algorithm in the experiment Compared with the state-of-the-art approaches ourmethod with the parameters has the best classification accuracy on three image data sets
1 Introduction
Learning an efficient representation of image information isa key problem in machine learning and computer visionEfficiency of the representation refers to the ability to capturesignificant information from a high dimensional image spaceSuch a high dimensional problem is difficult to manipulateand compute therefore dimension reduction becomes thecrucialmethod to copewith this problem Fortunatelymatrixfactorization is a valid approach to solve the dimensionreduction problem and it has a long and successful historyin dealing with image representation [1ndash3] Some methods ofmatrix factorization can be referred to as principal compo-nent analysis (PCA) [4] singular value decomposition (SVD)[5] vector quantization (VQ) [6] and nonnegative matrixfactorization (NMF) [7]
Among all techniques for matrix factorization NMF isdistinguished from others by its use of nonnegative con-straints in learning a basis representation of image data [8]and has been applied in face recognition [9ndash11] medicalimaging [12 13] electroencephalogram (EEG) classifica-tion for brain computer interface [14] and many otherareas However NMF is an unsupervised learning algorithm
and inapplicable to learning a basic representation from lim-ited image informationThus to make up for this deficiencyextra constraints are implicitly or explicitly incorporated intoNMF to derive some semisupervised matrix decompositionalgorithms In [15] the authors impose label information asadditional hard constraints to NMF based on the squaredEuclidean distance and Kulback-Leibler divergence Sucha representation encodes the data points from the sameclass using the indicator matrix in a new representationspace where the obtained part-based representation is morediscriminating
However none of the semisupervised NMF algorithmsmentioned above contain parameters associated with thecharacteristics of image data sets In this paper we introduce120572-divergence-NMF algorithm [16] where 120572 is a parameterWe impose the labeled constraints to the 120572-divergence-NMFalgorithm to derive a generic constraint matrix decomposi-tion algorithm which includes some existing algorithms astheir special cases one of them is CNMFKL [15] with 120572 =
1 Then we obtain the proposed algorithm using Karush-Kuhn-Tucker (KKT)method as well as the projected gradientmethod and prove its monotonic local convergence using anauxiliary function Comparing to the current semisupervised
Hindawi Publishing CorporationDiscrete Dynamics in Nature and SocietyVolume 2014 Article ID 179129 12 pageshttpdxdoiorg1011552014179129
2 Discrete Dynamics in Nature and Society
NMF algorithms we analyze the classification accuracy fortwo fixed values of 120572 (120572 = 05 2) on three image data sets
The CNMF120572algorithm does not work well for a fixed
value of 120572 Since the parameter 120572 is associated with thecharacteristics of a learning machine the model distributionis more inclusive when 120572 goes to +infin and is more exclusivewhen 120572 approaches to minusinfinThe selection of the optimal valueof 120572 plays a critical role in determining the discriminativebasis vectors In this paper we provide a method to selectparameters 120572 for our semisupervised CNMF
120572algorithmThe
variation of 120572 is associated with the characteristics of imagedata sets Compared with the algorithms in [15 16] ouralgorithm is more complete and systemic
The rest of the paper is organized as follows In Section 2we make a brief overview on standard NMF algorithm andconstraint NMF algorithm The detailed algorithms withlabeled constraints and theoretical proof of the convergenceof the algorithms are provided in Sections 3 and 4 separatelySection 5 presents some experimental results to show theadvantages of our algorithm Finally a conclusion is given inSection 6
2 Related Work
NMF proposed by Lee and Seung [7] is considered to providea part-based representation and applied in diverse examplesof nonnegative data [17ndash21] including text data miningsubsystem identification spectral data analysis audio andsound processing and document clustering
Suppose 119883 = [1199091 119909
119899] isin 119877
119898times119899 is a set of 119899 trainingimages where 119909
119905119899
119905=1is a column vector and consists of 119898
nonnegative pixel values of a training image NMF is to findtwo nonnegative matrix factors 119882 isin 119877
119898times119903 and 119867 isin 119877119899times119903 to
approximate the original image matrix
119883 asymp 119882119867119879
(1)
where the positive integer 119903 is smaller than119898 or 119899NMF uses nonnegative constraints to make the rep-
resentation purely adapted to an unsupervised way It isinapplicable to learn a basis representation to the limitedimage information To make up for this deficiency extraconstraints such as locality [22] sparseness [9] and orthog-onality [23] were implicitly or explicitly incorporated intoNMF to identify better local features or provide more sparserepresentation
In [15] the authors impose label information as additionalhard constraints to NMF unsupervised learning algorithmto derive a semisupervised matrix decomposition algorithmwhich makes the obtained representation more discriminat-ing The label information is incorporated as follows
Suppose 119883 = 119909119894119899
119894=1is a data set which consists of n
training images Set that the first 119904 images 1199091 119909
119904 (119904 le 119899)
are represented by the label information and the remaining119899 minus 119904 images 119909
119904+1 119909
119899 (119904 le 119899) are represented by the
unlabeled Assume there exist 119888 classes and each image from
1199091 119909
119904 is designated one class Then we have an 119904 times 119888
indicator matrix 119878 which can be represented as
119904119894119895=
1 if119909119894is designated the 119895th class
0 otherwise(2)
From the indicator matrix 119878 a label constraint matrix 119880 canbe defined as
119880 = (
119878119904times119888
0
0 119868119899minus119904
) (3)
where 119868119899minus119904
denotes an (119899 minus 119904) times (119899 minus 119904) identity matrixImposing the label information as additional hard con-
straint by an auxiliary matrix 119881 there is 119867 = 119880119881 Itverifies that ℎ
119894= ℎ119895if 119909119894and 119909
119895have the same labels With
the label constraints the standard NMF is transformed intofactorizing a large-size matrix 119883 into the product of threesmall-size matrices119882 119880 and 119881
119883 asymp 119882(119880119881)119879
(4)
Such a representation encodes the data points from the sameclass using the indicatormatrix in a new representation space
3 A Constrained Algorithm Based NMF120572
The exact form of the error measure of (1) is as crucialas the nonnegative constraints in the success of the NMFalgorithm in learning a useful representation of image dataIn the researches on NMF there are quite a large numberof investigations for error measure such as Csiszars f-divergences [24] Amaris 120572-divergence [25] and Bregmandivergences [26] Here we introduce a genetic multiplicativeupdating algorithm [16] which iteratively minimizes the 120572-divergence between119883 and119882119867
119879 We define the 120572-divergenceas119863120572[119883119882119867
119879
]
=
1
120572 (1 minus 120572)
119898
sum
119894=1
119899
sum
119895=1
120572119883119894119895+ (1 minus 120572) [119882119867
119879
]119894119895
minus 119883120572
119894119895[119882119867119879
]
1minus120572
119894119895
(5)
where 120572 is a positive parameter We combine the labeled con-straints with (4) to derive the following objective functionwhich is based on the 120572-divergence between119883 and119882(119880119881)
119879
119863120572[119883119882(119880119881)
119879
]
=
1
120572 (1 minus 120572)
times
119898
sum
119894=1
119899
sum
119895=1
120572119883119894119895+ (1 minus 120572) [119882119881
119879
119880119879
]119894119895
minus 119883120572
119894119895[119882119881119879
119880119879
]
1minus120572
119894119895
(6)
With the constraints 119882119894119895
ge 0 119880119894119895
ge 0 and 119881119894119895
ge 0 theminimization of119863
120572[119883119882(119880119881)
119879
] can be formulated as a con-strained minimization problem with inequality constraintsIn the following we will present two methods to find a localminimum of (6)
Discrete Dynamics in Nature and Society 3
31 KKT Method Let 120582119894119895ge 0 and 120583
119894119895ge 0 be the Lagrangian
multipliers associated with constraints 119881119894119895
ge 0 and 119882119894119895
ge
0 respectively The Karush-Kuhn-Tucker conditions requirethat both the optimality conditions
120597119863120572[119883119882(119880119881)
119879
]
120597119881119894119895
=
1
120572
sum
119896
[119880119879
119882]119896119894
minussum
119896
[119880119879
119882]119896119894
(
119883119896119895
[119882119881119879119880119879]119896119895
)
120572
= 120582119894119895
(7)
120597119863120572[119883119882(119880119881)
119879
]
120597119882119894119895
=
1
120572
sum
119896
[119881119879
119880119879
]119895119896
minussum
119896
[119881119879
119880119879
]119895119896
(
119883119894119896
[119882119881119879119880119879]119894119896
)
120572
= 120583119894119895
(8)
and the complementary slackness conditions
120582119894119895119881119894119895= 0
120583119894119895119882119894119895= 0
(9)
are satisfied If 119881119894119895= 0 and 119882
119894119895= 0 then either 120582
119894119895120583119894119895can
have any values At the same time if 120582119894119895= 0 and 120583
119894119895= 0 then
119881119894119895ge 0 and119882
119894119895ge 0 Hence we need both 120582
119894119895= 0 120583119894119895
= 0 and119881119894119895
= 0 119882119894119895
= 0 It follows from (9) that
120582119894119895119881120572
119894119895= 0
120583119894119895119882120572
119894119895= 0
(10)
We multiply both sides of (7) and (8) by 119881119894119895
and 119882119894119895
respectively and incorporate with (10) and then we obtainthe following updating rules
119881119894119895larr997888 119881
119894119895
[
[
[
sum119896[119880119879
119882]119896119894
(119883119896119895[119882119881
119879
119880119879
]119896119895
)
120572
sum119897[119880119879119882]119897119894
]
]
]
1120572
(11)
119882119894119895larr997888 119882
119894119895
[
[
sum119896[119881119879
119880119879
]119895119896
(119883119894119896[119882119881
119879
119880119879
]119894119896
)
120572
sum119897[119881119879119880119879]119895119897
]
]
1120572
(12)
32 Projected Gradient Method Considering the gradientdescent algorithm [24 25] the updating rules for the objec-tive function (6) can be also derived by using the projectedgradient method [27] and have the form
120601 (119881119894119895) larr997888 120601 (119881
119894119895) + 120575119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119881119894119895
120601 (119882119894119895) larr997888 120601 (119882
119894119895) + 120574119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119882119894119895
(13)
where 120601(sdot) is a suitably chosen function and 120575119894119895and 120574119894119895are two
parameters to control the step size of gradient descent Thenwe have
119881119894119895larr997888 120601
minus1
(120601 (119881119894119895) + 120575119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119881119894119895
)
119882119894119895larr997888 120601
minus1
(120601 (119882119894119895) + 120574119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119882119894119895
)
(14)
Setting 120601(Ω) = Ω120572 to guarantee that the updating rules (11)
and (12) hold we need
120575119894119895= minus
120572119881120572
119894119895
sum119896[119880119879119882]119896119894
120574119894119895= minus
120572119882120572
119894119895
sum119896[119881119879119880119879]119895119896
(15)
From (7) and (8) the updating rules become
119881120572
119894119895larr997888 119881
120572
119894119895+ 120575119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119881119894119895
= 119881120572
119894119895
sum119896[119880119879
119882]119896119894
(119883119896119895[119882119881
119879
119880119879
]119896119895
)
120572
sum119897[119880119879119882]119897119894
119882120572
119894119895larr997888 119882
120572
119894119895+ 120574119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119882119894119895
= 119882120572
119894119895
sum119896[119881119879
119880119879
]119895119896
(119883119894119896[119882119881
119879
119880119879
]119894119896
)
120572
sum119897[119881119879119880119879]119895119897
(16)
which are the same as the updating rules (11) and (12)We have shown that the algorithm can be derived usingKarush-Kuhn-Tucker conditions and presented alternative ofthe algorithm using the projected gradient The use of thetwo methods guarantees the correctness of the algorithmtheoretically
In the following we will give a theorem to guarantee theconvergence of the iterations in updates (11) and (12)
Theorem 1 For the objective function (6) 119863120572[119883119882(119880119881)
119879
]
is nonincreasing under the updating rules (11) and (12) Theobjective function 119863
120572is invariant under these updates if and
only if 119881 and119882 are at a stationary point
Multiplicative updates for our constrained algorithmbased NMF
120572are given in (11) and (12) These updates find
a local minimum of 119863120572[119883119882(119880119881)
119879
] which is the finalsolution of (11) and (12) Note that when 120572 = 1 the updates(11) and (12) are the samewithCNMFKL algorithm [15] whichis a special case included in our generic constraint matrixdecomposition algorithm In the following we will give theproof of Theorem 1
4 Convergence Analysis
To proveTheorem 1 wewillmake use of an auxiliary functionthatwas used in the expectation-maximization algorithm [2829]
4 Discrete Dynamics in Nature and Society
Definition 2 A function 119866(119909 119909) is defined as an auxiliaryfunction for 119865(119909) if the following two conditions are bothsatisfied
119866 (119909 119909) = 119865 (119909) 119866 (119909 119909) ge 119865 (119909) (17)
Lemma 3 Assume that the function 119866(119909 119909) is an auxiliaryfunction for 119865(119909) then 119865(119909) is nonincreasing under the update
119909119905+1
= argmin119909
119866(119909 119909119905
) (18)
Proof Consider 119865(119909119905+1) le 119866(119909119905+1
119909119905
) le 119866(119909119905
119909119905
) = 119865(119909119905
)
It can be observed that the equality 119865(119909119905+1
) = 119865(119909119905
)
holds only if 119909119905 is a local minimum of 119866(119909 119909) We iterate theupdate in (18) to obtain a sequence of estimates that convergeto a local minimum 119909min = argmin
119909119865(119909) of the objective
function given by
119865 (119909min) le sdot sdot sdot le 119865 (119909119905+1
) le 119865 (119909119905
)
le sdot sdot sdot le 119865 (1199092) le 119865 (119909
1) le 119865 (119909
0)
(19)
In the following we will show that the objective function (6)is nonincreasing under the updating rules (11) and (12) bydefining the appropriate auxiliary functions with respect to119881119894119895and119882
119894119895
Lemma 4 Function
119866(119881 )
=
1
120572 (1 minus 120572)
sum
119894119895119896
119883119894119895120585119894119895119896
()
times
120572 + (1 minus 120572)
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120585119894119895119896
()
minus (
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120585119894119895119896()
)
1minus120572
(20)
where 120585119894119895119896() = 119882
119894119896[119879
119880119879
]119896119895sum119897119882119894119897[119879
119880119879
]119897119895 is an auxiliary
function for
119865 (119881) =
1
120572 (1 minus 120572)
timessum
119894119895
120572119883119894119895+ (1 minus 120572) [119882119881
119879
119880119879
]119894119895
minus 119883120572
119894119895[119882119881119879
119880119879
]
1minus120572
119894119895
(21)
Proof Obviously 119866(119881119881) = 119865(119881) According to the defini-tion of auxiliary function we only need to prove 119866(119881 ) ge
119865(119881) To do this we use the convex function 119891(sdot) for positive120572 to rewrite the 120572-divergence function 119865(119881) as
119865 (119881) = sum
119894119895
119883119894119895119891(
sum119896119882119894119896[119881119879
119880119879
]119896119895
119883119894119895
) (22)
where
119891 (119909) =
1
120572 (1 minus 120572)
[120572 minus (1 minus 120572) 119909 minus 1199091minus120572
] (23)
Note that sum119896120585119894119895119896() = 1 and 120585
119894119895119896() ge 0 from the definition
of 120585119894119895119896() Applying Jensens inequality [30] it leads to
119891(sum
119896
119882119894119896[119881119879
119880119879
]119896119895
) le sum
119896
120585119894119895119896
() 119891(
119882119894119896[119881119879
119880119879
]119896119895
120585119894119895119896
()
)
(24)
From the above inequality it follows that
119865 (119881) = sum
119894119895
119883119894119895119891(
sum119896119882119894119896[119881119879
119880119879
]119896119895
119883119894119895
)
le sum
119894119895119896
119883119894119895120585119894119895119896
() 119891(
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120585119894119895119896
()
)
= 119866 (119881 )
(25)
which satisfies the condition of auxiliary function
Reversing the rules of 119881119894119895and119882
119894119895in Lemma 4 we define
an auxiliary function for the update (12)
Lemma 5 Function
119866(119882 )
=
1
120572 (1 minus 120572)
sum
119894119895119896
119883119894119895120578119894119895119896
()
times
120572 + (1 minus 120572)
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120578119894119895119896
()
minus (
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120578119894119895119896
()
)
1minus120572
(26)
where 120578119894119895119896() =
119894119896[119881119879
119880119879
]119896119895sum119897119894119897[119881119879
119880119879
]119897119895 is an auxil-
iary function for
119865 (119882)
=
1
120572 (1 minus 120572)
timessum
119894119895
120572119883119894119895+ (1 minus 120572) [119882119881
119879
119880119879
]119894119895
minus 119883120572
119894119895[119882119881119879
119880119879
]
1minus120572
119894119895
(27)
This can be easily proved in the same way as Lemma 4From Lemmas 4 and 5 now we can demonstrate the conver-gence of Theorem 1
Discrete Dynamics in Nature and Society 5
Proof To guarantee the stability of 119865(119881) from Lemma 3 wejust need to obtain the minimum of 119866(119881 ) with respect to119881119894119895 Set the gradient of (20) to zero there is
120597119866 (119881 )
120597119881119894119895
=
1
120572
sum
119896
[119880119879
119882]119896119894
1 minus (
119882119896119894[119881119879
119880119879
]119894119895
119883119896119895120585119896119895119894
()
)
minus120572
= 0
(28)
Then it follows that
119881119894119895= 119894119895
[
[
[
sum119896[119880119879
119882]119896119894
(119883119896119895sum119897119882119896119897[119879
119880119879
]119897119895
)
120572
sum119896[119880119879119882]119896119894
]
]
]
1120572
(29)
which is similar to the form of the updating rule (11)Similarly to guarantee the updating rule (12) holds theminimum of 119866(119882 ) which can be determined by settingthe gradient of (26) to zero must exist
Since 119866 is an auxiliary function according to Lemma 4119865 in (21) is nonincreasing under the update (11) Multiplyingupdates (11) and (12) we can find a local minimum of119863120572[119883119882(119880119881)
119879
]
5 Experiments
In this section the CNMF120572
algorithm is systematicallycompared with the current constrained NMF algorithmson three image data sets named ORL Database [31] YaleDatabase [32] and Caltech 101 Database [33] The details ofthe above three databases will be described individually laterWe introduce the evaluated algorithms firstly
(i) Constrained nonnegative matrix factorization algo-rithm in [15] aims to minimize the F-norm cost
(ii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 05 in this paper aims tominimize the Hellinger divergence cost
(iii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 1 in [15] aiming at min-imizing the KL-divergence cost is the best reportedalgorithm in image representation
(iv) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 2 in this paper aims tominimize the 1205942-divergence cost
(v) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 120572
119896in this paper aims to
minimize the 120572-divergence cost where the param-eters 120572
119896are associated with the characteristics of
the image database and designed by the presentedmethod CNMFKL algorithm is a special case of ourCNMF
120572119896algorithm with 120572 = 120572
119896= 1
We apply these algorithms to a problem of classificationand evaluate their performance on three image data setswhich contain a number of different categories of image Foreach date set the evaluations are conducted with differentnumbers of clusters here the number of clusters 119896 varies from2 to 10We randomly choose 119896 categories fromone image dataset and mix the images of these 119896 categories as the collection119883 Then for the semisupervised algorithms we randomlypick up 10 percent images from each category in119883 and recordtheir category number as the available label information toobtain the label matrix119880 For some special data sets the labelprocess is different and we will describe the details later
Suppose a data set has 119873 categories 1198621 1198622 119862
119873 and
the cardinalities of these labeled images are 1198711 1198712 119871119873
respectively Since the label constraint matrix 119880 is composedof the indicatormatrix 119878 and the identity matrix 119868 the indica-tor matrix 119878 plays a critical role in classification performancefor different categories in119883 To determine the effectiveness of119878 we define a measure to represent the relationship betweenthe cardinalities of labeled samples and the total samples
119903 (119896) =
119871max (119896) minus 119871min (119896)
sum119896
119894=1119862119894
119896 (30)
where 119871max(119896) and 119871min(119896) denote the maximum and mini-mum labeled cardinalities of 119896 categories For the fixed clusternumber 119896 in the data set 119903(119896) is different if the number ofsamples in each category is different Then we compute 120572 asfollows
120572119896=
120572119873
119903 (119873)
( |119903 (119896) minus 119903 (119873)|
times (
1003816100381610038161003816100381610038161003816100381610038161003816
119871max (119896) 119896
sum119896
119894=1119862119894
minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
+
1003816100381610038161003816100381610038161003816100381610038161003816
119871min (119896) 119896
sum119896
119894=1119862119894
minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
)
minus1
)
(31)
where
120572119873= (119871max (119873) minus 119871min (119873))
times (
1003816100381610038161003816100381610038161003816100381610038161003816
119871max (119873) minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
+
1003816100381610038161003816100381610038161003816100381610038161003816
119871min (119873) minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
)
minus1
(32)
The value of 120572119896computed by (31) is associated with the
characteristics of image data sets since its variation is causedby both the cardinalities of labeled samples in each categoryand the total samples We can obtain both the cardinalitiesof labeled samples and the total samples in our semisuper-vised algorithms However we can not get the cardinalitiesof labeled images exactly in many real-word applicationsMoreover the value of 120572 varies depending on data sets It isstill an open problem how to select the optimal 120572 [16]
To evaluate the classification performance we defineclassification accuracy as the first measure Our CNMF
120572
6 Discrete Dynamics in Nature and Society
algorithm described in (11) and (12) provides a classificationlabel of each sample marked 119904
119894119897119894 Suppose 119883 = 119909
119894119899
119894=1is a
data set which consists of n training images For each samplelet 119897119894be the true class label provided by the image data set
More specifically if the image 119909119894is designated the 119897
119894th class
we evaluate it as a correct label and set 119904119894119897119894
= 1 Otherwiseit is counted as a false label and noted 119904
119894119897119894= 0 Eventually we
compute the percentage of correct labels obtained by definingthe accuracy measure as
Ac (119904119894119897119894) =
sum119899
119894=1119904119894119897119894
119899
(33)
To evaluate the classification performance we carryout computation about the normalized mutual informationwhich is used to measure how similar two sets of clusters areas the second measure Given two data sets of clusters 119862
119894and
119862119895 their normalized mutual information is defined as
NMI (119862119894 119862119895)
=
sum119888119894isin119862119894 119888119895isin119862119895
119901 (119888119894 119888119895) sdot log 119901 (119888
119894 119888119895) [119901 (119888
119894) 119901 (119888119895)]
max (119867 (119862119894) 119867 (119862
119895))
(34)
which takes values between 0 and 1 Where 119901(119888119894) 119901(119888
119895)
denote the probabilities that an image arbitrarily chosen fromthe data set belongs to the clusters 119862
119894and 119862
119895 respectively
and 119901(119888119894 119888119895) denotes the joint probability that this arbitrarily
selected image belongs to the cluster 119862119894as well as 119862
119895at the
same time119867(119862119894) and 119867(119862
119895) are the entropies of 119862
119894and 119862
119895
respectivelyExperimental results on each data set will be presented as
classification accuracy and the normalized mutual informa-tion is in Tables 1 and 2
51 ORL Database The Cambridge ORL Face Database has400 images for 40 different people 10 images per personThe images of some people are taken at different timesvarying lighting slightly facial expressions (openclosed eyessmilingnonsmiling) and facial details (glassesno glasses)All the images are taken against a dark homogeneous back-ground with the subjects in an upright frontal position andslight left-right out-of-plane rotation To locate the facesthe input images are preprocessed They are resized to 32 times
32 pixels with 256 gray levels per pixel and normalized inorientation so that two eyes in the facial areas are aligned atthe same position
There are 10 images for each category in ORL and 10percent is just one image For the fixed parameter 120572 (120572 =
05 1 2) we randomly choose two images from each categoryto provide the label information Note that the same labelis meaningless for (30) To obtain 119903(119896) we divide the 40categories into 3 groups 10 categories 20 categories and 10categories In the first 10 categories pick up 1 image fromeach category to provide the label information pick up 2images from each category in the second 20 categories andpick up 3 from each category in the remaining categoriesThe
dividing process is repeated 10 times and the obtained averageclassification accuracy is recorded as the final result
Figure 1 shows the graphical classification accuracy ratesand normalized mutual information on the ORL DatabaseNote that if the samples in the collection 119883 come fromthe same group we set 120572
119896= 067 Because of the same
number of samples in each category the variation of the120572 is small even though we label different cardinalities ofsamples Compared to the constrained nonnegative matrixfactorization algorithms with fixed parameters our CNMF
120572119896
algorithm gives the best performance since the selection of120572119896is suitable to the collection 119883 Table 1 summarizes the
detailed classification accuracy and error bars of CNMF120572=05
CNMF
120572=2 and CNMF
120572119896 It shows that our CNMF
120572119896algo-
rithm achieves 192 percent improvement compared to thebest reported CNMFKL algorithm (120572 = 1) [15] in averageclassification accuracy For normalized mutual informationthe details and the error bars of our constrained algorithmswith 120572 = 05 120572 = 2 and 120572
119896are listed in Table 2 Comparing
to the best algorithmCNMF ourCNMF120572119896algorithmachieves
054 percent improvement
52 Yale Database The Yale Database consists of 165grayscale images for 15 individuals 11 images per personOne per image is taken from different facial expression orconfiguration center-light wglasses happy left-light wnoglasses normal right-light sad sleepy surprised and winkWe preprocess all the images of Yale Database in the sameway as the ORL Database Each image is linearly stretched toa 1024-dimensional vector in image space
TheYaleDatabase also has the same number of samples ineach category To obtain an appropriate 120572 we do similar labelprocessing to the ORL Database Divide the 15 individualsinto 3 groups averagely choose 1 image from each categoryin the first group choose 2 from each category in the secondand choose 3 from each category in the remaining groupWe repeat the process 10 times and record the averageclassification accuracy as the final result
Figure 2 shows the classification accuracy and normal-ized mutual information on the Yale Database Set 120572
119896=
055 when 119903(119896) minus 119903(119873) = 0 It indicates that the samplesin the collection 119883 just come from two groups that ischoose 15 images from 10 categories in the first and secondgroup CNMF
120572119896achieves an extraordinary performance for
all the cases and CNMF120572=05
follows This suggests that theconstrained nonnegative matrix factorization algorithm hasa higher classification accuracy when the value of 120572 is closeto 120572119896 Comparing to the best reported CNMFKL algorithm
CNMF120572119896
achieves 242 percent improvement in averageclassification accuracy For normalized mutual informationCNMF
120572119896achieves 72 percent improvement compared to
CNMF The details of classification accuracy and normal-ized mutual information are provided in Tables 3 and 4which contain the error bars of CNMF
120572=05 CNMF
120572=2 and
CNMF120572119896
53 Caltech 101 Database The Caltech 101 Database createdby Caltech University has images of 101 different object
Discrete Dynamics in Nature and Society 7
Table 1 The comparison of classification accuracy rates on the ORL Database
119896
Classification accuracy Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321
Table 2 The comparison of normalized mutual information on the ORL Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027
categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram
We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572
119896computed by (31) are listed in Table 7
The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572
119896can fully reflect the effectiveness of the
indicator matrix 119878Figure 3 shows the effect that derived from the using of
proposed algorithm with 120572119896 The upper part of figure is the
original samples which contain 26 images the middle part
is their gray images and the lower is the combination of thebasis vectors learned by CNMF
120572119896
The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF
120572=05 CNMF
120572=2 and CNMF
120572119896 The graphical results
of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572
119896listed in Table 7were selected In general ourmethod
demonstrates much better effectiveness in classification bychoosing 120572
119896 Comparing to the best reported algorithm
other than our CNMF120572algorithm CNMF
120572119896achieves 24
percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF
120572
algorithm CNMF120572119896
achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF
120572119896achieves 229 percent improvement and
consistently outperforms the other algorithms
6 Conclusion
In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using
8 Discrete Dynamics in Nature and Society
Table 3 The comparison of classification accuracy rates on the Yale Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831
Table 4 The comparison of normalized mutual information on the Yale Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765
2 3 4 5 6 7 8 9 10076
078
08
082
084
086
088
09
092
094
Number of classes
Accu
racy
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy rates
2 3 4 5 6 7 8 9 10072
074
076
078
08
082
084
086
088
09
Nor
mal
ized
mut
ual i
nfor
mat
ion
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 1 Classification performance on the ORL Database
Discrete Dynamics in Nature and Society 9
2 3 4 5 6 7 8 9 1005
055
06
065
07
075
08
085
09Ac
cura
cy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 10042
044
046
048
05
052
054
056
058
06
062
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 2 Classification performance on the Yale Database
(a)
(b)
(c)
Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF
120572119896 Positive values are illustrated with white pixels
Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the
model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF
120572
algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total
10 Discrete Dynamics in Nature and Society
2 3 4 5 6 7 8 9 1004
045
05
055
06
065
07
075
08
085
09
Accu
racy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 1002
025
03
035
04
045
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 4 Classification performance on the Caltech 101 Database
Table 5 The comparison of classification accuracy rates on the Caltech 101 Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898
Table 6 The comparison of normalized mutual information on the Caltech 101 Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869
Discrete Dynamics in Nature and Society 11
Table 7 The value of 120572 in 119896 categories
119896 1205721
1205722
1205723
1205724
1205725
1205726
1205727
1205728
1205729
12057210
2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100
samples In the experiments we apply the fixed parameters120572 and 120572
119896to analyze the classification accuracy on three
image databasesThe experimental results have demonstratedthat the CNMF
120572119896algorithm has best classification accuracy
However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)
References
[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987
[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996
[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994
[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986
[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001
[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992
[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999
[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004
[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001
[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005
[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004
[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004
[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001
[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006
[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012
[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008
[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003
[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003
[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006
[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of
12 Discrete Dynamics in Nature and Society
IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003
[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006
[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004
[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008
[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006
[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006
[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006
[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006
[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977
[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997
[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982
[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml
[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant
keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
2 Discrete Dynamics in Nature and Society
NMF algorithms we analyze the classification accuracy fortwo fixed values of 120572 (120572 = 05 2) on three image data sets
The CNMF120572algorithm does not work well for a fixed
value of 120572 Since the parameter 120572 is associated with thecharacteristics of a learning machine the model distributionis more inclusive when 120572 goes to +infin and is more exclusivewhen 120572 approaches to minusinfinThe selection of the optimal valueof 120572 plays a critical role in determining the discriminativebasis vectors In this paper we provide a method to selectparameters 120572 for our semisupervised CNMF
120572algorithmThe
variation of 120572 is associated with the characteristics of imagedata sets Compared with the algorithms in [15 16] ouralgorithm is more complete and systemic
The rest of the paper is organized as follows In Section 2we make a brief overview on standard NMF algorithm andconstraint NMF algorithm The detailed algorithms withlabeled constraints and theoretical proof of the convergenceof the algorithms are provided in Sections 3 and 4 separatelySection 5 presents some experimental results to show theadvantages of our algorithm Finally a conclusion is given inSection 6
2 Related Work
NMF proposed by Lee and Seung [7] is considered to providea part-based representation and applied in diverse examplesof nonnegative data [17ndash21] including text data miningsubsystem identification spectral data analysis audio andsound processing and document clustering
Suppose 119883 = [1199091 119909
119899] isin 119877
119898times119899 is a set of 119899 trainingimages where 119909
119905119899
119905=1is a column vector and consists of 119898
nonnegative pixel values of a training image NMF is to findtwo nonnegative matrix factors 119882 isin 119877
119898times119903 and 119867 isin 119877119899times119903 to
approximate the original image matrix
119883 asymp 119882119867119879
(1)
where the positive integer 119903 is smaller than119898 or 119899NMF uses nonnegative constraints to make the rep-
resentation purely adapted to an unsupervised way It isinapplicable to learn a basis representation to the limitedimage information To make up for this deficiency extraconstraints such as locality [22] sparseness [9] and orthog-onality [23] were implicitly or explicitly incorporated intoNMF to identify better local features or provide more sparserepresentation
In [15] the authors impose label information as additionalhard constraints to NMF unsupervised learning algorithmto derive a semisupervised matrix decomposition algorithmwhich makes the obtained representation more discriminat-ing The label information is incorporated as follows
Suppose 119883 = 119909119894119899
119894=1is a data set which consists of n
training images Set that the first 119904 images 1199091 119909
119904 (119904 le 119899)
are represented by the label information and the remaining119899 minus 119904 images 119909
119904+1 119909
119899 (119904 le 119899) are represented by the
unlabeled Assume there exist 119888 classes and each image from
1199091 119909
119904 is designated one class Then we have an 119904 times 119888
indicator matrix 119878 which can be represented as
119904119894119895=
1 if119909119894is designated the 119895th class
0 otherwise(2)
From the indicator matrix 119878 a label constraint matrix 119880 canbe defined as
119880 = (
119878119904times119888
0
0 119868119899minus119904
) (3)
where 119868119899minus119904
denotes an (119899 minus 119904) times (119899 minus 119904) identity matrixImposing the label information as additional hard con-
straint by an auxiliary matrix 119881 there is 119867 = 119880119881 Itverifies that ℎ
119894= ℎ119895if 119909119894and 119909
119895have the same labels With
the label constraints the standard NMF is transformed intofactorizing a large-size matrix 119883 into the product of threesmall-size matrices119882 119880 and 119881
119883 asymp 119882(119880119881)119879
(4)
Such a representation encodes the data points from the sameclass using the indicatormatrix in a new representation space
3 A Constrained Algorithm Based NMF120572
The exact form of the error measure of (1) is as crucialas the nonnegative constraints in the success of the NMFalgorithm in learning a useful representation of image dataIn the researches on NMF there are quite a large numberof investigations for error measure such as Csiszars f-divergences [24] Amaris 120572-divergence [25] and Bregmandivergences [26] Here we introduce a genetic multiplicativeupdating algorithm [16] which iteratively minimizes the 120572-divergence between119883 and119882119867
119879 We define the 120572-divergenceas119863120572[119883119882119867
119879
]
=
1
120572 (1 minus 120572)
119898
sum
119894=1
119899
sum
119895=1
120572119883119894119895+ (1 minus 120572) [119882119867
119879
]119894119895
minus 119883120572
119894119895[119882119867119879
]
1minus120572
119894119895
(5)
where 120572 is a positive parameter We combine the labeled con-straints with (4) to derive the following objective functionwhich is based on the 120572-divergence between119883 and119882(119880119881)
119879
119863120572[119883119882(119880119881)
119879
]
=
1
120572 (1 minus 120572)
times
119898
sum
119894=1
119899
sum
119895=1
120572119883119894119895+ (1 minus 120572) [119882119881
119879
119880119879
]119894119895
minus 119883120572
119894119895[119882119881119879
119880119879
]
1minus120572
119894119895
(6)
With the constraints 119882119894119895
ge 0 119880119894119895
ge 0 and 119881119894119895
ge 0 theminimization of119863
120572[119883119882(119880119881)
119879
] can be formulated as a con-strained minimization problem with inequality constraintsIn the following we will present two methods to find a localminimum of (6)
Discrete Dynamics in Nature and Society 3
31 KKT Method Let 120582119894119895ge 0 and 120583
119894119895ge 0 be the Lagrangian
multipliers associated with constraints 119881119894119895
ge 0 and 119882119894119895
ge
0 respectively The Karush-Kuhn-Tucker conditions requirethat both the optimality conditions
120597119863120572[119883119882(119880119881)
119879
]
120597119881119894119895
=
1
120572
sum
119896
[119880119879
119882]119896119894
minussum
119896
[119880119879
119882]119896119894
(
119883119896119895
[119882119881119879119880119879]119896119895
)
120572
= 120582119894119895
(7)
120597119863120572[119883119882(119880119881)
119879
]
120597119882119894119895
=
1
120572
sum
119896
[119881119879
119880119879
]119895119896
minussum
119896
[119881119879
119880119879
]119895119896
(
119883119894119896
[119882119881119879119880119879]119894119896
)
120572
= 120583119894119895
(8)
and the complementary slackness conditions
120582119894119895119881119894119895= 0
120583119894119895119882119894119895= 0
(9)
are satisfied If 119881119894119895= 0 and 119882
119894119895= 0 then either 120582
119894119895120583119894119895can
have any values At the same time if 120582119894119895= 0 and 120583
119894119895= 0 then
119881119894119895ge 0 and119882
119894119895ge 0 Hence we need both 120582
119894119895= 0 120583119894119895
= 0 and119881119894119895
= 0 119882119894119895
= 0 It follows from (9) that
120582119894119895119881120572
119894119895= 0
120583119894119895119882120572
119894119895= 0
(10)
We multiply both sides of (7) and (8) by 119881119894119895
and 119882119894119895
respectively and incorporate with (10) and then we obtainthe following updating rules
119881119894119895larr997888 119881
119894119895
[
[
[
sum119896[119880119879
119882]119896119894
(119883119896119895[119882119881
119879
119880119879
]119896119895
)
120572
sum119897[119880119879119882]119897119894
]
]
]
1120572
(11)
119882119894119895larr997888 119882
119894119895
[
[
sum119896[119881119879
119880119879
]119895119896
(119883119894119896[119882119881
119879
119880119879
]119894119896
)
120572
sum119897[119881119879119880119879]119895119897
]
]
1120572
(12)
32 Projected Gradient Method Considering the gradientdescent algorithm [24 25] the updating rules for the objec-tive function (6) can be also derived by using the projectedgradient method [27] and have the form
120601 (119881119894119895) larr997888 120601 (119881
119894119895) + 120575119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119881119894119895
120601 (119882119894119895) larr997888 120601 (119882
119894119895) + 120574119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119882119894119895
(13)
where 120601(sdot) is a suitably chosen function and 120575119894119895and 120574119894119895are two
parameters to control the step size of gradient descent Thenwe have
119881119894119895larr997888 120601
minus1
(120601 (119881119894119895) + 120575119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119881119894119895
)
119882119894119895larr997888 120601
minus1
(120601 (119882119894119895) + 120574119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119882119894119895
)
(14)
Setting 120601(Ω) = Ω120572 to guarantee that the updating rules (11)
and (12) hold we need
120575119894119895= minus
120572119881120572
119894119895
sum119896[119880119879119882]119896119894
120574119894119895= minus
120572119882120572
119894119895
sum119896[119881119879119880119879]119895119896
(15)
From (7) and (8) the updating rules become
119881120572
119894119895larr997888 119881
120572
119894119895+ 120575119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119881119894119895
= 119881120572
119894119895
sum119896[119880119879
119882]119896119894
(119883119896119895[119882119881
119879
119880119879
]119896119895
)
120572
sum119897[119880119879119882]119897119894
119882120572
119894119895larr997888 119882
120572
119894119895+ 120574119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119882119894119895
= 119882120572
119894119895
sum119896[119881119879
119880119879
]119895119896
(119883119894119896[119882119881
119879
119880119879
]119894119896
)
120572
sum119897[119881119879119880119879]119895119897
(16)
which are the same as the updating rules (11) and (12)We have shown that the algorithm can be derived usingKarush-Kuhn-Tucker conditions and presented alternative ofthe algorithm using the projected gradient The use of thetwo methods guarantees the correctness of the algorithmtheoretically
In the following we will give a theorem to guarantee theconvergence of the iterations in updates (11) and (12)
Theorem 1 For the objective function (6) 119863120572[119883119882(119880119881)
119879
]
is nonincreasing under the updating rules (11) and (12) Theobjective function 119863
120572is invariant under these updates if and
only if 119881 and119882 are at a stationary point
Multiplicative updates for our constrained algorithmbased NMF
120572are given in (11) and (12) These updates find
a local minimum of 119863120572[119883119882(119880119881)
119879
] which is the finalsolution of (11) and (12) Note that when 120572 = 1 the updates(11) and (12) are the samewithCNMFKL algorithm [15] whichis a special case included in our generic constraint matrixdecomposition algorithm In the following we will give theproof of Theorem 1
4 Convergence Analysis
To proveTheorem 1 wewillmake use of an auxiliary functionthatwas used in the expectation-maximization algorithm [2829]
4 Discrete Dynamics in Nature and Society
Definition 2 A function 119866(119909 119909) is defined as an auxiliaryfunction for 119865(119909) if the following two conditions are bothsatisfied
119866 (119909 119909) = 119865 (119909) 119866 (119909 119909) ge 119865 (119909) (17)
Lemma 3 Assume that the function 119866(119909 119909) is an auxiliaryfunction for 119865(119909) then 119865(119909) is nonincreasing under the update
119909119905+1
= argmin119909
119866(119909 119909119905
) (18)
Proof Consider 119865(119909119905+1) le 119866(119909119905+1
119909119905
) le 119866(119909119905
119909119905
) = 119865(119909119905
)
It can be observed that the equality 119865(119909119905+1
) = 119865(119909119905
)
holds only if 119909119905 is a local minimum of 119866(119909 119909) We iterate theupdate in (18) to obtain a sequence of estimates that convergeto a local minimum 119909min = argmin
119909119865(119909) of the objective
function given by
119865 (119909min) le sdot sdot sdot le 119865 (119909119905+1
) le 119865 (119909119905
)
le sdot sdot sdot le 119865 (1199092) le 119865 (119909
1) le 119865 (119909
0)
(19)
In the following we will show that the objective function (6)is nonincreasing under the updating rules (11) and (12) bydefining the appropriate auxiliary functions with respect to119881119894119895and119882
119894119895
Lemma 4 Function
119866(119881 )
=
1
120572 (1 minus 120572)
sum
119894119895119896
119883119894119895120585119894119895119896
()
times
120572 + (1 minus 120572)
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120585119894119895119896
()
minus (
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120585119894119895119896()
)
1minus120572
(20)
where 120585119894119895119896() = 119882
119894119896[119879
119880119879
]119896119895sum119897119882119894119897[119879
119880119879
]119897119895 is an auxiliary
function for
119865 (119881) =
1
120572 (1 minus 120572)
timessum
119894119895
120572119883119894119895+ (1 minus 120572) [119882119881
119879
119880119879
]119894119895
minus 119883120572
119894119895[119882119881119879
119880119879
]
1minus120572
119894119895
(21)
Proof Obviously 119866(119881119881) = 119865(119881) According to the defini-tion of auxiliary function we only need to prove 119866(119881 ) ge
119865(119881) To do this we use the convex function 119891(sdot) for positive120572 to rewrite the 120572-divergence function 119865(119881) as
119865 (119881) = sum
119894119895
119883119894119895119891(
sum119896119882119894119896[119881119879
119880119879
]119896119895
119883119894119895
) (22)
where
119891 (119909) =
1
120572 (1 minus 120572)
[120572 minus (1 minus 120572) 119909 minus 1199091minus120572
] (23)
Note that sum119896120585119894119895119896() = 1 and 120585
119894119895119896() ge 0 from the definition
of 120585119894119895119896() Applying Jensens inequality [30] it leads to
119891(sum
119896
119882119894119896[119881119879
119880119879
]119896119895
) le sum
119896
120585119894119895119896
() 119891(
119882119894119896[119881119879
119880119879
]119896119895
120585119894119895119896
()
)
(24)
From the above inequality it follows that
119865 (119881) = sum
119894119895
119883119894119895119891(
sum119896119882119894119896[119881119879
119880119879
]119896119895
119883119894119895
)
le sum
119894119895119896
119883119894119895120585119894119895119896
() 119891(
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120585119894119895119896
()
)
= 119866 (119881 )
(25)
which satisfies the condition of auxiliary function
Reversing the rules of 119881119894119895and119882
119894119895in Lemma 4 we define
an auxiliary function for the update (12)
Lemma 5 Function
119866(119882 )
=
1
120572 (1 minus 120572)
sum
119894119895119896
119883119894119895120578119894119895119896
()
times
120572 + (1 minus 120572)
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120578119894119895119896
()
minus (
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120578119894119895119896
()
)
1minus120572
(26)
where 120578119894119895119896() =
119894119896[119881119879
119880119879
]119896119895sum119897119894119897[119881119879
119880119879
]119897119895 is an auxil-
iary function for
119865 (119882)
=
1
120572 (1 minus 120572)
timessum
119894119895
120572119883119894119895+ (1 minus 120572) [119882119881
119879
119880119879
]119894119895
minus 119883120572
119894119895[119882119881119879
119880119879
]
1minus120572
119894119895
(27)
This can be easily proved in the same way as Lemma 4From Lemmas 4 and 5 now we can demonstrate the conver-gence of Theorem 1
Discrete Dynamics in Nature and Society 5
Proof To guarantee the stability of 119865(119881) from Lemma 3 wejust need to obtain the minimum of 119866(119881 ) with respect to119881119894119895 Set the gradient of (20) to zero there is
120597119866 (119881 )
120597119881119894119895
=
1
120572
sum
119896
[119880119879
119882]119896119894
1 minus (
119882119896119894[119881119879
119880119879
]119894119895
119883119896119895120585119896119895119894
()
)
minus120572
= 0
(28)
Then it follows that
119881119894119895= 119894119895
[
[
[
sum119896[119880119879
119882]119896119894
(119883119896119895sum119897119882119896119897[119879
119880119879
]119897119895
)
120572
sum119896[119880119879119882]119896119894
]
]
]
1120572
(29)
which is similar to the form of the updating rule (11)Similarly to guarantee the updating rule (12) holds theminimum of 119866(119882 ) which can be determined by settingthe gradient of (26) to zero must exist
Since 119866 is an auxiliary function according to Lemma 4119865 in (21) is nonincreasing under the update (11) Multiplyingupdates (11) and (12) we can find a local minimum of119863120572[119883119882(119880119881)
119879
]
5 Experiments
In this section the CNMF120572
algorithm is systematicallycompared with the current constrained NMF algorithmson three image data sets named ORL Database [31] YaleDatabase [32] and Caltech 101 Database [33] The details ofthe above three databases will be described individually laterWe introduce the evaluated algorithms firstly
(i) Constrained nonnegative matrix factorization algo-rithm in [15] aims to minimize the F-norm cost
(ii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 05 in this paper aims tominimize the Hellinger divergence cost
(iii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 1 in [15] aiming at min-imizing the KL-divergence cost is the best reportedalgorithm in image representation
(iv) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 2 in this paper aims tominimize the 1205942-divergence cost
(v) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 120572
119896in this paper aims to
minimize the 120572-divergence cost where the param-eters 120572
119896are associated with the characteristics of
the image database and designed by the presentedmethod CNMFKL algorithm is a special case of ourCNMF
120572119896algorithm with 120572 = 120572
119896= 1
We apply these algorithms to a problem of classificationand evaluate their performance on three image data setswhich contain a number of different categories of image Foreach date set the evaluations are conducted with differentnumbers of clusters here the number of clusters 119896 varies from2 to 10We randomly choose 119896 categories fromone image dataset and mix the images of these 119896 categories as the collection119883 Then for the semisupervised algorithms we randomlypick up 10 percent images from each category in119883 and recordtheir category number as the available label information toobtain the label matrix119880 For some special data sets the labelprocess is different and we will describe the details later
Suppose a data set has 119873 categories 1198621 1198622 119862
119873 and
the cardinalities of these labeled images are 1198711 1198712 119871119873
respectively Since the label constraint matrix 119880 is composedof the indicatormatrix 119878 and the identity matrix 119868 the indica-tor matrix 119878 plays a critical role in classification performancefor different categories in119883 To determine the effectiveness of119878 we define a measure to represent the relationship betweenthe cardinalities of labeled samples and the total samples
119903 (119896) =
119871max (119896) minus 119871min (119896)
sum119896
119894=1119862119894
119896 (30)
where 119871max(119896) and 119871min(119896) denote the maximum and mini-mum labeled cardinalities of 119896 categories For the fixed clusternumber 119896 in the data set 119903(119896) is different if the number ofsamples in each category is different Then we compute 120572 asfollows
120572119896=
120572119873
119903 (119873)
( |119903 (119896) minus 119903 (119873)|
times (
1003816100381610038161003816100381610038161003816100381610038161003816
119871max (119896) 119896
sum119896
119894=1119862119894
minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
+
1003816100381610038161003816100381610038161003816100381610038161003816
119871min (119896) 119896
sum119896
119894=1119862119894
minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
)
minus1
)
(31)
where
120572119873= (119871max (119873) minus 119871min (119873))
times (
1003816100381610038161003816100381610038161003816100381610038161003816
119871max (119873) minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
+
1003816100381610038161003816100381610038161003816100381610038161003816
119871min (119873) minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
)
minus1
(32)
The value of 120572119896computed by (31) is associated with the
characteristics of image data sets since its variation is causedby both the cardinalities of labeled samples in each categoryand the total samples We can obtain both the cardinalitiesof labeled samples and the total samples in our semisuper-vised algorithms However we can not get the cardinalitiesof labeled images exactly in many real-word applicationsMoreover the value of 120572 varies depending on data sets It isstill an open problem how to select the optimal 120572 [16]
To evaluate the classification performance we defineclassification accuracy as the first measure Our CNMF
120572
6 Discrete Dynamics in Nature and Society
algorithm described in (11) and (12) provides a classificationlabel of each sample marked 119904
119894119897119894 Suppose 119883 = 119909
119894119899
119894=1is a
data set which consists of n training images For each samplelet 119897119894be the true class label provided by the image data set
More specifically if the image 119909119894is designated the 119897
119894th class
we evaluate it as a correct label and set 119904119894119897119894
= 1 Otherwiseit is counted as a false label and noted 119904
119894119897119894= 0 Eventually we
compute the percentage of correct labels obtained by definingthe accuracy measure as
Ac (119904119894119897119894) =
sum119899
119894=1119904119894119897119894
119899
(33)
To evaluate the classification performance we carryout computation about the normalized mutual informationwhich is used to measure how similar two sets of clusters areas the second measure Given two data sets of clusters 119862
119894and
119862119895 their normalized mutual information is defined as
NMI (119862119894 119862119895)
=
sum119888119894isin119862119894 119888119895isin119862119895
119901 (119888119894 119888119895) sdot log 119901 (119888
119894 119888119895) [119901 (119888
119894) 119901 (119888119895)]
max (119867 (119862119894) 119867 (119862
119895))
(34)
which takes values between 0 and 1 Where 119901(119888119894) 119901(119888
119895)
denote the probabilities that an image arbitrarily chosen fromthe data set belongs to the clusters 119862
119894and 119862
119895 respectively
and 119901(119888119894 119888119895) denotes the joint probability that this arbitrarily
selected image belongs to the cluster 119862119894as well as 119862
119895at the
same time119867(119862119894) and 119867(119862
119895) are the entropies of 119862
119894and 119862
119895
respectivelyExperimental results on each data set will be presented as
classification accuracy and the normalized mutual informa-tion is in Tables 1 and 2
51 ORL Database The Cambridge ORL Face Database has400 images for 40 different people 10 images per personThe images of some people are taken at different timesvarying lighting slightly facial expressions (openclosed eyessmilingnonsmiling) and facial details (glassesno glasses)All the images are taken against a dark homogeneous back-ground with the subjects in an upright frontal position andslight left-right out-of-plane rotation To locate the facesthe input images are preprocessed They are resized to 32 times
32 pixels with 256 gray levels per pixel and normalized inorientation so that two eyes in the facial areas are aligned atthe same position
There are 10 images for each category in ORL and 10percent is just one image For the fixed parameter 120572 (120572 =
05 1 2) we randomly choose two images from each categoryto provide the label information Note that the same labelis meaningless for (30) To obtain 119903(119896) we divide the 40categories into 3 groups 10 categories 20 categories and 10categories In the first 10 categories pick up 1 image fromeach category to provide the label information pick up 2images from each category in the second 20 categories andpick up 3 from each category in the remaining categoriesThe
dividing process is repeated 10 times and the obtained averageclassification accuracy is recorded as the final result
Figure 1 shows the graphical classification accuracy ratesand normalized mutual information on the ORL DatabaseNote that if the samples in the collection 119883 come fromthe same group we set 120572
119896= 067 Because of the same
number of samples in each category the variation of the120572 is small even though we label different cardinalities ofsamples Compared to the constrained nonnegative matrixfactorization algorithms with fixed parameters our CNMF
120572119896
algorithm gives the best performance since the selection of120572119896is suitable to the collection 119883 Table 1 summarizes the
detailed classification accuracy and error bars of CNMF120572=05
CNMF
120572=2 and CNMF
120572119896 It shows that our CNMF
120572119896algo-
rithm achieves 192 percent improvement compared to thebest reported CNMFKL algorithm (120572 = 1) [15] in averageclassification accuracy For normalized mutual informationthe details and the error bars of our constrained algorithmswith 120572 = 05 120572 = 2 and 120572
119896are listed in Table 2 Comparing
to the best algorithmCNMF ourCNMF120572119896algorithmachieves
054 percent improvement
52 Yale Database The Yale Database consists of 165grayscale images for 15 individuals 11 images per personOne per image is taken from different facial expression orconfiguration center-light wglasses happy left-light wnoglasses normal right-light sad sleepy surprised and winkWe preprocess all the images of Yale Database in the sameway as the ORL Database Each image is linearly stretched toa 1024-dimensional vector in image space
TheYaleDatabase also has the same number of samples ineach category To obtain an appropriate 120572 we do similar labelprocessing to the ORL Database Divide the 15 individualsinto 3 groups averagely choose 1 image from each categoryin the first group choose 2 from each category in the secondand choose 3 from each category in the remaining groupWe repeat the process 10 times and record the averageclassification accuracy as the final result
Figure 2 shows the classification accuracy and normal-ized mutual information on the Yale Database Set 120572
119896=
055 when 119903(119896) minus 119903(119873) = 0 It indicates that the samplesin the collection 119883 just come from two groups that ischoose 15 images from 10 categories in the first and secondgroup CNMF
120572119896achieves an extraordinary performance for
all the cases and CNMF120572=05
follows This suggests that theconstrained nonnegative matrix factorization algorithm hasa higher classification accuracy when the value of 120572 is closeto 120572119896 Comparing to the best reported CNMFKL algorithm
CNMF120572119896
achieves 242 percent improvement in averageclassification accuracy For normalized mutual informationCNMF
120572119896achieves 72 percent improvement compared to
CNMF The details of classification accuracy and normal-ized mutual information are provided in Tables 3 and 4which contain the error bars of CNMF
120572=05 CNMF
120572=2 and
CNMF120572119896
53 Caltech 101 Database The Caltech 101 Database createdby Caltech University has images of 101 different object
Discrete Dynamics in Nature and Society 7
Table 1 The comparison of classification accuracy rates on the ORL Database
119896
Classification accuracy Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321
Table 2 The comparison of normalized mutual information on the ORL Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027
categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram
We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572
119896computed by (31) are listed in Table 7
The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572
119896can fully reflect the effectiveness of the
indicator matrix 119878Figure 3 shows the effect that derived from the using of
proposed algorithm with 120572119896 The upper part of figure is the
original samples which contain 26 images the middle part
is their gray images and the lower is the combination of thebasis vectors learned by CNMF
120572119896
The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF
120572=05 CNMF
120572=2 and CNMF
120572119896 The graphical results
of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572
119896listed in Table 7were selected In general ourmethod
demonstrates much better effectiveness in classification bychoosing 120572
119896 Comparing to the best reported algorithm
other than our CNMF120572algorithm CNMF
120572119896achieves 24
percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF
120572
algorithm CNMF120572119896
achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF
120572119896achieves 229 percent improvement and
consistently outperforms the other algorithms
6 Conclusion
In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using
8 Discrete Dynamics in Nature and Society
Table 3 The comparison of classification accuracy rates on the Yale Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831
Table 4 The comparison of normalized mutual information on the Yale Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765
2 3 4 5 6 7 8 9 10076
078
08
082
084
086
088
09
092
094
Number of classes
Accu
racy
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy rates
2 3 4 5 6 7 8 9 10072
074
076
078
08
082
084
086
088
09
Nor
mal
ized
mut
ual i
nfor
mat
ion
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 1 Classification performance on the ORL Database
Discrete Dynamics in Nature and Society 9
2 3 4 5 6 7 8 9 1005
055
06
065
07
075
08
085
09Ac
cura
cy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 10042
044
046
048
05
052
054
056
058
06
062
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 2 Classification performance on the Yale Database
(a)
(b)
(c)
Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF
120572119896 Positive values are illustrated with white pixels
Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the
model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF
120572
algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total
10 Discrete Dynamics in Nature and Society
2 3 4 5 6 7 8 9 1004
045
05
055
06
065
07
075
08
085
09
Accu
racy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 1002
025
03
035
04
045
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 4 Classification performance on the Caltech 101 Database
Table 5 The comparison of classification accuracy rates on the Caltech 101 Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898
Table 6 The comparison of normalized mutual information on the Caltech 101 Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869
Discrete Dynamics in Nature and Society 11
Table 7 The value of 120572 in 119896 categories
119896 1205721
1205722
1205723
1205724
1205725
1205726
1205727
1205728
1205729
12057210
2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100
samples In the experiments we apply the fixed parameters120572 and 120572
119896to analyze the classification accuracy on three
image databasesThe experimental results have demonstratedthat the CNMF
120572119896algorithm has best classification accuracy
However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)
References
[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987
[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996
[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994
[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986
[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001
[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992
[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999
[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004
[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001
[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005
[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004
[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004
[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001
[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006
[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012
[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008
[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003
[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003
[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006
[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of
12 Discrete Dynamics in Nature and Society
IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003
[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006
[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004
[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008
[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006
[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006
[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006
[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006
[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977
[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997
[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982
[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml
[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant
keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Discrete Dynamics in Nature and Society 3
31 KKT Method Let 120582119894119895ge 0 and 120583
119894119895ge 0 be the Lagrangian
multipliers associated with constraints 119881119894119895
ge 0 and 119882119894119895
ge
0 respectively The Karush-Kuhn-Tucker conditions requirethat both the optimality conditions
120597119863120572[119883119882(119880119881)
119879
]
120597119881119894119895
=
1
120572
sum
119896
[119880119879
119882]119896119894
minussum
119896
[119880119879
119882]119896119894
(
119883119896119895
[119882119881119879119880119879]119896119895
)
120572
= 120582119894119895
(7)
120597119863120572[119883119882(119880119881)
119879
]
120597119882119894119895
=
1
120572
sum
119896
[119881119879
119880119879
]119895119896
minussum
119896
[119881119879
119880119879
]119895119896
(
119883119894119896
[119882119881119879119880119879]119894119896
)
120572
= 120583119894119895
(8)
and the complementary slackness conditions
120582119894119895119881119894119895= 0
120583119894119895119882119894119895= 0
(9)
are satisfied If 119881119894119895= 0 and 119882
119894119895= 0 then either 120582
119894119895120583119894119895can
have any values At the same time if 120582119894119895= 0 and 120583
119894119895= 0 then
119881119894119895ge 0 and119882
119894119895ge 0 Hence we need both 120582
119894119895= 0 120583119894119895
= 0 and119881119894119895
= 0 119882119894119895
= 0 It follows from (9) that
120582119894119895119881120572
119894119895= 0
120583119894119895119882120572
119894119895= 0
(10)
We multiply both sides of (7) and (8) by 119881119894119895
and 119882119894119895
respectively and incorporate with (10) and then we obtainthe following updating rules
119881119894119895larr997888 119881
119894119895
[
[
[
sum119896[119880119879
119882]119896119894
(119883119896119895[119882119881
119879
119880119879
]119896119895
)
120572
sum119897[119880119879119882]119897119894
]
]
]
1120572
(11)
119882119894119895larr997888 119882
119894119895
[
[
sum119896[119881119879
119880119879
]119895119896
(119883119894119896[119882119881
119879
119880119879
]119894119896
)
120572
sum119897[119881119879119880119879]119895119897
]
]
1120572
(12)
32 Projected Gradient Method Considering the gradientdescent algorithm [24 25] the updating rules for the objec-tive function (6) can be also derived by using the projectedgradient method [27] and have the form
120601 (119881119894119895) larr997888 120601 (119881
119894119895) + 120575119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119881119894119895
120601 (119882119894119895) larr997888 120601 (119882
119894119895) + 120574119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119882119894119895
(13)
where 120601(sdot) is a suitably chosen function and 120575119894119895and 120574119894119895are two
parameters to control the step size of gradient descent Thenwe have
119881119894119895larr997888 120601
minus1
(120601 (119881119894119895) + 120575119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119881119894119895
)
119882119894119895larr997888 120601
minus1
(120601 (119882119894119895) + 120574119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119882119894119895
)
(14)
Setting 120601(Ω) = Ω120572 to guarantee that the updating rules (11)
and (12) hold we need
120575119894119895= minus
120572119881120572
119894119895
sum119896[119880119879119882]119896119894
120574119894119895= minus
120572119882120572
119894119895
sum119896[119881119879119880119879]119895119896
(15)
From (7) and (8) the updating rules become
119881120572
119894119895larr997888 119881
120572
119894119895+ 120575119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119881119894119895
= 119881120572
119894119895
sum119896[119880119879
119882]119896119894
(119883119896119895[119882119881
119879
119880119879
]119896119895
)
120572
sum119897[119880119879119882]119897119894
119882120572
119894119895larr997888 119882
120572
119894119895+ 120574119894119895
120597119863120572[119883119882(119880119881)
119879
]
120597119882119894119895
= 119882120572
119894119895
sum119896[119881119879
119880119879
]119895119896
(119883119894119896[119882119881
119879
119880119879
]119894119896
)
120572
sum119897[119881119879119880119879]119895119897
(16)
which are the same as the updating rules (11) and (12)We have shown that the algorithm can be derived usingKarush-Kuhn-Tucker conditions and presented alternative ofthe algorithm using the projected gradient The use of thetwo methods guarantees the correctness of the algorithmtheoretically
In the following we will give a theorem to guarantee theconvergence of the iterations in updates (11) and (12)
Theorem 1 For the objective function (6) 119863120572[119883119882(119880119881)
119879
]
is nonincreasing under the updating rules (11) and (12) Theobjective function 119863
120572is invariant under these updates if and
only if 119881 and119882 are at a stationary point
Multiplicative updates for our constrained algorithmbased NMF
120572are given in (11) and (12) These updates find
a local minimum of 119863120572[119883119882(119880119881)
119879
] which is the finalsolution of (11) and (12) Note that when 120572 = 1 the updates(11) and (12) are the samewithCNMFKL algorithm [15] whichis a special case included in our generic constraint matrixdecomposition algorithm In the following we will give theproof of Theorem 1
4 Convergence Analysis
To proveTheorem 1 wewillmake use of an auxiliary functionthatwas used in the expectation-maximization algorithm [2829]
4 Discrete Dynamics in Nature and Society
Definition 2 A function 119866(119909 119909) is defined as an auxiliaryfunction for 119865(119909) if the following two conditions are bothsatisfied
119866 (119909 119909) = 119865 (119909) 119866 (119909 119909) ge 119865 (119909) (17)
Lemma 3 Assume that the function 119866(119909 119909) is an auxiliaryfunction for 119865(119909) then 119865(119909) is nonincreasing under the update
119909119905+1
= argmin119909
119866(119909 119909119905
) (18)
Proof Consider 119865(119909119905+1) le 119866(119909119905+1
119909119905
) le 119866(119909119905
119909119905
) = 119865(119909119905
)
It can be observed that the equality 119865(119909119905+1
) = 119865(119909119905
)
holds only if 119909119905 is a local minimum of 119866(119909 119909) We iterate theupdate in (18) to obtain a sequence of estimates that convergeto a local minimum 119909min = argmin
119909119865(119909) of the objective
function given by
119865 (119909min) le sdot sdot sdot le 119865 (119909119905+1
) le 119865 (119909119905
)
le sdot sdot sdot le 119865 (1199092) le 119865 (119909
1) le 119865 (119909
0)
(19)
In the following we will show that the objective function (6)is nonincreasing under the updating rules (11) and (12) bydefining the appropriate auxiliary functions with respect to119881119894119895and119882
119894119895
Lemma 4 Function
119866(119881 )
=
1
120572 (1 minus 120572)
sum
119894119895119896
119883119894119895120585119894119895119896
()
times
120572 + (1 minus 120572)
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120585119894119895119896
()
minus (
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120585119894119895119896()
)
1minus120572
(20)
where 120585119894119895119896() = 119882
119894119896[119879
119880119879
]119896119895sum119897119882119894119897[119879
119880119879
]119897119895 is an auxiliary
function for
119865 (119881) =
1
120572 (1 minus 120572)
timessum
119894119895
120572119883119894119895+ (1 minus 120572) [119882119881
119879
119880119879
]119894119895
minus 119883120572
119894119895[119882119881119879
119880119879
]
1minus120572
119894119895
(21)
Proof Obviously 119866(119881119881) = 119865(119881) According to the defini-tion of auxiliary function we only need to prove 119866(119881 ) ge
119865(119881) To do this we use the convex function 119891(sdot) for positive120572 to rewrite the 120572-divergence function 119865(119881) as
119865 (119881) = sum
119894119895
119883119894119895119891(
sum119896119882119894119896[119881119879
119880119879
]119896119895
119883119894119895
) (22)
where
119891 (119909) =
1
120572 (1 minus 120572)
[120572 minus (1 minus 120572) 119909 minus 1199091minus120572
] (23)
Note that sum119896120585119894119895119896() = 1 and 120585
119894119895119896() ge 0 from the definition
of 120585119894119895119896() Applying Jensens inequality [30] it leads to
119891(sum
119896
119882119894119896[119881119879
119880119879
]119896119895
) le sum
119896
120585119894119895119896
() 119891(
119882119894119896[119881119879
119880119879
]119896119895
120585119894119895119896
()
)
(24)
From the above inequality it follows that
119865 (119881) = sum
119894119895
119883119894119895119891(
sum119896119882119894119896[119881119879
119880119879
]119896119895
119883119894119895
)
le sum
119894119895119896
119883119894119895120585119894119895119896
() 119891(
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120585119894119895119896
()
)
= 119866 (119881 )
(25)
which satisfies the condition of auxiliary function
Reversing the rules of 119881119894119895and119882
119894119895in Lemma 4 we define
an auxiliary function for the update (12)
Lemma 5 Function
119866(119882 )
=
1
120572 (1 minus 120572)
sum
119894119895119896
119883119894119895120578119894119895119896
()
times
120572 + (1 minus 120572)
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120578119894119895119896
()
minus (
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120578119894119895119896
()
)
1minus120572
(26)
where 120578119894119895119896() =
119894119896[119881119879
119880119879
]119896119895sum119897119894119897[119881119879
119880119879
]119897119895 is an auxil-
iary function for
119865 (119882)
=
1
120572 (1 minus 120572)
timessum
119894119895
120572119883119894119895+ (1 minus 120572) [119882119881
119879
119880119879
]119894119895
minus 119883120572
119894119895[119882119881119879
119880119879
]
1minus120572
119894119895
(27)
This can be easily proved in the same way as Lemma 4From Lemmas 4 and 5 now we can demonstrate the conver-gence of Theorem 1
Discrete Dynamics in Nature and Society 5
Proof To guarantee the stability of 119865(119881) from Lemma 3 wejust need to obtain the minimum of 119866(119881 ) with respect to119881119894119895 Set the gradient of (20) to zero there is
120597119866 (119881 )
120597119881119894119895
=
1
120572
sum
119896
[119880119879
119882]119896119894
1 minus (
119882119896119894[119881119879
119880119879
]119894119895
119883119896119895120585119896119895119894
()
)
minus120572
= 0
(28)
Then it follows that
119881119894119895= 119894119895
[
[
[
sum119896[119880119879
119882]119896119894
(119883119896119895sum119897119882119896119897[119879
119880119879
]119897119895
)
120572
sum119896[119880119879119882]119896119894
]
]
]
1120572
(29)
which is similar to the form of the updating rule (11)Similarly to guarantee the updating rule (12) holds theminimum of 119866(119882 ) which can be determined by settingthe gradient of (26) to zero must exist
Since 119866 is an auxiliary function according to Lemma 4119865 in (21) is nonincreasing under the update (11) Multiplyingupdates (11) and (12) we can find a local minimum of119863120572[119883119882(119880119881)
119879
]
5 Experiments
In this section the CNMF120572
algorithm is systematicallycompared with the current constrained NMF algorithmson three image data sets named ORL Database [31] YaleDatabase [32] and Caltech 101 Database [33] The details ofthe above three databases will be described individually laterWe introduce the evaluated algorithms firstly
(i) Constrained nonnegative matrix factorization algo-rithm in [15] aims to minimize the F-norm cost
(ii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 05 in this paper aims tominimize the Hellinger divergence cost
(iii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 1 in [15] aiming at min-imizing the KL-divergence cost is the best reportedalgorithm in image representation
(iv) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 2 in this paper aims tominimize the 1205942-divergence cost
(v) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 120572
119896in this paper aims to
minimize the 120572-divergence cost where the param-eters 120572
119896are associated with the characteristics of
the image database and designed by the presentedmethod CNMFKL algorithm is a special case of ourCNMF
120572119896algorithm with 120572 = 120572
119896= 1
We apply these algorithms to a problem of classificationand evaluate their performance on three image data setswhich contain a number of different categories of image Foreach date set the evaluations are conducted with differentnumbers of clusters here the number of clusters 119896 varies from2 to 10We randomly choose 119896 categories fromone image dataset and mix the images of these 119896 categories as the collection119883 Then for the semisupervised algorithms we randomlypick up 10 percent images from each category in119883 and recordtheir category number as the available label information toobtain the label matrix119880 For some special data sets the labelprocess is different and we will describe the details later
Suppose a data set has 119873 categories 1198621 1198622 119862
119873 and
the cardinalities of these labeled images are 1198711 1198712 119871119873
respectively Since the label constraint matrix 119880 is composedof the indicatormatrix 119878 and the identity matrix 119868 the indica-tor matrix 119878 plays a critical role in classification performancefor different categories in119883 To determine the effectiveness of119878 we define a measure to represent the relationship betweenthe cardinalities of labeled samples and the total samples
119903 (119896) =
119871max (119896) minus 119871min (119896)
sum119896
119894=1119862119894
119896 (30)
where 119871max(119896) and 119871min(119896) denote the maximum and mini-mum labeled cardinalities of 119896 categories For the fixed clusternumber 119896 in the data set 119903(119896) is different if the number ofsamples in each category is different Then we compute 120572 asfollows
120572119896=
120572119873
119903 (119873)
( |119903 (119896) minus 119903 (119873)|
times (
1003816100381610038161003816100381610038161003816100381610038161003816
119871max (119896) 119896
sum119896
119894=1119862119894
minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
+
1003816100381610038161003816100381610038161003816100381610038161003816
119871min (119896) 119896
sum119896
119894=1119862119894
minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
)
minus1
)
(31)
where
120572119873= (119871max (119873) minus 119871min (119873))
times (
1003816100381610038161003816100381610038161003816100381610038161003816
119871max (119873) minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
+
1003816100381610038161003816100381610038161003816100381610038161003816
119871min (119873) minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
)
minus1
(32)
The value of 120572119896computed by (31) is associated with the
characteristics of image data sets since its variation is causedby both the cardinalities of labeled samples in each categoryand the total samples We can obtain both the cardinalitiesof labeled samples and the total samples in our semisuper-vised algorithms However we can not get the cardinalitiesof labeled images exactly in many real-word applicationsMoreover the value of 120572 varies depending on data sets It isstill an open problem how to select the optimal 120572 [16]
To evaluate the classification performance we defineclassification accuracy as the first measure Our CNMF
120572
6 Discrete Dynamics in Nature and Society
algorithm described in (11) and (12) provides a classificationlabel of each sample marked 119904
119894119897119894 Suppose 119883 = 119909
119894119899
119894=1is a
data set which consists of n training images For each samplelet 119897119894be the true class label provided by the image data set
More specifically if the image 119909119894is designated the 119897
119894th class
we evaluate it as a correct label and set 119904119894119897119894
= 1 Otherwiseit is counted as a false label and noted 119904
119894119897119894= 0 Eventually we
compute the percentage of correct labels obtained by definingthe accuracy measure as
Ac (119904119894119897119894) =
sum119899
119894=1119904119894119897119894
119899
(33)
To evaluate the classification performance we carryout computation about the normalized mutual informationwhich is used to measure how similar two sets of clusters areas the second measure Given two data sets of clusters 119862
119894and
119862119895 their normalized mutual information is defined as
NMI (119862119894 119862119895)
=
sum119888119894isin119862119894 119888119895isin119862119895
119901 (119888119894 119888119895) sdot log 119901 (119888
119894 119888119895) [119901 (119888
119894) 119901 (119888119895)]
max (119867 (119862119894) 119867 (119862
119895))
(34)
which takes values between 0 and 1 Where 119901(119888119894) 119901(119888
119895)
denote the probabilities that an image arbitrarily chosen fromthe data set belongs to the clusters 119862
119894and 119862
119895 respectively
and 119901(119888119894 119888119895) denotes the joint probability that this arbitrarily
selected image belongs to the cluster 119862119894as well as 119862
119895at the
same time119867(119862119894) and 119867(119862
119895) are the entropies of 119862
119894and 119862
119895
respectivelyExperimental results on each data set will be presented as
classification accuracy and the normalized mutual informa-tion is in Tables 1 and 2
51 ORL Database The Cambridge ORL Face Database has400 images for 40 different people 10 images per personThe images of some people are taken at different timesvarying lighting slightly facial expressions (openclosed eyessmilingnonsmiling) and facial details (glassesno glasses)All the images are taken against a dark homogeneous back-ground with the subjects in an upright frontal position andslight left-right out-of-plane rotation To locate the facesthe input images are preprocessed They are resized to 32 times
32 pixels with 256 gray levels per pixel and normalized inorientation so that two eyes in the facial areas are aligned atthe same position
There are 10 images for each category in ORL and 10percent is just one image For the fixed parameter 120572 (120572 =
05 1 2) we randomly choose two images from each categoryto provide the label information Note that the same labelis meaningless for (30) To obtain 119903(119896) we divide the 40categories into 3 groups 10 categories 20 categories and 10categories In the first 10 categories pick up 1 image fromeach category to provide the label information pick up 2images from each category in the second 20 categories andpick up 3 from each category in the remaining categoriesThe
dividing process is repeated 10 times and the obtained averageclassification accuracy is recorded as the final result
Figure 1 shows the graphical classification accuracy ratesand normalized mutual information on the ORL DatabaseNote that if the samples in the collection 119883 come fromthe same group we set 120572
119896= 067 Because of the same
number of samples in each category the variation of the120572 is small even though we label different cardinalities ofsamples Compared to the constrained nonnegative matrixfactorization algorithms with fixed parameters our CNMF
120572119896
algorithm gives the best performance since the selection of120572119896is suitable to the collection 119883 Table 1 summarizes the
detailed classification accuracy and error bars of CNMF120572=05
CNMF
120572=2 and CNMF
120572119896 It shows that our CNMF
120572119896algo-
rithm achieves 192 percent improvement compared to thebest reported CNMFKL algorithm (120572 = 1) [15] in averageclassification accuracy For normalized mutual informationthe details and the error bars of our constrained algorithmswith 120572 = 05 120572 = 2 and 120572
119896are listed in Table 2 Comparing
to the best algorithmCNMF ourCNMF120572119896algorithmachieves
054 percent improvement
52 Yale Database The Yale Database consists of 165grayscale images for 15 individuals 11 images per personOne per image is taken from different facial expression orconfiguration center-light wglasses happy left-light wnoglasses normal right-light sad sleepy surprised and winkWe preprocess all the images of Yale Database in the sameway as the ORL Database Each image is linearly stretched toa 1024-dimensional vector in image space
TheYaleDatabase also has the same number of samples ineach category To obtain an appropriate 120572 we do similar labelprocessing to the ORL Database Divide the 15 individualsinto 3 groups averagely choose 1 image from each categoryin the first group choose 2 from each category in the secondand choose 3 from each category in the remaining groupWe repeat the process 10 times and record the averageclassification accuracy as the final result
Figure 2 shows the classification accuracy and normal-ized mutual information on the Yale Database Set 120572
119896=
055 when 119903(119896) minus 119903(119873) = 0 It indicates that the samplesin the collection 119883 just come from two groups that ischoose 15 images from 10 categories in the first and secondgroup CNMF
120572119896achieves an extraordinary performance for
all the cases and CNMF120572=05
follows This suggests that theconstrained nonnegative matrix factorization algorithm hasa higher classification accuracy when the value of 120572 is closeto 120572119896 Comparing to the best reported CNMFKL algorithm
CNMF120572119896
achieves 242 percent improvement in averageclassification accuracy For normalized mutual informationCNMF
120572119896achieves 72 percent improvement compared to
CNMF The details of classification accuracy and normal-ized mutual information are provided in Tables 3 and 4which contain the error bars of CNMF
120572=05 CNMF
120572=2 and
CNMF120572119896
53 Caltech 101 Database The Caltech 101 Database createdby Caltech University has images of 101 different object
Discrete Dynamics in Nature and Society 7
Table 1 The comparison of classification accuracy rates on the ORL Database
119896
Classification accuracy Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321
Table 2 The comparison of normalized mutual information on the ORL Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027
categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram
We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572
119896computed by (31) are listed in Table 7
The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572
119896can fully reflect the effectiveness of the
indicator matrix 119878Figure 3 shows the effect that derived from the using of
proposed algorithm with 120572119896 The upper part of figure is the
original samples which contain 26 images the middle part
is their gray images and the lower is the combination of thebasis vectors learned by CNMF
120572119896
The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF
120572=05 CNMF
120572=2 and CNMF
120572119896 The graphical results
of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572
119896listed in Table 7were selected In general ourmethod
demonstrates much better effectiveness in classification bychoosing 120572
119896 Comparing to the best reported algorithm
other than our CNMF120572algorithm CNMF
120572119896achieves 24
percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF
120572
algorithm CNMF120572119896
achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF
120572119896achieves 229 percent improvement and
consistently outperforms the other algorithms
6 Conclusion
In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using
8 Discrete Dynamics in Nature and Society
Table 3 The comparison of classification accuracy rates on the Yale Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831
Table 4 The comparison of normalized mutual information on the Yale Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765
2 3 4 5 6 7 8 9 10076
078
08
082
084
086
088
09
092
094
Number of classes
Accu
racy
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy rates
2 3 4 5 6 7 8 9 10072
074
076
078
08
082
084
086
088
09
Nor
mal
ized
mut
ual i
nfor
mat
ion
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 1 Classification performance on the ORL Database
Discrete Dynamics in Nature and Society 9
2 3 4 5 6 7 8 9 1005
055
06
065
07
075
08
085
09Ac
cura
cy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 10042
044
046
048
05
052
054
056
058
06
062
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 2 Classification performance on the Yale Database
(a)
(b)
(c)
Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF
120572119896 Positive values are illustrated with white pixels
Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the
model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF
120572
algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total
10 Discrete Dynamics in Nature and Society
2 3 4 5 6 7 8 9 1004
045
05
055
06
065
07
075
08
085
09
Accu
racy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 1002
025
03
035
04
045
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 4 Classification performance on the Caltech 101 Database
Table 5 The comparison of classification accuracy rates on the Caltech 101 Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898
Table 6 The comparison of normalized mutual information on the Caltech 101 Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869
Discrete Dynamics in Nature and Society 11
Table 7 The value of 120572 in 119896 categories
119896 1205721
1205722
1205723
1205724
1205725
1205726
1205727
1205728
1205729
12057210
2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100
samples In the experiments we apply the fixed parameters120572 and 120572
119896to analyze the classification accuracy on three
image databasesThe experimental results have demonstratedthat the CNMF
120572119896algorithm has best classification accuracy
However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)
References
[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987
[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996
[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994
[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986
[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001
[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992
[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999
[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004
[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001
[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005
[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004
[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004
[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001
[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006
[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012
[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008
[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003
[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003
[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006
[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of
12 Discrete Dynamics in Nature and Society
IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003
[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006
[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004
[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008
[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006
[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006
[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006
[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006
[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977
[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997
[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982
[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml
[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant
keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
4 Discrete Dynamics in Nature and Society
Definition 2 A function 119866(119909 119909) is defined as an auxiliaryfunction for 119865(119909) if the following two conditions are bothsatisfied
119866 (119909 119909) = 119865 (119909) 119866 (119909 119909) ge 119865 (119909) (17)
Lemma 3 Assume that the function 119866(119909 119909) is an auxiliaryfunction for 119865(119909) then 119865(119909) is nonincreasing under the update
119909119905+1
= argmin119909
119866(119909 119909119905
) (18)
Proof Consider 119865(119909119905+1) le 119866(119909119905+1
119909119905
) le 119866(119909119905
119909119905
) = 119865(119909119905
)
It can be observed that the equality 119865(119909119905+1
) = 119865(119909119905
)
holds only if 119909119905 is a local minimum of 119866(119909 119909) We iterate theupdate in (18) to obtain a sequence of estimates that convergeto a local minimum 119909min = argmin
119909119865(119909) of the objective
function given by
119865 (119909min) le sdot sdot sdot le 119865 (119909119905+1
) le 119865 (119909119905
)
le sdot sdot sdot le 119865 (1199092) le 119865 (119909
1) le 119865 (119909
0)
(19)
In the following we will show that the objective function (6)is nonincreasing under the updating rules (11) and (12) bydefining the appropriate auxiliary functions with respect to119881119894119895and119882
119894119895
Lemma 4 Function
119866(119881 )
=
1
120572 (1 minus 120572)
sum
119894119895119896
119883119894119895120585119894119895119896
()
times
120572 + (1 minus 120572)
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120585119894119895119896
()
minus (
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120585119894119895119896()
)
1minus120572
(20)
where 120585119894119895119896() = 119882
119894119896[119879
119880119879
]119896119895sum119897119882119894119897[119879
119880119879
]119897119895 is an auxiliary
function for
119865 (119881) =
1
120572 (1 minus 120572)
timessum
119894119895
120572119883119894119895+ (1 minus 120572) [119882119881
119879
119880119879
]119894119895
minus 119883120572
119894119895[119882119881119879
119880119879
]
1minus120572
119894119895
(21)
Proof Obviously 119866(119881119881) = 119865(119881) According to the defini-tion of auxiliary function we only need to prove 119866(119881 ) ge
119865(119881) To do this we use the convex function 119891(sdot) for positive120572 to rewrite the 120572-divergence function 119865(119881) as
119865 (119881) = sum
119894119895
119883119894119895119891(
sum119896119882119894119896[119881119879
119880119879
]119896119895
119883119894119895
) (22)
where
119891 (119909) =
1
120572 (1 minus 120572)
[120572 minus (1 minus 120572) 119909 minus 1199091minus120572
] (23)
Note that sum119896120585119894119895119896() = 1 and 120585
119894119895119896() ge 0 from the definition
of 120585119894119895119896() Applying Jensens inequality [30] it leads to
119891(sum
119896
119882119894119896[119881119879
119880119879
]119896119895
) le sum
119896
120585119894119895119896
() 119891(
119882119894119896[119881119879
119880119879
]119896119895
120585119894119895119896
()
)
(24)
From the above inequality it follows that
119865 (119881) = sum
119894119895
119883119894119895119891(
sum119896119882119894119896[119881119879
119880119879
]119896119895
119883119894119895
)
le sum
119894119895119896
119883119894119895120585119894119895119896
() 119891(
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120585119894119895119896
()
)
= 119866 (119881 )
(25)
which satisfies the condition of auxiliary function
Reversing the rules of 119881119894119895and119882
119894119895in Lemma 4 we define
an auxiliary function for the update (12)
Lemma 5 Function
119866(119882 )
=
1
120572 (1 minus 120572)
sum
119894119895119896
119883119894119895120578119894119895119896
()
times
120572 + (1 minus 120572)
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120578119894119895119896
()
minus (
119882119894119896[119881119879
119880119879
]119896119895
119883119894119895120578119894119895119896
()
)
1minus120572
(26)
where 120578119894119895119896() =
119894119896[119881119879
119880119879
]119896119895sum119897119894119897[119881119879
119880119879
]119897119895 is an auxil-
iary function for
119865 (119882)
=
1
120572 (1 minus 120572)
timessum
119894119895
120572119883119894119895+ (1 minus 120572) [119882119881
119879
119880119879
]119894119895
minus 119883120572
119894119895[119882119881119879
119880119879
]
1minus120572
119894119895
(27)
This can be easily proved in the same way as Lemma 4From Lemmas 4 and 5 now we can demonstrate the conver-gence of Theorem 1
Discrete Dynamics in Nature and Society 5
Proof To guarantee the stability of 119865(119881) from Lemma 3 wejust need to obtain the minimum of 119866(119881 ) with respect to119881119894119895 Set the gradient of (20) to zero there is
120597119866 (119881 )
120597119881119894119895
=
1
120572
sum
119896
[119880119879
119882]119896119894
1 minus (
119882119896119894[119881119879
119880119879
]119894119895
119883119896119895120585119896119895119894
()
)
minus120572
= 0
(28)
Then it follows that
119881119894119895= 119894119895
[
[
[
sum119896[119880119879
119882]119896119894
(119883119896119895sum119897119882119896119897[119879
119880119879
]119897119895
)
120572
sum119896[119880119879119882]119896119894
]
]
]
1120572
(29)
which is similar to the form of the updating rule (11)Similarly to guarantee the updating rule (12) holds theminimum of 119866(119882 ) which can be determined by settingthe gradient of (26) to zero must exist
Since 119866 is an auxiliary function according to Lemma 4119865 in (21) is nonincreasing under the update (11) Multiplyingupdates (11) and (12) we can find a local minimum of119863120572[119883119882(119880119881)
119879
]
5 Experiments
In this section the CNMF120572
algorithm is systematicallycompared with the current constrained NMF algorithmson three image data sets named ORL Database [31] YaleDatabase [32] and Caltech 101 Database [33] The details ofthe above three databases will be described individually laterWe introduce the evaluated algorithms firstly
(i) Constrained nonnegative matrix factorization algo-rithm in [15] aims to minimize the F-norm cost
(ii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 05 in this paper aims tominimize the Hellinger divergence cost
(iii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 1 in [15] aiming at min-imizing the KL-divergence cost is the best reportedalgorithm in image representation
(iv) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 2 in this paper aims tominimize the 1205942-divergence cost
(v) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 120572
119896in this paper aims to
minimize the 120572-divergence cost where the param-eters 120572
119896are associated with the characteristics of
the image database and designed by the presentedmethod CNMFKL algorithm is a special case of ourCNMF
120572119896algorithm with 120572 = 120572
119896= 1
We apply these algorithms to a problem of classificationand evaluate their performance on three image data setswhich contain a number of different categories of image Foreach date set the evaluations are conducted with differentnumbers of clusters here the number of clusters 119896 varies from2 to 10We randomly choose 119896 categories fromone image dataset and mix the images of these 119896 categories as the collection119883 Then for the semisupervised algorithms we randomlypick up 10 percent images from each category in119883 and recordtheir category number as the available label information toobtain the label matrix119880 For some special data sets the labelprocess is different and we will describe the details later
Suppose a data set has 119873 categories 1198621 1198622 119862
119873 and
the cardinalities of these labeled images are 1198711 1198712 119871119873
respectively Since the label constraint matrix 119880 is composedof the indicatormatrix 119878 and the identity matrix 119868 the indica-tor matrix 119878 plays a critical role in classification performancefor different categories in119883 To determine the effectiveness of119878 we define a measure to represent the relationship betweenthe cardinalities of labeled samples and the total samples
119903 (119896) =
119871max (119896) minus 119871min (119896)
sum119896
119894=1119862119894
119896 (30)
where 119871max(119896) and 119871min(119896) denote the maximum and mini-mum labeled cardinalities of 119896 categories For the fixed clusternumber 119896 in the data set 119903(119896) is different if the number ofsamples in each category is different Then we compute 120572 asfollows
120572119896=
120572119873
119903 (119873)
( |119903 (119896) minus 119903 (119873)|
times (
1003816100381610038161003816100381610038161003816100381610038161003816
119871max (119896) 119896
sum119896
119894=1119862119894
minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
+
1003816100381610038161003816100381610038161003816100381610038161003816
119871min (119896) 119896
sum119896
119894=1119862119894
minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
)
minus1
)
(31)
where
120572119873= (119871max (119873) minus 119871min (119873))
times (
1003816100381610038161003816100381610038161003816100381610038161003816
119871max (119873) minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
+
1003816100381610038161003816100381610038161003816100381610038161003816
119871min (119873) minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
)
minus1
(32)
The value of 120572119896computed by (31) is associated with the
characteristics of image data sets since its variation is causedby both the cardinalities of labeled samples in each categoryand the total samples We can obtain both the cardinalitiesof labeled samples and the total samples in our semisuper-vised algorithms However we can not get the cardinalitiesof labeled images exactly in many real-word applicationsMoreover the value of 120572 varies depending on data sets It isstill an open problem how to select the optimal 120572 [16]
To evaluate the classification performance we defineclassification accuracy as the first measure Our CNMF
120572
6 Discrete Dynamics in Nature and Society
algorithm described in (11) and (12) provides a classificationlabel of each sample marked 119904
119894119897119894 Suppose 119883 = 119909
119894119899
119894=1is a
data set which consists of n training images For each samplelet 119897119894be the true class label provided by the image data set
More specifically if the image 119909119894is designated the 119897
119894th class
we evaluate it as a correct label and set 119904119894119897119894
= 1 Otherwiseit is counted as a false label and noted 119904
119894119897119894= 0 Eventually we
compute the percentage of correct labels obtained by definingthe accuracy measure as
Ac (119904119894119897119894) =
sum119899
119894=1119904119894119897119894
119899
(33)
To evaluate the classification performance we carryout computation about the normalized mutual informationwhich is used to measure how similar two sets of clusters areas the second measure Given two data sets of clusters 119862
119894and
119862119895 their normalized mutual information is defined as
NMI (119862119894 119862119895)
=
sum119888119894isin119862119894 119888119895isin119862119895
119901 (119888119894 119888119895) sdot log 119901 (119888
119894 119888119895) [119901 (119888
119894) 119901 (119888119895)]
max (119867 (119862119894) 119867 (119862
119895))
(34)
which takes values between 0 and 1 Where 119901(119888119894) 119901(119888
119895)
denote the probabilities that an image arbitrarily chosen fromthe data set belongs to the clusters 119862
119894and 119862
119895 respectively
and 119901(119888119894 119888119895) denotes the joint probability that this arbitrarily
selected image belongs to the cluster 119862119894as well as 119862
119895at the
same time119867(119862119894) and 119867(119862
119895) are the entropies of 119862
119894and 119862
119895
respectivelyExperimental results on each data set will be presented as
classification accuracy and the normalized mutual informa-tion is in Tables 1 and 2
51 ORL Database The Cambridge ORL Face Database has400 images for 40 different people 10 images per personThe images of some people are taken at different timesvarying lighting slightly facial expressions (openclosed eyessmilingnonsmiling) and facial details (glassesno glasses)All the images are taken against a dark homogeneous back-ground with the subjects in an upright frontal position andslight left-right out-of-plane rotation To locate the facesthe input images are preprocessed They are resized to 32 times
32 pixels with 256 gray levels per pixel and normalized inorientation so that two eyes in the facial areas are aligned atthe same position
There are 10 images for each category in ORL and 10percent is just one image For the fixed parameter 120572 (120572 =
05 1 2) we randomly choose two images from each categoryto provide the label information Note that the same labelis meaningless for (30) To obtain 119903(119896) we divide the 40categories into 3 groups 10 categories 20 categories and 10categories In the first 10 categories pick up 1 image fromeach category to provide the label information pick up 2images from each category in the second 20 categories andpick up 3 from each category in the remaining categoriesThe
dividing process is repeated 10 times and the obtained averageclassification accuracy is recorded as the final result
Figure 1 shows the graphical classification accuracy ratesand normalized mutual information on the ORL DatabaseNote that if the samples in the collection 119883 come fromthe same group we set 120572
119896= 067 Because of the same
number of samples in each category the variation of the120572 is small even though we label different cardinalities ofsamples Compared to the constrained nonnegative matrixfactorization algorithms with fixed parameters our CNMF
120572119896
algorithm gives the best performance since the selection of120572119896is suitable to the collection 119883 Table 1 summarizes the
detailed classification accuracy and error bars of CNMF120572=05
CNMF
120572=2 and CNMF
120572119896 It shows that our CNMF
120572119896algo-
rithm achieves 192 percent improvement compared to thebest reported CNMFKL algorithm (120572 = 1) [15] in averageclassification accuracy For normalized mutual informationthe details and the error bars of our constrained algorithmswith 120572 = 05 120572 = 2 and 120572
119896are listed in Table 2 Comparing
to the best algorithmCNMF ourCNMF120572119896algorithmachieves
054 percent improvement
52 Yale Database The Yale Database consists of 165grayscale images for 15 individuals 11 images per personOne per image is taken from different facial expression orconfiguration center-light wglasses happy left-light wnoglasses normal right-light sad sleepy surprised and winkWe preprocess all the images of Yale Database in the sameway as the ORL Database Each image is linearly stretched toa 1024-dimensional vector in image space
TheYaleDatabase also has the same number of samples ineach category To obtain an appropriate 120572 we do similar labelprocessing to the ORL Database Divide the 15 individualsinto 3 groups averagely choose 1 image from each categoryin the first group choose 2 from each category in the secondand choose 3 from each category in the remaining groupWe repeat the process 10 times and record the averageclassification accuracy as the final result
Figure 2 shows the classification accuracy and normal-ized mutual information on the Yale Database Set 120572
119896=
055 when 119903(119896) minus 119903(119873) = 0 It indicates that the samplesin the collection 119883 just come from two groups that ischoose 15 images from 10 categories in the first and secondgroup CNMF
120572119896achieves an extraordinary performance for
all the cases and CNMF120572=05
follows This suggests that theconstrained nonnegative matrix factorization algorithm hasa higher classification accuracy when the value of 120572 is closeto 120572119896 Comparing to the best reported CNMFKL algorithm
CNMF120572119896
achieves 242 percent improvement in averageclassification accuracy For normalized mutual informationCNMF
120572119896achieves 72 percent improvement compared to
CNMF The details of classification accuracy and normal-ized mutual information are provided in Tables 3 and 4which contain the error bars of CNMF
120572=05 CNMF
120572=2 and
CNMF120572119896
53 Caltech 101 Database The Caltech 101 Database createdby Caltech University has images of 101 different object
Discrete Dynamics in Nature and Society 7
Table 1 The comparison of classification accuracy rates on the ORL Database
119896
Classification accuracy Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321
Table 2 The comparison of normalized mutual information on the ORL Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027
categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram
We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572
119896computed by (31) are listed in Table 7
The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572
119896can fully reflect the effectiveness of the
indicator matrix 119878Figure 3 shows the effect that derived from the using of
proposed algorithm with 120572119896 The upper part of figure is the
original samples which contain 26 images the middle part
is their gray images and the lower is the combination of thebasis vectors learned by CNMF
120572119896
The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF
120572=05 CNMF
120572=2 and CNMF
120572119896 The graphical results
of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572
119896listed in Table 7were selected In general ourmethod
demonstrates much better effectiveness in classification bychoosing 120572
119896 Comparing to the best reported algorithm
other than our CNMF120572algorithm CNMF
120572119896achieves 24
percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF
120572
algorithm CNMF120572119896
achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF
120572119896achieves 229 percent improvement and
consistently outperforms the other algorithms
6 Conclusion
In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using
8 Discrete Dynamics in Nature and Society
Table 3 The comparison of classification accuracy rates on the Yale Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831
Table 4 The comparison of normalized mutual information on the Yale Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765
2 3 4 5 6 7 8 9 10076
078
08
082
084
086
088
09
092
094
Number of classes
Accu
racy
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy rates
2 3 4 5 6 7 8 9 10072
074
076
078
08
082
084
086
088
09
Nor
mal
ized
mut
ual i
nfor
mat
ion
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 1 Classification performance on the ORL Database
Discrete Dynamics in Nature and Society 9
2 3 4 5 6 7 8 9 1005
055
06
065
07
075
08
085
09Ac
cura
cy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 10042
044
046
048
05
052
054
056
058
06
062
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 2 Classification performance on the Yale Database
(a)
(b)
(c)
Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF
120572119896 Positive values are illustrated with white pixels
Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the
model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF
120572
algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total
10 Discrete Dynamics in Nature and Society
2 3 4 5 6 7 8 9 1004
045
05
055
06
065
07
075
08
085
09
Accu
racy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 1002
025
03
035
04
045
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 4 Classification performance on the Caltech 101 Database
Table 5 The comparison of classification accuracy rates on the Caltech 101 Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898
Table 6 The comparison of normalized mutual information on the Caltech 101 Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869
Discrete Dynamics in Nature and Society 11
Table 7 The value of 120572 in 119896 categories
119896 1205721
1205722
1205723
1205724
1205725
1205726
1205727
1205728
1205729
12057210
2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100
samples In the experiments we apply the fixed parameters120572 and 120572
119896to analyze the classification accuracy on three
image databasesThe experimental results have demonstratedthat the CNMF
120572119896algorithm has best classification accuracy
However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)
References
[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987
[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996
[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994
[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986
[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001
[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992
[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999
[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004
[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001
[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005
[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004
[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004
[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001
[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006
[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012
[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008
[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003
[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003
[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006
[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of
12 Discrete Dynamics in Nature and Society
IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003
[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006
[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004
[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008
[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006
[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006
[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006
[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006
[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977
[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997
[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982
[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml
[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant
keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Discrete Dynamics in Nature and Society 5
Proof To guarantee the stability of 119865(119881) from Lemma 3 wejust need to obtain the minimum of 119866(119881 ) with respect to119881119894119895 Set the gradient of (20) to zero there is
120597119866 (119881 )
120597119881119894119895
=
1
120572
sum
119896
[119880119879
119882]119896119894
1 minus (
119882119896119894[119881119879
119880119879
]119894119895
119883119896119895120585119896119895119894
()
)
minus120572
= 0
(28)
Then it follows that
119881119894119895= 119894119895
[
[
[
sum119896[119880119879
119882]119896119894
(119883119896119895sum119897119882119896119897[119879
119880119879
]119897119895
)
120572
sum119896[119880119879119882]119896119894
]
]
]
1120572
(29)
which is similar to the form of the updating rule (11)Similarly to guarantee the updating rule (12) holds theminimum of 119866(119882 ) which can be determined by settingthe gradient of (26) to zero must exist
Since 119866 is an auxiliary function according to Lemma 4119865 in (21) is nonincreasing under the update (11) Multiplyingupdates (11) and (12) we can find a local minimum of119863120572[119883119882(119880119881)
119879
]
5 Experiments
In this section the CNMF120572
algorithm is systematicallycompared with the current constrained NMF algorithmson three image data sets named ORL Database [31] YaleDatabase [32] and Caltech 101 Database [33] The details ofthe above three databases will be described individually laterWe introduce the evaluated algorithms firstly
(i) Constrained nonnegative matrix factorization algo-rithm in [15] aims to minimize the F-norm cost
(ii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 05 in this paper aims tominimize the Hellinger divergence cost
(iii) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 1 in [15] aiming at min-imizing the KL-divergence cost is the best reportedalgorithm in image representation
(iv) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 2 in this paper aims tominimize the 1205942-divergence cost
(v) Constrained nonnegative matrix factorization algo-rithm with parameter 120572 = 120572
119896in this paper aims to
minimize the 120572-divergence cost where the param-eters 120572
119896are associated with the characteristics of
the image database and designed by the presentedmethod CNMFKL algorithm is a special case of ourCNMF
120572119896algorithm with 120572 = 120572
119896= 1
We apply these algorithms to a problem of classificationand evaluate their performance on three image data setswhich contain a number of different categories of image Foreach date set the evaluations are conducted with differentnumbers of clusters here the number of clusters 119896 varies from2 to 10We randomly choose 119896 categories fromone image dataset and mix the images of these 119896 categories as the collection119883 Then for the semisupervised algorithms we randomlypick up 10 percent images from each category in119883 and recordtheir category number as the available label information toobtain the label matrix119880 For some special data sets the labelprocess is different and we will describe the details later
Suppose a data set has 119873 categories 1198621 1198622 119862
119873 and
the cardinalities of these labeled images are 1198711 1198712 119871119873
respectively Since the label constraint matrix 119880 is composedof the indicatormatrix 119878 and the identity matrix 119868 the indica-tor matrix 119878 plays a critical role in classification performancefor different categories in119883 To determine the effectiveness of119878 we define a measure to represent the relationship betweenthe cardinalities of labeled samples and the total samples
119903 (119896) =
119871max (119896) minus 119871min (119896)
sum119896
119894=1119862119894
119896 (30)
where 119871max(119896) and 119871min(119896) denote the maximum and mini-mum labeled cardinalities of 119896 categories For the fixed clusternumber 119896 in the data set 119903(119896) is different if the number ofsamples in each category is different Then we compute 120572 asfollows
120572119896=
120572119873
119903 (119873)
( |119903 (119896) minus 119903 (119873)|
times (
1003816100381610038161003816100381610038161003816100381610038161003816
119871max (119896) 119896
sum119896
119894=1119862119894
minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
+
1003816100381610038161003816100381610038161003816100381610038161003816
119871min (119896) 119896
sum119896
119894=1119862119894
minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
)
minus1
)
(31)
where
120572119873= (119871max (119873) minus 119871min (119873))
times (
1003816100381610038161003816100381610038161003816100381610038161003816
119871max (119873) minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
+
1003816100381610038161003816100381610038161003816100381610038161003816
119871min (119873) minus
sum119873
119894=1119871119894
sum119873
119894=1119862119894
1003816100381610038161003816100381610038161003816100381610038161003816
)
minus1
(32)
The value of 120572119896computed by (31) is associated with the
characteristics of image data sets since its variation is causedby both the cardinalities of labeled samples in each categoryand the total samples We can obtain both the cardinalitiesof labeled samples and the total samples in our semisuper-vised algorithms However we can not get the cardinalitiesof labeled images exactly in many real-word applicationsMoreover the value of 120572 varies depending on data sets It isstill an open problem how to select the optimal 120572 [16]
To evaluate the classification performance we defineclassification accuracy as the first measure Our CNMF
120572
6 Discrete Dynamics in Nature and Society
algorithm described in (11) and (12) provides a classificationlabel of each sample marked 119904
119894119897119894 Suppose 119883 = 119909
119894119899
119894=1is a
data set which consists of n training images For each samplelet 119897119894be the true class label provided by the image data set
More specifically if the image 119909119894is designated the 119897
119894th class
we evaluate it as a correct label and set 119904119894119897119894
= 1 Otherwiseit is counted as a false label and noted 119904
119894119897119894= 0 Eventually we
compute the percentage of correct labels obtained by definingthe accuracy measure as
Ac (119904119894119897119894) =
sum119899
119894=1119904119894119897119894
119899
(33)
To evaluate the classification performance we carryout computation about the normalized mutual informationwhich is used to measure how similar two sets of clusters areas the second measure Given two data sets of clusters 119862
119894and
119862119895 their normalized mutual information is defined as
NMI (119862119894 119862119895)
=
sum119888119894isin119862119894 119888119895isin119862119895
119901 (119888119894 119888119895) sdot log 119901 (119888
119894 119888119895) [119901 (119888
119894) 119901 (119888119895)]
max (119867 (119862119894) 119867 (119862
119895))
(34)
which takes values between 0 and 1 Where 119901(119888119894) 119901(119888
119895)
denote the probabilities that an image arbitrarily chosen fromthe data set belongs to the clusters 119862
119894and 119862
119895 respectively
and 119901(119888119894 119888119895) denotes the joint probability that this arbitrarily
selected image belongs to the cluster 119862119894as well as 119862
119895at the
same time119867(119862119894) and 119867(119862
119895) are the entropies of 119862
119894and 119862
119895
respectivelyExperimental results on each data set will be presented as
classification accuracy and the normalized mutual informa-tion is in Tables 1 and 2
51 ORL Database The Cambridge ORL Face Database has400 images for 40 different people 10 images per personThe images of some people are taken at different timesvarying lighting slightly facial expressions (openclosed eyessmilingnonsmiling) and facial details (glassesno glasses)All the images are taken against a dark homogeneous back-ground with the subjects in an upright frontal position andslight left-right out-of-plane rotation To locate the facesthe input images are preprocessed They are resized to 32 times
32 pixels with 256 gray levels per pixel and normalized inorientation so that two eyes in the facial areas are aligned atthe same position
There are 10 images for each category in ORL and 10percent is just one image For the fixed parameter 120572 (120572 =
05 1 2) we randomly choose two images from each categoryto provide the label information Note that the same labelis meaningless for (30) To obtain 119903(119896) we divide the 40categories into 3 groups 10 categories 20 categories and 10categories In the first 10 categories pick up 1 image fromeach category to provide the label information pick up 2images from each category in the second 20 categories andpick up 3 from each category in the remaining categoriesThe
dividing process is repeated 10 times and the obtained averageclassification accuracy is recorded as the final result
Figure 1 shows the graphical classification accuracy ratesand normalized mutual information on the ORL DatabaseNote that if the samples in the collection 119883 come fromthe same group we set 120572
119896= 067 Because of the same
number of samples in each category the variation of the120572 is small even though we label different cardinalities ofsamples Compared to the constrained nonnegative matrixfactorization algorithms with fixed parameters our CNMF
120572119896
algorithm gives the best performance since the selection of120572119896is suitable to the collection 119883 Table 1 summarizes the
detailed classification accuracy and error bars of CNMF120572=05
CNMF
120572=2 and CNMF
120572119896 It shows that our CNMF
120572119896algo-
rithm achieves 192 percent improvement compared to thebest reported CNMFKL algorithm (120572 = 1) [15] in averageclassification accuracy For normalized mutual informationthe details and the error bars of our constrained algorithmswith 120572 = 05 120572 = 2 and 120572
119896are listed in Table 2 Comparing
to the best algorithmCNMF ourCNMF120572119896algorithmachieves
054 percent improvement
52 Yale Database The Yale Database consists of 165grayscale images for 15 individuals 11 images per personOne per image is taken from different facial expression orconfiguration center-light wglasses happy left-light wnoglasses normal right-light sad sleepy surprised and winkWe preprocess all the images of Yale Database in the sameway as the ORL Database Each image is linearly stretched toa 1024-dimensional vector in image space
TheYaleDatabase also has the same number of samples ineach category To obtain an appropriate 120572 we do similar labelprocessing to the ORL Database Divide the 15 individualsinto 3 groups averagely choose 1 image from each categoryin the first group choose 2 from each category in the secondand choose 3 from each category in the remaining groupWe repeat the process 10 times and record the averageclassification accuracy as the final result
Figure 2 shows the classification accuracy and normal-ized mutual information on the Yale Database Set 120572
119896=
055 when 119903(119896) minus 119903(119873) = 0 It indicates that the samplesin the collection 119883 just come from two groups that ischoose 15 images from 10 categories in the first and secondgroup CNMF
120572119896achieves an extraordinary performance for
all the cases and CNMF120572=05
follows This suggests that theconstrained nonnegative matrix factorization algorithm hasa higher classification accuracy when the value of 120572 is closeto 120572119896 Comparing to the best reported CNMFKL algorithm
CNMF120572119896
achieves 242 percent improvement in averageclassification accuracy For normalized mutual informationCNMF
120572119896achieves 72 percent improvement compared to
CNMF The details of classification accuracy and normal-ized mutual information are provided in Tables 3 and 4which contain the error bars of CNMF
120572=05 CNMF
120572=2 and
CNMF120572119896
53 Caltech 101 Database The Caltech 101 Database createdby Caltech University has images of 101 different object
Discrete Dynamics in Nature and Society 7
Table 1 The comparison of classification accuracy rates on the ORL Database
119896
Classification accuracy Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321
Table 2 The comparison of normalized mutual information on the ORL Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027
categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram
We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572
119896computed by (31) are listed in Table 7
The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572
119896can fully reflect the effectiveness of the
indicator matrix 119878Figure 3 shows the effect that derived from the using of
proposed algorithm with 120572119896 The upper part of figure is the
original samples which contain 26 images the middle part
is their gray images and the lower is the combination of thebasis vectors learned by CNMF
120572119896
The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF
120572=05 CNMF
120572=2 and CNMF
120572119896 The graphical results
of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572
119896listed in Table 7were selected In general ourmethod
demonstrates much better effectiveness in classification bychoosing 120572
119896 Comparing to the best reported algorithm
other than our CNMF120572algorithm CNMF
120572119896achieves 24
percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF
120572
algorithm CNMF120572119896
achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF
120572119896achieves 229 percent improvement and
consistently outperforms the other algorithms
6 Conclusion
In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using
8 Discrete Dynamics in Nature and Society
Table 3 The comparison of classification accuracy rates on the Yale Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831
Table 4 The comparison of normalized mutual information on the Yale Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765
2 3 4 5 6 7 8 9 10076
078
08
082
084
086
088
09
092
094
Number of classes
Accu
racy
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy rates
2 3 4 5 6 7 8 9 10072
074
076
078
08
082
084
086
088
09
Nor
mal
ized
mut
ual i
nfor
mat
ion
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 1 Classification performance on the ORL Database
Discrete Dynamics in Nature and Society 9
2 3 4 5 6 7 8 9 1005
055
06
065
07
075
08
085
09Ac
cura
cy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 10042
044
046
048
05
052
054
056
058
06
062
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 2 Classification performance on the Yale Database
(a)
(b)
(c)
Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF
120572119896 Positive values are illustrated with white pixels
Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the
model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF
120572
algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total
10 Discrete Dynamics in Nature and Society
2 3 4 5 6 7 8 9 1004
045
05
055
06
065
07
075
08
085
09
Accu
racy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 1002
025
03
035
04
045
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 4 Classification performance on the Caltech 101 Database
Table 5 The comparison of classification accuracy rates on the Caltech 101 Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898
Table 6 The comparison of normalized mutual information on the Caltech 101 Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869
Discrete Dynamics in Nature and Society 11
Table 7 The value of 120572 in 119896 categories
119896 1205721
1205722
1205723
1205724
1205725
1205726
1205727
1205728
1205729
12057210
2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100
samples In the experiments we apply the fixed parameters120572 and 120572
119896to analyze the classification accuracy on three
image databasesThe experimental results have demonstratedthat the CNMF
120572119896algorithm has best classification accuracy
However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)
References
[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987
[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996
[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994
[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986
[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001
[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992
[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999
[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004
[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001
[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005
[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004
[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004
[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001
[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006
[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012
[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008
[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003
[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003
[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006
[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of
12 Discrete Dynamics in Nature and Society
IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003
[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006
[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004
[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008
[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006
[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006
[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006
[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006
[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977
[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997
[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982
[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml
[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant
keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
6 Discrete Dynamics in Nature and Society
algorithm described in (11) and (12) provides a classificationlabel of each sample marked 119904
119894119897119894 Suppose 119883 = 119909
119894119899
119894=1is a
data set which consists of n training images For each samplelet 119897119894be the true class label provided by the image data set
More specifically if the image 119909119894is designated the 119897
119894th class
we evaluate it as a correct label and set 119904119894119897119894
= 1 Otherwiseit is counted as a false label and noted 119904
119894119897119894= 0 Eventually we
compute the percentage of correct labels obtained by definingthe accuracy measure as
Ac (119904119894119897119894) =
sum119899
119894=1119904119894119897119894
119899
(33)
To evaluate the classification performance we carryout computation about the normalized mutual informationwhich is used to measure how similar two sets of clusters areas the second measure Given two data sets of clusters 119862
119894and
119862119895 their normalized mutual information is defined as
NMI (119862119894 119862119895)
=
sum119888119894isin119862119894 119888119895isin119862119895
119901 (119888119894 119888119895) sdot log 119901 (119888
119894 119888119895) [119901 (119888
119894) 119901 (119888119895)]
max (119867 (119862119894) 119867 (119862
119895))
(34)
which takes values between 0 and 1 Where 119901(119888119894) 119901(119888
119895)
denote the probabilities that an image arbitrarily chosen fromthe data set belongs to the clusters 119862
119894and 119862
119895 respectively
and 119901(119888119894 119888119895) denotes the joint probability that this arbitrarily
selected image belongs to the cluster 119862119894as well as 119862
119895at the
same time119867(119862119894) and 119867(119862
119895) are the entropies of 119862
119894and 119862
119895
respectivelyExperimental results on each data set will be presented as
classification accuracy and the normalized mutual informa-tion is in Tables 1 and 2
51 ORL Database The Cambridge ORL Face Database has400 images for 40 different people 10 images per personThe images of some people are taken at different timesvarying lighting slightly facial expressions (openclosed eyessmilingnonsmiling) and facial details (glassesno glasses)All the images are taken against a dark homogeneous back-ground with the subjects in an upright frontal position andslight left-right out-of-plane rotation To locate the facesthe input images are preprocessed They are resized to 32 times
32 pixels with 256 gray levels per pixel and normalized inorientation so that two eyes in the facial areas are aligned atthe same position
There are 10 images for each category in ORL and 10percent is just one image For the fixed parameter 120572 (120572 =
05 1 2) we randomly choose two images from each categoryto provide the label information Note that the same labelis meaningless for (30) To obtain 119903(119896) we divide the 40categories into 3 groups 10 categories 20 categories and 10categories In the first 10 categories pick up 1 image fromeach category to provide the label information pick up 2images from each category in the second 20 categories andpick up 3 from each category in the remaining categoriesThe
dividing process is repeated 10 times and the obtained averageclassification accuracy is recorded as the final result
Figure 1 shows the graphical classification accuracy ratesand normalized mutual information on the ORL DatabaseNote that if the samples in the collection 119883 come fromthe same group we set 120572
119896= 067 Because of the same
number of samples in each category the variation of the120572 is small even though we label different cardinalities ofsamples Compared to the constrained nonnegative matrixfactorization algorithms with fixed parameters our CNMF
120572119896
algorithm gives the best performance since the selection of120572119896is suitable to the collection 119883 Table 1 summarizes the
detailed classification accuracy and error bars of CNMF120572=05
CNMF
120572=2 and CNMF
120572119896 It shows that our CNMF
120572119896algo-
rithm achieves 192 percent improvement compared to thebest reported CNMFKL algorithm (120572 = 1) [15] in averageclassification accuracy For normalized mutual informationthe details and the error bars of our constrained algorithmswith 120572 = 05 120572 = 2 and 120572
119896are listed in Table 2 Comparing
to the best algorithmCNMF ourCNMF120572119896algorithmachieves
054 percent improvement
52 Yale Database The Yale Database consists of 165grayscale images for 15 individuals 11 images per personOne per image is taken from different facial expression orconfiguration center-light wglasses happy left-light wnoglasses normal right-light sad sleepy surprised and winkWe preprocess all the images of Yale Database in the sameway as the ORL Database Each image is linearly stretched toa 1024-dimensional vector in image space
TheYaleDatabase also has the same number of samples ineach category To obtain an appropriate 120572 we do similar labelprocessing to the ORL Database Divide the 15 individualsinto 3 groups averagely choose 1 image from each categoryin the first group choose 2 from each category in the secondand choose 3 from each category in the remaining groupWe repeat the process 10 times and record the averageclassification accuracy as the final result
Figure 2 shows the classification accuracy and normal-ized mutual information on the Yale Database Set 120572
119896=
055 when 119903(119896) minus 119903(119873) = 0 It indicates that the samplesin the collection 119883 just come from two groups that ischoose 15 images from 10 categories in the first and secondgroup CNMF
120572119896achieves an extraordinary performance for
all the cases and CNMF120572=05
follows This suggests that theconstrained nonnegative matrix factorization algorithm hasa higher classification accuracy when the value of 120572 is closeto 120572119896 Comparing to the best reported CNMFKL algorithm
CNMF120572119896
achieves 242 percent improvement in averageclassification accuracy For normalized mutual informationCNMF
120572119896achieves 72 percent improvement compared to
CNMF The details of classification accuracy and normal-ized mutual information are provided in Tables 3 and 4which contain the error bars of CNMF
120572=05 CNMF
120572=2 and
CNMF120572119896
53 Caltech 101 Database The Caltech 101 Database createdby Caltech University has images of 101 different object
Discrete Dynamics in Nature and Society 7
Table 1 The comparison of classification accuracy rates on the ORL Database
119896
Classification accuracy Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321
Table 2 The comparison of normalized mutual information on the ORL Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027
categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram
We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572
119896computed by (31) are listed in Table 7
The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572
119896can fully reflect the effectiveness of the
indicator matrix 119878Figure 3 shows the effect that derived from the using of
proposed algorithm with 120572119896 The upper part of figure is the
original samples which contain 26 images the middle part
is their gray images and the lower is the combination of thebasis vectors learned by CNMF
120572119896
The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF
120572=05 CNMF
120572=2 and CNMF
120572119896 The graphical results
of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572
119896listed in Table 7were selected In general ourmethod
demonstrates much better effectiveness in classification bychoosing 120572
119896 Comparing to the best reported algorithm
other than our CNMF120572algorithm CNMF
120572119896achieves 24
percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF
120572
algorithm CNMF120572119896
achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF
120572119896achieves 229 percent improvement and
consistently outperforms the other algorithms
6 Conclusion
In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using
8 Discrete Dynamics in Nature and Society
Table 3 The comparison of classification accuracy rates on the Yale Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831
Table 4 The comparison of normalized mutual information on the Yale Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765
2 3 4 5 6 7 8 9 10076
078
08
082
084
086
088
09
092
094
Number of classes
Accu
racy
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy rates
2 3 4 5 6 7 8 9 10072
074
076
078
08
082
084
086
088
09
Nor
mal
ized
mut
ual i
nfor
mat
ion
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 1 Classification performance on the ORL Database
Discrete Dynamics in Nature and Society 9
2 3 4 5 6 7 8 9 1005
055
06
065
07
075
08
085
09Ac
cura
cy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 10042
044
046
048
05
052
054
056
058
06
062
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 2 Classification performance on the Yale Database
(a)
(b)
(c)
Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF
120572119896 Positive values are illustrated with white pixels
Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the
model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF
120572
algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total
10 Discrete Dynamics in Nature and Society
2 3 4 5 6 7 8 9 1004
045
05
055
06
065
07
075
08
085
09
Accu
racy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 1002
025
03
035
04
045
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 4 Classification performance on the Caltech 101 Database
Table 5 The comparison of classification accuracy rates on the Caltech 101 Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898
Table 6 The comparison of normalized mutual information on the Caltech 101 Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869
Discrete Dynamics in Nature and Society 11
Table 7 The value of 120572 in 119896 categories
119896 1205721
1205722
1205723
1205724
1205725
1205726
1205727
1205728
1205729
12057210
2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100
samples In the experiments we apply the fixed parameters120572 and 120572
119896to analyze the classification accuracy on three
image databasesThe experimental results have demonstratedthat the CNMF
120572119896algorithm has best classification accuracy
However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)
References
[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987
[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996
[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994
[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986
[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001
[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992
[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999
[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004
[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001
[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005
[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004
[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004
[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001
[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006
[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012
[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008
[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003
[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003
[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006
[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of
12 Discrete Dynamics in Nature and Society
IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003
[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006
[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004
[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008
[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006
[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006
[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006
[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006
[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977
[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997
[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982
[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml
[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant
keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Discrete Dynamics in Nature and Society 7
Table 1 The comparison of classification accuracy rates on the ORL Database
119896
Classification accuracy Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 9390 9295 9351 plusmn 009 9214 plusmn 018 9389 plusmn 0913 8620 8433 8571 plusmn 083 8226 plusmn 123 8621 plusmn 2074 8455 8405 8474 plusmn 056 8302 plusmn 047 8478 plusmn 0365 8016 7838 8010 plusmn 022 8015 plusmn 018 8016 plusmn 0296 7672 7673 7713 plusmn 028 7760 plusmn 019 7809 plusmn 0407 8114 7904 8034 plusmn 081 7807 plusmn 088 8215 plusmn 1958 7808 7702 7866 plusmn 069 7877 plusmn 094 7945 plusmn 1339 8319 8223 8357 plusmn 036 8122 plusmn 021 8519 plusmn 12110 7703 7688 7769 plusmn 012 7775 plusmn 019 7896 plusmn 073Avg 8233 8129 8238 8122 8321
Table 2 The comparison of normalized mutual information on the ORL Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 7824 7582 7601 plusmn 014 7394 plusmn 011 7767 plusmn 0583 7661 7536 7538 plusmn 041 7450 plusmn 053 7592 plusmn 1554 7869 7782 7796 plusmn 047 7704 plusmn 054 7927 plusmn 1265 7669 7427 7532 plusmn 080 7262 plusmn 087 7535 plusmn 1676 7652 7609 7621 plusmn 012 7609 plusmn 067 7760 plusmn 1827 8245 8135 8236 plusmn 042 8135 plusmn 026 8308 plusmn 1598 8057 7992 8011 plusmn 073 7991 plusmn 068 8249 plusmn 1979 8603 8567 8673 plusmn 070 8567 plusmn 037 8805 plusmn 11510 8177 8107 8207 plusmn 012 8107 plusmn 012 8297 plusmn 171Avg 7973 7860 7913 7802 8027
categories Each category contains about 31 to 800 imageswith a total of 9144 samples of size roughly 300 times 300 pixelsThis database is particularly challenging for learning a basisrepresentation of image information because the numberof training samples per category is exceedingly small Inour experiment we select the 10 largest categories (3044images in total) except the background category To representthe input images we do the preprocessing by using thecodewords generated by SIFT features [34] Then we obtain555292 SIFT descriptors and generate 500 codewords Byassigning the descriptors to the closest codewords eachimage in Caltech 101 database is represented by a 500-dimensional frequency histogram
We randomly select 119896 categories from Faces-Easy cate-gory in Caltech 101 database and convert them to gray-scaleof 32 times 32 The label process is repeated 10 times and theobtained values of 120572
119896computed by (31) are listed in Table 7
The variation of the 120572 in the same 119896 categories is greatThat isselecting an appropriate 120572 plays a critical role for the mixtureof different categories in one image data set especially in thecase that the number of samples in each category is differentThe choice of 120572
119896can fully reflect the effectiveness of the
indicator matrix 119878Figure 3 shows the effect that derived from the using of
proposed algorithm with 120572119896 The upper part of figure is the
original samples which contain 26 images the middle part
is their gray images and the lower is the combination of thebasis vectors learned by CNMF
120572119896
The classification accuracy results and normalizedmutualinformation for Faces-Easy category in Caltech 101 databaseare detailed in Tables 5 and 6 which contain the error bars ofCNMF
120572=05 CNMF
120572=2 and CNMF
120572119896 The graphical results
of classification performance are shown in Figure 4 The bestperformance in this experiment is achieved when the param-eters120572
119896listed in Table 7were selected In general ourmethod
demonstrates much better effectiveness in classification bychoosing 120572
119896 Comparing to the best reported algorithm
other than our CNMF120572algorithm CNMF
120572119896achieves 24
percent improvement in average classification accuracy andcomparing to the CNMF algorithm [15] other than CNMF
120572
algorithm CNMF120572119896
achieves 877 percent improvement inaverage classification accuracy For normalized mutual infor-mation CNMF
120572119896achieves 229 percent improvement and
consistently outperforms the other algorithms
6 Conclusion
In this paper we present a generic constraint nonnegativematrix factorization algorithm by imposing label informa-tion as additional hard constraint to the 120572-divergence-NMFalgorithm The proposed algorithm can be derived using
8 Discrete Dynamics in Nature and Society
Table 3 The comparison of classification accuracy rates on the Yale Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831
Table 4 The comparison of normalized mutual information on the Yale Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765
2 3 4 5 6 7 8 9 10076
078
08
082
084
086
088
09
092
094
Number of classes
Accu
racy
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy rates
2 3 4 5 6 7 8 9 10072
074
076
078
08
082
084
086
088
09
Nor
mal
ized
mut
ual i
nfor
mat
ion
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 1 Classification performance on the ORL Database
Discrete Dynamics in Nature and Society 9
2 3 4 5 6 7 8 9 1005
055
06
065
07
075
08
085
09Ac
cura
cy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 10042
044
046
048
05
052
054
056
058
06
062
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 2 Classification performance on the Yale Database
(a)
(b)
(c)
Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF
120572119896 Positive values are illustrated with white pixels
Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the
model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF
120572
algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total
10 Discrete Dynamics in Nature and Society
2 3 4 5 6 7 8 9 1004
045
05
055
06
065
07
075
08
085
09
Accu
racy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 1002
025
03
035
04
045
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 4 Classification performance on the Caltech 101 Database
Table 5 The comparison of classification accuracy rates on the Caltech 101 Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898
Table 6 The comparison of normalized mutual information on the Caltech 101 Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869
Discrete Dynamics in Nature and Society 11
Table 7 The value of 120572 in 119896 categories
119896 1205721
1205722
1205723
1205724
1205725
1205726
1205727
1205728
1205729
12057210
2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100
samples In the experiments we apply the fixed parameters120572 and 120572
119896to analyze the classification accuracy on three
image databasesThe experimental results have demonstratedthat the CNMF
120572119896algorithm has best classification accuracy
However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)
References
[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987
[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996
[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994
[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986
[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001
[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992
[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999
[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004
[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001
[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005
[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004
[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004
[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001
[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006
[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012
[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008
[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003
[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003
[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006
[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of
12 Discrete Dynamics in Nature and Society
IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003
[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006
[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004
[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008
[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006
[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006
[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006
[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006
[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977
[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997
[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982
[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml
[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant
keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
8 Discrete Dynamics in Nature and Society
Table 3 The comparison of classification accuracy rates on the Yale Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 8164 8523 8593 plusmn 065 8418 plusmn 081 8627 plusmn 2463 6848 7691 7714 plusmn 086 7726 plusmn 042 7901 plusmn 1094 6425 6670 6811 plusmn 067 6612 plusmn 016 6933 plusmn 2235 6582 6771 6914 plusmn 121 6654 plusmn 101 7057 plusmn 2226 6012 6426 6562 plusmn 094 6370 plusmn 094 6692 plusmn 1417 5925 6247 6397 plusmn 072 5983 plusmn 038 6523 plusmn 1778 5440 5926 6012 plusmn 031 5676 plusmn 007 6089 plusmn 1289 5485 5730 5808 plusmn 012 5488 plusmn 014 5898 plusmn 06210 5268 5620 5756 plusmn 003 5383 plusmn 001 5756 plusmn 073Avg 6239 6589 6730 6479 6831
Table 4 The comparison of normalized mutual information on the Yale Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 4899 5966 5973 plusmn 039 5817 plusmn 017 6111 plusmn 1823 4290 5352 5358 plusmn 073 5275 plusmn 093 5449 plusmn 1744 4780 5360 5366 plusmn 018 5299 plusmn 065 5458 plusmn 1645 5399 5749 5756 plusmn 007 5621 plusmn 006 5858 plusmn 2326 5063 5588 5615 plusmn 007 5540 plusmn 008 5762 plusmn 1097 5303 5760 5788 plusmn 022 5727 plusmn 016 5905 plusmn 1898 5083 5652 5693 plusmn 013 5620 plusmn 013 5898 plusmn 1469 5308 5530 5598 plusmn 073 5481 plusmn 081 5615 plusmn 12010 5284 5673 5743 plusmn 016 5641 plusmn 041 5830 plusmn 111Avg 5045 5626 5654 5558 5765
2 3 4 5 6 7 8 9 10076
078
08
082
084
086
088
09
092
094
Number of classes
Accu
racy
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy rates
2 3 4 5 6 7 8 9 10072
074
076
078
08
082
084
086
088
09
Nor
mal
ized
mut
ual i
nfor
mat
ion
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 1 Classification performance on the ORL Database
Discrete Dynamics in Nature and Society 9
2 3 4 5 6 7 8 9 1005
055
06
065
07
075
08
085
09Ac
cura
cy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 10042
044
046
048
05
052
054
056
058
06
062
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 2 Classification performance on the Yale Database
(a)
(b)
(c)
Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF
120572119896 Positive values are illustrated with white pixels
Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the
model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF
120572
algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total
10 Discrete Dynamics in Nature and Society
2 3 4 5 6 7 8 9 1004
045
05
055
06
065
07
075
08
085
09
Accu
racy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 1002
025
03
035
04
045
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 4 Classification performance on the Caltech 101 Database
Table 5 The comparison of classification accuracy rates on the Caltech 101 Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898
Table 6 The comparison of normalized mutual information on the Caltech 101 Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869
Discrete Dynamics in Nature and Society 11
Table 7 The value of 120572 in 119896 categories
119896 1205721
1205722
1205723
1205724
1205725
1205726
1205727
1205728
1205729
12057210
2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100
samples In the experiments we apply the fixed parameters120572 and 120572
119896to analyze the classification accuracy on three
image databasesThe experimental results have demonstratedthat the CNMF
120572119896algorithm has best classification accuracy
However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)
References
[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987
[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996
[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994
[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986
[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001
[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992
[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999
[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004
[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001
[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005
[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004
[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004
[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001
[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006
[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012
[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008
[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003
[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003
[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006
[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of
12 Discrete Dynamics in Nature and Society
IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003
[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006
[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004
[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008
[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006
[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006
[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006
[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006
[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977
[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997
[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982
[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml
[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant
keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Discrete Dynamics in Nature and Society 9
2 3 4 5 6 7 8 9 1005
055
06
065
07
075
08
085
09Ac
cura
cy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 10042
044
046
048
05
052
054
056
058
06
062
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 2 Classification performance on the Yale Database
(a)
(b)
(c)
Figure 3 Sample images used in YaleB Database As shown in the three 2 times 13 montages (a) is the original images (b) is the gray imagesand (c) is the basis vectors learned by CNMF
120572119896 Positive values are illustrated with white pixels
Karush-Kuhn-Tucker conditions and presented alternativeof the algorithm using the projected gradient The use ofthe two methods guarantees the correctness of the algo-rithm theoretically The image representation learned by ouralgorithm contains a parameter 120572 Since 120572-divergence isa parametric discrepancy measure and the parameter 120572 isassociated with the characteristics of a learning machine the
model distribution is more inclusive when 120572 goes to +infin andis more exclusive when 120572 approaches minusinfin The selection ofthe optimal value of 120572 plays a critical role in determiningthe discriminative basis vectors We provide a method toselect the parameters 120572 for our semisupervised CNMF
120572
algorithm The variation of the 120572 is caused by both thecardinalities of labeled samples in each category and the total
10 Discrete Dynamics in Nature and Society
2 3 4 5 6 7 8 9 1004
045
05
055
06
065
07
075
08
085
09
Accu
racy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 1002
025
03
035
04
045
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 4 Classification performance on the Caltech 101 Database
Table 5 The comparison of classification accuracy rates on the Caltech 101 Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898
Table 6 The comparison of normalized mutual information on the Caltech 101 Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869
Discrete Dynamics in Nature and Society 11
Table 7 The value of 120572 in 119896 categories
119896 1205721
1205722
1205723
1205724
1205725
1205726
1205727
1205728
1205729
12057210
2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100
samples In the experiments we apply the fixed parameters120572 and 120572
119896to analyze the classification accuracy on three
image databasesThe experimental results have demonstratedthat the CNMF
120572119896algorithm has best classification accuracy
However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)
References
[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987
[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996
[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994
[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986
[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001
[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992
[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999
[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004
[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001
[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005
[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004
[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004
[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001
[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006
[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012
[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008
[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003
[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003
[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006
[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of
12 Discrete Dynamics in Nature and Society
IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003
[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006
[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004
[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008
[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006
[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006
[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006
[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006
[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977
[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997
[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982
[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml
[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant
keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
10 Discrete Dynamics in Nature and Society
2 3 4 5 6 7 8 9 1004
045
05
055
06
065
07
075
08
085
09
Accu
racy
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(a) Classification accuracy ratesN
orm
aliz
ed m
utua
l inf
orm
atio
n
2 3 4 5 6 7 8 9 1002
025
03
035
04
045
Number of classes
CNMFCNMF
120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
(b) Normalized mutual information
Figure 4 Classification performance on the Caltech 101 Database
Table 5 The comparison of classification accuracy rates on the Caltech 101 Database
119896
Ac(119904119894119897119894) ()
CNMF CNMF120572=1
CNMF120572=05
CNMF120572=2
CNMF120572119896
2 6639 7880 7831 plusmn 169 8010 plusmn 207 8276 plusmn 3563 6091 6998 6877 plusmn 157 7294 plusmn 151 7359 plusmn 3514 5082 5750 5819 plusmn 198 5628 plusmn 196 5989 plusmn 2825 5083 5367 5371 plusmn 102 5412 plusmn 144 5506 plusmn 2636 5049 5505 5563 plusmn 089 5804 plusmn 147 5897 plusmn 1977 4609 5112 5004 plusmn 011 5173 plusmn 025 5236 plusmn 1818 4421 5050 5172 plusmn 051 4837 plusmn 159 5264 plusmn 1529 4134 4658 4785 plusmn 028 4587 plusmn 030 4954 plusmn 12310 4080 4602 4602 plusmn 005 4576 plusmn 008 4602 plusmn 026Avg 5021 5658 5669 5702 5898
Table 6 The comparison of normalized mutual information on the Caltech 101 Database
119896
Normalized mutual information ()CNMF CNMF
120572=1CNMF
120572=05CNMF
120572=2CNMF
120572119896
2 2105 3817 3820 plusmn 018 3748 plusmn 021 4008 plusmn 2193 3162 4063 4068 plusmn 015 4005 plusmn 014 4188 plusmn 1244 2760 3436 3441 plusmn 079 3412 plusmn 091 3641 plusmn 1175 3446 3582 3600 plusmn 099 3542 plusmn 101 3720 plusmn 1786 3485 3897 3916 plusmn 086 3853 plusmn 111 3964 plusmn 1837 3488 3767 3786 plusmn 047 3747 plusmn 049 3986 plusmn 2588 3263 3743 3771 plusmn 039 3722 plusmn 042 3791 plusmn 2189 3120 3597 3621 plusmn 060 3586 plusmn 063 3773 plusmn 06710 3121 3571 3606 plusmn 002 3568 plusmn 003 3748 plusmn 022Avg 3105 3719 3737 3687 3869
Discrete Dynamics in Nature and Society 11
Table 7 The value of 120572 in 119896 categories
119896 1205721
1205722
1205723
1205724
1205725
1205726
1205727
1205728
1205729
12057210
2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100
samples In the experiments we apply the fixed parameters120572 and 120572
119896to analyze the classification accuracy on three
image databasesThe experimental results have demonstratedthat the CNMF
120572119896algorithm has best classification accuracy
However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)
References
[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987
[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996
[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994
[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986
[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001
[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992
[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999
[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004
[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001
[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005
[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004
[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004
[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001
[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006
[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012
[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008
[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003
[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003
[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006
[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of
12 Discrete Dynamics in Nature and Society
IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003
[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006
[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004
[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008
[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006
[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006
[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006
[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006
[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977
[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997
[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982
[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml
[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant
keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Discrete Dynamics in Nature and Society 11
Table 7 The value of 120572 in 119896 categories
119896 1205721
1205722
1205723
1205724
1205725
1205726
1205727
1205728
1205729
12057210
2 207 208 222 227 257 352 397 398 437 8963 045 055 066 107 195 231 231 354 424 7544 030 032 035 086 098 127 165 210 327 5805 045 049 051 071 080 113 205 255 282 3896 007 056 073 108 126 146 201 212 391 8527 071 094 112 114 116 126 127 137 157 8978 012 030 050 064 066 069 078 079 097 1119 078 079 201 010 017 028 029 030 031 03910 100 100 100 100 100 100 100 100 100 100
samples In the experiments we apply the fixed parameters120572 and 120572
119896to analyze the classification accuracy on three
image databasesThe experimental results have demonstratedthat the CNMF
120572119896algorithm has best classification accuracy
However we can not get the cardinalities of labeled imagesexactly in many real-word applications Moreover the valueof 120572 varies depending on data sets It is still an open problemhow to select the optimal 120572 for a specific image data set
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgments
This work was supported in part by the National NaturalScience Foundation of China (61375038) National NaturalScience Foundation of China (11401060) Zhejiang Provin-cial Natural Science Foundation of China (LQ13A010023)Key Scientific and Technological Project of Henan Province(142102210010) and Key Research Project in Science andTechnology of the Education Department Henan Province(14A520028 14A520052)
References
[1] I Biederman ldquoRecognition-by-components a theory of humanimage understandingrdquo Psychological Review vol 94 no 2 pp115ndash147 1987
[2] S Ullman High-Level Vision Object Recognition and VisualCognition MIT Press Cambridge Mass USA 1996
[3] P Paatero and U Tapper ldquoPositive matrix factorization a non-negative factormodel with optimal utilization of error estimatesof data valuesrdquo Environmetrics vol 5 no 2 pp 111ndash126 1994
[4] I T Jolliffe Principle Component Analysis Springer BerlinGermany 1986
[5] R O Duda P E Hart and D G Stork Pattern ClassificationWiley-Interscience New York NY USA 2001
[6] A Gersho and R M Gray Vector Quantization and SignalCompression Kluwer Academic Publishers 1992
[7] D D Lee and H S Seung ldquoLearning the parts of objects bynon-negative matrix factorizationrdquo Nature vol 401 no 6755pp 788ndash791 1999
[8] M Chu F Diele R Plemmons and S Ragni ldquoOptimalitycomputation and interpretation of nonnegative matrix factor-izationsrdquo SIAM Journal on Matrix Analysis pp 4ndash8030 2004
[9] S Z Li X W Hou H J Zhang and Q S Cheng ldquoLearningspatially localized parts-based representationrdquo inProceedings ofthe IEEE Computer Society Conference on Computer Vision andPattern Recognition (CVPR rsquo01) pp I207ndashI212 Kauai HawaiiUSA December 2001
[10] Y Wang Y Jia H U Changbo and M Turk ldquoNon-negativematrix factorization framework for face recognitionrdquo Interna-tional Journal of Pattern Recognition and Artificial Intelligencevol 19 no 4 pp 495ndash511 2005
[11] YWang Y Jia CHu andM Turk ldquoFisher non-negativematrixfactorization for learning local featuresrdquo in Proceedings of theAsian Conference on Computer Vision Jeju Island Republic ofKorea 2004
[12] J H Ahn S Kim J H Oh and S Choi ldquoMultiple nonnegative-matrix factorization of dynamic PET imagesrdquo in Proceedings ofthe Asian Conference on Computer Vision 2004
[13] J S Lee D D Lee S Choi and D S Lee ldquoApplication of nonnegative matrix factorization to dynamic positron emissiontomographyrdquo in Proceedings of the International Conference onIndependent Component Analysis and Blind Signal Separationpp 629ndash632 San Diego Calif USA 2001
[14] H Lee A Cichocki and S Choi ldquoNonnegative matrix factor-ization for motor imagery EEG classificationrdquo in Proceedingsof the International Conference on Artificial Neural NetworksSpringer Athens Greece 2006
[15] H Liu Z Wu X Li D Cai and T S Huang ldquoConstrainednonnegative matrix factorization for image representationrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 34 no 7 pp 1299ndash1311 2012
[16] A Cichocki H Lee Y-D Kim and S Choi ldquoNon-negativematrix factorization with 120572-divergencerdquo Pattern RecognitionLetters vol 29 no 9 pp 1433ndash1440 2008
[17] L K Saul F Sha and D D Lee ldquoTatistical signal processingwith nonnegativity constraintsrdquo Proceedings of EuroSpeech vol2 pp 1001ndash1004 2003
[18] P M Kim and B Tidor ldquoSubsystem identification throughdimensionality reduction of large-scale gene expression datardquoGenome Research vol 13 no 7 pp 1706ndash1718 2003
[19] V P Pauca J Piper and R J Plemmons ldquoNonnegative matrixfactorization for spectral data analysisrdquo Linear Algebra and itsApplications vol 416 no 1 pp 29ndash47 2006
[20] P Smaragdis and J C Brown ldquoNon-negative matrix factor-ization for polyphonic music transcriptionrdquo in Proceedings of
12 Discrete Dynamics in Nature and Society
IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003
[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006
[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004
[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008
[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006
[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006
[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006
[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006
[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977
[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997
[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982
[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml
[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant
keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
12 Discrete Dynamics in Nature and Society
IEEEWorkshop onApplications of Signal Processing to Audio andAcoustics pp 177ndash180 New Paltz NY USA 2003
[21] F Shahnaz M W Berry V P Pauca and R J PlemmonsldquoDocument clustering using nonnegative matrix factorizationrdquoInformation Processing andManagement vol 42 no 2 pp 373ndash386 2006
[22] P O Hoyer ldquoNon-negativematrix factorization with sparsenessconstraintsrdquo Journal of Machine Learning Research vol 5 pp1457ndash1469 2004
[23] S Choi ldquoAlgorithms for orthogonal nonnegative matrix factor-izationrdquo in Proceedings of the International Joint Conference onNeural Networks pp 1828ndash1832 June 2008
[24] A Cichocki R Zdunek and S Amari ldquoCsiszarrsquos divergencesfor nonnegativematrix factorization family of new algorithmsrdquoin Proceedings of the International Conference on IndependentComponent Analysis and Blind Signal Separation CharlestonSC USA 2006
[25] A Cichocki R Zdunek and S-I Amari ldquoNew algorithmsfor non-negative matrix factorization in applications to blindsource separationrdquo in Proceedings of the IEEE InternationalConference on Acoustics Speech and Signal Processing (ICASSPrsquo06) pp V621ndashV624 Toulouse France May 2006
[26] I S Dhillon and S Sra ldquoGeneralized nonnegative matrixapproximations with Bregman divergencesrdquo in Advances inNeural Information Processing Systems 8 pp 283ndash290 MITPress Cambridge Mass USA 2006
[27] A Cichocki S Amari and R Zdunek ldquoExtended SMART algo-rithms for non-negative matrix factorizationrdquo in Proceedings ofthe 8th International Conference on Artificial Intelligence andSoft Computing Zakopane Poland 2006
[28] A P Dempster N M Laird and D B Rubin ldquoMaximumlikelihood from incomplete data via the EM algorithmrdquo Journalof the Royal Statistical Society Series B Methodological vol 39no 1 pp 1ndash38 1977
[29] L Saul and F Pereira ldquoAggregate and mixed-order Markovmodels for statistical language processingrdquo in Proceedings ofthe 2nd Conference on Empirical Methods in Natural LanguageProcessing C Cardie and R Weischedel Eds pp 81ndash89 ACLPress 1997
[30] F Hansen and G Kjaeligrgard Pedersen ldquoJensenrsquos inequality foroperators and Lownerrsquos theoremrdquoMathematische Annalen vol258 no 3 pp 229ndash241 1982
[31] httpwwwclcamacukresearchdtgattarchivefacedatabasehtml
[32] httpvisionucsdeducontentyale-face-database[33] httpwwwvisioncaltecheduImage DatasetsCaltech101[34] D G Lowe ldquoDistinctive image features from scale-invariant
keypointsrdquo International Journal of Computer Vision vol 60 no2 pp 91ndash110 2004
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of