an alternating direction algorithm for structure …optimization/l1/optseminar/adm for...an...
TRANSCRIPT
An Alternating Direction Algorithm for Structure-enforced
Matrix Factorization
Lijun Xu (Dalian University of Technology)
Supervised by
Bo Yu (DUT) Yin Zhang (Rice University)
March 27, 2013
Outline
Introduction Alternating Direction Method (ADM) ADM Extension to SeMF Numerical experiments Conclusion
• Matrix Factorization • Various factorizations requiring different
constraints on and , a) Exact factorizations: LU, QR, SVD and
eigendecomposition, etc b) Recent approximate factorizations : NMF, K-means,
sparse PCA, matrix completion, dictionary learning, etc.
Introduction
2
,
1min , , ,2
m n m k k nFX Y
M XY M X Y× × ×− ∈ ∈ ∈
X Y
• In practice, many constraints on and impose structural properties like non-negativity, sparsity, orthogonality, normalization, etc., which allow easy ‘projections’.
• Structure-enforced Matrix Factorization (SeMF)
where and are easily projectable sets.
2
,
1min , s.t. , 2 FX Y
M XY X Y− ∈ ∈
X Y
Introduction
• Some examples of easily projectable sets : Non-negativity :
Sparsity:
Orthogonality:
, 0( )
0 , 0ij ij
ij
X XX
X≥= <
{ : 0}ijX X= ≥
0{ : , 1, 2, }iX X k i= ≤ =
, | | is in the first -th largest absolute values of ( )
0 , otherwiseij ij iX X k X
X
=
{ : , }i JX X X i I= ⊥ ∈
( )1( ) , ( )
,
T TJ J J J i
j
X X X X X i IX
X j J
− Ι − ∈= ∈
Introduction
Normalization:
Combinatorial structure:
E.g. 3 groups, each group is sparse.
, 1( )
, 1i i i
i i
X X XX
X X
>= ≤
{ : 1, 1, 2, }iX X i= ≤ =
{ }1 2 : , 1, 2,
r iI I I I iX X X X X i r = = ∈ =
1 1 2 2( ) ( ) ( ) ( )
r rI I IX X X X =
1 zero 2 zeros 1 zero
Introduction
Introduction
• Problems with specific structural patterns
a) Sparse NMF : non-negative (+sparse) : non-negative (+ sparse) b) Sparse PCA : sparse : column normalized c) Dictionary Learning for sparse representation : column normalized : sparse etc.
2
,
1min , s.t. , 2 FX Y
M XY X Y− ∈ ∈
• Classic ADM:
where are convex, are closed convex. • Augmented Lagrangian:
ADM:
Alternating Direction Method
ADM Extension to SeMF • Original Model:
• Model with splitting variables:
Splitting variables separates from (similarly for ), Separations facilitate alternating direction methods
2
,
1min , s.t. , 2 FX Y
M XY X Y− ∈ ∈
2
, , ,
1min , s.t. 0, 0, ,2 FX Y U V
M XY X U Y V U V− − = − = ∈ ∈
U X Y
ADM framework to SeMF
• Augmented Lagrangian:
where are lagrangian multipliers, are penalty parameters and product .
Minimizing with respect to one at a time while fixing others, and then updating after each sweep of such alternating minimization.
2 2 21( , , , , , )2 2 2
+ ( ) ( )
A F F FX Y U V M XY X U Y V
X U Y V
α βΛ Π = − + − + −
Λ• − +Π • −
, ij iji jA B a b• =∑
,Λ Π ( , ) 0α β >
A ( ), and , ,X Y U V,Λ Π
ADM framework to SeMF
• Framework:
( )
1
1 1
1 1
1 1
1 1 1
1 1 1
argmin ( , , , , , ) ,
argmin ( , , , , , ) ,
( / ),
( / ),
( ),
,
k k k k k kA
Xk k k k k k
AY
k k k
k k k
k k k k
k k k k
X X Y U V
Y X Y U V
U XV Y
X U
Y V
α
β
γα
γβ
+
+ +
+ +
+ +
+ + +
+ + +
← Λ Π
← Λ Π
← +Λ
← +Π
Λ ← Λ + −
Π ← Π + −
Implementation • Choice of Step length we set Adaptive updating Motivation: fixed values often cause slow convergence and getting
trapped in local minima. Intuition : balance the changes of the 3 terms and .
• Stopping criterion: , where
M XY−
,X U Y V− −
1,γ =( )0,1.618 ,γ ∈, , α β γ
( , ) , α β
1 k k kf f f tol+− ≤ k kk F
f M X Y= −
Implementation • An updating strategy:
Implementation • An simple example:
Solve
using different initial :
2
,: random 40 60 matrix, || || =1: sparse 60 1500 matrix
each column has 3 zeros with random location and value,
i
A XYX xY
=×
×,
,
2[1 0.1] 10 , 1, 5.kA k−× × =
2
2 0,
1min . . 1, 32 i iFX Y
A XY s t x y− = ≤
Numerical Experiments Dictionary Learning
Synthetic experiments: (compare with K-SVD) X*: random 20*50, columns normalized; Y*: 3 random non-zeros each column; M: X*Y*+ white Gaussian noise.
2
2 0,
1min , s.t. 1, ,2 i jFX Y
M XY x y k i j− ≤ ≤ ∀,
: samples of data, : overcomplete dictionary matrix,
: sparse representation of ,
MXY M
Denote X as learned dictionary. Measure distance: ( )( , ) min 1 ,T
j i jidist x X x x∗ ∗= −
In this case (sparsity = 3), SeMF can recover better when number of samples is small (<500).
Test: a) Solve with different numbers of samples and figure out the percentage of recovery columns ,
Numerical Experiments if is recovered, and define
( , ) 0.01,jdist x X∗ ≤
( , ) ( ( , ))jdist X X mean dist x X∗ ∗=jx∗
Dictionary size : 20*50, Sparsity: 3 Noise: 20dB .
b) The smallest number of samples to reach 95% recovery of dictionary respective to different sparsity ,
the number of samples : [200:50:2000] sparsity: [1 2 3 4 5 6] average results of 10 experiments:
Numerical Experiments
Dictionary size : 20*50, Noise: 20dB .
c) Recovery respect to different noise level.
Numerical Experiments
For each SNR, compute the number of recovered atoms, repeat 100 tests, sort the results and average in groups of 20. SNR = [10 20 30 ]dB
Numerical Experiments Test on Swimmer Datasets
• Swimmer consists of 256 images of size 32*32. Each image is constituted by 5 parts from the 17 distinct non-overlapping basis images, i.e., a centered invariant part called torso and four limbs in one of the 4 positions.
• Goal: extracting non-negative basis images . 1024 256 1024 17 17 256, ,M X Y× × ×∈ ∈ ∈
1 17{ , , }X X
Different structure enforcing 1. Sparse NMF
2. Sparse NMF with equal non-zero coefficients
Latent property: 5 parts of swimmer image have the same
coefficient, which means there are 5 equal non-zeros in the sparse representation Y.
2
00, 0
1min , s.t. 5 1, 2562 jFX Y
M XY y j≥ ≥
− ≤ = ,
2
00 0 ,,
1min , s.t. ( , 5 2
) jFX j nnzY jy meM Y a jyX yn≥ ≥
− ≤= ∀,
Numerical Experiments
Results on different structure enforcing
Sparse NMF Sparse NMF with equal coefficients
Improved but no sequence
Numerical Experiments
3. Sparse NMF with orthogonal property Since sparse NMF can not apparently extract the central
torso, but potential sparsity and orthogonality to 4 limbs. (Actually all 5 parts are independent and there are non-overlapping non-zero parts.)
1, ,16 12
00, 00 7 171min , s.t. , 52
7 , 1 jFX Yx x xM XY y
≥ ≥− ⊥ ≤ ≤
Different structure enforcing Numerical Experiments
Sparse NMF Sparse NMF with orthogonal structure
The torso is classified.
Results on different structure enforcing Numerical Experiments
4. Sparse NMF with combinatorial patterns Divide rows of Y into 5 groups(4 limbs and 1 torso), each
group has only 1 non-zero and the 5 non-zeros are equal.
0
2,0, 0
1min , s.t. ( 1, 1,)2
,5, ij nnz jF GX Y
M XY y mean y y i≥ ≥
= =− =
G1 G2 G3 G4 G5
Different structure enforcing Numerical Experiments
Sparse NMF enforcing combinatorial patterns
Results on different structure enforcing Numerical Experiments
quite well classified parts
Numerical Experiments Test on Face Images
• Goal: return a part-based representation.
The basis elements extract facial features such as eyes, nose and lips.
• Structure Property: Y is non-negative, X is sparse and non-negative,
Few works with L0 sparse NMF. Non-negative K-SVD (NNK-SVD,2005), Probabilistic sparse matrix factorization
(PSMF,2004), NMFL0 (2012)
a) L1 sparse NMF (relaxation of L0 sparse, convex) penalize or constrain the L1 norm of X or Y: b) L0 sparse NMF (more intuitive, non-convex) constrain the L0 norm of X or Y.
Numerical Experiments
(Hoyer 2004)
• Model: sparsity enforced to matrix X
• Compare to Alg. (R.Peharz, F. Pernkopf, 2012) a) fixed Y, calculate X using non-negative least square
(NNLS), b) update Y maintaining sparse structure of X. (ANLS or Multiplicative Update) Difference in subproblems a) and b): SeMF : minimize augmented lagrangian function, : minimize original objective.
2
00, 0
1min , s.t. 2 iFX Y
M XY x K≥ ≥
− ≤
Numerical Experiments
0 -NMF X
0 -NMF X
• Apply to ORL datasets(10304400, 25 basis parts)
Numerical Experiments
nnz: 33% nnz: 25% nnz: 10%
SeMF:
NMFL0:
• Comparison of reconstruction quality and running time.
similar quality but more faster than in less
sparsity cases (more non-zeros).
Numerical Experiments
0 -NMF X
note: perform better than Hoyer’s method in both SNR and time in the paper “Sparse nonnegative matrix factorization with L0-constraints” by R. Peharz and F. Pernkopf.
0 -NMF X
• SeMF can handle many different structures provided they have easy projections,
• ADM approach for augmented lagrangian of a split model, • Dynamically updating penalty parameters empirically
performs well. • Potential applications to many problems with latent
structure properties to improve solution quality, • Further work on experiments and comparisons, non-convex
complication, parameter choices, etc.
Conclusions
Thank you!