an alternating direction algorithm for structure …optimization/l1/optseminar/adm for...an...

An Alternating Direction Algorithm for Structure-enforced

Matrix Factorization

Lijun Xu (Dalian University of Technology)

Supervised by

Bo Yu (DUT) Yin Zhang (Rice University)

March 27, 2013

Outline

Introduction Alternating Direction Method (ADM) ADM Extension to SeMF Numerical experiments Conclusion

• Matrix Factorization • Various factorizations requiring different

constraints on and , a) Exact factorizations: LU, QR, SVD and

eigendecomposition, etc b) Recent approximate factorizations : NMF, K-means,

sparse PCA, matrix completion, dictionary learning, etc.

Introduction

2

,

1min , , ,2

m n m k k nFX Y

M XY M X Y× × ×− ∈ ∈ ∈

X Y

• In practice, many constraints on and impose structural properties like non-negativity, sparsity, orthogonality, normalization, etc., which allow easy ‘projections’.

• Structure-enforced Matrix Factorization (SeMF)

where and are easily projectable sets.

2

,

1min , s.t. , 2 FX Y

M XY X Y− ∈ ∈

X Y

Introduction

• Some examples of easily projectable sets : Non-negativity :

Sparsity:

Orthogonality:

, 0( )

0 , 0ij ij

ij

X XX

X≥= <

{ : 0}ijX X= ≥

0{ : , 1, 2, }iX X k i= ≤ =

, | | is in the first -th largest absolute values of ( )

0 , otherwiseij ij iX X k X

X

=

{ : , }i JX X X i I= ⊥ ∈

( )1( ) , ( )

,

T TJ J J J i

j

X X X X X i IX

X j J

− Ι − ∈= ∈

Introduction

Normalization:

Combinatorial structure:

E.g. 3 groups, each group is sparse.

, 1( )

, 1i i i

i i

X X XX

X X

>= ≤

{ : 1, 1, 2, }iX X i= ≤ =

{ }1 2 : , 1, 2,

r iI I I I iX X X X X i r = = ∈ =

1 1 2 2( ) ( ) ( ) ( )

r rI I IX X X X =

1 zero 2 zeros 1 zero

Introduction

Introduction

• Problems with specific structural patterns

a) Sparse NMF : non-negative (+sparse) : non-negative (+ sparse) b) Sparse PCA : sparse : column normalized c) Dictionary Learning for sparse representation : column normalized : sparse etc.

2

,


M XY X Y− ∈ ∈

• Classic ADM:

where are convex, are closed convex. • Augmented Lagrangian:

ADM:

Alternating Direction Method

ADM Extension to SeMF • Original Model:

• Model with splitting variables:

Splitting variables separates from (similarly for ), Separations facilitate alternating direction methods

2

,


M XY X Y− ∈ ∈

2

, , ,

1min , s.t. 0, 0, ,2 FX Y U V

M XY X U Y V U V− − = − = ∈ ∈

U X Y

ADM framework to SeMF

• Augmented Lagrangian:

where are lagrangian multipliers, are penalty parameters and product .

Minimizing with respect to one at a time while fixing others, and then updating after each sweep of such alternating minimization.

2 2 21( , , , , , )2 2 2

+ ( ) ( )

A F F FX Y U V M XY X U Y V

X U Y V

α βΛ Π = − + − + −

Λ• − +Π • −

, ij iji jA B a b• =∑

,Λ Π ( , ) 0α β >

A ( ), and , ,X Y U V,Λ Π

ADM framework to SeMF

• Framework:

( )

1

1 1

1 1

1 1

1 1 1

1 1 1

argmin ( , , , , , ) ,

argmin ( , , , , , ) ,

( / ),

( / ),

( ),

,

k k k k k kA

Xk k k k k k

AY

k k k

k k k

k k k k

k k k k

X X Y U V

Y X Y U V

U XV Y

X U

Y V

α

β

γα

γβ

+

+ +

+ +

+ +

+ + +

+ + +

← Λ Π

← Λ Π

← +Λ

← +Π

Λ ← Λ + −

Π ← Π + −

Implementation • Choice of Step length we set Adaptive updating Motivation: fixed values often cause slow convergence and getting

trapped in local minima. Intuition : balance the changes of the 3 terms and .

• Stopping criterion: , where

M XY−

,X U Y V− −

1,γ =( )0,1.618 ,γ ∈, , α β γ

( , ) , α β

1 k k kf f f tol+− ≤ k kk F

f M X Y= −

Implementation • An updating strategy:

Implementation • An simple example:

Solve

using different initial :

2

,: random 40 60 matrix, || || =1: sparse 60 1500 matrix

each column has 3 zeros with random location and value,

i

A XYX xY

=×

×，

，

2[1 0.1] 10 , 1, 5.kA k−× × =

2

2 0,

1min . . 1, 32 i iFX Y

A XY s t x y− = ≤

Numerical Experiments Dictionary Learning

Synthetic experiments: (compare with K-SVD) X*: random 20*50, columns normalized; Y*: 3 random non-zeros each column; M: X*Y*+ white Gaussian noise.

2

2 0,

1min , s.t. 1, ,2 i jFX Y

M XY x y k i j− ≤ ≤ ∀，

: samples of data, : overcomplete dictionary matrix,

: sparse representation of ,

MXY M

Denote X as learned dictionary. Measure distance: ( )( , ) min 1 ,T

j i jidist x X x x∗ ∗= −

In this case (sparsity = 3), SeMF can recover better when number of samples is small (<500).

Test: a) Solve with different numbers of samples and figure out the percentage of recovery columns ,

Numerical Experiments if is recovered, and define

( , ) 0.01,jdist x X∗ ≤

( , ) ( ( , ))jdist X X mean dist x X∗ ∗=jx∗

Dictionary size : 20*50, Sparsity: 3 Noise: 20dB .

b) The smallest number of samples to reach 95% recovery of dictionary respective to different sparsity ,

the number of samples : [200:50:2000] sparsity: [1 2 3 4 5 6] average results of 10 experiments:

Numerical Experiments

Dictionary size : 20*50, Noise: 20dB .

c) Recovery respect to different noise level.


For each SNR, compute the number of recovered atoms, repeat 100 tests, sort the results and average in groups of 20. SNR = [10 20 30 ]dB

Numerical Experiments Test on Swimmer Datasets

• Swimmer consists of 256 images of size 32*32. Each image is constituted by 5 parts from the 17 distinct non-overlapping basis images, i.e., a centered invariant part called torso and four limbs in one of the 4 positions.

• Goal: extracting non-negative basis images . 1024 256 1024 17 17 256, ,M X Y× × ×∈ ∈ ∈

1 17{ , , }X X

Different structure enforcing 1. Sparse NMF

2. Sparse NMF with equal non-zero coefficients

Latent property: 5 parts of swimmer image have the same

coefficient, which means there are 5 equal non-zeros in the sparse representation Y.

2

00, 0

1min , s.t. 5 1, 2562 jFX Y

M XY y j≥ ≥

− ≤ = ，

2

00 0 ,,

1min , s.t. ( , 5 2

) jFX j nnzY jy meM Y a jyX yn≥ ≥

− ≤= ∀，


Results on different structure enforcing

Sparse NMF Sparse NMF with equal coefficients

Improved but no sequence


3. Sparse NMF with orthogonal property Since sparse NMF can not apparently extract the central

torso, but potential sparsity and orthogonality to 4 limbs. (Actually all 5 parts are independent and there are non-overlapping non-zero parts.)

1, ,16 12

00, 00 7 171min , s.t. , 52

7 , 1 jFX Yx x xM XY y

≥ ≥− ⊥ ≤ ≤

Different structure enforcing Numerical Experiments

Sparse NMF Sparse NMF with orthogonal structure

The torso is classified.

Results on different structure enforcing Numerical Experiments

4. Sparse NMF with combinatorial patterns Divide rows of Y into 5 groups(4 limbs and 1 torso), each

group has only 1 non-zero and the 5 non-zeros are equal.

0

2,0, 0

1min , s.t. ( 1, 1,)2

,5, ij nnz jF GX Y

M XY y mean y y i≥ ≥

= =− =

G1 G2 G3 G4 G5

Different structure enforcing Numerical Experiments

Sparse NMF enforcing combinatorial patterns

Results on different structure enforcing Numerical Experiments

quite well classified parts

Numerical Experiments Test on Face Images

• Goal: return a part-based representation.

The basis elements extract facial features such as eyes, nose and lips.

• Structure Property: Y is non-negative, X is sparse and non-negative,

Few works with L0 sparse NMF. Non-negative K-SVD (NNK-SVD,2005), Probabilistic sparse matrix factorization

(PSMF,2004), NMFL0 (2012)

a) L1 sparse NMF (relaxation of L0 sparse, convex) penalize or constrain the L1 norm of X or Y: b) L0 sparse NMF (more intuitive, non-convex) constrain the L0 norm of X or Y.


(Hoyer 2004)

• Model: sparsity enforced to matrix X

• Compare to Alg. (R.Peharz, F. Pernkopf, 2012) a) fixed Y, calculate X using non-negative least square

(NNLS), b) update Y maintaining sparse structure of X. (ANLS or Multiplicative Update) Difference in subproblems a) and b): SeMF : minimize augmented lagrangian function, : minimize original objective.

2

00, 0

1min , s.t. 2 iFX Y

M XY x K≥ ≥

− ≤


0 -NMF X

0 -NMF X

• Apply to ORL datasets(10304400, 25 basis parts)


nnz: 33% nnz: 25% nnz: 10%

SeMF:

NMFL0:

• Comparison of reconstruction quality and running time.

similar quality but more faster than in less

sparsity cases (more non-zeros).


0 -NMF X

note: perform better than Hoyer’s method in both SNR and time in the paper “Sparse nonnegative matrix factorization with L0-constraints” by R. Peharz and F. Pernkopf.

0 -NMF X

• SeMF can handle many different structures provided they have easy projections,

• ADM approach for augmented lagrangian of a split model, • Dynamically updating penalty parameters empirically

performs well. • Potential applications to many problems with latent

structure properties to improve solution quality, • Further work on experiments and comparisons, non-convex

complication, parameter choices, etc.

Conclusions

Thank you!

an alternating direction algorithm for structure …optimization/l1/optseminar/adm for...an...

Documents