chapter 2 non-negative matrix factorization

30
Chapter 2 Non-Negative Matrix Factorization Part 1: Introduction & computation

Upload: others

Post on 05-Jan-2022

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 2 Non-Negative Matrix Factorization

Chapter 2 Non-Negative Matrix Factorization

Part 1: Introduction & computation

Page 2: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Motivating NMF

2Skillicorn chapter 8; Berry et al. (2007)

Page 3: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Reminder

3

1

0

0

1

1

1

1

0

0

1

1

1

1

0

0

0.8

0.5

0.5

0.6

−0.5

−0.5

1.5

0

0

1.3

0.3

0.5

0.6

−0.3

0.3

0.5

0.6

−0.3

0.3

0.5

U Σ VTA

= +

0.6

0.3

0.3

1.3

0.8

0.8

0.6

0.3

0.3

1.3

0.8

0.8

0.6

0.3

0.3

0.4

−0.3

−0.3

−0.3

0.2

0.2

0.4

−0.3

−0.3

−0.3

0.2

0.2

0.4

−0.3

−0.3

U1Σ1,1V1T U2Σ2,2V2

T

The components of the SVD are not very interpretable

Page 4: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Non-negative factors

4

1

0

0

1

1

1

1

0

0

1

1

1

1

0

0

A

=

1

0

1

1

1

0

1

1

1

0

1

0

0

0

1

1

W H1

0

0

1

0

0

1

0

0

1

0

0

1

0

0

0

0

0

0

1

1

0

0

0

0

1

1

0

0

0

+

W1H1 W2H2

Forcing the factors to be non-negative can, and often will, improve the interpretability of the factorization

Page 5: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

The definition

5

Page 6: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Definition of NMF

6

Given a non-negative matrix and integer k, find non-negative matrices and such thatis minimized.

12 kA �WHk2F

A 2 Rn⇥m+

W 2 Rn⇥k+ H 2 Rk⇥m+

Page 7: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Non-negative rank

• The non-negative rank of matrix A, rank+(A), is the size of the smallest exact non-negative factorization A = WH

• rank(A) ≤ rank+(A) ≤ min{n, m}

7

Page 8: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Some comments• NMF is not unique

• If X is nonnegative and with nonnegative inverse, then WXX–1H is equivalent valid decomposition

• Computing NMF (and non-negative rank) is NP-hard

• This was open until 2008

8

Page 9: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Example of non-uniqueness

9

1

0

0

1

1

1

1

0

0

1

1

1

1

0

0

1

0

0

1

0

0

1

0

0

1

0

0

1

0

0

0

0

0

0

1

1

0

0

0

0

1

1

0

0

0

1

0

0

0.5

0

0

1

0

0

0.5

0

0

1

0

0

0

0

0

0.5

1

1

0

0

0

0.5

1

1

0

0

0

1

0

0

0

0

0

1

0

0

0

0

0

1

0

0

0

0

0

1

1

1

0

0

0

1

1

1

0

0

0

=

=

=

+

+

+

Page 10: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

NMF has no order• The factors in NMF have no inherent order

• The first component is no more important than the second is no more important…

• NMF is not hierarchical

• The factors of rank-(k+1) decomposition can be completely different to those of rank-k decomposition

10

Page 11: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Example

11

1

0

0

1

1

1

1

0

0

1

1

1

1

0

0

0.8

0.5

0.5

0.7 1.5 0.7 1.5 0.7

≈ Rank-1}1

0

0

0

1

1

1

0

1

1

1

0

1

1

1

0= Rank-2}

Page 12: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Interpreting NMF

12

Page 13: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Parts-of-whole

• NMF works over anti-negative semiring

• There is no subtraction

• Each rank-1 component wihi explains a part of the whole

• This can yield to sparse factors

13

Page 14: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

NMF example: faces

14

Row of original

Row of reconstruction

= ×

PCA/SVD

Page 15: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

NMF example: faces

15

Row of original

= ×

NMF

Page 16: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

NMF example: digits

16

A H

NMF factors correspond to patterns and background

Page 17: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Some NMF applications• Text mining (more later)

• Bioinformatics

• Microarray analysis

• Mineral exploration

• Neuroscience

• Image understanding

• Air pollution research

• Weather forecasting

• …

17

Page 18: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Computing NMF

18

Page 19: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

General idea• NMF is not convex, but it is biconvex

• If W is fixed, is convex

• Start from random W and repeat

• Fix W and update H

• Fix H and update W

• until the error doesn’t decrease anymore

19

12 kA �WHk2F

Page 20: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Notes on the general idea• How to create a good random starting point?

• Is the algorithm robust to initial solutions?

• How to update W and H?

• When (and how quickly) has the process converged?

• Fixed number of iterations? Minimum change in error?

20

Page 21: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Alternating least squares

• Without the non-negativity constraint, this is the standard least-squares:

• wi ← argminw ||wH – ai||F

• we can update W ← AH+ and H ← W+A

• X+ is the pseudo-inverse of X which is LS-optimal

• The method is called alternating least-squares (ALS)

• This can introduce negative values

21

Page 22: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Enforcing non-negativity in ALS

• Least-squares optimal update of W (or H) with non-negativity constraints is convex optimization problem

• In theory in P, in practice slow, but subject to much research

• Simple approach: truncate all negative values to 0

• Update W ← [AH+]+

22

Page 23: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

The NMF-ALS algorithm

1. W ← random(n, k)

2. repeat

2.1. H ← [W+A]+

2.2. W ← [AH+]+

3. until convergence

23

Page 24: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

When has there been enough convergence?

• When the error doesn’t change too much

• ||A – W(k)H(k)||F – ||A – W(k+1)H(k+1)||F ≤ ε

• After some number of maximum iterations has been achieved

• Usually, whichever of these two happens first

24

Page 25: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Gradient descent• We can compute the gradient of the error function

(with one factor matrix fixed)

• We can move slightly towards the negative gradient

• How much is the step size and deciding it is a big problem

25

ƒ (H) = 12 kA �WHk2F =

12P

� k�� �Wh�k2F�H�j ƒ (H) = (W

TA)�j � (WTWH)�j

Page 26: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

The NMF gradient descent algorithm

1. W ← random(n, k)

2. H ← random(k, m)

3. repeat

3.1.

3.2.

4. until convergence

26

H H � �H�ƒ�H

W W � �W�ƒ�W

Page 27: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Example

27

l

r

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

l

r

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

l

r

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

●●

l

r

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

●●

l

r

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

●●

● ●

l

r

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

●●

● ●

l

r

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

●●

● ●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

Page 28: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Notes on gradient descent

• Choosing the correct step size is crucial

• Usually the shorter step sizes the closer the solution we are

• Can converge to local minimum

• Wrong step size, and converges very close to the initial solution

28

Page 29: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

The NMF multiplicative updates algorithm

1. W ← random(n, k)

2. H ← random(k, m)

3. repeat

3.1.

3.2.

4. until convergence

29

h�j h�j(WTA)�j

(WTWH)�j+�

��j ��j(AHT )�j

(WHHT )�j+�

Page 30: Chapter 2 Non-Negative Matrix Factorization

DMM, summer 2015 Pauli Miettinen

Notes on multiplicative updates

• Proposed by Lee & Seung (Nature, 1999)

• Equivalent to gradient descent with dynamic step size

• Zeros in initial solutions will never turn into non-zeros; non-zeros will never turn into zeros

• Problems if the correct solution contains zeros

30