sand2009-2389c 1/17 coupled matrix factorizations using optimization daniel m. dunlavy, tamara g....

17
SAND2009-2389C 1/17 Coupled Matrix Factorizations using Optimization Daniel M. Dunlavy, Tamara G. Kolda, Evrim Acar Sandia National Laboratories SIAM Conference on Computational Science and Engineering March 4, 2009 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Upload: sydney-mills

Post on 02-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

SAND2009-2389C1/17

Coupled Matrix Factorizations using Optimization

Daniel M. Dunlavy, Tamara G. Kolda, Evrim AcarSandia National Laboratories

SIAM Conference on Computational Science and EngineeringMarch 4, 2009

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration

under contract DE-AC04-94AL85000.

2/17

Motivating Problems

• Data with multiple types of two-way relationships– Bibliometric analysis

• author-document, term-document, author-venue, etc.• Can we predict potential co-authors?

– Movie ratings• movie-actor, user-movie, actor-award• Can we predict useful movie ratings for other users?

• Consistent dimensionality reduction• Improved interpretation through non-negativity

constraints

3/17

Some Related Work

• Simultaneous factor analysis– Gramian matrices [Levin, 1966]– Test score covariance matrices over time [Millsap, et al., 1988]

• Simultaneous diagonalization– Population differentiation in biology [Thorpe, 1988]– Blind source separation [Ziehe et al., 2004]

• Generalized SVD• Damped or constrained least squares [Van Loan, 1976]

– Microarray data analysis [Alter, et al., 2003]– Multimicrophone speech filtering [Doclo and Moonen, 2002]

• Simultaneous Non-negative Matrix Factorization– Gene clustering in microarray data [Badea, 2007; 2008]

• Tensor decompositions– Data mining, chemometrics, neuroscience

[Kolda, Acar, Bro, Park, Zhang, Berry, Chen, Martin, CSE09]

matrices of same size

matrices of same size

only 2 matrices

slow

at least one common dimension

4/17

Coupled Non-negative Matrix Factorization (CNMF)

• Given

• Solve

document-term

document-author

5/17

Method: CNMF-ALS

• CNMF-ALS: Alternating Least Squares [Extends Berry, et al., 2006]

linear least squares+ simple projection to constraint boundary

6/17

Method: CNMF-MULT

• CNMF-MULT: Multiplicative Updates [Badea, 2007; Badea, 2008; extends Lee and Seung, 2001]

7/17

Method: CNMF-OPT

• CNMF-OPT: Projective Nonlinear CG, More-Thuente LS [Extends Acar, Kolda, and Dunlavy, 2009 and Lin, 2007]

8/17

Matlab Experiments

Noise:

m n p r* # var

20 16 12 3 144

40 32 24 3 288

80 64 48 3 576

160 128 96 3 1152

9/17

Results: No noise, r = r*

NMF Comparison

0.01

0.1

1

10

100

144 288 576 1152

Variables

cpu

tim

e

NMF2-ALSNMF2-MULTNMF2-OPT

NMF Comparison

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

144 288 576 1152

Variables

accu

racy

NMF2-ALSNMF2-MULTNMF2-OPT

10/17

Results: No noise, r = r*

0.01

0.1

1

10

100

1000

0 200 400 600 800 1000 1200 1400

Variables

cpu

tim

e CNMF-ALS

CNMF-MULT

CNMF-OPT

11/17

Results: No noise, r = r*

12/17

Results: No noise, r = r*+1

NMF Comparison

0.01

0.1

1

10

100

1000

144 288 576 1152

Variables

cpu

tim

e

NMF2-ALSNMF2-MULTNMF2-OPT

NMF Comparison

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

144 288 576 1152

Variables

accu

racy

NMF2-ALSNMF2-MULTNMF2-OPT

13/17

Results: No noise, r=r*+1

14/17

Results: Noisy data, r=r*+1

15/17

Future Work

• Extending other promising methods to CNMF– Block principal pivoting based NMF [Park, et al. 2008]– Projected gradient NMF [Lin, 2007]– Projected Newton NMF [Kim, et al., 2008]

• CNMF-OPT extensions– Sparse data, regularization [Acar, Kolda, and Dunlavy, 2009]– Sparsity constraints [Park, et al. 2008]

• Numerical experiments– Scale to larger data sets– Comparisons on real data sets [Park, et al. 2008]

• Alternate models / problem formulations– Coupling matrix and tensor decompositions (CNMF/CNTF)

16/17

Conclusions

• Coupled matrix factorizations– Method for computing factorizations consistent along

common dimensions in data• Results

– CNMF-OPT• Fast and accurate

– Overfactors well and handles noise well

– CNMF-ALS• Fast, but not accurate

– Overfactoring is a big challenge

– CNMF-MULT• Accurate, but may be too slow (similar to NMF results)

• Future Work– Identified several promising paths forward

17/17

Thank You

Coupled Matrix Factorizations using Optimization

Danny Dunlavy

[email protected]

http://www.cs.sandia.gov/~dmdunla