sand2009-2389c 1/17 coupled matrix factorizations using optimization daniel m. dunlavy, tamara g....
TRANSCRIPT
SAND2009-2389C1/17
Coupled Matrix Factorizations using Optimization
Daniel M. Dunlavy, Tamara G. Kolda, Evrim AcarSandia National Laboratories
SIAM Conference on Computational Science and EngineeringMarch 4, 2009
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration
under contract DE-AC04-94AL85000.
2/17
Motivating Problems
• Data with multiple types of two-way relationships– Bibliometric analysis
• author-document, term-document, author-venue, etc.• Can we predict potential co-authors?
– Movie ratings• movie-actor, user-movie, actor-award• Can we predict useful movie ratings for other users?
• Consistent dimensionality reduction• Improved interpretation through non-negativity
constraints
3/17
Some Related Work
• Simultaneous factor analysis– Gramian matrices [Levin, 1966]– Test score covariance matrices over time [Millsap, et al., 1988]
• Simultaneous diagonalization– Population differentiation in biology [Thorpe, 1988]– Blind source separation [Ziehe et al., 2004]
• Generalized SVD• Damped or constrained least squares [Van Loan, 1976]
– Microarray data analysis [Alter, et al., 2003]– Multimicrophone speech filtering [Doclo and Moonen, 2002]
• Simultaneous Non-negative Matrix Factorization– Gene clustering in microarray data [Badea, 2007; 2008]
• Tensor decompositions– Data mining, chemometrics, neuroscience
[Kolda, Acar, Bro, Park, Zhang, Berry, Chen, Martin, CSE09]
matrices of same size
matrices of same size
only 2 matrices
slow
at least one common dimension
5/17
Method: CNMF-ALS
• CNMF-ALS: Alternating Least Squares [Extends Berry, et al., 2006]
linear least squares+ simple projection to constraint boundary
6/17
Method: CNMF-MULT
• CNMF-MULT: Multiplicative Updates [Badea, 2007; Badea, 2008; extends Lee and Seung, 2001]
7/17
Method: CNMF-OPT
• CNMF-OPT: Projective Nonlinear CG, More-Thuente LS [Extends Acar, Kolda, and Dunlavy, 2009 and Lin, 2007]
8/17
Matlab Experiments
Noise:
m n p r* # var
20 16 12 3 144
40 32 24 3 288
80 64 48 3 576
160 128 96 3 1152
9/17
Results: No noise, r = r*
NMF Comparison
0.01
0.1
1
10
100
144 288 576 1152
Variables
cpu
tim
e
NMF2-ALSNMF2-MULTNMF2-OPT
NMF Comparison
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
144 288 576 1152
Variables
accu
racy
NMF2-ALSNMF2-MULTNMF2-OPT
10/17
Results: No noise, r = r*
0.01
0.1
1
10
100
1000
0 200 400 600 800 1000 1200 1400
Variables
cpu
tim
e CNMF-ALS
CNMF-MULT
CNMF-OPT
12/17
Results: No noise, r = r*+1
NMF Comparison
0.01
0.1
1
10
100
1000
144 288 576 1152
Variables
cpu
tim
e
NMF2-ALSNMF2-MULTNMF2-OPT
NMF Comparison
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
144 288 576 1152
Variables
accu
racy
NMF2-ALSNMF2-MULTNMF2-OPT
15/17
Future Work
• Extending other promising methods to CNMF– Block principal pivoting based NMF [Park, et al. 2008]– Projected gradient NMF [Lin, 2007]– Projected Newton NMF [Kim, et al., 2008]
• CNMF-OPT extensions– Sparse data, regularization [Acar, Kolda, and Dunlavy, 2009]– Sparsity constraints [Park, et al. 2008]
• Numerical experiments– Scale to larger data sets– Comparisons on real data sets [Park, et al. 2008]
• Alternate models / problem formulations– Coupling matrix and tensor decompositions (CNMF/CNTF)
16/17
Conclusions
• Coupled matrix factorizations– Method for computing factorizations consistent along
common dimensions in data• Results
– CNMF-OPT• Fast and accurate
– Overfactors well and handles noise well
– CNMF-ALS• Fast, but not accurate
– Overfactoring is a big challenge
– CNMF-MULT• Accurate, but may be too slow (similar to NMF results)
• Future Work– Identified several promising paths forward
17/17
Thank You
Coupled Matrix Factorizations using Optimization
Danny Dunlavy
http://www.cs.sandia.gov/~dmdunla