volodymyr kuleshovú arun tejasvi chagantyú …kuleshov/papers/aistats2015...tensor factorization...
TRANSCRIPT
Tensor Factorization via Matrix Factorization
Volodymyr Kuleshovú
Arun Tejasvi Chagantyú
Percy Liang
Stanford University
May 11, 2015
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 1 / 28
Introduction: tensor factorization
An application: community detection
Anandkumar, Ge, Hsu, andS. Kakade 2013
a b
c d
? ?
? ?
= + + · · · +
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28
Introduction: tensor factorization
An application: community detection
Anandkumar, Ge, Hsu, andS. Kakade 2013
a b
c d
? ?
? ?
= + + · · · +
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28
Introduction: tensor factorization
An application: community detection
Anandkumar, Ge, Hsu, andS. Kakade 2013
a b
c d
? ?
? ?
= + + · · · +
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28
Introduction: tensor factorization
An application: community detection
Anandkumar, Ge, Hsu, andS. Kakade 2013
a b
c d
? ?
? ?
= + + · · · +
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28
Introduction: tensor factorization
An application: community detectionAnandkumar, Ge, Hsu, and
S. Kakade 2013
a b
c d
? ?
? ?
= + + · · · +
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 2 / 28
Introduction: tensor factorization
Applications of tensor factorizationI Community detection
I Anandkumar, Ge, Hsu, and S. Kakade 2013I Parsing
I Cohen, Satta, and Collins 2013I Knowledge base completion
I Chang et al. 2014I Singh, Rocktaschel, and Riedel 2015
I Topic modellingI Anandkumar, Foster, et al. 2012
I CrowdsourcingI Zhang et al. 2014
I Mixture modelsI Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013
I Bottlenecked modelsI Chaganty and Liang 2014
I . . .Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 3 / 28
Introduction: tensor factorization
What is tensor (CP) factorization?I Tensor analogue of matrix eigen-decomposition.
M =kÿ
i=1fi
i
ui
¢ ui
.
I Goal: Given T with noise, ‘R, recover factors ui
.
= + + · · · +
k
= + + · · · +
+
k
Orth
ogon
al
Non-
ortho
gona
l
= + + · · · + +
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 4 / 28
Introduction: tensor factorization
What is tensor (CP) factorization?I Tensor analogue of matrix eigen-decomposition.
T =kÿ
i=1fi
i
ui
¢ ui
¢ui
.
I Goal: Given T with noise, ‘R, recover factors ui
.
= + + · · · +
k
= + + · · · +
+
k
Orth
ogon
al
Non-
ortho
gona
l
= + + · · · + +
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 4 / 28
Introduction: tensor factorization
What is tensor (CP) factorization?I Tensor analogue of matrix eigen-decomposition.
‚T =kÿ
i=1fi
i
ui
¢ ui
¢ui
+‘R.
I Goal: Given T with noise, ‘R, recover factors ui
.
= + + · · · +
k
= + + · · · + +
k
Orth
ogon
al
Non-
ortho
gona
l
= + + · · · + +
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 4 / 28
Introduction: tensor factorization
What is tensor (CP) factorization?I Tensor analogue of matrix eigen-decomposition.
‚T =kÿ
i=1fi
i
ui
¢ ui
¢ui
+‘R.
I Goal: Given T with noise, ‘R, recover factors ui
.
= + + · · · +
k
= + + · · · + +
k
Orth
ogon
al
Non-
ortho
gona
l
= + + · · · + +
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 4 / 28
Introduction: tensor factorization
Existing tensor factorization algorithms
I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al.2013)
I Analog of matrix power method.I Sensitive to noise.I Restricted to orthogonal tensors.
I Alternating least squares (Comon, Luciani, and Almeida 2009;Anandkumar, Ge, and Janzamin 2014)
I Sensitive to initialization.
Our approach: reduce to existing fast and robust matrixalgorithms.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 5 / 28
Introduction: tensor factorization
Existing tensor factorization algorithms
I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al.2013)
I Analog of matrix power method.I Sensitive to noise.I Restricted to orthogonal tensors.
I Alternating least squares (Comon, Luciani, and Almeida 2009;Anandkumar, Ge, and Janzamin 2014)
I Sensitive to initialization.
Our approach: reduce to existing fast and robust matrixalgorithms.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 5 / 28
Introduction: tensor factorization
Existing tensor factorization algorithms
I Tensor power method (Anandkumar, Ge, Hsu, S. M. Kakade, et al.2013)
I Analog of matrix power method.I Sensitive to noise.I Restricted to orthogonal tensors.
I Alternating least squares (Comon, Luciani, and Almeida 2009;Anandkumar, Ge, and Janzamin 2014)
I Sensitive to initialization.
Our approach: reduce to existing fast and robust matrixalgorithms.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 5 / 28
Orthogonal Tensor factorization
Outline
Introduction: tensor factorization
Orthogonal Tensor factorizationProjections
Non-orthogonal tensor factorization
Related work
Empirical results
Conclusions
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 6 / 28
Orthogonal Tensor factorization
Tensor factorization via single matrix factorization
T = fi1u¢31 + fi2u¢3
2 + fi3u¢33 + ‘R
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 7 / 28
Orthogonal Tensor factorization
Tensor factorization via single matrix factorization
T = u¢31 + u¢3
1 + u¢31
¿
T (I, I, w) = (w€u1)u¢21 + (w€u2)u¢2
2 + (w€u3)u¢23
I Proposal: Eigen-decomposition on the projected matrix.I Return: recovered eigenvectors, u
i
.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 8 / 28
Orthogonal Tensor factorization
Tensor factorization via single matrix factorization
T = u¢31 + u¢3
1 + u¢31
¿
T (I, I, w) = (w€u1)u¢21 + (w€u2)u¢2
2 + (w€u3)u¢23
I Proposal: Eigen-decomposition on the projected matrix.I Return: recovered eigenvectors, u
i
.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 8 / 28
Orthogonal Tensor factorization
Tensor factorization via single matrix factorization
T = u¢31 + u¢3
1 + u¢31
¿
T (I, I, w) = (w€u1)¸ ˚˙ ˝
⁄1
u¢21 + (w€u2)
¸ ˚˙ ˝⁄2
u¢22 + (w€u3)
¸ ˚˙ ˝⁄3
u¢23
I Proposal: Eigen-decomposition on the projected matrix.I Return: recovered eigenvectors, u
i
.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 8 / 28
Orthogonal Tensor factorization
Sensitivity of single matrix projectionI Problem: Eigendecomposition is very sensitive to the eigengap.
error in factors à 1min(di�erence in eigenvalues) .
I Intuition: If two eigenvalues are equal, corresponding eigenvectorsare arbitrary.
= + ++ +
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 9 / 28
Orthogonal Tensor factorization
Sensitivity of single matrix projectionI Problem: Eigendecomposition is very sensitive to the eigengap.
error in factors à 1min(di�erence in eigenvalues) .
I Intuition: If two eigenvalues are equal, corresponding eigenvectorsare arbitrary.
= + +
+ +
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 9 / 28
Orthogonal Tensor factorization
Sensitivity of single matrix projectionI Problem: Eigendecomposition is very sensitive to the eigengap.
error in factors à 1min(di�erence in eigenvalues) .
I Intuition: If two eigenvalues are equal, corresponding eigenvectorsare arbitrary.
=
+ +
+ +
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 9 / 28
Orthogonal Tensor factorization
Sensitivity analysis (contd.)
(Cardoso 1994)
I Single matrix factorization:
error in factors Ã1
min di�. in eigenvalues .
I Simultaneous matrixfactorization:
error in factors Ã1
min avg. di�. in eigenvalues .
Every coordinate pair needsone good projection (with alarge eigengap).
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28
Orthogonal Tensor factorization
Sensitivity analysis (contd.)
(Cardoso 1994)
I Single matrix factorization:
error in factors Ã1
min di�. in eigenvalues .
I Simultaneous matrixfactorization:
error in factors Ã1
min avg. di�. in eigenvalues .
Every coordinate pair needsone good projection (with alarge eigengap).
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28
Orthogonal Tensor factorization
Sensitivity analysis (contd.)
(Cardoso 1994)
I Single matrix factorization:
error in factors Ã1
min di�. in eigenvalues .
I Simultaneous matrixfactorization:
error in factors Ã1
min avg. di�. in eigenvalues .
Every coordinate pair needsone good projection (with alarge eigengap).
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28
Orthogonal Tensor factorization
Sensitivity analysis (contd.)
(Cardoso 1994)
I Single matrix factorization:
error in factors Ã1
min di�. in eigenvalues .
I Simultaneous matrixfactorization:
error in factors Ã1
min avg. di�. in eigenvalues .
Every coordinate pair needsone good projection (with alarge eigengap).
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28
Orthogonal Tensor factorization
Sensitivity analysis (contd.)
(Cardoso 1994)
I Single matrix factorization:
error in factors Ã1
min di�. in eigenvalues .
I Simultaneous matrixfactorization:
error in factors Ã1
min avg. di�. in eigenvalues .
Every coordinate pair needsone good projection (with alarge eigengap).
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28
Orthogonal Tensor factorization
Sensitivity analysis (contd.)
(Cardoso 1994)
I Single matrix factorization:
error in factors Ã1
min di�. in eigenvalues .
I Simultaneous matrixfactorization:
error in factors Ã1
min avg. di�. in eigenvalues .
Every coordinate pair needsone good projection (with alarge eigengap).
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28
Orthogonal Tensor factorization
Sensitivity analysis (contd.)
(Cardoso 1994)
I Single matrix factorization:
error in factors Ã1
min di�. in eigenvalues .
I Simultaneous matrixfactorization:
error in factors Ã1
min avg. di�. in eigenvalues .
Every coordinate pair needsone good projection (with alarge eigengap).
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28
Orthogonal Tensor factorization
Sensitivity analysis (contd.)(Cardoso 1994)
I Single matrix factorization:
error in factors Ã1
min di�. in eigenvalues .
I Simultaneous matrixfactorization:
error in factors Ã1
min avg. di�. in eigenvalues .
Every coordinate pair needsone good projection (with alarge eigengap).
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28
Orthogonal Tensor factorization
Sensitivity analysis (contd.)(Cardoso 1994)
I Single matrix factorization:
error in factors Ã1
min di�. in eigenvalues .
I Simultaneous matrixfactorization:
error in factors Ã1
min avg. di�. in eigenvalues .
Every coordinate pair needsone good projection (with alarge eigengap).
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28
Orthogonal Tensor factorization
Sensitivity analysis (contd.)(Cardoso 1994)
I Single matrix factorization:
error in factors Ã1
min di�. in eigenvalues .
I Simultaneous matrixfactorization:
error in factors Ã1
min avg. di�. in eigenvalues .
Every coordinate pair needsone good projection (with alarge eigengap).
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28
Orthogonal Tensor factorization
Sensitivity analysis (contd.)(Cardoso 1994)
I Single matrix factorization:
error in factors Ã1
min di�. in eigenvalues .
I Simultaneous matrixfactorization:
error in factors Ã1
min avg. di�. in eigenvalues .
Every coordinate pair needsone good projection (with alarge eigengap).
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28
Orthogonal Tensor factorization
Sensitivity analysis (contd.)(Cardoso 1994)
I Single matrix factorization:
error in factors Ã1
min di�. in eigenvalues .
I Simultaneous matrixfactorization:
error in factors Ã1
min avg. di�. in eigenvalues .
Every coordinate pair needsone good projection (with alarge eigengap).
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 10 / 28
Orthogonal Tensor factorization
Reduction to simultaneous diagonalization
T (I, I, w1)¸ ˚˙ ˝
M1
= (w€1 u1)
¸ ˚˙ ˝⁄11
u1u€1 + (w€
1 u2)¸ ˚˙ ˝
⁄21
u2u€2 + (w€
1 u3)¸ ˚˙ ˝
⁄31
u3u€3
......
......
T (I, I, w¸)¸ ˚˙ ˝M¸
= (w€¸ u1)
¸ ˚˙ ˝⁄1¸
u1u€1 + (w€
¸ u2)¸ ˚˙ ˝
⁄2¸
u2u€2 + (w€
¸ u3)¸ ˚˙ ˝
⁄3¸
u3u€3
I Projections share factors: can be simultaneously diagonalized.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 11 / 28
Orthogonal Tensor factorization
Reduction to simultaneous diagonalization
T (I, I, w1)¸ ˚˙ ˝
M1
= (w€1 u1)
¸ ˚˙ ˝⁄11
u1u€1 + (w€
1 u2)¸ ˚˙ ˝
⁄21
u2u€2 + (w€
1 u3)¸ ˚˙ ˝
⁄31
u3u€3
......
......
T (I, I, w¸)¸ ˚˙ ˝M¸
= (w€¸ u1)
¸ ˚˙ ˝⁄1¸
u1u€1 + (w€
¸ u2)¸ ˚˙ ˝
⁄2¸
u2u€2 + (w€
¸ u3)¸ ˚˙ ˝
⁄3¸
u3u€3
I Projections share factors: can be simultaneously diagonalized.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 11 / 28
Orthogonal Tensor factorization
Reduction to simultaneous diagonalization
T (I, I, w1)¸ ˚˙ ˝
M1
= (w€1 u1)
¸ ˚˙ ˝⁄11
u1u€1 + (w€
1 u2)¸ ˚˙ ˝
⁄21
u2u€2 + (w€
1 u3)¸ ˚˙ ˝
⁄31
u3u€3
......
......
T (I, I, w¸)¸ ˚˙ ˝M¸
= (w€¸ u1)
¸ ˚˙ ˝⁄1¸
u1u€1 + (w€
¸ u2)¸ ˚˙ ˝
⁄2¸
u2u€2 + (w€
¸ u3)¸ ˚˙ ˝
⁄3¸
u3u€3
I Projections share factors: can be simultaneously diagonalized.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 11 / 28
Orthogonal Tensor factorization
Simultaneous diagonalization algorithm
I Algorithm: Simultaneously diagonalize projected matrices.
‚U = arg minU:UU
€=I
Lÿ
¸=1o�(U€M¸U) o�(A) =
ÿ
i ”=j
A2ij
.
I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).
I Used widly in the ICA community.I Empirically appears to be generically global convergence.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28
Orthogonal Tensor factorization
Simultaneous diagonalization algorithm
I Algorithm: Simultaneously diagonalize projected matrices.
‚U = arg minU:UU
€=I
Lÿ
¸=1o�(U€M¸U) o�(A) =
ÿ
i ”=j
A2ij
.
I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).
I Used widly in the ICA community.I Empirically appears to be generically global convergence.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28
Orthogonal Tensor factorization
Simultaneous diagonalization algorithm
I Algorithm: Simultaneously diagonalize projected matrices.
‚U = arg minU:UU
€=I
Lÿ
¸=1o�(U€M¸U) o�(A) =
ÿ
i ”=j
A2ij
.
I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).
I Used widly in the ICA community.I Empirically appears to be generically global convergence.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28
Orthogonal Tensor factorization
Simultaneous diagonalization algorithm
I Algorithm: Simultaneously diagonalize projected matrices.
‚U = arg minU:UU
€=I
Lÿ
¸=1o�(U€M¸U) o�(A) =
ÿ
i ”=j
A2ij
.
I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).
I Used widly in the ICA community.I Empirically appears to be generically global convergence.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28
Orthogonal Tensor factorization
Simultaneous diagonalization algorithm
I Algorithm: Simultaneously diagonalize projected matrices.
‚U = arg minU:UU
€=I
Lÿ
¸=1o�(U€M¸U) o�(A) =
ÿ
i ”=j
A2ij
.
I Optimize using Jacobi angles (Cardoso and Souloumiac 1996).I Used widly in the ICA community.I Empirically appears to be generically global convergence.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 12 / 28
Orthogonal Tensor factorization Projections
Outline
Introduction: tensor factorization
Orthogonal Tensor factorizationProjections
Non-orthogonal tensor factorization
Related work
Empirical results
Conclusions
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 13 / 28
Orthogonal Tensor factorization Projections
Oracle and random projections
I Hypothetically: “oracle”projections along the factorsis good.
I Practically: use projectionsalong random directions.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 14 / 28
Orthogonal Tensor factorization Projections
Oracle and random projections
I Hypothetically: “oracle”projections along the factorsis good.
I Practically: use projectionsalong random directions.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 14 / 28
Orthogonal Tensor factorization Projections
Results: Orthogonal tensor decomposition
‚T =kÿ
i=1fi
i
u¢3i
+ ‘R.
Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,
error in factors Æ O
Q
ccca
ÎfiÎ1fimax
fi2min
+Û
dL
R
dddb ‘
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 15 / 28
Orthogonal Tensor factorization Projections
Results: Orthogonal tensor decomposition
‚T =kÿ
i=1fi
i
u¢3i
+ ‘R.
Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,
error in factors Æ O
Q
ccca
ÎfiÎ1fimax
fi2min
+Û
dL
R
dddb ‘
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 15 / 28
Orthogonal Tensor factorization Projections
Results: Orthogonal tensor decomposition
‚T =kÿ
i=1fi
i
u¢3i
+ ‘R.
Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,
error in factors Æ O
Q
ccca
ÎfiÎ1fimax
fi2min¸ ˚˙ ˝
oracle error
+Û
dL
R
dddb ‘
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 15 / 28
Orthogonal Tensor factorization Projections
Results: Orthogonal tensor decomposition
‚T =kÿ
i=1fi
i
u¢3i
+ ‘R.
Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,
error in factors Æ O
Q
ccca
ÎfiÎ1fimax
fi2min¸ ˚˙ ˝
oracle error
+Û
dL
¸˚˙˝conc. term
R
dddb ‘
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 15 / 28
Orthogonal Tensor factorization Projections
Empirical: Random vs. oracle projections
0 10 20 30 40 50 601umEer of projections
0.000
0.002
0.004
0.006
0.008
0.010
Err
or
2rthogonaO case
Figure: Comparing random vs. oracle projections (d = k = 10, ‘ = 0.05)
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 16 / 28
Non-orthogonal tensor factorization
Outline
Introduction: tensor factorization
Orthogonal Tensor factorizationProjections
Non-orthogonal tensor factorization
Related work
Empirical results
Conclusions
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 17 / 28
Non-orthogonal tensor factorization
Non-orthogonal simultaneous diagonalization
T (I, I, w1)¸ ˚˙ ˝
M1
= (w€1 u1)
¸ ˚˙ ˝⁄11
u1u€1 + (w€
1 u2)¸ ˚˙ ˝
⁄21
u2u€2 + (w€
1 u3)¸ ˚˙ ˝
⁄31
u3u€3
......
......
T (I, I, w¸)¸ ˚˙ ˝M¸
= (w€¸ u1)
¸ ˚˙ ˝⁄1¸
u1u€1 + (w€
¸ u2)¸ ˚˙ ˝
⁄2¸
u2u€2 + (w€
¸ u3)¸ ˚˙ ˝
⁄3¸
u3u€3
I No unique non-orthogonal factorization for a single matrix.
I Ø 2 matrices have a unique non-orthogonal factorization.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 18 / 28
Non-orthogonal tensor factorization
Non-orthogonal simultaneous diagonalization
T (I, I, w1)¸ ˚˙ ˝
M1
= (w€1 u1)
¸ ˚˙ ˝⁄11
u1u€1 + (w€
1 u2)¸ ˚˙ ˝
⁄21
u2u€2 + (w€
1 u3)¸ ˚˙ ˝
⁄31
u3u€3
......
......
T (I, I, w¸)¸ ˚˙ ˝M¸
= (w€¸ u1)
¸ ˚˙ ˝⁄1¸
u1u€1 + (w€
¸ u2)¸ ˚˙ ˝
⁄2¸
u2u€2 + (w€
¸ u3)¸ ˚˙ ˝
⁄3¸
u3u€3
I No unique non-orthogonal factorization for a single matrix.I Ø 2 matrices have a unique non-orthogonal factorization.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 18 / 28
Non-orthogonal tensor factorization
Non-orthogonal simultaneous diagonalization
I Algorithm: Simultaneously diagonalize projected matrices.
‚U = arg minU
Lÿ
¸=1o�(U≠1M¸U≠€) o�(A) =
ÿ
i ”=j
A2ij
.
I U are not constrained to be orthogonal.I Optimize using the QR1JD algorithm (Afsari 2006).
I Only guaranteed to have local convergence.I More stable than ALS in practice.
I Sensitivity analysis due to Afsari 2008
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 19 / 28
Non-orthogonal tensor factorization
Non-orthogonal simultaneous diagonalization
I Algorithm: Simultaneously diagonalize projected matrices.
‚U = arg minU
Lÿ
¸=1o�(U≠1M¸U≠€) o�(A) =
ÿ
i ”=j
A2ij
.
I U are not constrained to be orthogonal.
I Optimize using the QR1JD algorithm (Afsari 2006).I Only guaranteed to have local convergence.I More stable than ALS in practice.
I Sensitivity analysis due to Afsari 2008
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 19 / 28
Non-orthogonal tensor factorization
Non-orthogonal simultaneous diagonalization
I Algorithm: Simultaneously diagonalize projected matrices.
‚U = arg minU
Lÿ
¸=1o�(U≠1M¸U≠€) o�(A) =
ÿ
i ”=j
A2ij
.
I U are not constrained to be orthogonal.I Optimize using the QR1JD algorithm (Afsari 2006).
I Only guaranteed to have local convergence.I More stable than ALS in practice.
I Sensitivity analysis due to Afsari 2008
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 19 / 28
Non-orthogonal tensor factorization
Non-orthogonal simultaneous diagonalization
I Algorithm: Simultaneously diagonalize projected matrices.
‚U = arg minU
Lÿ
¸=1o�(U≠1M¸U≠€) o�(A) =
ÿ
i ”=j
A2ij
.
I U are not constrained to be orthogonal.I Optimize using the QR1JD algorithm (Afsari 2006).
I Only guaranteed to have local convergence.I More stable than ALS in practice.
I Sensitivity analysis due to Afsari 2008
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 19 / 28
Non-orthogonal tensor factorization
Results: Non-orthogonal tensor decomposition
Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,
error in factors Æ O
Q
ccccca
ÎU≠€Î22
1 ≠ µ2
ÎfiÎ1fimax
fi2min
Q
a1 +Û
dL
R
b
R
dddddb‘
where U = [u1| . . . |uk
], µ = max u€i
uj
and ÎU
≠€Î22
1≠µ2 measuresnon-orthogonality.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 20 / 28
Non-orthogonal tensor factorization
Results: Non-orthogonal tensor decomposition
Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,
error in factors Æ O
Q
ccccca
ÎU≠€Î22
1 ≠ µ2
ÎfiÎ1fimax
fi2min
Q
a1 +Û
dL
R
b
¸ ˚˙ ˝ortho. cost
R
dddddb‘
where U = [u1| . . . |uk
], µ = max u€i
uj
and ÎU
≠€Î22
1≠µ2 measuresnon-orthogonality.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 20 / 28
Non-orthogonal tensor factorization
Results: Non-orthogonal tensor decomposition
Theorem (Random projections)Pick L = �(k log k) projections randomly from the unit sphere. Then,with high probability,
error in factors Æ O
Q
ccccca
ÎU≠€Î22
1 ≠ µ2¸ ˚˙ ˝
non≠ortho. cost
ÎfiÎ1fimax
fi2min
Q
a1 +Û
dL
R
b
¸ ˚˙ ˝ortho. cost
R
dddddb‘
where U = [u1| . . . |uk
], µ = max u€i
uj
and ÎU
≠€Î22
1≠µ2 measuresnon-orthogonality.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 20 / 28
Related work
Outline
Introduction: tensor factorization
Orthogonal Tensor factorizationProjections
Non-orthogonal tensor factorization
Related work
Empirical results
Conclusions
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 21 / 28
Related work
Notes and Related work
I Orthogonal tensor methods can factorize non-orthogonal tensorsusing a whitening transformation (Anandkumar, Ge, Hsu,S. M. Kakade, et al. 2013).
I Is a major source of errors itself (Souloumiac 2009).
I Simultaneous diagonalization for tensors proposed by Lathauwer2006.
I Relies on computing the SVD of a d4 ◊ k2 matrix.
I Simultaneous diagonalizations for multiple projections mentionedin Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013.
I No analysis presented.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 22 / 28
Related work
Notes and Related work
I Orthogonal tensor methods can factorize non-orthogonal tensorsusing a whitening transformation (Anandkumar, Ge, Hsu,S. M. Kakade, et al. 2013).
I Is a major source of errors itself (Souloumiac 2009).I Simultaneous diagonalization for tensors proposed by Lathauwer
2006.I Relies on computing the SVD of a d4 ◊ k2 matrix.
I Simultaneous diagonalizations for multiple projections mentionedin Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013.
I No analysis presented.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 22 / 28
Related work
Notes and Related work
I Orthogonal tensor methods can factorize non-orthogonal tensorsusing a whitening transformation (Anandkumar, Ge, Hsu,S. M. Kakade, et al. 2013).
I Is a major source of errors itself (Souloumiac 2009).I Simultaneous diagonalization for tensors proposed by Lathauwer
2006.I Relies on computing the SVD of a d4 ◊ k2 matrix.
I Simultaneous diagonalizations for multiple projections mentionedin Anandkumar, Ge, Hsu, S. M. Kakade, et al. 2013.
I No analysis presented.
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 22 / 28
Empirical results
Outline
Introduction: tensor factorization
Orthogonal Tensor factorizationProjections
Non-orthogonal tensor factorization
Related work
Empirical results
Conclusions
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 23 / 28
Empirical results
Community detectionAnandkumar, Ge, Hsu, and
S. Kakade 2013
a b
c d
= + + · · · +
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 24 / 28
Empirical results
Community detection
0 0.02 0.04 0.06 0.080
0.05
0.1
0.15
Recall
Erro
r
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 25 / 28
Empirical results
Community detection
0 0.02 0.04 0.06 0.080
0.05
0.1
0.15
Recall
Erro
rTPM
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 25 / 28
Empirical results
Community detection
0 0.02 0.04 0.06 0.080
0.05
0.1
0.15
Recall
Erro
rTPMOJD
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 25 / 28
Empirical results
CrowdsourcingZhang et al. 2014
a
b
c
Y
NN
YN
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 26 / 28
Empirical results
CrowdsourcingZhang et al. 2014
web rte birds dogs
80
85
90
95
100
Accu
racy
TPM ALSKuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28
Empirical results
CrowdsourcingZhang et al. 2014
web rte birds dogs
80
85
90
95
100
Accu
racy
TPM ALS OJDKuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28
Empirical results
CrowdsourcingZhang et al. 2014
web rte birds dogs
80
85
90
95
100
Accu
racy
TPM ALS OJD NOJDKuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28
Empirical results
CrowdsourcingZhang et al. 2014
web rte birds dogs
80
85
90
95
100
Accu
racy
TPM ALS OJD NOJD MV+EMKuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 27 / 28
Conclusions
Conclusions
$I Reduce tensor problems to matrix ones with random projections.
I Empirically, competitive with state of the art with support fornon-orthogonal, asymmetric tensors of arbitrary order.
I Open question: is the Jacobi angles algorithm for orthogonalsimultaneous diagaonalization generically globally convergent?
I Github: https://github.com/kuleshov/tensor-factorizationI Codalab: https://www.codalab.org/worksheets/I Thanks! Questions?
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28
Conclusions
Conclusions
$I Reduce tensor problems to matrix ones with random projections.I Empirically, competitive with state of the art with support for
non-orthogonal, asymmetric tensors of arbitrary order.
I Open question: is the Jacobi angles algorithm for orthogonalsimultaneous diagaonalization generically globally convergent?
I Github: https://github.com/kuleshov/tensor-factorizationI Codalab: https://www.codalab.org/worksheets/I Thanks! Questions?
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28
Conclusions
Conclusions
$I Reduce tensor problems to matrix ones with random projections.I Empirically, competitive with state of the art with support for
non-orthogonal, asymmetric tensors of arbitrary order.I Open question: is the Jacobi angles algorithm for orthogonal
simultaneous diagaonalization generically globally convergent?
I Github: https://github.com/kuleshov/tensor-factorizationI Codalab: https://www.codalab.org/worksheets/I Thanks! Questions?
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28
Conclusions
Conclusions
$I Reduce tensor problems to matrix ones with random projections.I Empirically, competitive with state of the art with support for
non-orthogonal, asymmetric tensors of arbitrary order.I Open question: is the Jacobi angles algorithm for orthogonal
simultaneous diagaonalization generically globally convergent?I Github: https://github.com/kuleshov/tensor-factorizationI Codalab: https://www.codalab.org/worksheets/
I Thanks! Questions?
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28
Conclusions
Conclusions
$I Reduce tensor problems to matrix ones with random projections.I Empirically, competitive with state of the art with support for
non-orthogonal, asymmetric tensors of arbitrary order.I Open question: is the Jacobi angles algorithm for orthogonal
simultaneous diagaonalization generically globally convergent?I Github: https://github.com/kuleshov/tensor-factorizationI Codalab: https://www.codalab.org/worksheets/I Thanks! Questions?
Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization May 11, 2015 28 / 28