computing a nonnegative matrix factorization...
TRANSCRIPT
![Page 1: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/1.jpg)
Computing a Nonnegative MatrixFactorization – Provably
Ankur Moitra, IAS
joint work with Sanjeev Arora, Rong Ge and Ravi Kannan
June 20, 2012
Ankur Moitra (IAS) Provably June 20, 2012
![Page 2: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/2.jpg)
m
n
M
![Page 3: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/3.jpg)
inner−dimension
W
A
n
=m M
![Page 4: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/4.jpg)
A
n
=minner−dimension
rank
W
M
![Page 5: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/5.jpg)
A
n
=minner−dimension
non−negative
non−negative
rank
W
M
![Page 6: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/6.jpg)
A
n
=minner−dimension
non−negative
non−negative
ranknon−negative
W
M
![Page 7: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/7.jpg)
Information Retrieval
documents:
![Page 8: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/8.jpg)
Information Retrieval
documents:
n
m M
![Page 9: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/9.jpg)
Information Retrieval
A
n
=minner−dimension
documents:
W
M
![Page 10: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/10.jpg)
Information Retrieval
"topics"
W
A
n
=minner−dimension
documents:
M
![Page 11: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/11.jpg)
Information Retrieval
"representation"
W
A
n
=minner−dimension
documents:
"topics"
M
![Page 12: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/12.jpg)
Information Retrieval
A
n
=minner−dimension
documents:
"topics" "representation"
W
M
![Page 13: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/13.jpg)
Information Retrieval
A
n
=minner−dimension
documents:
"topics" "representation"
W
M
![Page 14: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/14.jpg)
Applications
Statistics and Machine Learning:extract latent relationships in data
image segmentation, text classification, information retrieval,collaborative filtering, ...[Lee, Seung], [Xu et al], [Hofmann], [Kumar et al], [Kleinberg, Sandler]
Combinatorics:extended formulation, log-rank conjecture[Yannakakis], [Lovasz, Saks]
Physical Modeling:interaction of components is additive
visual recognition, environmetrics
![Page 15: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/15.jpg)
Applications
Statistics and Machine Learning:extract latent relationships in data
image segmentation, text classification, information retrieval,collaborative filtering, ...[Lee, Seung], [Xu et al], [Hofmann], [Kumar et al], [Kleinberg, Sandler]
Combinatorics:extended formulation, log-rank conjecture[Yannakakis], [Lovasz, Saks]
Physical Modeling:interaction of components is additive
visual recognition, environmetrics
![Page 16: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/16.jpg)
Applications
Statistics and Machine Learning:extract latent relationships in data
image segmentation, text classification, information retrieval,collaborative filtering, ...[Lee, Seung], [Xu et al], [Hofmann], [Kumar et al], [Kleinberg, Sandler]
Combinatorics:extended formulation, log-rank conjecture[Yannakakis], [Lovasz, Saks]
Physical Modeling:interaction of components is additive
visual recognition, environmetrics
![Page 17: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/17.jpg)
Local Search: Given A, compute W , compute A, ....
Known to fail on worst-case inputs (stuck in local minima)
Highly sensitive to cost function, regularization, update procedure
Question (theoretical)
Is there an algorithm that (provably) works on all inputs?
Question
Can we give a theoretical explanation for why simple heuristics are soeffective?
(What distinguishes a realistic input from an artificial one?)
![Page 18: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/18.jpg)
Local Search: Given A, compute W , compute A, ....
Known to fail on worst-case inputs (stuck in local minima)
Highly sensitive to cost function, regularization, update procedure
Question (theoretical)
Is there an algorithm that (provably) works on all inputs?
Question
Can we give a theoretical explanation for why simple heuristics are soeffective?
(What distinguishes a realistic input from an artificial one?)
![Page 19: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/19.jpg)
Local Search: Given A, compute W , compute A, ....
Known to fail on worst-case inputs (stuck in local minima)
Highly sensitive to cost function, regularization, update procedure
Question (theoretical)
Is there an algorithm that (provably) works on all inputs?
Question
Can we give a theoretical explanation for why simple heuristics are soeffective?
(What distinguishes a realistic input from an artificial one?)
![Page 20: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/20.jpg)
Local Search: Given A, compute W , compute A, ....
Known to fail on worst-case inputs (stuck in local minima)
Highly sensitive to cost function, regularization, update procedure
Question (theoretical)
Is there an algorithm that (provably) works on all inputs?
Question
Can we give a theoretical explanation for why simple heuristics are soeffective?
(What distinguishes a realistic input from an artificial one?)
![Page 21: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/21.jpg)
Hardness of NMF
Theorem (Vavasis)
NMF is NP-hard to compute
Hence it is unlikely that there is an exact algorithm that runs in timepolynomial in n, m and r
Question
Should we expect r to be large?
What if you gave me a collection of 100 documents, and I told youthere are 75 topics?
How quickly can we solve NMF if r is small?
![Page 22: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/22.jpg)
Hardness of NMF
Theorem (Vavasis)
NMF is NP-hard to compute
Hence it is unlikely that there is an exact algorithm that runs in timepolynomial in n, m and r
Question
Should we expect r to be large?
What if you gave me a collection of 100 documents, and I told youthere are 75 topics?
How quickly can we solve NMF if r is small?
![Page 23: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/23.jpg)
Hardness of NMF
Theorem (Vavasis)
NMF is NP-hard to compute
Hence it is unlikely that there is an exact algorithm that runs in timepolynomial in n, m and r
Question
Should we expect r to be large?
What if you gave me a collection of 100 documents, and I told youthere are 75 topics?
How quickly can we solve NMF if r is small?
![Page 24: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/24.jpg)
Hardness of NMF
Theorem (Vavasis)
NMF is NP-hard to compute
Hence it is unlikely that there is an exact algorithm that runs in timepolynomial in n, m and r
Question
Should we expect r to be large?
What if you gave me a collection of 100 documents, and I told youthere are 75 topics?
How quickly can we solve NMF if r is small?
![Page 25: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/25.jpg)
Hardness of NMF
Theorem (Vavasis)
NMF is NP-hard to compute
Hence it is unlikely that there is an exact algorithm that runs in timepolynomial in n, m and r
Question
Should we expect r to be large?
What if you gave me a collection of 100 documents, and I told youthere are 75 topics?
How quickly can we solve NMF if r is small?
![Page 26: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/26.jpg)
The Worst-Case Complexity of NMF
Theorem (Arora, Ge, Moitra, Kannan)
There is an (nm)O(r2) time exact algorithm for NMF
Previously, the fastest (provable) algorithm for r = 3 ran in timeexponential in n and m
Can we improve the exponential dependence on r?
Theorem (Arora, Ge, Moitra, Kannan)
An exact algorithm for NMF that runs in time (nm)o(r) would yield asub-exponential time algorithm for 3-SAT
![Page 27: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/27.jpg)
The Worst-Case Complexity of NMF
Theorem (Arora, Ge, Moitra, Kannan)
There is an (nm)O(r2) time exact algorithm for NMF
Previously, the fastest (provable) algorithm for r = 3 ran in timeexponential in n and m
Can we improve the exponential dependence on r?
Theorem (Arora, Ge, Moitra, Kannan)
An exact algorithm for NMF that runs in time (nm)o(r) would yield asub-exponential time algorithm for 3-SAT
![Page 28: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/28.jpg)
The Worst-Case Complexity of NMF
Theorem (Arora, Ge, Moitra, Kannan)
There is an (nm)O(r2) time exact algorithm for NMF
Previously, the fastest (provable) algorithm for r = 3 ran in timeexponential in n and m
Can we improve the exponential dependence on r?
Theorem (Arora, Ge, Moitra, Kannan)
An exact algorithm for NMF that runs in time (nm)o(r) would yield asub-exponential time algorithm for 3-SAT
![Page 29: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/29.jpg)
The Worst-Case Complexity of NMF
Theorem (Arora, Ge, Moitra, Kannan)
There is an (nm)O(r2) time exact algorithm for NMF
Previously, the fastest (provable) algorithm for r = 3 ran in timeexponential in n and m
Can we improve the exponential dependence on r?
Theorem (Arora, Ge, Moitra, Kannan)
An exact algorithm for NMF that runs in time (nm)o(r) would yield asub-exponential time algorithm for 3-SAT
![Page 30: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/30.jpg)
Local Search: Given A, compute W , compute A, ....
Known to fail on worst-case inputs (stuck in local minima)
Highly sensitive to cost function, regularization, update procedure
Question (theoretical)
Is there an algorithm that (provably) works on all inputs?
Question
Can we give a theoretical explanation for why simple heuristics are soeffective?
(What distinguishes a realistic input from an artificial one?)
![Page 31: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/31.jpg)
Local Search: Given A, compute W , compute A, ....
Known to fail on worst-case inputs (stuck in local minima)
Highly sensitive to cost function, regularization, update procedure
Question (theoretical)
Is there an algorithm that (provably) works on all inputs?
Question
Can we give a theoretical explanation for why simple heuristics are soeffective?
(What distinguishes a realistic input from an artificial one?)
![Page 32: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/32.jpg)
Local Search: Given A, compute W , compute A, ....
Known to fail on worst-case inputs (stuck in local minima)
Highly sensitive to cost function, regularization, update procedure
Question (theoretical)
Is there an algorithm that (provably) works on all inputs?
Question
Can we give a theoretical explanation for why simple heuristics are soeffective?
(What distinguishes a realistic input from an artificial one?)
![Page 33: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/33.jpg)
Separability [Donoho, Stodden], Reinterpreted
Each topic has an anchor word, and any document that containsthis word is very likely to be (at least partially) about thecorresponding topic
e.g. personal finance ← 401k , baseball ← outfield, ...
A document can contain no anchor words, but when one occurs itis a strong indicator
Observation (Blei)
This condition is met by topics found on real data, say, by local search
Separability was introduced to understand when NMF is unique – Is itenough to make NMF easy?
![Page 34: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/34.jpg)
Separability [Donoho, Stodden], Reinterpreted
Each topic has an anchor word, and any document that containsthis word is very likely to be (at least partially) about thecorresponding topic
e.g. personal finance ← 401k , baseball ← outfield, ...
A document can contain no anchor words, but when one occurs itis a strong indicator
Observation (Blei)
This condition is met by topics found on real data, say, by local search
Separability was introduced to understand when NMF is unique – Is itenough to make NMF easy?
![Page 35: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/35.jpg)
Separability [Donoho, Stodden], Reinterpreted
Each topic has an anchor word, and any document that containsthis word is very likely to be (at least partially) about thecorresponding topic
e.g. personal finance ← 401k , baseball ← outfield, ...
A document can contain no anchor words, but when one occurs itis a strong indicator
Observation (Blei)
This condition is met by topics found on real data, say, by local search
Separability was introduced to understand when NMF is unique – Is itenough to make NMF easy?
![Page 36: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/36.jpg)
Separability [Donoho, Stodden], Reinterpreted
Each topic has an anchor word, and any document that containsthis word is very likely to be (at least partially) about thecorresponding topic
e.g. personal finance ← 401k , baseball ← outfield, ...
A document can contain no anchor words, but when one occurs itis a strong indicator
Observation (Blei)
This condition is met by topics found on real data, say, by local search
Separability was introduced to understand when NMF is unique – Is itenough to make NMF easy?
![Page 37: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/37.jpg)
Separability [Donoho, Stodden], Reinterpreted
Each topic has an anchor word, and any document that containsthis word is very likely to be (at least partially) about thecorresponding topic
e.g. personal finance ← 401k , baseball ← outfield, ...
A document can contain no anchor words, but when one occurs itis a strong indicator
Observation (Blei)
This condition is met by topics found on real data, say, by local search
Separability was introduced to understand when NMF is unique – Is itenough to make NMF easy?
![Page 38: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/38.jpg)
Beyond Worst-Case Analysis
Theorem (Arora, Ge, Kannan, Moitra)
There is a polynomial time exact algorithm for NMF when the topicmatrix A is separable
What if documents do not contain many words compared to thedictionary? (e.g. we are given samples from M)
In fact, the above algorithm can be made robust to noise:
Theorem (Arora, Ge, Moitra)
There is a polynomial time algorithm for learning a separable topicmatrix A in various probabilistic models - e.g. LDA, CTM
![Page 39: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/39.jpg)
Beyond Worst-Case Analysis
Theorem (Arora, Ge, Kannan, Moitra)
There is a polynomial time exact algorithm for NMF when the topicmatrix A is separable
What if documents do not contain many words compared to thedictionary? (e.g. we are given samples from M)
In fact, the above algorithm can be made robust to noise:
Theorem (Arora, Ge, Moitra)
There is a polynomial time algorithm for learning a separable topicmatrix A in various probabilistic models - e.g. LDA, CTM
![Page 40: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/40.jpg)
Beyond Worst-Case Analysis
Theorem (Arora, Ge, Kannan, Moitra)
There is a polynomial time exact algorithm for NMF when the topicmatrix A is separable
What if documents do not contain many words compared to thedictionary? (e.g. we are given samples from M)
In fact, the above algorithm can be made robust to noise:
Theorem (Arora, Ge, Moitra)
There is a polynomial time algorithm for learning a separable topicmatrix A in various probabilistic models - e.g. LDA, CTM
![Page 41: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/41.jpg)
Local Search: Given A, compute W , compute A, ....
Known to fail on worst-case inputs (stuck in local minima)
Highly sensitive to cost function, regularization, update procedure
Question (theoretical)
Is there an algorithm that (provably) works on all inputs?
Question
Can we give a theoretical explanation for why simple heuristics are soeffective?
(What distinguishes a realistic input from an artificial one?)
![Page 42: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/42.jpg)
Local Search: Given A, compute W , compute A, ....
Known to fail on worst-case inputs (stuck in local minima)
Highly sensitive to cost function, regularization, update procedure
Question (theoretical)
Is there an algorithm that (provably) works on all inputs?
Question
Can we give a theoretical explanation for why simple heuristics are soeffective?
(What distinguishes a realistic input from an artificial one?)
![Page 43: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/43.jpg)
Is NMF Computable?
[Cohen, Rothblum]: Yes (DETOUR)
Variables: entries in A and W (nr + mr total)
Constraints: A,W ≥ 0 and AW = M (degree two)
Running time for a solver is exponential in the number of variables
Question
What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?
![Page 44: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/44.jpg)
Is NMF Computable?
[Cohen, Rothblum]: Yes
(DETOUR)
Variables: entries in A and W (nr + mr total)
Constraints: A,W ≥ 0 and AW = M (degree two)
Running time for a solver is exponential in the number of variables
Question
What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?
![Page 45: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/45.jpg)
Is NMF Computable?
[Cohen, Rothblum]: Yes (DETOUR)
Variables: entries in A and W (nr + mr total)
Constraints: A,W ≥ 0 and AW = M (degree two)
Running time for a solver is exponential in the number of variables
Question
What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?
![Page 46: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/46.jpg)
Semi-algebraic sets: s polynomials, k variables, Boolean function B
S = {x1, x2...xk |B(sgn(f1), sgn(f2), ...sgn(fs)) = ”true” }
Question
How many sign patterns arise (as x1, x2, ...xk range over Rk)?
Naive bound: 3s (all of {−1, 0, 1}s), [Milnor, Warren]: at most (ds)k ,where d is the maximum degree
In fact, best known algorithms (e.g. [Renegar]) for finding a point inS run in (ds)O(k) time
![Page 47: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/47.jpg)
Semi-algebraic sets: s polynomials, k variables, Boolean function B
S = {x1, x2...xk |B(sgn(f1), sgn(f2), ...sgn(fs)) = ”true” }
Question
How many sign patterns arise (as x1, x2, ...xk range over Rk)?
Naive bound: 3s (all of {−1, 0, 1}s), [Milnor, Warren]: at most (ds)k ,where d is the maximum degree
In fact, best known algorithms (e.g. [Renegar]) for finding a point inS run in (ds)O(k) time
![Page 48: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/48.jpg)
Semi-algebraic sets: s polynomials, k variables, Boolean function B
S = {x1, x2...xk |B(sgn(f1), sgn(f2), ...sgn(fs)) = ”true” }
Question
How many sign patterns arise (as x1, x2, ...xk range over Rk)?
Naive bound: 3s (all of {−1, 0, 1}s),
[Milnor, Warren]: at most (ds)k ,where d is the maximum degree
In fact, best known algorithms (e.g. [Renegar]) for finding a point inS run in (ds)O(k) time
![Page 49: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/49.jpg)
Semi-algebraic sets: s polynomials, k variables, Boolean function B
S = {x1, x2...xk |B(sgn(f1), sgn(f2), ...sgn(fs)) = ”true” }
Question
How many sign patterns arise (as x1, x2, ...xk range over Rk)?
Naive bound: 3s (all of {−1, 0, 1}s), [Milnor, Warren]: at most (ds)k ,where d is the maximum degree
In fact, best known algorithms (e.g. [Renegar]) for finding a point inS run in (ds)O(k) time
![Page 50: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/50.jpg)
Semi-algebraic sets: s polynomials, k variables, Boolean function B
S = {x1, x2...xk |B(sgn(f1), sgn(f2), ...sgn(fs)) = ”true” }
Question
How many sign patterns arise (as x1, x2, ...xk range over Rk)?
Naive bound: 3s (all of {−1, 0, 1}s), [Milnor, Warren]: at most (ds)k ,where d is the maximum degree
In fact, best known algorithms (e.g. [Renegar]) for finding a point inS run in (ds)O(k) time
![Page 51: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/51.jpg)
Is NMF Computable?
[Cohen, Rothblum]: Yes (DETOUR)
Variables: entries in A and W (nr + mr total)
Constraints: A,W ≥ 0 and AW = M (degree two)
Running time for a solver is exponential in the number of variables
Question
What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?
![Page 52: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/52.jpg)
Is NMF Computable?
[Cohen, Rothblum]: Yes (DETOUR)
Variables: entries in A and W (nr + mr total)
Constraints: A,W ≥ 0 and AW = M (degree two)
Running time for a solver is exponential in the number of variables
Question
What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?
![Page 53: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/53.jpg)
Is NMF Computable?
[Cohen, Rothblum]: Yes (DETOUR)
Variables: entries in A and W (nr + mr total)
Constraints: A,W ≥ 0 and AW = M (degree two)
Running time for a solver is exponential in the number of variables
Question
What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?
![Page 54: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/54.jpg)
Is NMF Computable?
[Cohen, Rothblum]: Yes (DETOUR)
Variables: entries in A and W (nr + mr total)
Constraints: A,W ≥ 0 and AW = M (degree two)
Running time for a solver is exponential in the number of variables
Question
What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?
![Page 55: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/55.jpg)
Is NMF Computable?
[Cohen, Rothblum]: Yes (DETOUR)
Variables: entries in A and W (nr + mr total)
Constraints: A,W ≥ 0 and AW = M (degree two)
Running time for a solver is exponential in the number of variables
Question
What is the smallest formulation, measured in the number ofvariables? Can we use only f (r) variables?
![Page 56: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/56.jpg)
Reducing the Number of Variables
Easy: If A has full rank, then f (r) = 2r 2
[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)
[Moitra]: In general, f (r) = 2r 2 (using a normal form)
Corollary
There is an (nm)O(r2) time algorithm for NMF
In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT
![Page 57: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/57.jpg)
Reducing the Number of Variables
Easy: If A has full rank, then f (r) = 2r 2
[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)
[Moitra]: In general, f (r) = 2r 2 (using a normal form)
Corollary
There is an (nm)O(r2) time algorithm for NMF
In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT
![Page 58: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/58.jpg)
Easy Case: A has Full Column Rank
A
![Page 59: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/59.jpg)
Easy Case: A has Full Column Rank
A
A
pseudo−inverse
+
![Page 60: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/60.jpg)
Easy Case: A has Full Column Rank
I rA =
A
pseudo−inverse
+
![Page 61: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/61.jpg)
Easy Case: A has Full Column Rank
A =A
pseudo−inverse
+
W
W
![Page 62: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/62.jpg)
Easy Case: A has Full Column Rank
A =pseudo−inverse
A
M
+
W
W
![Page 63: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/63.jpg)
Easy Case: A has Full Column Rank
linearly independent
A =pseudo−inverse
A
M
+
W
W
![Page 64: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/64.jpg)
Easy Case: A has Full Column Rank
T
change of basis
W
linearly independent
A =
M’
pseudo−inverse
A
M
+
W
![Page 65: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/65.jpg)
Putting it Together: 2r 2 Variables
M
![Page 66: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/66.jpg)
Putting it Together: 2r 2 Variables
M’’
M
M’
![Page 67: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/67.jpg)
Putting it Together: 2r 2 Variables
S
M
variables
M’
M’’
T
![Page 68: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/68.jpg)
Putting it Together: 2r 2 Variables
T
S
M
variables
A
M’
W
M’’
![Page 69: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/69.jpg)
Putting it Together: 2r 2 Variables
T
S
M
non−negative?
non−negative?
variables
A
M’
W
M’’
![Page 70: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/70.jpg)
Putting it Together: 2r 2 Variables
T
S
M
( ) ( )
non−negative?
non−negative?
variables
A
M’
W
M’’
![Page 71: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/71.jpg)
Putting it Together: 2r 2 Variables
T
S
M
( ) ( ) = M
?
non−negative?
non−negative?
variables
A
M’
W
M’’
![Page 72: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/72.jpg)
Reducing the Number of Variables
Easy: If A has full rank, then f (r) = 2r 2
[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)
[Moitra]: In general, f (r) = 2r 2 (using a normal form)
Corollary
There is an (nm)O(r2) time algorithm for NMF
In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT
![Page 73: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/73.jpg)
Reducing the Number of Variables
Easy: If A has full rank, then f (r) = 2r 2
[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)
[Moitra]: In general, f (r) = 2r 2 (using a normal form)
Corollary
There is an (nm)O(r2) time algorithm for NMF
In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT
![Page 74: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/74.jpg)
Reducing the Number of Variables
Easy: If A has full rank, then f (r) = 2r 2
[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)
[Moitra]: In general, f (r) = 2r 2 (using a normal form)
Corollary
There is an (nm)O(r2) time algorithm for NMF
In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT
![Page 75: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/75.jpg)
Reducing the Number of Variables
Easy: If A has full rank, then f (r) = 2r 2
[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)
[Moitra]: In general, f (r) = 2r 2 (using a normal form)
Corollary
There is an (nm)O(r2) time algorithm for NMF
In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT
![Page 76: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/76.jpg)
Reducing the Number of Variables
Easy: If A has full rank, then f (r) = 2r 2
[Arora, Ge, Kannan, Moitra]: In general, f (r) = 2r 22r , which isconstant for r = O(1)
[Moitra]: In general, f (r) = 2r 2 (using a normal form)
Corollary
There is an (nm)O(r2) time algorithm for NMF
In fact, any (nm)o(r) time algorithm would yield a 2o(n) timealgorithm for 3-SAT
![Page 77: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/77.jpg)
Local Search: Given A, compute W , compute A, ....
Known to fail on worst-case inputs (stuck in local minima)
Highly sensitive to cost function, regularization, update procedure
Question (theoretical)
Is there an algorithm that (provably) works on all inputs?
Question
Can we give a theoretical explanation for why simple heuristics are soeffective?
(What distinguishes a realistic input from an artificial one?)
![Page 78: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/78.jpg)
Local Search: Given A, compute W , compute A, ....
Known to fail on worst-case inputs (stuck in local minima)
Highly sensitive to cost function, regularization, update procedure
Question (theoretical)
Is there an algorithm that (provably) works on all inputs?
Question
Can we give a theoretical explanation for why simple heuristics are soeffective?
(What distinguishes a realistic input from an artificial one?)
![Page 79: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/79.jpg)
Separable Instances
Recall: For each topic, there is some (anchor) word that only appearsin this topic
Observation
Rows of W appear as (scaled) rows of M
Question
How can we identify anchor words?
Removing a row from M strictly changes the convex hull iff it is ananchor word
Hence we can identify all the anchor words via linear programming(can be made robust to noise)
![Page 80: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/80.jpg)
A
![Page 81: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/81.jpg)
A
![Page 82: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/82.jpg)
=
A W M
![Page 83: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/83.jpg)
=
A W M
![Page 84: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/84.jpg)
Separable Instances
Recall: For each topic, there is some (anchor) word that only appearsin this topic
Observation
Rows of W appear as (scaled) rows of M
Question
How can we identify anchor words?
Removing a row from M strictly changes the convex hull iff it is ananchor word
Hence we can identify all the anchor words via linear programming(can be made robust to noise)
![Page 85: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/85.jpg)
Separable Instances
Recall: For each topic, there is some (anchor) word that only appearsin this topic
Observation
Rows of W appear as (scaled) rows of M
Question
How can we identify anchor words?
Removing a row from M strictly changes the convex hull iff it is ananchor word
Hence we can identify all the anchor words via linear programming(can be made robust to noise)
![Page 86: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/86.jpg)
Separable Instances
Recall: For each topic, there is some (anchor) word that only appearsin this topic
Observation
Rows of W appear as (scaled) rows of M
Question
How can we identify anchor words?
Removing a row from M strictly changes the convex hull iff it is ananchor word
Hence we can identify all the anchor words via linear programming(can be made robust to noise)
![Page 87: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/87.jpg)
=
A W M
![Page 88: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/88.jpg)
nr
=
A W M
brute force:
![Page 89: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/89.jpg)
nr
=
A W M
brute force:
![Page 90: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/90.jpg)
brute force: nr
=
A W M
![Page 91: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/91.jpg)
brute force: nr
=
A W M
![Page 92: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/92.jpg)
brute force: nr
=
A W M
![Page 93: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/93.jpg)
brute force: nr
=
A W M
![Page 94: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/94.jpg)
Separable Instances
Recall: For each topic, there is some (anchor) word that only appearsin this topic
Observation
Rows of W appear as (scaled) rows of M
Question
How can we identify anchor words?
Removing a row from M strictly changes the convex hull iff it is ananchor word
Hence we can identify all the anchor words via linear programming(can be made robust to noise)
![Page 95: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/95.jpg)
Separable Instances
Recall: For each topic, there is some (anchor) word that only appearsin this topic
Observation
Rows of W appear as (scaled) rows of M
Question
How can we identify anchor words?
Removing a row from M strictly changes the convex hull iff it is ananchor word
Hence we can identify all the anchor words via linear programming(can be made robust to noise)
![Page 96: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/96.jpg)
Separable Instances
Recall: For each topic, there is some (anchor) word that only appearsin this topic
Observation
Rows of W appear as (scaled) rows of M
Question
How can we identify anchor words?
Removing a row from M strictly changes the convex hull iff it is ananchor word
Hence we can identify all the anchor words via linear programming
(can be made robust to noise)
![Page 97: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/97.jpg)
Separable Instances
Recall: For each topic, there is some (anchor) word that only appearsin this topic
Observation
Rows of W appear as (scaled) rows of M
Question
How can we identify anchor words?
Removing a row from M strictly changes the convex hull iff it is ananchor word
Hence we can identify all the anchor words via linear programming(can be made robust to noise)
![Page 98: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/98.jpg)
Topic Models
Topic matrix A, generate W stochastically
For each document (column in M = AW ) sample N words
e.g. Latent Dirichlet Allocation (LDA), Correlated Topic Model(CTM), Pachinko Allocation Model (PAM), ...
Question
Can we estimate A, given random samples from M?
Yes! [Arora, Ge, Moitra] we give a provable algorithm based on(noise-tolerant) NMF
Advertisement: Sanjeev will talk about this here in July
![Page 99: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/99.jpg)
Topic Models
Topic matrix A, generate W stochastically
For each document (column in M = AW ) sample N words
e.g. Latent Dirichlet Allocation (LDA), Correlated Topic Model(CTM), Pachinko Allocation Model (PAM), ...
Question
Can we estimate A, given random samples from M?
Yes! [Arora, Ge, Moitra] we give a provable algorithm based on(noise-tolerant) NMF
Advertisement: Sanjeev will talk about this here in July
![Page 100: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/100.jpg)
Topic Models
Topic matrix A, generate W stochastically
For each document (column in M = AW ) sample N words
e.g. Latent Dirichlet Allocation (LDA), Correlated Topic Model(CTM), Pachinko Allocation Model (PAM), ...
Question
Can we estimate A, given random samples from M?
Yes! [Arora, Ge, Moitra] we give a provable algorithm based on(noise-tolerant) NMF
Advertisement: Sanjeev will talk about this here in July
![Page 101: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/101.jpg)
Topic Models
Topic matrix A, generate W stochastically
For each document (column in M = AW ) sample N words
e.g. Latent Dirichlet Allocation (LDA), Correlated Topic Model(CTM), Pachinko Allocation Model (PAM), ...
Question
Can we estimate A, given random samples from M?
Yes! [Arora, Ge, Moitra] we give a provable algorithm based on(noise-tolerant) NMF
Advertisement: Sanjeev will talk about this here in July
![Page 102: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/102.jpg)
Topic Models
Topic matrix A, generate W stochastically
For each document (column in M = AW ) sample N words
e.g. Latent Dirichlet Allocation (LDA), Correlated Topic Model(CTM), Pachinko Allocation Model (PAM), ...
Question
Can we estimate A, given random samples from M?
Yes! [Arora, Ge, Moitra] we give a provable algorithm based on(noise-tolerant) NMF
Advertisement: Sanjeev will talk about this here in July
![Page 103: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/103.jpg)
Concluding Remarks
This is just part of a broader agenda:
Question
When is machine learning provably easy?
Often, theoretical models for learning are too hard or focus on mistakebounds (e.g. PAC)
Question
Will this involve a better understanding of real data? A betterunderstanding of popular algorithms in machine learning? Both?
Some interesting problems worth further investigation: Topic Models,Independent Component Analysis, Graphical Models, Deep Learning
![Page 104: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/104.jpg)
Concluding Remarks
This is just part of a broader agenda:
Question
When is machine learning provably easy?
Often, theoretical models for learning are too hard or focus on mistakebounds (e.g. PAC)
Question
Will this involve a better understanding of real data? A betterunderstanding of popular algorithms in machine learning? Both?
Some interesting problems worth further investigation: Topic Models,Independent Component Analysis, Graphical Models, Deep Learning
![Page 105: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/105.jpg)
Concluding Remarks
This is just part of a broader agenda:
Question
When is machine learning provably easy?
Often, theoretical models for learning are too hard or focus on mistakebounds (e.g. PAC)
Question
Will this involve a better understanding of real data?
A betterunderstanding of popular algorithms in machine learning? Both?
Some interesting problems worth further investigation: Topic Models,Independent Component Analysis, Graphical Models, Deep Learning
![Page 106: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/106.jpg)
Concluding Remarks
This is just part of a broader agenda:
Question
When is machine learning provably easy?
Often, theoretical models for learning are too hard or focus on mistakebounds (e.g. PAC)
Question
Will this involve a better understanding of real data? A betterunderstanding of popular algorithms in machine learning?
Both?
Some interesting problems worth further investigation: Topic Models,Independent Component Analysis, Graphical Models, Deep Learning
![Page 107: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/107.jpg)
Concluding Remarks
This is just part of a broader agenda:
Question
When is machine learning provably easy?
Often, theoretical models for learning are too hard or focus on mistakebounds (e.g. PAC)
Question
Will this involve a better understanding of real data? A betterunderstanding of popular algorithms in machine learning? Both?
Some interesting problems worth further investigation: Topic Models,Independent Component Analysis, Graphical Models, Deep Learning
![Page 108: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/108.jpg)
Concluding Remarks
This is just part of a broader agenda:
Question
When is machine learning provably easy?
Often, theoretical models for learning are too hard or focus on mistakebounds (e.g. PAC)
Question
Will this involve a better understanding of real data? A betterunderstanding of popular algorithms in machine learning? Both?
Some interesting problems worth further investigation: Topic Models,Independent Component Analysis, Graphical Models, Deep Learning
![Page 109: Computing a Nonnegative Matrix Factorization …people.csail.mit.edu/moitra/docs/Provably3.pdfComputing a Nonnegative Matrix Factorization { Provably Ankur Moitra, IAS joint work with](https://reader030.vdocuments.mx/reader030/viewer/2022040115/5e611f7bf987cd49d32e3267/html5/thumbnails/109.jpg)
Thanks!