scaling up lda
DESCRIPTION
Scaling up LDA. William Cohen. First some pictures…. LDA in way too much detail. William Cohen. Review - LDA. Latent Dirichlet Allocation with Gibbs. . Randomly initialize each z m,n Repeat for t=1,…. For each doc m, word n Find Pr( z mn = k |other z’s) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/1.jpg)
Scaling up LDA
William Cohen
![Page 2: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/2.jpg)
First some pictures…
![Page 3: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/3.jpg)
![Page 4: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/4.jpg)
![Page 5: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/5.jpg)
![Page 6: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/6.jpg)
![Page 7: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/7.jpg)
![Page 8: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/8.jpg)
![Page 9: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/9.jpg)
![Page 10: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/10.jpg)
![Page 11: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/11.jpg)
![Page 12: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/12.jpg)
![Page 13: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/13.jpg)
![Page 14: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/14.jpg)
![Page 15: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/15.jpg)
![Page 16: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/16.jpg)
![Page 17: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/17.jpg)
![Page 18: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/18.jpg)
![Page 19: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/19.jpg)
LDAin way too much detail
William Cohen
![Page 20: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/20.jpg)
Review - LDA
• Latent Dirichlet Allocation with Gibbs
z
w
M
N
a• Randomly initialize each zm,n
• Repeat for t=1,….
• For each doc m, word n
• Find Pr(zmn=k|other z’s)
• Sample zmn according to that distr.
![Page 21: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/21.jpg)
Way way more detail
![Page 22: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/22.jpg)
More detail
![Page 23: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/23.jpg)
![Page 24: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/24.jpg)
![Page 25: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/25.jpg)
What gets learned…..
![Page 26: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/26.jpg)
In A Math-ier Notation
N[*,k]N[d,k]M[w,k]
N[*,*]=V
![Page 27: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/27.jpg)
for each document d and word position j in d• z[d,j] = k, a random topic• N[d,k]++• W[w,k]++ where w = id of j-th word in d
![Page 28: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/28.jpg)
for each document d and word position j in d• z[d,j] = k, a new random topic• update N, W to reflect the new assignment of z:
• N[d,k]++; N[d,k’] - - where k’ is old z[d,j]• W[w,k]++; W[w,k’] - - where w is w[d,j]
for each pass t=1,2,….
![Page 29: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/29.jpg)
![Page 30: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/30.jpg)
z=1
z=2
z=3
…
…
unit heightrandom
![Page 31: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/31.jpg)
JMLR 2009
![Page 32: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/32.jpg)
Observation
• How much does the choice of z depend on the other z’s in the same document?–quite a lot
• How much does the choice of z depend on the other z’s in elsewhere in the corpus?–maybe not so much–depends on Pr(w|t) but that changes slowly
• Can we parallelize Gibbs and still get good results?
![Page 33: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/33.jpg)
Question
• Can we parallelize Gibbs sampling?– formally, no: every choice of z depends on all the other z’s–Gibbs needs to be sequential
• just like SGD
![Page 34: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/34.jpg)
What if you try and parallelize?
Split document/term matrix randomly and distribute to p processors .. then run “Approximate Distributed LDA”
![Page 35: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/35.jpg)
What if you try and parallelize?
D=#docs W=#word(types) K=#topics N=words in corpus
![Page 36: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/36.jpg)
![Page 37: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/37.jpg)
![Page 38: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/38.jpg)
![Page 39: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/39.jpg)
![Page 40: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/40.jpg)
z=1
z=2
z=3
…
…
unit heightrandom
![Page 41: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/41.jpg)
![Page 42: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/42.jpg)
Running total of P(z=k|…) or P(z<=k)
![Page 43: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/43.jpg)
Discussion….
• Where do you spend your time?–sampling the z’s–each sampling step involves a loop over all topics– this seems wasteful
• even with many topics, words are often only assigned to a few different topics– low frequency words appear < K times … and there are lots and lots of them!– even frequent words are not in every topic
![Page 44: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/44.jpg)
Discussion….
• What’s the solution?Idea: come up with
approximations to Z at
each stage - then you might be
able to stop early…..
Want Zi>=Z
![Page 45: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/45.jpg)
Tricks• How do you compute and maintain the bound?
– see the paper• What order do you go in?
– want to pick large P(k)’s first– … so we want large P(k|d) and P(k|w)– … so we maintain k’s in sorted order
• which only change a little bit after each flip, so a bubble-sort will fix up the almost-sorted array
![Page 46: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/46.jpg)
Results
![Page 47: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/47.jpg)
Results
![Page 48: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/48.jpg)
Results
![Page 49: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/49.jpg)
KDD 09
![Page 50: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/50.jpg)
z=s+r+q
![Page 51: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/51.jpg)
z=s+r+q
• If U<s:• lookup U on line segment with tic-
marks at α1β/(βV + n.|1), α2β/(βV + n.|2), …• If s<U<r:
• lookup U on line segment for rOnly need to check t such that nt|
d>0
![Page 52: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/52.jpg)
z=s+r+q
• If U<s:• lookup U on line segment with tic-
marks at α1β/(βV + n.|1), α2β/(βV + n.|2), …• If s<U<s+r:
• lookup U on line segment for r• If s+r<U:
• lookup U on line segment for qOnly need to check t such that nw|
t>0
![Page 53: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/53.jpg)
z=s+r+q
Only need to check t such that nw|
t>0
Only need to check t such that nt|
d>0
Only need to check occasionally (< 10% of the time)
![Page 54: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/54.jpg)
z=s+r+q
Need to store nw|t for each word, topic pair …???
Only need to store nt|d for current d
Only need to store (and maintain) total words per topic and α’s,β,V
Trick; count up nt|
d for d when you start working on d and update incrementally
![Page 55: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/55.jpg)
z=s+r+q
Need to store nw|t for each word, topic pair …???
1. Precompute, for each t,
Most (>90%) of the time and space is here…
2. Quickly find t’s such that nw|t is large for w
![Page 56: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/56.jpg)
Need to store nw|t for each word, topic pair …???
1. Precompute, for each t,
Most (>90%) of the time and space is here…
2. Quickly find t’s such that nw|t is large for w• map w to an int array
• no larger than frequency w• no larger than #topics
• encode (t,n) as a bit vector• n in the high-order bits• t in the low-order bits
• keep ints sorted in descending order
![Page 57: Scaling up LDA](https://reader035.vdocuments.mx/reader035/viewer/2022062321/56813cd8550346895da67c52/html5/thumbnails/57.jpg)