topic models - donald bren school of information and ... · topic modeling • given some dataset...
TRANSCRIPT
![Page 1: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/1.jpg)
Topic ModelspLSI (Hofmann, 1999), LDA (Blei, Ng, Jordan, 2002)
Nathan Sutter
1
![Page 2: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/2.jpg)
Overview
Introduction
• Topic Modeling
• Review of Relevant Probability Distributions
• LSI
• Evaluating Topic Models
Topic Models
• pLSI
• LDA
• Application to Collaborative Filtering
Nathan Sutter - SCIVI Lab, ICS 2
![Page 3: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/3.jpg)
IntroductionTopic Modeling
• Given some dataset what topics are persistent in that dataset?
• Can we infer that items within our dataset "belong" to a topic or mixture of topics?
• Example topic groups in datasets :
. Documents
. Images
. Movie preferences per user
Nathan Sutter - SCIVI Lab, ICS 3
![Page 4: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/4.jpg)
. etc...
• Examples of topic models
. SVD/Speci�c Applications of SVD (like LSI)
. pLSI
. LDA
• Trivial to apply topic models to collaborative �ltering!
. Beware of Caviats!
Nathan Sutter - SCIVI Lab, ICS 4
![Page 5: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/5.jpg)
Introduction(cont.)
Review of Relevant Probability Distributions
• Multinomial Distribution
Parameters :
n independent events,
x1, ..., xk, where xi is the number of times event i occurs,P
i xi = n.
p1, ..., pk, where pi is the probabiliy that event i occurs,P
i pi = 1..
p(x1, ..., xn) =n!
x1!...xn!
Yi
pxii
Toy Example :
Have an urn with 7 balls in it, which are either red, green, blue, or yellow. Out of seven
draws, what is the probability that a red ball is drawn once, a green ball is drawn 4 times,
and a blue ball is drawn 2 times?
Parameters :
7 independent events,
x1 = 1, x2 = 4, x3 = 2, x4 = 0
p1 = p2 = p3 = p4 = .25.
p(x1 = 1, x2 = 4, x3 = 2) = 7!1!4!2!0!(.25
1)(.254)(.252)(.250)
Nathan Sutter - SCIVI Lab, ICS 5
![Page 6: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/6.jpg)
Introduction(cont.)• Dirichlet Distribution
Returns the probability of multinomial probabilities θ, given counts for each event α.
Parameters :
θ = (θ1, ..., θk), where θi is the probabiliy that event i occurs.P
i θi = 1.
α = (α1, ..., αk), where αi is the number of times event i occurs.
Dir(θ|α) =1
B(α)
Yi
θαi−1
i
B(α) =
Qi Γ(αi)
Γ(P
i αi)
Or put more intuitively :
. Given a stick of length 1, break stick into k pieces.
. Allow for variability in the size of pieces.
Nathan Sutter - SCIVI Lab, ICS 6
![Page 7: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/7.jpg)
Introduction(cont.)Nice transition to topic modeling :
. We have k topics in our dataset, each claiming a part of the stick.
. Interested in how much of the stick each topic claims in our training set, optimally.
Properties :
. Conjugate to the multinomial distribution.
. Finite dimensional su�cient statistics.
Nathan Sutter - SCIVI Lab, ICS 7
![Page 8: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/8.jpg)
Introduction(cont.)
LSI
• Map a document to latent spaceX = UΣV T U,V orthonormal, Σ has ordered eigs of X.
Σ̂ = Σ(1 : n, 1 : n) 3 X ≈ UΣ̂V T Ignore eigenvalues of dimensions > n.
V = V TUΣ̂−1 Solving for V
d = dTUΣ̂−1 Individual document mapping in latent space
• Compare that document to other documents
cos(θ) =d1·d2
||d1||||d2||
Evaluating Topic Models
• Calculate perplexity on test set, given model parameters learned during training.
• Monotonically Decreasing in the likelihood of the test data
• A good model would assign a high likelihood to held out documents, and thus, low perplexity.
perplexity(Dtest) = −∑
m log(p(wm))∑m wm
Nathan Sutter - SCIVI Lab, ICS 8
![Page 9: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/9.jpg)
Topic Models
pLSI (Ho�man, 1999)
• Notation
Document Data :
M documents D = (d1, ..., dM)
N words per document di = (w1, ..., wN)
K topics z1, ..., zk
Parameters :
P (w|z) : Probability of a word given a topic.
P (d|z) : Probability of a document given a topic.
P (z) : Probability of a topic.
Nathan Sutter - SCIVI Lab, ICS 9
![Page 10: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/10.jpg)
Topic Models(cont.)• Model Speci�cation
P (d, z, w) = P (d)P (w|z)P (z|d)
P (d, w) =X
z
P (z)P (d|z)P (w|z)
• Relation to LSI
We can interpret the joint probability P (d, w) as P = UΣV T such that :
U = (P (di|zk))i,k
Σ = diag(P (zk))k
V = (P (wj|zk))j,k
P = UΣV T
Nathan Sutter - SCIVI Lab, ICS 10
![Page 11: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/11.jpg)
Topic Models(cont.)• Fitting the model via an EM algorithm
E Step:
P (z|d, w) = P (z)P (d|z)P (w|z)Pz′ P (z′)P (d|z′)P (w|z′)
M Step:
P (w|z) ∝P
d n(d, w)P (z|d, w)
P (d|z) ∝P
w n(d, w)P (z|d, w)
P (z) ∝P
d,w n(d, w)P (z|d, w)
• Model inference on unseen data
. Recall we are interested in perplexity as a metric for model performance.
. Need to calculate P (w|wobs) =P
z P (w|z)P (z|wobs).
. Fix P (w|z) from training, re-learn P (z|d, w) on test set (called folding-in).
Nathan Sutter - SCIVI Lab, ICS 11
![Page 12: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/12.jpg)
Topic Models(cont.)• Applications
pLSI tested on two text datasets, MED and LOB, using bag of words assumption.
• Disadvantages
. Assigns zero probability to documents in training set
. Folding-in is kind of weird.
. Folding-in, as presented by Ho�mann, ignores an intractable normalization constant.
. Correct folding-in requires heavy regularization depending on the test data split.
. pLSA over�ts heavily as K →∞.
Nathan Sutter - SCIVI Lab, ICS 12
![Page 13: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/13.jpg)
LDA (Blei, Ng, and Jordan, 2002)
• Notation
Document Data :
A vocabulary that is V entries long.
Words w are V dimensional vectors such that wu = 1, and wv = 0 for u 6= v
A document which is a sequence of N words, w = {w1, w2, ..., wN}A corpus of M documents, D = {w1, w2, ..., wM}
Parameters :
α : hyperparameter on θ, number of times each topic occurs.
β : βij = p(wj = 1|zi = 1)
θ : individual probabilities of each topic occuring.
Nathan Sutter - SCIVI Lab, ICS 13
![Page 14: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/14.jpg)
Topic Models (cont.)• Model Speci�cation
We imagine a corpus is generated as follows :
For each document 1, ..., M .
Choose N ∼ Poisson(ξ)
Choose θ ∼ Dirchilet(α)
For each word in wn
Choose zn ∼ Multinomial(θ)
Choose a word wn from p(wn|zn, β)
Full Joint Probability (for one document):
p(θ, z, w|α, β) = p(θ|α)YN
p(zn|θ)p(w|zn, β)
Nathan Sutter - SCIVI Lab, ICS 14
![Page 15: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/15.jpg)
Topic Models (cont.)Marginalize out parameters :
p(w|α, β) =
Zp(θ|α)
YN
Xz
p(zn|θ)p(w|zn, β)dθ
We're interested in :
arg maxα,β
L(p(w|α, β))
Problem : function we are trying to optimize is intractable for exact inference.
• Learning Model Parameters from data
. Use Variational EM to approximate α, β (Blei, et al, 2001).
. Use a collapsed Gibbs sampler to approximate θ, β (Gri�ths,Steyvers 2002).
Nathan Sutter - SCIVI Lab, ICS 15
![Page 16: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/16.jpg)
Topic Models (cont.)• Applications
LDA tested on text dataset AP, relying on bag of words assumption.
Nathan Sutter - SCIVI Lab, ICS 16
![Page 17: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/17.jpg)
Topic Models (cont.)
Application to Collaborative Filtering
• Dataset and Framework
. Using the EachMovie dataset.
. Each movie rating is converted to either a positive or negative rating.
. Users are analogous to documents, Movies are analougous to words.
. Measure Predictive Perplexity.
• pLSA
P (w|wobs) =X
z
P (w|z)P (z|wobs)
• LDA
P (w|wobs) =
Z Xz
P (w|z)P (z|θ)P (θ|wobs)dθ
Nathan Sutter - SCIVI Lab, ICS 17
![Page 18: Topic Models - Donald Bren School of Information and ... · Topic Modeling • Given some dataset what topics are persistent in that dataset? • Can we infer that items within our](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed70f4d62136e72fb7bc27d/html5/thumbnails/18.jpg)
Topic Models (cont.)• Results
Nathan Sutter - SCIVI Lab, ICS 18