introduction to lda jinyang gao. outline bayesian analysis dirichlet distribution evolution of topic...
TRANSCRIPT
![Page 1: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/1.jpg)
Introduction to LDA
Jinyang Gao
![Page 2: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/2.jpg)
Outline
• Bayesian Analysis• Dirichlet Distribution• Evolution of Topic Model• Gibbs Sampling• Intuition Analysis of Parameter Setting
![Page 3: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/3.jpg)
Outline
• Bayesian Analysis• Dirichlet Distribution• Evolution of Topic Model• Gibbs Sampling• Intuition Analysis of Parameter Setting
![Page 4: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/4.jpg)
Bayesian Analysis
• Suppose we have some coins, they have an average 0.75 probability to appear the FRONT.
• We throw a coin, how should we esimate?
• FRONT: 0.75 BACK: 0.25• Prior Estimation
![Page 5: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/5.jpg)
Bayesian Analysis
• Suppose we throw a coin 100 times, and we observed that 25 of them is FRONT.
• How should we estimate the next throw:
• FRONT: 0.25 BACK: 0.75• Maximum Likelihood Estimation
![Page 6: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/6.jpg)
Bayesian Analysis
• Can we give a trade-off between prior and observation?
• Prior is NOT certain to be some fixed value.– Change 0.75 to a distribution of Beta(u|15, 5)
• Add posterior observation (5 FRONT 15 BACK)– Beta(u|15, 5) to Beta(u|15, 15)
• Calculate the expectation etc.
![Page 7: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/7.jpg)
Bayesian Analysis
• Key idea:– Express the uncertainty of prior estimation as a
distribution.– Distribution converge to a single value after more
and more observation– Little observation : prior estimation– Large observation: posterior observation– If we have strong confidence about prior, a single
value estimation after any observation won’t change.
![Page 8: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/8.jpg)
Outline
• Bayesian Analysis• Dirichlet Distribution• Evolution of Topic Model• Gibbs Sampling• Intuition Analysis of Parameter Setting
![Page 9: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/9.jpg)
Dirichlet Distribution
• Some properties:
• Just some smoothing method by adding on the observation value of each choices.
![Page 10: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/10.jpg)
Dirichlet Distribution
• A Dirichlet Distribution with parameter EQUALs to the smoothing method that add to each choices.
• Here EQUAL when we only care about the expectation, but it holds on most application!
• Don’t be deterred by the definition, it is just the Laplace (when we set to be equal) or other smoothing methods represented in a Bayesian way!
![Page 11: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/11.jpg)
Outline
• Bayesian Analysis• Dirichlet Distribution• Evolution of Topic Model• Gibbs Sampling• Intuition Analysis of Parameter Setting
![Page 12: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/12.jpg)
Evolution of Topic Model
• Here we give some solutions from NAÏVE to LDA.– Kmeans (TF vector version)– Kmeans with KL-divergence(Language Model
Version)– PLSA (fixed topic frequency prior)– LDA (based on topic frequency observation and
smoothing)
![Page 13: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/13.jpg)
Evolution of Topic Model
• K-means with TF vector:– We begin with one simplest model.– Just cluster the document!– Each document is a vector of terms.
– How to cluster? K-means!– Each cluster is a topic.– Each topic is a TF vector.
![Page 14: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/14.jpg)
Evolution of Topic Model
• Problems of K-means with TF vector– High frequency words over influence(idf logtf and
stop words can help some)– Correlation among the words– Single word than a topic (implement it and you
will see)
![Page 15: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/15.jpg)
Evolution of Topic Model
• K-means with KL-divergence:– Generation model about text.– Each text is a probability distribution of words.– Still just cluster the document.
– K-means(not cosine or Euclidean, KL-divegence)– Each cluster is a topic.– Each topic is a distribution of words.
![Page 16: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/16.jpg)
Evolution of Topic Model
• Problems of K-means with KL-divergence:– Much better, some topic appear.– Still not clearly.
– Each document only have one topic?– It’s still just a good cluster method for documents.
![Page 17: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/17.jpg)
Evolution of Topic Model
• PLSA/PLSI– Each text is a probability distribution of words.– Each text is a distribution of topics.
– Probabilistic way to assign topics and words(EM).– Each cluster is a topic (but no entire document in a
cluster).– Each topic is a distribution of words.
![Page 18: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/18.jpg)
Evolution of Topic Model
• Problems of PLSA:– First available version of topic model in this
evolution!
– General words? Context information? See works of QZ Mei among 2005-2008.
– What about the k in K-means?– Each topic is not in the same size.– Can two topics with same distribution combine?– Can a large topic break?
![Page 19: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/19.jpg)
Evolution of Topic Model
• LDA:– Gives a prior distribution of topics.– From maximum likelihood estimation(MLE) to
Bayesian analysis in word-to-topic assignments.– Dirichlet is the easiest way!– Give a complete Bayesian analysis.
![Page 20: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/20.jpg)
Evolution of Topic Model
• Analysis of LDA:– Small topic will disappear (even the central point
text has a larger probability to be chose by a large nearby topic). K is self-adaption here.
– Smoothing in topic-word distribution.
![Page 21: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/21.jpg)
What About Short Text?
• Consider the following:– Lots of documents only have one meaningful
word.– How many words is enough to be a topic?– Usually no ‘blue’ and ‘red’ co-occurred in a short
text, but “blue plane” or “red car”.– ……
![Page 22: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/22.jpg)
Evolution of Topic Model
• This is only some milestone in this evolution line. Small changes may give different results.– Text weight– General words– Probabilistic clustering– Hyperparameters– Context information– Hierarchy
![Page 23: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/23.jpg)
Evolution of Topic Model
• You SHOULD implement ALL of them if you want to get a deep understand of topic model !– I implemented all of them in both long and short
text in my undergraduate. The code is easy and data is also easy to be obtained.
– Check some topic (and their variation in iteration) and find why they work well or bad.
– You will know more about each consideration in model inference and some derivation is not difficult in code.
![Page 24: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/24.jpg)
Evolution of Topic Model
• You should know why some models are RIGHT rather than performs good in experiment. Otherwise you can’t know which model is RIGHT in your own problem (usually some features changed).
• Study the features of models, data and targets carefully. Use Occam's Razor to develop your model.
![Page 25: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/25.jpg)
Outline
• Bayesian Analysis• Dirichlet Distribution• Evolution of Topic Model• Gibbs Sampling• Intuition Analysis of Parameter Setting
![Page 26: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/26.jpg)
Gibbs Sampling
• Gibbs sampling:– Key idea: if all the parameters are decided, then
the decision for new things should be easy.
– Choose one thing (e.g. one word’s topic etc.)– Fix all others.– Sample (not optimize) based on other.– Loop until converge.
![Page 27: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/27.jpg)
Gibbs Sampling
Pls read the paper carefully for the details. It is a easy-to-follow material for Gibbs in LDA.
![Page 28: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/28.jpg)
Gibbs Sampling
• EM– Fix all parameters or settings– Compute the best(maximize likelihood) for all
parameters or settings– Changed to the new setting– Loop until converge
![Page 29: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/29.jpg)
Gibbs Sampling
• Either Gibbs or EM gives a best estimation!• Exact best estimation is to calculate the
expectation of each random variable consider all the possible situation(exponential), but NOT their optimized expectation in current status.
• But so far these are the best we can do. • No good or bad for them in my personal view.
![Page 30: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/30.jpg)
Outline
• Bayesian Analysis• Dirichlet Distribution• Evolution of Topic Model• Gibbs Sampling• Intuition Analysis of Parameter Settings
![Page 31: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/31.jpg)
Parameter Settings
• Think first:– : smooth the prior probability of topics.– : smooth the probability of words appears in a
topic.
![Page 32: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/32.jpg)
Parameter Settings
• Higher :– Higher probability for rare words in a topic. Rare
words are easy to survive. So more words in a topic in average.
• Higher :– Higher probability for small topics. Small topics are
easy to survive. So more topics in total.– More topics in a documents.
![Page 33: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/33.jpg)
Parameter Settings
• Multiple interpretation:– Lower and result in more decisive topic
associations. The words in a topic should be more similar.
– : the topic difference among documents.– : the word similarity in a topic– Don’t forget the K. The largest number of topic
you can have. Self-adaptively means you won’t suffer from bad K for K-means. But you still need to decide which number of topic you need.
![Page 34: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/34.jpg)
Summary
• Bayesian Analysis: Prior-Observation Trade-off• Dirichlet Distribution: Smoothing Method• Topic Model Evolution: Why It Works Well• Gibbs and EM: Variable Inference Methods• Parameter Setting: How Many Topics, Words
in a Topic
![Page 35: Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter](https://reader036.vdocuments.mx/reader036/viewer/2022062518/56649f4f5503460f94c71b67/html5/thumbnails/35.jpg)
THANKSQ&A