topic modeling using latent dirichlet allocation

10
Topic Modeling using Latent Dirichlet Allocation

Upload: ruth-curtis

Post on 29-Jan-2016

256 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topic Modeling using Latent Dirichlet Allocation

Topic Modeling using Latent Dirichlet Allocation

Page 2: Topic Modeling using Latent Dirichlet Allocation

Topic Modeling

• A process of analyzing large collections of documents in order to discover latent topics from the documents.

• Able to organize and structure the documents• Discover the different topics that a documents has • How similar are certain documents

Page 3: Topic Modeling using Latent Dirichlet Allocation
Page 4: Topic Modeling using Latent Dirichlet Allocation

Latent Dirichlet Allocation (LDA)

• It is a unsupervised learning

• Produces a generative model

Page 5: Topic Modeling using Latent Dirichlet Allocation

Terminology

• Word: w {1,…,V} ∈

• Document: Sequence of N words

• Corpus: which is a set of M documents

• Topic: z {1,…, K} ∈

Page 6: Topic Modeling using Latent Dirichlet Allocation

Topic

A topic is a set of co-occurring terms

Page 7: Topic Modeling using Latent Dirichlet Allocation

Generate Process

1. Choose N based on Poisson distribution

2. Choose θ based on Dirichlet distribution (θ is a topic weight vector)

3. For each of the N words:1. Choose z from θ2. Choose w from z

Page 8: Topic Modeling using Latent Dirichlet Allocation

Learning

• Variational Bayes

• Gibbs Sampling

Page 9: Topic Modeling using Latent Dirichlet Allocation

Applications of LDA

• Collaborative Filtering

• Spam Detection

• Music

• Image

Page 10: Topic Modeling using Latent Dirichlet Allocation

References

D M Blei, A Y Ng, M I Jordan. (2003). Latent Dirichlet Allocation. The Journal of Machine Learning Research. 993-1022.

D J Hu. (2009). Latent Dirichlet Allocation for text, images, and music.