chinese restaurant process

34
Chinese Restaurant Process Mohitdeep Singh ML , April 29 th , 2015

Upload: mohitdeep-singh

Post on 15-Jul-2015

217 views

Category:

Science


5 download

TRANSCRIPT

Page 1: Chinese Restaurant Process

Chinese Restaurant ProcessMohitdeep Singh

ML , April 29th, 2015

Page 2: Chinese Restaurant Process

Outline

• General Introduction

• Chinese Restaurant Process

• Build up to non-parametric Bayesians

• Chinese Restaurant Franchise (if time)

• Demo

Page 3: Chinese Restaurant Process

Motivation

• Where do you start• How do you start• Unsupervised Learning Techniques ??

Page 4: Chinese Restaurant Process

The ML story

MLE/MAP estimation

Decision Rule/ Probability Distributions/tree..

Intractable to directly estimate(EM techniques)

Page 5: Chinese Restaurant Process

Machine Learning

Algorithms/DS etc

Reasoning uncertainty

Statistics

Page 6: Chinese Restaurant Process

Bayesian vs Frequentist

Page 7: Chinese Restaurant Process

Bayesian vs Frequentist

Frequentist Bayesian

Parametric Logistic Regression, FisherDiscriminant Analysis ..

Graphical Models…

Non-Parametric KNN, kernel approaches, decision trees

Gaussian Process, DirichletProcess…

Page 8: Chinese Restaurant Process

Clustering

• Fundamental problem in machine learning

• Where are the clusters?

• How many clusters (parameter k)

Page 9: Chinese Restaurant Process

LDA

w: word represented by multinomial random variablez: topic allocation represented as multinomial random variableΘ: document model as Dirichlet random variableα & β are random variables(hyper-parameters)

http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation

Page 10: Chinese Restaurant Process

Non-Parameteric Bayesian

• Fundamental equation:

posterior ∝ prior X likelihood

• If θ is euclidean parameter

p(θ|x) ∝ p(θ) p(x|θ)

• Introduce G (stochastic process)

p(G|x) ∝ p(G) p(x|G)

Page 11: Chinese Restaurant Process

Chinese Restaurant Process

• A random process where task is analogous of seating customers in Chinese Restaurant with infinite number of tables.

• First person sits in first table (deterministic)

• nth person can sits at a table based on following process:

Join existing table ∝ ni

Join an empty table ∝ αo

Page 12: Chinese Restaurant Process

CRP and clustering

ϕ1ϕ2 ϕ3

Data points are customersTables are clustersPrior: First person to sit at table k chooses a parameter vector ϕk for that table (P(G))Likelihood: Associate the data points with parameter of the table (P(x|G)).

Posterior: Turn the bayesian crank. P(G|x).

Page 13: Chinese Restaurant Process

Exchangebility

• As a prior on partition of data, CRP is exchangeable process.

• Concept introduced by Haag, popularized by de-Finetti.

• A sequence is exchangeable if its joint probability function is symmetric function of its n arguments.

Polya-urn model(more later)

θn|θ1 ….θn-1 ~ αoGo + Σ δθi

Page 14: Chinese Restaurant Process

Polya-urn model

Consider an urn with g green balls and r red balls. Draw a ball at random and note its color. Fix a number a and replace the ball you observed with a balls of same color.

Let Xi = 1 if i-th draw yield green ball else 0.

p(1,1,0,1) = g (g+a) r g + 2a

(g+r) (g+r+a) (g+r+2a) (g+r+3a)

= p(0,1,1,1)

Page 15: Chinese Restaurant Process

“The” De Fenetti Theorem

Theorem (De Fenetti): If x1,x2… are exchangeable, then the joint probability distribution p(x1,x2….) has a form

p(x1,x2..) =

In simple words, any exchangeable sequence of r.v.s can be represented as a mixture of i.i.d r.v.s.

Page 16: Chinese Restaurant Process

Finite Mixture Models

http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps

Page 17: Chinese Restaurant Process

Stick breaking process

• Infinite sequence of Beta random variables.

βk ~ Beta(1,αo)

• Define infinite sequence of mixing proportions as:

- π1 = β1

- πk = βk πl (1-βl)

Page 18: Chinese Restaurant Process

π1π3

β1

Β2(1-β1)

β3(1-β2)(1-β1)

π2 …..

Page 19: Chinese Restaurant Process

G = Σk πkδk k = 1…∞

G is called Dirichlet Process.

Any finite partition (A1,… Ar) of the sample space, the random vector (G(A1),… G(Ar)) is distributed as finite dimensional Dirichlet Distribution.

G ~ DP(αo,Go)

Page 20: Chinese Restaurant Process

Dirichlet Process Mixture Model

http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps

Page 21: Chinese Restaurant Process

Marginalize DPMM to get CRP

Page 22: Chinese Restaurant Process

Hierarchical Dirichlet Process

• Multiple groups of data

• Share some common properties

• Cluster shared across multiple groups

Page 23: Chinese Restaurant Process

Naïve attempt

MLE estimation??

Page 24: Chinese Restaurant Process

Hierarchical Bayesian Approach

θ

Page 25: Chinese Restaurant Process

Dirichlet Process Admixture model

• Admixture model: For each document, repeatedly draw the mixing proportions from prior.

• DP will yield disjoint set of atoms for different documents

• If the set is disjoint => No sharing.

• No sharing => No chinese restaurant

Admixture Model

Page 26: Chinese Restaurant Process

Hierarchical Dirichlet Process

• Issue is Go is continuous measure.

• Let Go be discrete and random? But how?

Page 27: Chinese Restaurant Process

Hierarchical Dirichlet Process

• Issue is Go is continuous measure.

• Let Go be discrete and random? But how?

Introduce another DP on Go.

Go | ϒ, H ~ DP(ϒH)

Gj | α, Go ~ DP(αoGo)

• We just got more Bayesian.

Page 28: Chinese Restaurant Process

Hierarchical Dirichlet Mixture Models

http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps

Page 29: Chinese Restaurant Process

Chinese Restaurant Franchise

Global Menu….

Page 30: Chinese Restaurant Process

Recap

• Introduction

• Exchangeability and De-Fenetti Theorem

• Dirichlet Process

• Hierarchical Dirichlet Process

Page 31: Chinese Restaurant Process

Other metaphors

• Introduction

• Exchangeability and De-Fenetti Theorem

• Dirichlet Process

• Hierarchical Dirichlet Process

• Nested Chinese Restaurant Process

• Beta Process(Indian Buffet Process)

• Hierarchical Beta Process(The Dependents Diner Process)

• Non Parametric Regression (gaussian process)

• Inference Techniques (MCMC, Variational techniques)

Page 32: Chinese Restaurant Process

DEMO

Page 33: Chinese Restaurant Process

Feature engineering is Machine Learning

• Thanks to big-data, trend is to store everything without giving much thought apriori.

• Thanks to big-data frameworks (like Presto), which aids in data exploration.

• Let the models do heavy lifting.• Let the data learn underlying structure, i.e.

minimize the assumptions.• Deep Learning is another example, where rich

(although blackbox) models are used in feature learning.

Page 34: Chinese Restaurant Process

Questions

References:1) http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps

2) http://mmds.imm.dtu.dk/presentations/whyeteh.pdf

3) Bayesian Non-Parametrics tutorial- MLSS 2013 Tubingen

4) Machine Learning: A Probabilistic Approach Kevin Murphy

And many more awesome tutorials available in the internet.