chinese restaurant process

Post on 15-Jul-2015

217 Views

Category:

Science

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chinese Restaurant ProcessMohitdeep Singh

ML , April 29th, 2015

Outline

• General Introduction

• Chinese Restaurant Process

• Build up to non-parametric Bayesians

• Chinese Restaurant Franchise (if time)

• Demo

Motivation

• Where do you start• How do you start• Unsupervised Learning Techniques ??

The ML story

MLE/MAP estimation

Decision Rule/ Probability Distributions/tree..

Intractable to directly estimate(EM techniques)

Machine Learning

Algorithms/DS etc

Reasoning uncertainty

Statistics

Bayesian vs Frequentist

Bayesian vs Frequentist

Frequentist Bayesian

Parametric Logistic Regression, FisherDiscriminant Analysis ..

Graphical Models…

Non-Parametric KNN, kernel approaches, decision trees

Gaussian Process, DirichletProcess…

Clustering

• Fundamental problem in machine learning

• Where are the clusters?

• How many clusters (parameter k)

LDA

w: word represented by multinomial random variablez: topic allocation represented as multinomial random variableΘ: document model as Dirichlet random variableα & β are random variables(hyper-parameters)

http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation

Non-Parameteric Bayesian

• Fundamental equation:

posterior ∝ prior X likelihood

• If θ is euclidean parameter

p(θ|x) ∝ p(θ) p(x|θ)

• Introduce G (stochastic process)

p(G|x) ∝ p(G) p(x|G)

Chinese Restaurant Process

• A random process where task is analogous of seating customers in Chinese Restaurant with infinite number of tables.

• First person sits in first table (deterministic)

• nth person can sits at a table based on following process:

Join existing table ∝ ni

Join an empty table ∝ αo

CRP and clustering

ϕ1ϕ2 ϕ3

Data points are customersTables are clustersPrior: First person to sit at table k chooses a parameter vector ϕk for that table (P(G))Likelihood: Associate the data points with parameter of the table (P(x|G)).

Posterior: Turn the bayesian crank. P(G|x).

Exchangebility

• As a prior on partition of data, CRP is exchangeable process.

• Concept introduced by Haag, popularized by de-Finetti.

• A sequence is exchangeable if its joint probability function is symmetric function of its n arguments.

Polya-urn model(more later)

θn|θ1 ….θn-1 ~ αoGo + Σ δθi

Polya-urn model

Consider an urn with g green balls and r red balls. Draw a ball at random and note its color. Fix a number a and replace the ball you observed with a balls of same color.

Let Xi = 1 if i-th draw yield green ball else 0.

p(1,1,0,1) = g (g+a) r g + 2a

(g+r) (g+r+a) (g+r+2a) (g+r+3a)

= p(0,1,1,1)

“The” De Fenetti Theorem

Theorem (De Fenetti): If x1,x2… are exchangeable, then the joint probability distribution p(x1,x2….) has a form

p(x1,x2..) =

In simple words, any exchangeable sequence of r.v.s can be represented as a mixture of i.i.d r.v.s.

Finite Mixture Models

http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps

Stick breaking process

• Infinite sequence of Beta random variables.

βk ~ Beta(1,αo)

• Define infinite sequence of mixing proportions as:

- π1 = β1

- πk = βk πl (1-βl)

π1π3

β1

Β2(1-β1)

β3(1-β2)(1-β1)

π2 …..

G = Σk πkδk k = 1…∞

G is called Dirichlet Process.

Any finite partition (A1,… Ar) of the sample space, the random vector (G(A1),… G(Ar)) is distributed as finite dimensional Dirichlet Distribution.

G ~ DP(αo,Go)

Dirichlet Process Mixture Model

http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps

Marginalize DPMM to get CRP

Hierarchical Dirichlet Process

• Multiple groups of data

• Share some common properties

• Cluster shared across multiple groups

Naïve attempt

MLE estimation??

Hierarchical Bayesian Approach

θ

Dirichlet Process Admixture model

• Admixture model: For each document, repeatedly draw the mixing proportions from prior.

• DP will yield disjoint set of atoms for different documents

• If the set is disjoint => No sharing.

• No sharing => No chinese restaurant

Admixture Model

Hierarchical Dirichlet Process

• Issue is Go is continuous measure.

• Let Go be discrete and random? But how?

Hierarchical Dirichlet Process

• Issue is Go is continuous measure.

• Let Go be discrete and random? But how?

Introduce another DP on Go.

Go | ϒ, H ~ DP(ϒH)

Gj | α, Go ~ DP(αoGo)

• We just got more Bayesian.

Hierarchical Dirichlet Mixture Models

http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps

Chinese Restaurant Franchise

Global Menu….

Recap

• Introduction

• Exchangeability and De-Fenetti Theorem

• Dirichlet Process

• Hierarchical Dirichlet Process

Other metaphors

• Introduction

• Exchangeability and De-Fenetti Theorem

• Dirichlet Process

• Hierarchical Dirichlet Process

• Nested Chinese Restaurant Process

• Beta Process(Indian Buffet Process)

• Hierarchical Beta Process(The Dependents Diner Process)

• Non Parametric Regression (gaussian process)

• Inference Techniques (MCMC, Variational techniques)

DEMO

Feature engineering is Machine Learning

• Thanks to big-data, trend is to store everything without giving much thought apriori.

• Thanks to big-data frameworks (like Presto), which aids in data exploration.

• Let the models do heavy lifting.• Let the data learn underlying structure, i.e.

minimize the assumptions.• Deep Learning is another example, where rich

(although blackbox) models are used in feature learning.

Questions

References:1) http://www.cs.berkeley.edu/~jordan/nips-tutorial05.ps

2) http://mmds.imm.dtu.dk/presentations/whyeteh.pdf

3) Bayesian Non-Parametrics tutorial- MLSS 2013 Tubingen

4) Machine Learning: A Probabilistic Approach Kevin Murphy

And many more awesome tutorials available in the internet.

top related