2012　mdsp pr12 k means mixture of gaussian

Course Calendar (revised 2012 Dec. 27)

Class DATE Contents

1 Sep. 26 Course information & Course overview

2 Oct. 4 Bayes Estimation

3 〃 11 Classical Bayes Estimation - Kalman Filter -

4 〃 18 Simulation-based Bayesian Methods

5 〃 25 Modern Bayesian Estimation ：Particle Filter

6 Nov. 1 HMM(Hidden Markov Model)

Nov. 8 No Class

7 〃 15 Bayesian Decision

8 〃 29 Non parametric Approaches

9 Dec. 6 PCA(Principal Component Analysis)

10 〃 13 ICA(Independent Component Analysis)

11 〃 20 Applications of PCA and ICA

12 〃 27 Clustering； k-means, Mixture Gaussian and EM

13 Jan. 17 Support Vector Machine

14 〃 22(Tue) No Class

Lecture Plan

Clustering：

K-means, Mixtures of Gaussians and EM

1. Introduction

2. K-means Algorithm

3. Mixtures of Gaussians

4. Re-formation of Mixtures of Gaussians

5. EM algorithm

3

１. Introduction

Unsupervised Learning and Clustering Problem

Given a set of feature vectors without labels of categories, we want to

attempt to find groups or clusters of the data samples in multi-

dimensional space.

We focus the following two methods:

- K-means algorithm

Non-parametric simple technique

- (Gaussian) Mixture models and EM(Expectation Maximization)

/Use a mixture of parametric densities such as Gaussians.

/The optimal model parameters are not given in a closed form

because of a highly non-linear coupled equations.

/The expectation-maximization algorithm is effective for

determining the optimal parameters.

4

1 2

: D-dimensional random vector

N dataset of : X:={ , , , }

: A group of data points whose inter-distances are small

compared with the distances to the points outside of the cluster

N

Cluster

Prototy

x

x x x x

of cluster: 1

: Find a set of vectors , such that the sum of the squared

disstances of each point to its cvlosest vector is minimized.

k

k

k

k K

K

pe

Aim

2. K-means Algorithm The K-means algorithm is a non-statistical approach of clustering of

data points in multi-dimensional feature space.

Problem: Partition the dataset into some number K of clusters

(K is known)

Fig. 1

1 [Bishop book[1] and its web site]

5

Fig. 1

1 [Bishop book[1] and its web site]

6

-Assignment indicator

1 if is assigned to -th cluster

0

n

nk

kr

otherwise

x

Algorithm

Introduce variable rnk denoting the assignment of data point

2

1 1

-Object Function (Distortion measure)

N K

nk n k

n k

J r

x

Squared of distance of each point xn to

its assigned vector 𝝁k

-Find both and which minimize nk kr J

(1)

(2)

7

(0)

( )

- : for the

- : Minize J with respect to for fixed

- : Minimize J with respect to for fixed

k k

i

nk

k nk

r

r

Two - stage Optimization

initial value

First stage

Second stage

:

Determination of for given 1 ~ at

at arg min1

0

That is, we assign the to the closest cluster center.

nk k n

n jj

nk

n

r k K x

k xr

otherwise

x

First stage

(3)

8

:

Optimization of

0 2 0

Above equation gives the mean vector of all data points

assinged to cluster .

k

nk n k

nk

nk nnk

nkn

n

Jr x

r x

r

x

k

Second stage

the number of points assigned

to cluster k

the sum of xn which assigned to

cluster k (4)

9

Example 1 [Bishop book[1] and its web site]

Fig.2

(0)

1

(0)

2

Fig. 3 [1]

Application of k-means algorithm for color-based image

segmentation [Bishop book[1] and its web site]

K-means clustering applied to the color vectors of pixels in RGB

color-space

11

1

[Mixture of Gaussians]

Conside a superposition of Gaussians (Normal distributions)

,K

k k k

k

K

p x x

３. Mixtures of Gaussians - Limitations of single Gaussian pdf model

Examples [Bishop[1]]

Single Gaussian model does not capture the multi-modes feature.

Fig 4

Mixture distribution approach: uses the linear combination of basic

distributions such as Gaussians

mixing coefficients mixture component

(5)

single Gaussian Mixture of Gaussians

12

single Gaussian

Mixture of Gaussians

13

1

1 1

0 0 1

The ( 1 ) satisfy the discrete probability requirements.

: The prior probability of selecting the -mixture component

,

K

k

k

k

k

k

k k

p x dx

p x

k K

p k k

x p x

1

responsibilit

: The probability of

i

with condition on

From Eq. (5)

- Define the by the posterior distributioe n

:

, =

s

,

K

k

k

k

k k k

l l l

x k

p x p k p x k

x p k x

x p k x

p k p x k x

p xx

1

K

l

(7)

(6)

14

1 2

1 2

1 2

1 2

(*)

(* see lect

- Parameters of mixture Gaussian (5)

:= , , ,

:= , , ,

:= , , ,

- Observed data X:= , , , Estimatte , ,

- Apply Maximum Likelihood method

K

K

K

Nx x x

1 1

ure 2 slides for a single Gaussian distribution case)

- Maximize the Log-Likelihood function

ln , , ln ,N K

k n k k

n k

p X N x

Too complex to give closed form solution

Go to EM (Expectation Maximization) algorithm

(8)

15

4. Re-formation of Mixtures of Gaussians Formulation of Mixture of Gaussians in terms of discrete latent random

variables

- Introduce K-dimensional random variable z

- 1-of-K representation model of πk

1 2

1

: , , ,

0,1 and 1

1

T

K

K

k k

k

k k

z z z

z z

p z

z

ln , , z

p x p z p x z p X

Equivalent formulation of the Gaussian mixture with explicit

latent variable z

(9)

16

1 1

- The conditional probability of for given

: 1

1 1 ,

1 1 ,

k k

k k k k k

K K

k k j j j

j j

z x

z p z x

p z p x z N x

p z p x z N x

The responsibility that component k

takes for explaining the observation x

The posterior probability for observed x

The prior probability of zk=1

1 2

1

- Modeling a data set X:= , , , using a mixture of Gaussians

Assuming , , are drawn independently from , ,

the Log-Likelihood function is given by Eq.()

N

N k k

x x x

x x p x

(10)

17

- With respect to and , the conditions that must be

satisfied at a maximum of the likelihood function

k k

1 1

- Maximization of ln , , with respect to

subject to a constraint 1 is also solved.

- Solutions are given by

1 where :

k

kk

N N

k nk n k nk

n nk

k

p X

z x N zN

1

The responsibility of with respect to -th cluster

1

where = E q. (10)n

NT

nk n k n k

nk

kk

nk x k

z x xN

N

N

z

5. EM Algorithm

ln , , 0, , k k

p X

(11)

(14)

(13)

(12)

18

Three equations ()-() do not give solutions directly because

, contain unknowns , , and in complex ways.

[EM algorithm for Gaussian Mixture Mode]

Simple iterative scheme which altaernate the

nk kz N

E (Expectation)

and M (Maximization) steps.

: Evaluate the posterior probabilities (responsibilities)

using the current parameters

: Re-estimate parameters , , a

nkz

E step

M step

nd using the

evaluated

Color illustration of in two-category case

nk

nk

z

z

19

(0)

2

(0)

1

20

Example 2 EM algorithm [Bishop book[1] and its web site]

(0)

1

(0)

2

21

k-means algorithm

EM algorithm

22

References:

[1] C. M. Bishop, “Pattern Recognition and Machine Learning”,

Springer, 2006

[2] R.O. Duda, P.E. Hart, and D. G. Stork, “Pattern Classification”,

John Wiley & Sons, 2nd edition, 2004

23

2

1 1

21

1

2

2 2

2

Proof of 1-dimensional case

ln , , ln ,

ln , , 0

,

- When

1,

derives Eq.

,

,

1( 2)

n k k

n k k

N K

j n j j

n j

kN

Kn

j n j j

j

k

n k k n k

k k

N x

N x

p X N x

p X

N x

N x x

Appendix

(A.1)

(A.2)

(A.3)

24

22

1

2

- When

Calculate and substitute it into Eq. (A.2)

derive

s

1

,

k

N

k nk n k

n

k

k

n

k

k

z x

N x

N

For the maximization problem of ln , , with respect

to subject to 1 , Lagrange multiplier method provides

an elegant solution.

- Introduce Lagragian function given by

, : ln ,

k kk

k

p X

L p X

, 1kk

(A.4)

(A.5)

25

2

21

1

2

21

1

- Stationarity conditions

, , 0, 0

,, 0

,

Multiply both sides above, we have

,

,

, and the s

k k

k

Nn k kk

Knk

j n j j

j

k

Nk n k k

kKn

j n j j

j

L L

N xL

N x

N x

N x

2

21

1

ummation over gives

,

,

Nk n k kk

kK kn

j n j j

j

k

N x

N x

(A.6)

(A.7)

(A.8)

26

2

21

1

We then have

From (A.7),

,1 =

,

Nk n k k

kk K

nj n j j

j

N

N x N

N NN x

(A.9)

2012 mdsp pr12 k means mixture of gaussian

Technology

2012　mdsp pr12 k means mixture of gaussian