tutorial on mixture models (2) - university college...

128
Tutorial on mixture models (2) Christian Hennig September 2, 2009 Christian Hennig Tutorial on mixture models (2)

Upload: others

Post on 02-Apr-2020

14 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Tutorial on mixture models (2)

Christian Hennig

September 2, 2009

Christian Hennig Tutorial on mixture models (2)

Page 2: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

1 Overview

Cluster validation, robustness and stability

◮ Potential problems with mixture model-based clustering

◮ Outliers◮ Non-normality◮ Instability◮ Interpretation vs. model

◮ Degenerating likelihood - practical implications

◮ Robustness and the “noise component” to deal with outliers

◮ Robustness theory◮ The “noise component” approach◮ Mixtures of t-distributions

◮ Cluster validation

◮ Cluster validation by visualisation◮ Stability assessment

◮ Gaussian mixture models vs. other clustering methods

◮ k -means and the fixed partition model◮ Agglomerative hierarchical methods

Christian Hennig Tutorial on mixture models (2)

Page 3: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Finite mixtures of generalised linear models

◮ Basics◮ The model◮ A linear regression mixture example◮ Identifiability◮ ML estimation and the EM algorithm◮ Model selection

◮ Mixtures of linear models◮ Fit and visualisation◮ Concomitant variables and assignment

dependence◮ Mixtures for discrete random effects

◮ Mixtures of generalised linear models

Christian Hennig Tutorial on mixture models (2)

Page 4: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Guide to files

mixtutorialnotes.pdf manuscript

mixturetutorial.R all R code used in the manuscript

cladagex.R R code to get you started with example data

clusterboot.R this may only be needed if fpc is not up to date

trigona.dat, jetfighters.dat datasets used in manuscript

melodypoly.dat, teff.dat, effect.dat more example datasets

Christian Hennig Tutorial on mixture models (2)

Page 5: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

2. Potential problems with mixture model-based clustering

Using mclust (Gaussian mixtures) for aim of clustering.

General attitude: models are not true,model assumptions are always violated,what does a method do when faced with different situations,is this desirable, and if not, how to deal with it?Gaussian distribution defines “cluster prototype”.

All CA methodsa are problematic.

Christian Hennig Tutorial on mixture models (2)

Page 6: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

2.1 Outliers

−2 −1 0 1 2 3 4

−2

−1

01

23

za[,1]

za[,2

]

0 10 20 30 40 50

−2

−1

01

23

zb[,1]

zb[,2

]

Gaussian mixture ML is sensitive toward outliers.Above: mclustBIC solution.flexmix merges all points.

Christian Hennig Tutorial on mixture models (2)

Page 7: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

2.2 Non-normality

0 1 2 3 4 5

−0.

20.

00.

20.

40.

60.

81.

0

x1

x2

2 4 6 8−

2500

−20

00−

1500

−10

00−

500

number of components

BIC

EIIVIIEEIVEIEVI

VVIEEEEEVVEVVVV

Christian Hennig Tutorial on mixture models (2)

Page 8: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

2.3 Instability

0 1 2 3 4 5

0.0

0.2

0.4

0.6

0.8

1.0

x1

x2

0 1 2 3 4 5

−0.

20.

00.

20.

40.

60.

81.

0

x1

x2

Sometimes only parts of solution are stable.Non-normality is one but not only source for instability.

Christian Hennig Tutorial on mixture models (2)

Page 9: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

More reasons for instability:◮ Gaussian components may not be properly separated,◮ Very small “spurious clusters”◮ Dataset too small

Instabilities may be tolerated if for example density estimation isof interest and not classification.

Christian Hennig Tutorial on mixture models (2)

Page 10: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

2.4 Interpretation vs. model

What a “good”/”true” cluster is, depends on application/aim.Not always every Gaussian subset corresponds to a “cluster”(see non-normality, but not only then).

Christian Hennig Tutorial on mixture models (2)

Page 11: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

−4

−2

02

46

0.00 0.05 0.10 0.15 0.20 0.25

x

0.5 * dnorm(x, 0, 1) + 0.5 * dnorm(x, 2, 1)

−4

−2

02

46

0.00 0.05 0.10 0.15 0.20

x

0.5 * dnorm(x, 0, 1) + 0.5 * dnorm(x, 2.5, 1)

−600

−400

−200

0200

400600

0.000 0.005 0.010 0.015 0.020

x

0.05 * dnorm(x, 0, 1) + 0.95 * dnorm(x, 0, 250)

Christian

Hennig

Tutorialonm

ixturem

odels(2)

Page 12: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Application requiring “gaps” (and some within-clusterhomogeneity): species delimitation.

Application not requiring gaps: “disease cluster” in medicinemay have large variation.

“Organisational clustering” requires neither “gaps” nor patternsbut small within-cluster distances.

Gaussian mixture modelling is good for flexible covariancepatterns.

Covariance matrices may be restricted depending onapplication. (“EEI” produces similar small within-clustervariation.)

Christian Hennig Tutorial on mixture models (2)

Page 13: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

var 1

−0.4 0.0 0.4 −0.6 −0.2 0.2

−0.

40.

00.

20.

4

−0.

40.

00.

4

var 2

var 3

−0.

40.

00.

20.

4

−0.

6−

0.2

0.2

var 4

−0.4 0.0 0.2 0.4 −0.4 0.0 0.2 0.4 −0.4 0.0 0.4

−0.

40.

00.

4

var 5

Christian Hennig Tutorial on mixture models (2)

Page 14: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

3. Degenerating likelihoods - practical implications

Consider (a1m,Σ1m)m∈IN so that λmin(Σ1m) → ∞ and∃x i : ϕa1m,Σ1m(x i) > c > 0∀m.

⇒ Ln =

n∑

i=1

log

s∑

j=1

πjmϕajm,Σjm(x i)

→ ∞.

EM may degenerate (but not always).

Problem depends on covariance matrix model,not usually for “E..”-models

Christian Hennig Tutorial on mixture models (2)

Page 15: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Theoretically, λmin(Σ) ≥ c or λmin(Σj )

λmax (Σk ) ≥ c prevent degeneration.But not implemented in mclust and flexmix.

mclustBIC discards solutions with non-invertible Σ.Will choose other covariance matrix model.(So outlier changes the coovariance matrix model.)

Christian Hennig Tutorial on mixture models (2)

Page 16: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

flexmix discards components if πj < c, (default c = 0.05),joins them with other mixture components.Outlier is joined with other components and destroys theirparameters.Change c with control=list(minprior=0.001) , but stillno proper outlier handling.

Christian Hennig Tutorial on mixture models (2)

Page 17: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Specify prior for mclustBICusing prior=priorControl()

µ|Σ ∼ N (µp,Σ/κp), Σ ∼ inverseWishart(νp,∆p).

Data dependent default choices of parameters(nonrobust;overall mean is used, which is affected by outlier;parameters a bit biased;not proper Bayes, no posterior distribution).Compute MAP estimator and BIC based on MAP likelihood.Improves problems with spurious clustersand degenerating likelihood.

Christian Hennig Tutorial on mixture models (2)

Page 18: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

11

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

11

1

111

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

11

1

1

1

1

1

1

1

1

1

1

11

1

1

1

11

1

1

1

12

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

22

2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

22

2

222

222

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

3

0 10 20 30 40 50

−2

−1

01

23

zb[,1]

zb[,2

]

Christian Hennig Tutorial on mixture models (2)

Page 19: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

4. Robustness and the “noise component” to deal withoutliers

4.1 Robustness theory

Finite sample addition breakdown point(Donoho and Huber 1983):smallest possible contamination ( g

n+g )to be added to the datasetso that estimator becomes “arbitrarily bad”.

General robustness results: 1n+1 for xn, s2,

≈ 12 for median and MAD scale.

May theoretically depend on dataset, but often does not.

Christian Hennig Tutorial on mixture models (2)

Page 20: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Definition 1. Let xn = (x1, . . . , xn) ∈ IRn be a data set. Let for any nlarge enough θn,s : xn → θ ∈ Θ an estimator, where

Θ = {(π1, . . . , πs, a1, . . . , as, σ1, . . . , σs) : πj ≥ 0,

πj = 1, σj ≥ c > 0∀j}.

For g > 0 let

Yg(xn) = {xn+g ∈ IRn+g : first n observations equal xn}

Then the breakdown point of the estimator θ•,s isB(θ•,s, xn) = min g

n+g so that at least one of the following for at leastone i ∈ {1, . . . , s}:

◮ infxn+g∈Yg(xn) πi(xn+g) = 0 for i with πi(xn) > 0,

◮ supxn+g∈Yg(xn) σi(xn+g) = ∞,

◮ supxn+g∈Yg(xn)|ai(xn+g)| = ∞,

where πi etc. denote the corresponding components of θ•,s.

Christian Hennig Tutorial on mixture models (2)

Page 21: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Theorem 2. Let s > 1,

θn,s(xn) = arg maxθ∈Θ

Ln,s(θ, xn),

Ln,s(θ, xn) =∑n

i=1 log(

∑sj=1 πjϕaj ,σj (xi)

)

,

the ML-estimator. Then, for any xn for which θn,s(xn) is welldefined, B(θ•,s, x) = 1

n+1 .

Christian Hennig Tutorial on mixture models (2)

Page 22: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Ln,s(η) =n

i=1

log

s∑

j=1

πj faj ,σj (xi )

+ log

s∑

j=1

πj faj ,σj (xn+1)

.

0 5 10 15

0.00

0.05

0.10

0.15

0.20

0.25

Christian Hennig Tutorial on mixture models (2)

Page 23: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Ln,s(η) =n

i=1

log

s∑

j=1

πj faj ,σj (xi )

+ log

s∑

j=1

πj faj ,σj (xn+1)

.

0 5 10 15

0.00

0.05

0.10

0.15

0.20

0.25

Christian Hennig Tutorial on mixture models (2)

Page 24: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Ln,s(η) =n

i=1

log

s∑

j=1

πj faj ,σj (xi )

+ log

s∑

j=1

πj faj ,σj (xn+1)

.

0 5 10 15

0.00

0.05

0.10

0.15

0.20

0.25

Christian Hennig Tutorial on mixture models (2)

Page 25: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Ln,s(η) =

n∑

i=1

log

s∑

j=1

πj faj ,σj (xi )

+ log

s∑

j=1

πj faj ,σj (xn+1)

.

0 5 10 15

0.00

0.05

0.10

0.15

0.20

0.25

No breakdown of original components:

xn+1 → ∞ ⇒ Ln,s → −∞

Christian Hennig Tutorial on mixture models (2)

Page 26: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Ln,s(η) =n

i=1

log

s∑

j=1

πj faj ,σj (xi )

+ log

s∑

j=1

πj faj ,σj (xn+1)

.

0 5 10 15

0.00

0.05

0.10

0.15

0.20

0.25

as = xn+1 ⇒ Ln,s ≥ const, thus B =1

n + 1

Christian Hennig Tutorial on mixture models (2)

Page 27: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Estimating s by BIC:Definition 3. Let xn = (x1, . . . , xn) ∈ IRn be a data set. Letθn : xn → θ ∈ Θ∗, Θ∗ as Θ in Definition 1 but with an additionalcomponent s ∈ IN, an estimator. Under the notation of Definition 1,the breakdown point of the estimator θ• is B(θ•, xn) = min g

n+g sothatinfxn+g∈Yg(xn) s(xn+g) < s(xn),or at least one of the following for at least one i ∈ {1, . . . , s(xn)}:

◮ infxn+g∈Yg(xn) πi(xn+g) = 0 for i with πi(xn) > 0,

◮ supxn+g∈Yg(xn)σi(xn+g) = ∞,

◮ supxn+g∈Yg(xn)|ai(xn+g)| = ∞,

where πi etc. denote the corresponding components of θ•,s.

Christian Hennig Tutorial on mixture models (2)

Page 28: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Theorem 4. Let θn,s be the ML-estimator as before, andθn(xn) = (s(xn), θn,s(xn)) where s(xn) is the optimal number ofcomponents according to the BIC.Then B(θ•, x) ≥ g

n+g for xn so that

minr<s

Ln,r (θn,r (xn), xn) − Ln,s(θn,s(xn), xn) > f (g),

where f is a monotonically increasing positive finite function ofg.

Christian Hennig Tutorial on mixture models (2)

Page 29: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

BIC(s) =

n∑

i=1

log

s∑

j=1

πj faj ,σj (xi )

+

n+g∑

i=n+1

log

s∑

j=1

πj faj ,σj (xi )

− (3s − 1) log n.

0 5 10 15

0.00

0.05

0.10

0.15

0.20

0.25

Breakdown by outliers almost impossible!

Christian Hennig Tutorial on mixture models (2)

Page 30: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

BIC(s) =

n∑

i=1

log

s∑

j=1

πj faj ,σj (xi )

+

n+g∑

i=n+1

log

s∑

j=1

πj faj ,σj (xi )

− (3s − 1) log n.

0 5 10 15

0.00

0.05

0.10

0.15

0.20

0.25

. . . but “in-between-liers” may cause trouble.May still often be unstable.

Christian Hennig Tutorial on mixture models (2)

Page 31: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

◮ Fixed s: single outlier spoils ML-estimator.◮ s estimated by BIC: adding mixture components to fit

outliers.◮ Breakdown can occur if components are not well

separated.◮ These do not only hold for p = 1.◮ If the number of outliers is large, adding components may

not do, because too many components are needed.

All these statements require σi > c > 0.Not implemented in mclust, flexmix;degeneration treatment interferes with robustness.mclust prior helps but still B = 1

n+1 .

Christian Hennig Tutorial on mixture models (2)

Page 32: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

4.2 The noise component (Banfield and Raftery, 1993)

f (x) = π01V

+

s∑

j=1

πjϕaj ,Σj (x),

V is fixed during EM-algorithm (mclustBIC),but initial π0 is needed and outliers should not affectinitialisation of Gaussian components.B = 1

n+1 still, but practical robustness much better.

Do it by initialization=list(noise=initnoise) .

May draw initial noise points at random.

Christian Hennig Tutorial on mixture models (2)

Page 33: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Better (reproducible): NNclean (Byers and Raftery 1998) inprabclus.Fits mixture of transformed Gamma-distributions on distancesto k-nearest neighborbased on mixture of two homogeneous (uniform) Poissonprocesses for data.Component with larger mean is “noise”.

Specification of k required.Isolated groups of fewer than k points may still be regarded asnoise.Decide based on application and size of dataset.

Christian Hennig Tutorial on mixture models (2)

Page 34: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Histogram of kthNND

Distance to 15 th nearest neighbour

Est

imat

e of

Mix

ture

0 5 10 15

0.00

0.05

0.10

0.15

22

1

2 N

2

1

2

2

2 1

2

22

1

1

2

1

1

1

2

2

1

1

2

1

1

2

N1

2

N

1

N

1

2

2

1

1

N1

2

N

N

1

1

1

1

1

1

2

2

1

2

2

N

1

21

N

2

2

N

1

2

21

1

2

1

2

1

1

22

2

N

22

22222

2

1

2

2

2

2

2

2

N

2

N

2

N

2

22

22

N

22

2

NN

2

N

2

N 2

1

2

N

1

2

2

22

2

2

2

1

2 1

2

N

2

22

2

2

1

2

2

2

2

2

2

2

2

2

2

22

2

N

2

2

2

22

2

2

N

2 2

N

2

2

2

N

2

2

2 2

2

2

1

2

2

2

2

2

N

2

2

2

2

N

1

22

2

N

2

122

2

1

2

2

2

2

N

2

2

22

2

2

N

2

N

22

2

2

2

21

1

11

1

11

1

N 1

1

1

1

1

1

N

1

1

1

1

2

1

1

11

N

11

1

N

1

2

1

1

N

1

1

1N

N

1

1

1

1

1

1

1

1

1

N

1 N

2

111

1

N

1

1

1

1

1

1

1

1

1

1

1

1

1

N11

1

11

1

1

N

1

1

N

1

1

1

1

11

2

11

1

1

1

2

11

1

2

1

11

1

2

N

11

N

N

1

N1

1

1

N

1

1

1

2

1

1

1

1

11

1

N

1

1

1

1

11

1

1

N

N

NN

N

1

NN

N

N

N

N

N

N

N

N

NN

N

N

NN

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

1

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N N

N

N

N

NN

N

N

N

N

N

NN

1

N

N

NN

N N

N

N

N

N

N N

N

N

N

N

N

N

N

N

N

N N

N N

N

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

NN

2

N

N

N

N

N

N

N

N

N

N

N

NNN

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

NN

N

N

NN

N

N

N

NN

N

N

N

N

N

N

N

N

NN

N N

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

N N

N

N

N

N

N NN

NN

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

N

N

N

N

NN

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

NN

N

NN

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

N

N

N

NN

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

N

N

N

N

N

1

N

N

N

N

N

N

N

N

N

N

N

N

N

N

2

N

N

N

N

N

N

N

N

N

N

N

N

N

N

2

N

N

N

N

N

N

2

N

N

N

N

NN

NN

N

N

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

N N

N

N

N

N

N

NN

NN

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

2

N

N

N

N N

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N N

N

N

N

N

NN

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

N

N

N

NN

N

NN

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

NN

N

N

N

2

N

N

N

N

N

N N

N

N

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

N

N

N

2

N

N

N

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

N

N

NN

N

N

N

N

N

N

N

N

N

N

0 20 40 60 80 100 120

020

4060

8010

012

0

x

y

For single outlier data, single outlier is noise with k = 4.

Christian Hennig Tutorial on mixture models (2)

Page 35: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

4.3 Mixtures of t-distributions (McLachlan and Peel 2000)

f (x) =

s∑

j=1

πj tν,aj ,Σj (x).

tν by generating data from (N )(a, σ/H) where H ∼ χ2ν .

Not in R, and I don’t like it.

Christian Hennig Tutorial on mixture models (2)

Page 36: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

5. Cluster validation

Check whether outcome of clustering method makes sense.Strategies:

◮ External/subject matter information◮ Significance tests for structure◮ Compare different clusterings on same dataset◮ Validation indexes◮ Visual inspection◮ Stability assessment

Christian Hennig Tutorial on mixture models (2)

Page 37: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Some indexes, validation information bycluster.stats in fpc based on distance matrix.

s(i) = b(i)−a(i)max(a(i),b(i)) is called the “silhouette width” (Kaufman and

Rousseeuw, 1990),a(i) is average distance of x i to another point of its own cluster,b(i) is average distance to another point of closest cluster.This can be averaged clusterwise over points.

Christian Hennig Tutorial on mixture models (2)

Page 38: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

> cs <- cluster.stats(dist(trigonadata),smtrigona$clas sification> cs$n[1] 236

$cluster.number[1] 10

$cluster.size[1] 35 23 20 4 10 8 13 62 48 13

$diameter[1] 0.2220615 0.2011110 0.8882174 0.2466013 0.2520631

0.7895725 0.3429880[8] 0.2464109 0.3996499 0.2268971

$average.distance[1] 0.10960597 0.10530936 0.42058017 0.14797559

0.11524152 0.49448545[7] 0.18921780 0.09693295 0.18436365 0.11742765

Christian Hennig Tutorial on mixture models (2)

Page 39: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

$median.distance[1] 0.10757334 0.10478257 0.40924474 0.13831797

0.11075841 0.52140322[7] 0.19024028 0.09521223 0.18188913 0.11888133

$separation[1] 0.5889131 0.3425002 0.3425002 0.5002507 0.3354944

0.0897763 0.3193068[8] 0.1922279 0.1604022 0.0897763

$average.toother[1] 0.8898844 0.9043505 0.8773002 0.8711378 0.9062254

0.7031898 0.6800758[8] 0.7083479 0.6734774 0.5841906

Christian Hennig Tutorial on mixture models (2)

Page 40: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

$separation.matrix[,1] [,2] [,3] [,4] [,5] [,6]

[1,] 0.0000000 0.8214897 0.6121101 0.7355199 0.9163432 0. 5889131[2,] 0.8214897 0.0000000 0.3425002 0.9149291 0.7642136 0. 8000080[3,] 0.6121101 0.3425002 0.0000000 0.8453088 0.5350112 0. 6501032[4,] 0.7355199 0.9149291 0.8453088 0.0000000 0.5467675 0. 6286042[5,] 0.9163432 0.7642136 0.5350112 0.5467675 0.0000000 0. 3354944[6,] 0.5889131 0.8000080 0.6501032 0.6286042 0.3354944 0. 0000000[7,] 0.6446088 0.7271676 0.7943997 0.5002507 0.8125327 0. 4271635[8,] 0.8023756 0.8801732 0.6508053 0.8341647 0.8896053 0. 2331168[9,] 0.7274789 0.7901756 0.6227011 0.7106608 0.6295933 0. 1891964

[10,] 0.6927242 0.9830951 0.7581647 0.7185608 0.7778999 0 .0897763[,8] [,9] [,10]

[1,] 0.8023756 0.7274789 0.6927242[2,] 0.8801732 0.7901756 0.9830951[3,] 0.6508053 0.6227011 0.7581647[4,] 0.8341647 0.7106608 0.7185608[5,] 0.8896053 0.6295933 0.7778999[6,] 0.2331168 0.1891964 0.0897763[7,] 0.3685067 0.3193068 0.4601516[8,] 0.0000000 0.2245057 0.1922279[9,] 0.2245057 0.0000000 0.1604022

[10,] 0.1922279 0.1604022 0.0000000Christian Hennig Tutorial on mixture models (2)

Page 41: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

$average.between[1] 0.7680693

$average.within[1] 0.1413954

(...)

$clus.avg.silwidths1 2 3 4 5

0.8616234 0.8269252 0.2727838 0.7614558 0.81424736 7

-0.1954790 0.61174648 9 10

0.7245092 0.4113044 0.6319789

$avg.silwidth[1] 0.6147748

(...)

Christian Hennig Tutorial on mixture models (2)

Page 42: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Cluster validation is not about estimating the number ofclusters!The results of such a method still need to be validated.

Christian Hennig Tutorial on mixture models (2)

Page 43: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

5.1 Cluster validation by visualisation

Generally use different colours and symbols.Here: projection methods .

Given: n × p-dataset X.Find p × k-matrix C (eg, k = 2), so thatY = XC is optimally “informative”.

Christian Hennig Tutorial on mixture models (2)

Page 44: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Definition 5. The first k projection vectors defined by thechoice of Q and R) c1, . . . ck are defined as the vectorsmaximising

Fc =c′Qcc′Rc

subject to c′iRc j = δij , where δij = 1 for i = j and δij = 0 else.

Corollary. The first k projection vectors of X are theeigenvectors of R−1Q corresponding to the k largesteigenvalues.

Definition 6. PCA is defined by Q = Cov(X) and R = Ip.

Christian Hennig Tutorial on mixture models (2)

Page 45: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

1

1

111 1

1

1

1

1

1

1

1

11

1

1

1

11

11 111

11

11

11

1

11

1

2

3

22 2

3

2

2 2

22

2

2

22 2

22

22

2

22

22

3 3

33

3

3

3

33

3

3

3

3

3

3

3

4

4 44

3

3

5

5

5

555

5

6

555

77

7

77

77 77

7 7

77

88

888

88

88

88

888

88

888 8

88

8

88

8

8

88

8

8 8888 8

88

8

8

8 8

888 88

8

8888 88888 8

888

6

8

6

6

66

9

99

9

99

9

999

9

69

9

9

9

999

9

9

9

9

9

9

9

99

9

9

99

9999

9

9

99

9

9

9

9

9

9 99

a

aa

a 9a

aa

a

6a

a

a

a

a

−2 −1 0 1 2

−1

01

23

Comp.1

Com

p.2

“Information” = variance. Clusters ignored.Christian Hennig Tutorial on mixture models (2)

Page 46: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Notation:Let x i1, . . . , x ini the p-dimensional points of group i = 1, . . . , s, n =

Psi=1 ni .

Let Xi = (x i1, . . . , x ini )′, i = 1, . . . , s, and X = (X′

1, . . . , X′

s)′. Let

m i = ini

Pnij=1 x ij , m = 1

n

Psi=1

Pnij=1 x ij ,

Ui =Pni

j=1(x ij − m i)(x ij − m i)′, U =

Psi=1 Ui ,

Si = 1ni−1 Ui , W = 1

n−s U, B = 1n(s−1)

Psi=1 ni(m i − m)(m i − m)′,

that is, Si is the covariance matrix of group i with mean vector m i , W is the

pooled within groups-scatter matrix and B is the between groups-scatter

matrix.

Christian Hennig Tutorial on mixture models (2)

Page 47: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Definition 7. DCs (Rao 1952) are defined by Q = B andR = W.

Corollary. Only s − 1 eigenvalues of W−1B are larger than 0.The whole information about the mean differences can bedisplayed in s − 1 dimensions (cf. Gnanadesikan, 1977).

Use R-function plotcluster in fpc.

Christian Hennig Tutorial on mixture models (2)

Page 48: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

1 11

11

111

1

1

1111111

1

11 111

11

1

11

1

1 1111 1

2

3

22 2 3

222

222

222

2

22

2 2 222

2

2

3

33

3

3

3

3

3

3

3

33

3

33

3

44 44

33

5

5

5

55

55

6

5557

7

7

7

7 7

7

777 7 7

7

88888888

88

88 888

8

88888

888

8888

888

88888

888

8

8

8

8888

88 88

88 8

888888

88

6

8

6

6669

9

9

9

99

9

999

9

6

99

9

9

9

99

99

9

9

9

9

9

9

999

9

9

9

9

9

9

9

99

999

9

99

9

9

9

a

aaa

9

aa

aa

6

a

aa aa

−10 −5 0 5 10 15

−10

−5

05

10

dc 1

dc 2

More than 3 clusters: cannot see everything in 2-d.

Christian Hennig Tutorial on mixture models (2)

Page 49: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Difficulties with DC:◮ Separation between cluster means is shown.◮ All within-cluster cov-matrices equal implicitly assumed.◮ More than 3 clusters: cannot see everything in 2-d.◮ DCs may still be dominated by outliers.

Christian Hennig Tutorial on mixture models (2)

Page 50: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

−2.

0−

1.5

−1.

0−

0.5

0.0

0.5

1.0

Discriminant coordinate p=2>1

x[,1]

x[,2

]

Christian Hennig Tutorial on mixture models (2)

Page 51: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

−2.

0−

1.5

−1.

0−

0.5

0.0

0.5

1.0

Discriminant coordinate p=2>1

x[,1]

x[,2

]

−3 −2 −1 0 1 2

−0.

50.

00.

51.

01.

52.

02.

5

Discriminant coordinate p=2>1 separates means

x[,1]x[

,2]

Christian Hennig Tutorial on mixture models (2)

Page 52: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

−2.

0−

1.5

−1.

0−

0.5

0.0

0.5

1.0

Discriminant coordinate p=2>1

x[,1]

x[,2

]

−3 −2 −1 0 1 2

−0.

50.

00.

51.

01.

52.

02.

5

Discriminant coordinate p=2>1 separates means

x[,1]

x[,2

]

−3 −2 −1 0 1 2

−0.

50.

00.

51.

01.

52.

02.

5

Asymmetric weighted coordinate p=2>1

x[,1]

x[,2

]

Christian Hennig Tutorial on mixture models (2)

Page 53: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Definition 8 (Hennig 2005) Let

B∗ =1

n1n2

n1∑

i=1

n2∑

j=1

(x1i − x2j)(x1i − x2j)′,

denoting now by x2j all points that are not in cluster 1. ADCs forcluster 1 are defined by Q = B∗ and R = S1.

Definition 9. Let

B∗∗ = 1n1

Pn2j=1 wj

n1∑

i=1

n2∑

j=1

wj(x1i − x2j )(x1i − x2j )′, where

wj = min(

1, d(x2j−m1)′S

−11 (x2j−m1)

)

, j = 1, . . . , n2, (1)

d > 0 being some constant, for example the 0.99-quantile of theχ2

p-distribution.AWCs for cluster 1 are defined by Q = B∗∗ and R = S1.

Christian Hennig Tutorial on mixture models (2)

Page 54: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Motivation for weights: Consider x2j = m1 + qv, where v is aunit vector w.r.t. S1 giving the direction of the deviation of x2j

from the mean m1 of cluster 1 and q > 0 is the amount ofdeviation. The contribution of x2j to B∗∗ is, for q large enough,

n1∑

i=1

d

(x2j − m1)′S−11 (x2j − m1)

(x1i − x2j)(x1i − x2j)′,

→ n1d vv ′

v′S−11 v

for q → ∞.

Christian Hennig Tutorial on mixture models (2)

Page 55: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

1

1

1

111

11111 1

111

1111

1

111

11

11

1 11

1

1

1

11

2

3

2

22

3

222

22

22

2

2

2

2222

222

22

333

3

3

3

3 3

3

33

3

33

33

4

4 44

33

5 55

55

55

6

555

7

7

7

77

7

7 77

77 7

7

8888

8888

88

8

888

88

8888 8

88

8

888

888

8

88888

88

8

8

8

8

888 888888

888888

8

888

6

8

6

6

66

9999

99

9

99

9

9

6

9

9

99

9

9

9

9

9

99

9

9

9

9

9

9

99

9

9

99

99

9

9

99

9

9

99

9

9

9aa

aa 9

a

a

aa

6

a

aaa

a

−5 0 5 10

−6

−4

−2

02

4

awc 1

awc

2

Christian Hennig Tutorial on mixture models (2)

Page 56: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

1 11

11

1

11

1

11

1111

11

1

11 11111

1

11

1

111

111

2

3

22

2

3

22 222 2

222

222

2 22 22

22 3

33 3

33

3

3

3

3

3

3

3

3 3

3

44

4 4

33

5

5

5

55 5

56

555

7 77

7

7

7

77

7

77

77

88 88 8888

88

88 888

8

8888

8

88 8

88 888

8

88888 8

88

8

8

8

8888

8

88 888

88

888888

88

6

8

66 6

69

9

9

9

9

99

999

96

99

9

9

99

9 999

9

9

9

9

99

99

9

9

9

9

9

9

9

99 99

9

99

99

9

9aaa

a

9

aaa

a 6a

aa a

a

−10 0 10 20 30

−20

−10

010

awc 1

awc

2

Christian Hennig Tutorial on mixture models (2)

Page 57: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

1

1

111

1

1 111

1

111 1 1

1 11

1

111

11

111 1 1

1

11

11

2 322

23

222 2

2

2

2 22 22

2222 2

22

2 3 3

3

333

3

3

3

3

3

33

3

3 3

44

4 4

3

3

5

55

555

5

6

55 5

7

777

7

7

77

77

7

77

88 8

88 888888

8

88

8 888

88

8

888

88 8

88

888

8888 88

88

8 8888

8

88

8888

88888 8

8 886

8

6

6 66

99

99

99

99

99

9

6

9 99 9

9

9

9

9 99

99

99

999

9

9

99

9

9

99

9 999

9

9

99

9

99 a

aaa

9

aa aa

6

a a

a

aa

−20 −10 0 10 20

−10

−5

05

1015

awc 1

awc

2

Christian Hennig Tutorial on mixture models (2)

Page 58: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

1

11111

1

1

1 1

1

1111

1

1

11

1

11

1

111

1

1

1 1111 1

1

2

3

2

22

3

22

222 2

2 2

2 222

222

222

2 3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

3

4

4 44

3

3

55

55

55

5

6

55 5

7

7

7

77

7

777

77 7

7

8 8 888

8888

888 8 88

888 88 888

8

88 8888

88 888888

8

88 8 88

8 8888888 888 8888

8 868

6

666

9999

9

9

99 99

9

6

99 9

9 9

9

9

9

9

9

99

9

99 999

99 99 9

9

9

9

9

999

9

9

9

9

99a

a

aa9 aaaa

6

aa

a

a

a

−10 −5 0 5 10 15 20

−5

05

10

awc 1

awc

2

Christian Hennig Tutorial on mixture models (2)

Page 59: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

1

11

11

1

1

11

11

1

111 11

11

1

1

1

1

1

1

11

1

1

11 1

1

1

1

2

3

2

22

3

2

2

2

2

2

2

2

2

2

2

22

2

2

22

2

2

2

33

3

3

3

3

3

3

3

3

3

3

33

3 3

4

4

4 4

3

3

5

5

5

5

55

5

6

55

5

7

7

7

7

7

7

77

7

7

7 7

7

88

88

8

88

8

88

8 8

8

888

8

8

8 8

8

8 8

8

8

8

8

88 8

8

8

88

8

888

88

8

88

8

8

8

8

8

8

8

8

8

88

8

8

88

888

6

8

6

6

6

6

99

9 9

9

9

9

9

9

9

9

6

9

9

9

9

9

9 9

9 9

9

99

9

9

9 9

99

99 99 9

9

9

9

99

99

9

9

9

9

9

9

a

aa

a9

a aaa

6

a

a

aa

a

−60 −40 −20 0 20

−15

−10

−5

05

1015

awc 1

awc

2

Christian Hennig Tutorial on mixture models (2)

Page 60: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

111111111111111 111

1111 11111111111111

1

1111111

1111111111

1 111111111111111111 1

1

11111111 111111 1111

111111

11111111111

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2 2

2

2

2

2

22

2

2

22

2

2

2

2

2

2

2 2

22

2

2

2

22

22

2

2

2

22

2

2

2

22

2

2

22

2

2

2

22

22

2

2

2

2

2

2

2 2

2

2

22

2

2

2

2 2

2 2

2

2

2

2

2

2 2

2

2

2

22

2

2

22

2

2

2

2

2

22

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2 2

22

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2 2

22

2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2 22

2 2

2

2

22

2

2

2

22

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

22

2

22

2

2

2

22

2

2

22

2

22

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

22 2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2 2

22

2

2

2

22

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

22

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2 2

2

2

2

2

2

22

2

2

2

2

22

2

2

2

22

2 2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

22

2

2

2

2

2

2

22

22

2

2

2

2

2 2

2

2

22

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2 222

2

2

2

2

2

2

2

2

2

2

2

22

2

22

2

22

2

2

22 2

2

2

2

2

2

22

222

22

2

2

2

2

2

2

2

2

2

2 222

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

222

2

2

2

2

2

2

22

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

22

22

2

22

22

2

2

2

22

2

2

2

2

2

2

2

2

2

22

22

2

22

2

2

22

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

2

2

2

22

2

2

2

2

2

22

2

2

2

22

2

2

22

2

2

2

2

2

2

2

2

2

22

2

2

22

2

2

2

2

2

22

2

2

22

2

2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2 2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

22

2

2

2

2

2

−60 −40 −20 0 20 40 60

−40

−20

020

40

awc 1

awc

2

Small clusters will look “packed”.

Christian Hennig Tutorial on mixture models (2)

Page 61: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Things to keep in mind:◮ Clusters can still be heterogeneous in other directions.◮ Cluster may be separated but surrounded. (Check

cluster.stats )◮ Outliers are influential if members of cluster to plot.

Alternative methods in Hennig (2005), plotcluster .

Also try “grand tour/2-d tour” (Asimov 1985) in Ggobi/rggobi!

Christian Hennig Tutorial on mixture models (2)

Page 62: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

5.2 Stability assessment

General principle for stability assessment◮ Generate several new datasets out of the original one.◮ Cluster all these new datasets.◮ Define statistic to formalise how similar new clusterings are

to the original one.◮ If they are very similar, it’s stable.

Christian Hennig Tutorial on mixture models (2)

Page 63: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Most clusterings are unstable in one way or another.Want to know which clusters are stable⇒ here cluster-wise methodology,clusterboot in package fpc (Hennig 2007).

Christian Hennig Tutorial on mixture models (2)

Page 64: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

1. Use the Jaccard coefficient

γ(C, D) =|C ∩ D|

|C ∪ D|.

to measure similarity between two subsets of a set.

2. Repeat B times steps 2-4:resample new data sets from the original one,

3. apply the same clustering method to them.

4. For C ∈ C record mi = maxD 6=C γ(C, D)

5. Use γ = 1B

∑Bi=1 mi to assess stability of C.

Various methods to resample are possible.

Christian Hennig Tutorial on mixture models (2)

Page 65: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Use two different methods,can discover different kinds of instability.

Bootstrap method discarding multiple points

Replacement by noise Draw 5%, say, of points and replacethem by uniform “noise”.

1. Sphere the dataset to unit covariance matrix.2. Draw points from U[−4, 4]p.3. Rotate data back.

Problem with bootstrap: can only increase separation.Problem with noise: unclear what “realistic” noise would be.

Christian Hennig Tutorial on mixture models (2)

Page 66: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

−5 0 5

−0.

50.

00.

51.

01.

52.

0

x1

x2

Christian Hennig Tutorial on mixture models (2)

Page 67: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

For computing γ for given original cluster and cluster inresampled dataset,use only points that are both in original dataset and inresampled one.

In practice, use B = 100 if time allows.But need some patience.

Christian Hennig Tutorial on mixture models (2)

Page 68: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Interpretation:

◮ 0.5 is minimum v so that for given partition it’s possible forevery cluster to find another partition so that maximum γ is≤ v .

◮ New partition with r clusters, original one with s > r ⇒ ∃ atleast s − r clusters in original partition for which no γ > v .

Consider clusters with maxγ ≤ 0.5 as “dissolved.Demand γ >> 0.5 for stability.

Christian Hennig Tutorial on mixture models (2)

Page 69: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

> trigonaboot <- clusterboot(trigonadata,B=20,multipleboot=FALSE,clustermethod=noisemclustCBI,nnk=0,G=1:15)

* Cluster stability assessment *Cluster method: mclustBICFull clustering results are given as parameter resultof the clusterboot object, which also providesfurther statistics of the resampling results.Number of resampling runs: 20

Number of clusters found in data: 10

Christian Hennig Tutorial on mixture models (2)

Page 70: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Clusterwise Jaccard bootstrap (omitting multiple points) mean:[1] 1.0000000 1.0000000 0.9493590 0.9236111 0.9833333

0.6884722 1.0000000[8] 0.9955763 0.9820907 0.9156313

dissolved:[1] 0 0 0 2 0 3 0 0 0 0

recovered:[1] 20 20 20 18 20 7 20 20 20 19Clusterwise Jaccard replacement by noise mean:[1] 1.0000000 1.0000000 0.9034211 0.9687500 1.0000000

0.5488095 1.0000000[8] 0.9974430 1.0000000 0.9288795

dissolved:[1] 0 0 0 1 0 13 0 0 0 0

recovered:[1] 20 20 20 19 20 1 20 20 20 20

Christian Hennig Tutorial on mixture models (2)

Page 71: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

0 1 2 3 4 5

−0.

20.

00.

20.

40.

60.

81.

0

x1

x2

2 4 6 8

−25

00−

2000

−15

00−

1000

−50

0

number of components

BIC

EIIVIIEEIVEIEVI

VVIEEEEEVVEVVVV

Christian Hennig Tutorial on mixture models (2)

Page 72: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

(For uniform plus Gaussian dataset)

* Cluster stability assessment *Cluster method: mclustBICNumber of resampling runs: 20

Number of clusters found in data: 6

Clusterwise Jaccard bootstrap (omitting multiple points) mean:[1] 0.78226138 0.90698801 0.93042938 0.08628977 0.817281 341.00000000dissolved:[1] 2 1 1 20 1 0recovered:[1] 17 18 18 0 18 20

Clusterwise Jaccard replacement by noise mean:[1] 0.35669233 0.26304825 0.31162945 0.07258365 0.199327 781.00000000dissolved:[1] 17 20 17 19 20 0recovered:[1] 0 0 0 1 0 20

Christian Hennig Tutorial on mixture models (2)

Page 73: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Instabilities can result from◮ features of the data,◮ instabilities of clustering method,◮ mismatch between the two.

Stable clusters are not necessarily good.(Fixing s = 1 is always stable.)Unstable clusters can be tolerated if stability is not the aim.

Christian Hennig Tutorial on mixture models (2)

Page 74: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

6. Gaussian mixture models vs. other clustering methods

There is no single best clustering method.Knowing chracteristics of methods is needed to decide.Choosing clustering method is not equivalentto finding the “true” model.

Christian Hennig Tutorial on mixture models (2)

Page 75: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Advantages of GMBC:◮ Explicit probability model; delivers θ, τij . It is possible to

check how data relate to model assumptions.Note that all methods make implicit assumptions.

◮ BIC as method to estimate s.(Somewhat controversial, but situation is better than forother clustering methods.)

◮ GMBC is good for clusters with different covariancestructures.

◮ Mixture setup may incorporate uniform noise or otherdistributions.

Christian Hennig Tutorial on mixture models (2)

Page 76: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

2 3 4 5 6 7

−2

−1

01

23

zaa[,1]

zaa[

,2]

2 3 4 5 6 7

−2

−1

01

23

zaa[,1]

zaa[

,2]

2 3 4 5 6 7

−2

−1

01

23

zaa[,1]

zaa[

,2]

Christian Hennig Tutorial on mixture models (2)

Page 77: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Drawbacks of GMBC:◮ Sometimes too flexible, unstable (particularly when

estimating covariance model).◮ GMBC neither guarantees small within-cluster distances,

nor good separation between clusters.Non-Gaussian homogeneous data subsets will often befitted by more than one not well separated Gaussianmixture component.

◮ Depends on EM-initialisation (as k-means).

Christian Hennig Tutorial on mixture models (2)

Page 78: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

6.1 k-means, the fixed partition model and theCEM-algorithm

Definition 10. The k-means clustering of X is defined by

En(X) = arg min{C1,...,Ck} partition of X

n∑

i=1

minj

‖xi − xj‖22,

where xj = 1|Cj |

xi∈Cjxi .

Don’t discuss estimation of k/s, but validation indexes aresometimes used.

Christian Hennig Tutorial on mixture models (2)

Page 79: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Definition 11. “Gaussian fixed partition model”:

f (X) =

n∏

i=1

ϕaκ(i)σ

2Ip(x i),

where κ : {1, . . . , n} 7→ {1, . . . , k}.

The k means are ML estimators for a1, . . . , ak ,(but inconsistent!),assignment of points to C1, . . . , Ck is ML for κ.

Very similar to mclust “EII” model.Clusters are spherical and not equivariant.

Christian Hennig Tutorial on mixture models (2)

Page 80: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

22

2

2 2

2

2

2

2

2 2

2

22

1

1

2

1

2

2

2

2

1

2

2

1

2

2

11

2

2

1

2

1

2

2

1

1

2

2

2

2

2

2

1

1

1

1

1

2

2

1

2

2

1

1

21

2

2

2

1

2

2

21

1

2

1

2

1

2

22

2

2

22

22222

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

22

2

22

2

22

2

2

2

2 2

2

2

2

2

2

2

22

2

2

2

2

2 2

2

2

2

22

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

22

2

2

2

2 2

2

2

2

2

2

2

2

2 2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

22

2

2

2

222

2

2

2

2

2

2

2

2

2

22

2

2

2

2

2

22

2

2

2

22

2

11

1

11

1

2 1

1

2

1

2

1

1

1

1

1

1

2

1

1

22

2

11

1

1

1

2

1

1

1

1

2

11

2

1

2

1

1

2

1

1

2

2

2

1 1

2

122

1

2

1

1

1

2

1

2

1

2

1

1

2

2

1

111

2

11

1

1

1

2

2

1

1

1

2

1

11

2

22

1

2

1

2

11

1

2

1

12

1

2

2

11

1

1

1

22

2

1

2

1

2

1

2

1

1

2

1

11

2

1

2

1

1

1

11

2

2

2

2

33

3

1

33

1

3

1

3

2

2

1

3

33

1

3

33

3

2

1

1

3

1

3

2

2

3

1

3

1

3

2

3

2

3

3

2

1

3

2

2

2

1

3

2

3

3

3

1

2

3

3

3

3

3

2

3

2

3

1

1

1

3

3

1

3

3

2

1

33

2

2

2

2

3

3

1

1

1

2

1

1

21

1

2

2

1

3

3

2

3

1

1

1

1

1

1

2

3

2

3

3

3

3

3

3

1

1

1

3

2

3

3

1

3

2

2

3 3

1

2

3

21

1

3

3

3

3

33

1

2

3

33

3 3

3

2

1

2

3 3

2

2

1

2

3

2

3

2

1

3 3

2 2

3

3

2

2

3

2

3

1

3

1

22

1

1

1

3

11

2

2

2

1

3

1

3

3

1

2

3

3

111

1

2

1

3

2

3

2

2

1

1

1

3

1

3

2

3

2

2

3

3

21

1

1

33

3

2

33

2

2

3

33

3

2

1

3

1

1

3

2

33

2 1

2

1

2

3

12

1

3

3

2

3

2

3

1

1

1 1

3

1

1

3

3 33

33

3

3

3

2

3

3

3

2

1

3

3

3

3

1

2

2

3

3

1

3

11

2

1

3

1

1

2

3

3

33

2

1

2

3

11

1

1

3

2

1

3

2

1

2

1

12

3

3

3

2

2

11

1

3

3

3

1

1

3

3

1

22

3

13

3

3

2

3

3

1

3

3

1

1

3

3

2

3

3

2

11

3

2

2

1

2

1

1

33

2

1

3

2

23

2

3

3

3

2

2

1

1

3

1

3

1

2

3

1

3

1

3

1

2

2

2

3

3

2

3

1

3

3

2

1

2

3

1

1

2

3

2

1

3

2

3

3

2

2

3

2

1

3

2

2

2

2

3

3

2

33

22

3

2

3

3

3

3

2

3

2

3

3

2

1

33

2

2

1

3

12

3

2

3

2

3

2

2

3

1

3

3

2

3

3

1

3

3

1

2

3

3

3

3

13

3

3

1

2

1

2

3

1

3

1

3

1

2

2

3

3

3

3

3

2

2

1

3

1

33

2

3

1 1

3

1

3

1

3

12

12

3

3

1

1

3

1

3

1

1

3

3

2

3

3

3

3

2

2

3

3

2

3

2

33

1

1

2

1

2

1

2 1

1

3

3

1

33

1

3

1

1

3

1

3

3

3

2

2

2

1

2

3

3

3

33

1

3

1

1

3

2

2

3

3

2

2

3

3

2

3

3

1

3 3

2

1

3

3

22

3

1

3

2

11

1

3

1

3

3

1

1

3

2

3

2

1

11

1

33

3

3

3

1

3

3

2

2

3

21

1

1

3

1

3

2

1

1

3

11

3

1

3

2

3

2

3

1

3

1 1

3

2

3

3

3

1

2

3

3

2

3

3

1

33

1

3

1

3

1

3

3

2

2

3

3

1

1

2

1

33

2

2

1

1

1

2

1

3

2

1

2

33

3

1

3

2

2

3

3

3

2

2

3

1

3

3

1

1

3

1

1

1

3

2

3

1

2

1

2

21

3

2

2

2

3

3

3

3

1

3

3

11

3

2

3

1

1

1

3

3

1

3

0 20 40 60 80 100 120

020

4060

8010

012

0

x

y

9 9

999

99

999

999999

9

9

99 99

999

99

9

99

9

99 99

6

7

5

6 6

7

6

5

61

51

55

61

655

66

65

1

5

7

78

7

8

a7

a

8

a

7

8

a

77

a

2

2 22 8 82

222

222

2

222

4

4

4

4

4

444 44

4

44

44 44

4444 44

4

4

444 44

444

4

44

4

4

44

44

4

4444

44

4444

4

444

4

4

44 4444

4

444

44

444

4

4

23 22

33

33

33

3

33

33

3

33 3

3

3

33

33

3

3

3 3

33

3

3

333

3

3

3

33

33

33

3

33

33

33

3

3

33

3

333

3

33

3

3 33

−5 0 5 10

−5

05

dc 1

dc 2

Christian Hennig Tutorial on mixture models (2)

Page 81: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

trigona10 <- clusterboot(trigonadata,B=20,multipleboot=FALSE, clustermethod=kmeansCBI, k=10)

Clusterwise Jaccard bootstrap (omitting multiple points) mean:[1] 0.1810657 0.8314384 0.6411663 0.7275444 0.5460139

0.5229203 0.4847722[8] 0.3607726 0.7861804 0.3430862Clusterwise Jaccard replacement by noise mean:[1] 0.1656969 0.8172386 0.7362327 0.8139100 0.3887676

0.4390961 0.5134999[8] 0.3692982 0.9522727 0.2937549

Christian Hennig Tutorial on mixture models (2)

Page 82: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Good about k-means: it’s fast!Good for small distances to cluster means.Versions: k-medoids, trimmed k-means.Fixed partition criteria available for other covariance models.

Christian Hennig Tutorial on mixture models (2)

Page 83: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

“CEM”-algorithm in flexmix (Celeux and Govaert 1992):

Q(q) =

s∑

i=1

n∑

j=1

τ(q)ij (log π

(q)i + log f (q)

i (x j)),

replace τij by 1(κ(i) = j). Faster but biased.Fixed partition log-likelihood:

Lfp(X) =

s∑

i=1

n∑

j=1

1(κ(i) = j) log fi(x j).

(k-means favours clusters of similar size.)

Christian Hennig Tutorial on mixture models (2)

Page 84: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

6.2 Agglomerative hierarchical methods

Definition 12. Let δ : P(X) × P(X) 7→ IR+0 be a dissimilarity

measure between data subsets. LetCn = {{x} : x ∈ X}, hn = 0. For k = n − 1, . . . , 1:

(Ak , Bk ) = arg minA,B∈Ck+1

δ(A, B), hk = δ(Ak , Bk ),

Ck = {Ak ∪ Bk} ∪ Ck+1 \ {Ak , Bk}.

C =⋃n

k=1 Ck is called

a) Single linkage hierarchy, ifδ(A, B) = δS(A, B) = min

xi∈A,xj∈Bdij ,

b) Complete linkage hierarchy, ifδ(A, B) = δC(A, B) = max

xi∈A,xj∈Bdij ,

Christian Hennig Tutorial on mixture models (2)

Page 85: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Single

linkage:

91911171226232235141521213242733 1286203431102545 1872983016332 8182475351455955445256363940574250 5446494338485860 65706873 6676726369 61677464713775 416277787980 94961039997100101 10510698102 95104 168136130169116117138148146133115143127131132144126145111122167166134152165142121150118124151113120109112164110155139140157137160153162108114128156129147161163158107154125149119159141123135 185222234236231224227235233223228229230225232 226177199219182189220195206208205197198178190200207181183179193196176175184188186211201203215191180213 194210216202192212187218209214217221174204 171170172173 9083858884878689939192

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Cluster D

endrogram

hclust (*, "single")dist(trigonadata)

Height

11

111

11 1

111

111 1

11 1

11

111

111

11

111

1111

2

3

2 22

3

22

2 22

2

2 22 22

2 222

22

22

3

3

3

3

4 5

35 3

5

3

3

5

3

3

5

6 666

78

9

99 9

9

99

a9 99

b

b

b

bb

b

bb

b

bbb b

ccccc cc c

cccc

cccc

cc

ccc

c cc

cc

cccc

cc

cccc cc

cc

ccc cc

cc

cc cc

cc

c c cc cc

ccc

c

d

d

d d

c

cc

cc

c

ccc

cc

c

c

c

c cc

cc

c

c

c

c

c

cc cc c

c

ccc

c

c

c

c

c c

c cc

ccc

ccc

cc

c

c

c

c

c cc

c

c c

cc

c

−20

−15

−10

−5

05

10

−10 −5 0 5 10

dc 1

dc 2

Getpartition

bycutting

dendrogramatfixed

hor

s.

Christian

Hennig

Tutorialonm

ixturem

odels(2)

Page 86: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Com

pletelinkage:

12834620311025 23123087291633218452235212132427339191117261415 81824858603843544649475344525642503657553940514559 65726369 6676706873 7464713775 61674162185197189196198178190181183179193206208220216218201203186211175176184205200207180213191215202192212187214 199195182188194210217219 226177209221174204 9699106 9710010110598102 9510494103222231224227229230223228225232236234233235 168128107154124151156117169130138148118161163129147158164119159108114157160153162139140110137155 136126145141123135125165134152122166167116132144146133115143127131149142121150111113120109112 77787980 171170172173 9083899192 858788848693

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Cluster D

endrogram

hclust (*, "complete")

dist(trigonadata)

Height

11

111

11 1

111

1 1111

111

11

1111

11

11

11

1111

2 3

2

22

32

22 2

2

22 22 222

22

22

2

2

2

3

33

3

4

3

3

3

3

3

3

3

3

3

3

3

55

55

66

7

7

7 7

7

77

87 77

9

9

9

99

9

99

9

999 9

aaaaa aa a

a aaa

aaa

aa

aaa

aa

aa

aa aaa

a

aa

aaa

a aaa

aa

a

a aa

a

aa

a aaa

aa a aa a

aa

aa

a

aa a a

aa

aa

aa

aaa

aa

a

a

a

aa

aa

a

a aa

aa

aa

aa

aaa

aa

a

aa

a

a

a

a

aa

aaa

aaa

aa

a

a

aa

a aa

a

a a

aa

a

−15

−10

−5

05

10

−5 0 5dc 1

dc 2

Christian

Hennig

Tutorialonm

ixturem

odels(2)

Page 87: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Single linkage:Clusters are merged whenever there is a single small distancebetween them.Resulting clusters are separated.All between-cluster distances are large.

Complete linkage:Clusters are not merged whenever there is a single largebetween-cluster distance.Resulting clusters are homogeneous.All within-cluster distances are small.

Christian Hennig Tutorial on mixture models (2)

Page 88: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

2 3 4 5 6 7

−2

−1

01

23

zaa[,1]

zaa[

,2]

2 3 4 5 6 7−

2−

10

12

3

zaa[,1]

zaa[

,2]

2 3 4 5 6 7

−2

−1

01

23

zaa[,1]

zaa[

,2]

. . . both are often too extreme for real situations, but for someapplications they are methods of choice.

Christian Hennig Tutorial on mixture models (2)

Page 89: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

49022298 46996117598 349281381508414201552136357 392451481583 58939044459221 503166284 75264233594198222 36540245288 596554582193341 99111272456 1168127309 16952662506206482 3452737431957160 46810106330224487356248247359 296312283180232230584 604586307466 31437158395568 276118308553361608100255291546 7655160393332 239547393419 46419321 17728728243636433133252433457555272684611891523254032205631734841083261643161193281504055530232220797234416306588 577114528 23278430530182425 4831484931993525139751818529121149285344191138402 338592 8422824135821731329338196337 28437 103128 6197564174271412 405606 496124480304342 82578 59170142 24543186587537556 304623842674741303913487491 4175443926104 533396305334475 77377188192431585185479478589 376449333207422515 2380505548 401137445170388 60710245410543141599176404317259223367 72435200412476301512109516 20481600432151499 88426514 339916310737117992112154094523984596715338522759514273 365492 45489575 236243372154411161542211421561125471 351318120144263139580 448260601 3601552313245394417329440647291453 44728659786354 37853149713546315387 46071400902565223735691012262953300605 25115348511146310 541133485 56627511219526255914740730343442167343 581183250463833125450941175535 3201783911389336442 35210280375 397242602 3277162122131335370525258205488212428 5249516858362329113140350590261408538246567 379237610465494 8083 29250741052027452464532 34038665203 15734617510 1334756458269502 299366486 238244413418 369572251498 225166015995501129253 3448450570327190290 1233539416556032214295534427609399593478544 49420500216209218 415565695194 5154535514357355042942374443202265473394424 17247036851798455 2973632406155778187 27924921579 446825766289446536549 382208311 270213504477558 181126229 57663266438521 145467571574 32351931521913217156250523 235440110156184

0.00 0.02 0.04 0.06 0.08 0.10 0.12

Cluster D

endrogram

hclust (*, "single")dist(hdata)

Height

0.00.2

0.40.6

0.81.0

0.0 0.2 0.4 0.6 0.8 1.0

h1

h2

0.00.2

0.40.6

0.81.0

0.0 0.2 0.4 0.6 0.8 1.0

h1

h2

0.00.2

0.40.6

0.81.0

0.0 0.2 0.4 0.6 0.8 1.0

h1

h2

Christian

Hennig

Tutorialonm

ixturem

odels(2)

Page 90: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

7. Mixtures of generalised linear models: basics7.1 The model

Z = (y, X), where Z = (z1, . . . , zn)t ,

y = (y1, . . . , yn), X = (x1, . . . , xn)t ,

yi ∈ IR, x i ∈ IRp, i = 1, . . . , n. X fixed.yi random and independent.

Clustering idea: data belong together if they are characterisedby the same way of how y depends on x.Points do not need to be similar to belong together.

Christian Hennig Tutorial on mixture models (2)

Page 91: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

hθ(y |x) =

s∑

k=1

πk fβtk x,σk

(y),

In standard GLM notation: βtkx = g(µk), µk = Ek (y |x).

Usually σ2 = Vark (y |x), not always needed:Poisson regression:fβt x Poisson (λ)-density,λ = exp(βtx) = E(y |x) = Var(y |x), or g(µ) = log(µ).

Some β-parameters may be equal across components.

Christian Hennig Tutorial on mixture models (2)

Page 92: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

More generally:

hθ(y |x) =s

k=1

πk (x, α)fβtk x,σk

(y), πk (x, α) =exp(αt

kx)∑s

u=1 exp(αtux)

,

where α = (α1, . . . , αk ), αi = (αi0, . . . , αip)t ∈ IRp+1 is part of θ.

Variables on which α depends could differ from those on whichβt

k x (“concomitant variables”, flexmix terminology),but could be the same as well.

Christian Hennig Tutorial on mixture models (2)

Page 93: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

7.2 An example

Clusterwise regression: fβtk x,σk

(y) = ϕβtk x,σk ,σ2(y),

g identity function, E(y |x) = βtkx.

1.5 2.0 2.5 3.0

1.5

2.0

2.5

3.0

3.5

tonedata$stretchratio

tone

data

$tun

ed

Christian Hennig Tutorial on mixture models (2)

Page 94: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

43

3

44

3333333332 3

33

33

33

3

4

33

33

3

3

2 2 2 2 2

111132

1

32223 2

2 2 2 2

3

2

3

4

4

2 2

4

1

2 2 22

333

4

33333323

4

3

1

2 2 2 2

4

1

11

222 2 2 2 2

333331333233

2 2 21 2 2 2

2 22 2 2 22

12 1

21111

1

3333433

1

2 2 2 2 2 21 2

4

1 1

4

1.5 2.0 2.5 3.0

1.5

2.0

2.5

3.0

3.5

tonedata$stretchratio

tone

data

$tun

ed

1 2 3 4 5

−40

0−

300

−20

0−

100

0

number of components

AICBICICL

Christian Hennig Tutorial on mixture models (2)

Page 95: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

7.3 Identifiability

No two non-equivalent parameter vectors should parameterisethe same distribution.Depends on “equivalence” of parameter vectors.Mixtures: equivalence allows for permutation of components.Restrict parameter space to exclude equal parameters fordifferent components, zero proportions.

Identifiability is necessary for consistency.In practice, it’s necessary for interpreting parameters.Problems arise for dummay variables, near non-identifiability.

Christian Hennig Tutorial on mixture models (2)

Page 96: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Aspects of identifiability for GLM mixtures:◮ Identifiability of univariate mixture for given x.

(Fulfilled for Gaussian, Poisson, not necessarily Binomial.)◮ x allow identification of β (and α).

s = 1: (XtX)−1 must exist.Hennig (2000):for regression mixtures with assignment independence,s < h number of (p − 1)-dimensional hyperplanes neededto cover X required.Grun and Leisch (2008): same for GLM.Assignment dependence (with α): more complicated.

Christian Hennig Tutorial on mixture models (2)

Page 97: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

0.0 0.2 0.4 0.6 0.8 1.0

−2

02

46

8

x

y

0.0 0.2 0.4 0.6 0.8 1.0

−2

02

46

8

x

y

Christian Hennig Tutorial on mixture models (2)

Page 98: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

7.4 ML-estimation and the EM-algorithm

Nothing essentially new here.flexmix uses random initialisation.Spurious or degenerating solutions for σ → 0.Use stepFlexmix for optimising over several initialisations.Use set.seed for making results reprodocible.

Christian Hennig Tutorial on mixture models (2)

Page 99: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

E-step: Given θ(i), for j = 1, . . . , n, k = 1, . . . , s,

τ(i+1)jk =

πk(x j , α(i))f

β(i)tk x j ,σ

(i)k

(yj )∑s

u+1 πu(x j , α(i))fβ

(i)tu x j ,σ

(i)u

(yj).

M-step: Given θ(i) through τ(i+1)jk , the expected

log-likelihood

Q(θ(i+1)|θ(i)) = Q1(β(i+1), σ(i+1)|θ(i)) + Q2(α

(i+1)|θ(i)),

Q1(β(i+1), σ(i+1)|θ(i)) =

n∑

j=1

s∑

k=1

τ(i+1)jk log(f

β(i+1)tk x j ,σ

(i)u

(yj)),

Q2(α(i+1)|θ(i)) =

n∑

j=1

s∑

k=1

τ(i+1)jk log(πk (x j , α

(i+1))).

Christian Hennig Tutorial on mixture models (2)

Page 100: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Maximise Q1 and ‘Q2 separately.Q1 gives new β(i+1), σ(i+1)

from WML (LS for Gaussian regression).

Q2 gives new α(i+1) using WML of multinomial logit models.Without α-parameters

π(i+1)k =

n∑

j=1

τ(i+1)jk , k = 1, . . . , s.

Christian Hennig Tutorial on mixture models (2)

Page 101: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

7.5 Model selection

Can use standard AIC and BIC.AIC overestimates asymptotically withnonzero probability.BIC probably consistent. (No strong theory.)

flexmix also offers ICL, where cluster indicators replace τij inlikelihood/BIC (related to CEM algorithm).ICL often produces smaller s.

Christian Hennig Tutorial on mixture models (2)

Page 102: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

flexmix implicit model selection by bounding πk from below(0.05) in algorithm.BIC plot shows s = 5 in tone example,but only 4 components fitted.Specify minprior in control -list.

Good if clusters are supposed to be largebut messy with outliers(may want to delete them first).

Generally, many local optima can be found.Increase nrep in stepFlexmix .

Christian Hennig Tutorial on mixture models (2)

Page 103: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

8. Mixtures of linear models8.1 Fit and visualisationTone perception example.

Christian Hennig Tutorial on mixture models (2)

Page 104: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

> set.seed(73579)> tonefm <- stepFlexmix(tuned˜stretchratio,data=tonedata,k=1:5,nrep=3)> plot(tonefm)

> tonefm

Call:stepFlexmix(tuned ˜ stretchratio, data = tonedata, k = 1:5, nrep

iter converged k k0 logLik AIC BIC ICL1 2 TRUE 1 1 9.3 -12.7 -3.7 -3.72 15 TRUE 2 2 141.1 -268.3 -247.3 -210.93 41 TRUE 3 3 154.6 -287.3 -254.2 -189.24 55 TRUE 4 4 161.8 -293.6 -248.5 -138.75 85 TRUE 4 5 246.0 -462.0 -416.8 -357.6

Christian Hennig Tutorial on mixture models (2)

Page 105: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

> tonefm4 <- getModel(tonefm,which="BIC")> plot(tonefm4)> summary(tonefm4)

Call:stepFlexmix(tuned ˜ stretchratio,data = tonedata, k = 5, nrep = 3)

prior size post>0 ratioComp.1 0.221 23 123 0.1870Comp.2 0.333 57 109 0.5229Comp.3 0.345 57 63 0.9048Comp.4 0.101 13 144 0.0903

’log Lik.’ 246.0216 (df=15)AIC: -462.0433 BIC: -416.8838

Christian Hennig Tutorial on mixture models (2)

Page 106: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Trystr(tonefm4)tonefm4@clusterparameters(tonefm4)

Christian Hennig Tutorial on mixture models (2)

Page 107: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Rootogram of posterior probabilities > 1e−04

0

20

40

60

80

100

Comp. 1

0.0 0.2 0.4 0.6 0.8 1.0

Comp. 2

0.0 0.2 0.4 0.6 0.8 1.0

Comp. 3

0

20

40

60

80

100

Comp. 4

Christian Hennig Tutorial on mixture models (2)

Page 108: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

> rtonefm4 <- refit(tonefm4)> summary(rtonefm4)$Comp.1

Estimate Std. Error z value Pr(>|z|)(Intercept) 1.796465 0.102142 17.5879 < 2e-16 ***stretchratio 0.086634 0.038654 2.2413 0.02501 *

$Comp.2Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.980572 0.028561 69.3455 <2e-16 ***stretchratio 0.018846 0.013008 1.4487 0.1474

$Comp.3Estimate Std. Error z value Pr(>|z|)

(Intercept) 0.0035112 0.0039481 0.8894 0.3738stretchratio 0.9987576 0.0018175 549.5323 <2e-16 ***

$Comp.4Estimate Std. Error z value Pr(>|z|)

(Intercept) -0.086369 0.256562 -0.3366 0.7364stretchratio 0.987769 0.108148 9.1335 <2e-16 ***> plot(rtonefm4)

Christian Hennig Tutorial on mixture models (2)

Page 109: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

stretchratio

(Intercept)

Comp. 1

−0.5 0.0 0.5 1.0 1.5 2.0

Comp. 2

−0.5 0.0 0.5 1.0 1.5 2.0

Comp. 3

stretchratio

(Intercept)

Comp. 4

Confidence intervals are not correct!(And level unclear, probably 0.95.)

Christian Hennig Tutorial on mixture models (2)

Page 110: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

plotregcluster -function in mixturetutorial.R

43

3

44

3333333332 3

33

33

33

3

4

33

33

3

3

2 2 2 2 2

111132

1

32223

22 2 2 2

3

2

3

4

4

2 2

4

1

2 2 22

333

4

33333323

4

3

1

2 2 2 2

4

1

11

22

2 2 22 2

333331333233

2 2 21 2 2 2 2 2

2 2 2 2

2

1

21

2

11111

3333433

1

2 2 2 2 2 21 2

4

1 1

4

1.92 1.94 1.96 1.98 2.00 2.02 2.04 2.06

−0.

50.

00.

51.

01.

5

fit

resi

dual

Christian Hennig Tutorial on mixture models (2)

Page 111: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

43

3

44

3333333332 3

33

33

33

3

4

33

33

3

3

2 2 2 2 2

111132

1

32223 2

2 2 2 2

3

2

3

4

4

2 2

4

1

2 2 22

333

4

33333323

4

3

1

2 2 2 2

4

1

11

222 2 2 2 2

333331333233

2 2 21 2 2 2

2 22 2 2 22

12 1

21111

1

3333433

1

2 2 2 2 2 21 2

4

1 1

4

2.005 2.010 2.015 2.020 2.025 2.030 2.035

−0.

50.

00.

51.

01.

5

fit

resi

dual

Christian Hennig Tutorial on mixture models (2)

Page 112: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

4

33 33333333

32

3

3

3 33

3 3 33

3 3 3 3 3

11

1

1

3

2

1

322

23

33

4

33

333333

3

2

3

4

3

33333

1

333

2

33

1

1

1

1

1

33

3

3

4

33

4

1.5 2.0 2.5 3.0

−0.

10−

0.05

0.00

0.05

0.10

fit

resi

dual

Christian Hennig Tutorial on mixture models (2)

Page 113: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

4

3 3

4 4

3333333332 33 3 3 3 3 3 3

4

3 3 3 3 3 3

22

2

22

111

132

1

32223

2

22

2 2

3

2

3

4

4

2 2

41

22

2 2

333

4

33333323 4 3

12

22

2

4

1

1

1

2

2

22

2

22

333331333233

2

22

1 22

22

2

22

2

2

2

12

1 2

1111

1

3333433

1 22

22

2

21

2

4

1

1

4

1.5 2.0 2.5

−0.

50.

00.

5

fit

resi

dual

Christian Hennig Tutorial on mixture models (2)

Page 114: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

8.2 Concomitant variables and assignment dependence

1

1

1

1

1

2

2

2

2

22 1

22

21

21

2 3 4 5 6 7 8

8010

012

014

016

018

020

0

power

first

_flig

ht

2

2

2

2

2

2

2

2

1

11 1

11

11

11

2 3 4 5 6 7 880

100

120

140

160

180

200

power

first

_flig

ht

Christian Hennig Tutorial on mixture models (2)

Page 115: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

> jetflexchange <- flexmix(first_flight˜power,data=jjet,k=2,concomitant=FLXPmultinom(˜ first_flight))

> rjet <- refit(jetflexchange)> summary(rjet,which="concomitant")$Comp.2

Estimate Std. Error z value Pr(>|z|)(Intercept) 110.180 248.945 0.4426 0.6581power -40.055 90.592 -0.4421 0.6584

Probability for component 2:(exp(110.18-40.55 * x)/(exp(110.18-40.55 * x+1)x=2.7 => 0.88x=2.8 => 0.12 $

Christian Hennig Tutorial on mixture models (2)

Page 116: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

8.3 Mixtures for discrete random effects

Univariate regression with random effect:i = 1, . . . , I subjects, j = 1, . . . , J measurementsx1, . . . x4 the four ages,b1, . . . , bI be the subject-wise random effects.

yij = β0 + β1xj + bi + e, e ∼ N (0, σ2), bi ∼ N (0, σ2b).

This is a continuous mixture.

Christian Hennig Tutorial on mixture models (2)

Page 117: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

8 9 10 11 12 13 14

2025

30

age[Subject == levels(Subject)[1]]

dist

ance

[Sub

ject

==

leve

ls(S

ubje

ct)[

1]]

Christian Hennig Tutorial on mixture models (2)

Page 118: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

> ortholme <- lme(distance˜age,random=˜1|Subject)> summary(ortholme)Linear mixed-effects model fit by REML

Data: NULLAIC BIC logLik

455.0025 465.6563 -223.5013

Random effects:Formula: ˜1 | Subject

(Intercept) ResidualStdDev: 2.114724 1.431592

Fixed effects: distance ˜ ageValue Std.Error DF t-value p-value

(Intercept) 16.761111 0.8023952 80 20.88885 0age 0.660185 0.0616059 80 10.71626 0

Christian Hennig Tutorial on mixture models (2)

Page 119: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

flexmix model:

hθ(y |x) =

s∑

k=1

πk fβtk x,σk

(y),

with slope βk1 constant over components,intercept βk0 ≃ β0 + bi takess different values with probabilities π1, . . . , πs.

Christian Hennig Tutorial on mixture models (2)

Page 120: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

set.seed(444444)orthomixrandom <- stepFlexmix(distance˜1|Subject,k=1:4,model=FLXMRglmfix(fixed=˜age))

> table(ortho3@cluster,Sex)Sex

Male Female1 28 42 36 243 0 16

Christian Hennig Tutorial on mixture models (2)

Page 121: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

8 9 10 11 12 13 14

2025

30

age

dist

ance

Christian Hennig Tutorial on mixture models (2)

Page 122: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

> summary(ortho3)

Call:stepFlexmix(distance ˜ 1 | Subject, model = FLXMRglmfix(fi xed =

k = 3)

prior size post>0 ratioComp.1 0.322 32 96 0.333Comp.2 0.529 60 80 0.750Comp.3 0.149 16 40 0.400

’log Lik.’ -217.9577 (df=9)AIC: 453.9155 BIC: 465.578

> parameters(ortho3)Comp.1 Comp.2 Comp.3

coef.age 0.6601852 0.6601852 0.6601852coef.(Intercept) 19.0737997 16.3302297 13.2781871sigma 2.3542300 1.0952854 1.4695694

Christian Hennig Tutorial on mixture models (2)

Page 123: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

9. Mixtures of generalised linear models

Fabric faults data,example for Poisson regression.

Grun and Leisch (2008c) fitfβt x Poisson (λ)-density with λ = exp(βtx) = E(y |x) = Var(y |x)and log(Length) as x-variable.Fix the slope of log(Length) to be equal across componentsand s = 2,

Christian Hennig Tutorial on mixture models (2)

Page 124: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

200 400 600 800

05

1015

2025

Length

Fau

lts

Christian Hennig Tutorial on mixture models (2)

Page 125: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

> set.seed(151515)> fabricmix <- stepFlexmix(Faults˜1,model=FLXMRglmfix(family="poisson",fixed=˜log(Length)),data=fabricfault,k=2,nrep=5)> summary(fabricmix)

prior size post>0 ratioComp.1 0.204 5 32 0.156Comp.2 0.796 27 32 0.844

’log Lik.’ -86.33121 (df=4)AIC: 180.6624 BIC: 186.5254

> parameters(fabricmix)Comp.1 Comp.2

coef.log(Length) 0.8012432 0.8012432coef.(Intercept) -2.3778178 -3.1424896

Christian Hennig Tutorial on mixture models (2)

Page 126: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

Try Length without log and without fitting slope:

200 400 600 800

05

1015

2025

Length

Fau

lts

Christian Hennig Tutorial on mixture models (2)

Page 127: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

> set.seed(151515)> fabricmix2 <- stepFlexmix(Faults˜Length,model=FLXMRglm(family="poisson"),data=fabricfault,k=2,nrep=5)> summary(fabricmix2)

prior size post>0 ratioComp.1 0.57 22 31 0.710Comp.2 0.43 10 31 0.323

’log Lik.’ -83.78301 (df=5)AIC: 177.5660 BIC: 184.8947

> parameters(fabricmix2)Comp.1 Comp.2

coef.(Intercept) 1.8367519820 -0.024776572coef.Length 0.0003043199 0.003569731

Christian Hennig Tutorial on mixture models (2)

Page 128: Tutorial on mixture models (2) - University College Londonucakche/presentations/cladagtutorial.pdf · Potential problems with mixture model-based clustering Using mclust (Gaussian

In principle, flexmix works in the same way as for clusterwiseregression.Just need suitable model. For example available as well:Binomial regression with logit link asmodel=FLXMRglm(family="binomial") .See insecticide dataset effect.dat, cladagex.R.GLMs need different residuals for diagnostic, though!

Christian Hennig Tutorial on mixture models (2)