likelihood-based finite mixture models for ordinal data · 2020. 2. 27. · 1. model-based...

74
Likelihood-Based Finite Mixture Models for Ordinal Data Daniel Fern´ andez Fundaci´ o Sant Joan de D´ eu Universitat Polit` ecnica de Catalunya [email protected] Seminari del Servei d’Estad´ ıstica Aplicada & Grup de Recerca Advanced Stochastic Modelling Universitat Aut` onoma de Barcelona Feb 27th, 2020

Upload: others

Post on 26-Feb-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

Likelihood-Based Finite Mixture Models forOrdinal Data

Daniel FernandezFundacio Sant Joan de Deu ⇒ Universitat Politecnica

de Catalunya

[email protected]

Seminari del Servei d’Estadıstica Aplicada & Grup de RecercaAdvanced Stochastic Modelling

Universitat Autonoma de BarcelonaFeb 27th, 2020

Page 2: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

Acknowledgments

I Servei d’Estadıstica Aplicada & Grup de Recerca AdvancedStochastic Modelling – Universitat Autonoma de Barcelona

I School of Mathematics and Statistics at Victoria University ofWellington, New Zealand

Prof. Richard Arnold Prof. Emer. Shirley Pledger AProf. Ivy Liu

Page 3: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

Outline

1 Background.I Ordinal data. Motivation.I Stereotype model. Definition and standard models.I Stereotype model including clustering.

2 Model fitting.

3 Example.I Level of depression data.I Ordinal data visualization: Spaced mosaic plots and fuzziness.

4 Bayesian inference approach: RJMCMC.

5 Summary.

Page 4: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Ordinal data

The response variable has ordinal categorical scalesOrdinal data is widely used in areas such as marketing, social,medical and ecological science.

I Pain scale:

I Likert scale: “strongly disagree”, “disagree”, “agree”, or“strongly agree” in a survey.

I Braun-Blanquet cover-abundance scale is very common invegetation analysis.

I Degree of dissimilarity among the different levels of the scaleis not necessarily always the same.

Page 5: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Ordinal Data and Goal

I Data represented as a matrix Y with dimensions n ×m (ncould be questions, m could be subjects) where

yij ∈ 1, . . . , q i = 1, . . . , n j = 1, . . . ,m q categories.

I Note: no covariates available.I For example, questionnaires to assess levels of depression:

I n = 13 questions (rows).I m = 151 individuals (columns).I q = 4 categories: 1 to 4, with higher scores indicating higher levels

of depression.

Page 6: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Ordinal Data and Goal

I For example, questionnaires to assess the level of depressionI n = 13 questions (rows).I m = 151 individuals (columns).I q = 4 categories: 1 to 4, with higher scores indicating higher levels

of depression.

I Goals:I Can we group patients/questions together?I Which questions or patients tend to be linked with higher

values of the ordinal response?

Page 7: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Motivation

I Minimal research on methods of clustering focusing onordinal data.

I Most of the current methods based on mathematicaltechniques (e.g. distance-based algorithms) ⇒ Neitherstatistical inference nor model selection.

I Recent work (Fernandez et al, 2016): fuzzy biclustering viafinite mixtures model for ordinal data ⇒ Statistical inferenceand model selection.

Page 8: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Motivation

I Minimal research on methods of clustering focusing onordinal data.

I Most of the current methods based on mathematicaltechniques (e.g. distance-based algorithms) ⇒ Neitherstatistical inference nor model selection.

I Recent work (Fernandez et al, 2016): fuzzy biclustering viafinite mixtures model for ordinal data ⇒ Statistical inferenceand model selection.

Page 9: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Motivation

I Minimal research on methods of clustering focusing onordinal data.

I Most of the current methods based on mathematicaltechniques (e.g. distance-based algorithms) ⇒ Neitherstatistical inference nor model selection.

I Recent work (Fernandez et al, 2016): fuzzy biclustering viafinite mixtures model for ordinal data ⇒ Statistical inferenceand model selection.

Page 10: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Motivation

Source: David Sontag, NYU

I Clusters may overlapI Some clusters may be

”wider” than othersI Distances can be

deceiving!I Try a probabilistic model

I allows overlapsI allows clusters of

different sizeI allows a soft/fuzzy

clustering

Page 11: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Motivation

Hard clustering Fuzzy clustering

Page 12: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Model-based clustering

I Model-based clustering: process of clustering via statisticalmodels, typically Finite Mixture Models (FMM).

I Finite mixture models: a way of clustering in order to reducedimensionality and identifying patterns related to theheterogeneity of the data (e.g. rows/columns with similareffect on the response)

Page 13: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Model-based clustering

red line - what we see

Page 14: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Model-based clustering

I Model-based clustering: process of clustering via statisticalmodels, typically Finite Mixture Models (FMM).

I Finite mixture models: a way of clustering in order to reducedimensionality and identifying patterns related to theheterogeneity of the data (e.g. rows/columns with similareffect on the response)

I Our research: model-based clustering for ordinal data, withcomponents within the FMM ⇒ Stereotype model.

Page 15: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Stereotype Model. Formulation

I Stereotype model (Anderson, J. A., 1984):

log(

P [yij = k | x]P [yij = 1 | x]

)= µk + (φkβ′)x k = 2, . . . , q

q − 1 log odds for categories k and 1. First category as abaseline.

I β: Assumes the parameter of the predictor regarding thecovariates is the same for all categories.

I φk : “score” for the response category k.

Page 16: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Stereotype Model. Formulation

I Stereotype model:

log(

P [yij = k | x]P [yij = 1 | x]

)= µk + (φkβ′)x k = 2, . . . , q

I Nothing in the stereotype model treats the response asordinal.

I Including an increasing order constraint (Anderson, J.A.,1984):

0 = φ1 ≤ φ2 ≤ · · · ≤ φq = 1 ,

captures the ordinal nature of the outcomes.I The model has received more attention, after Agresti (2010,

Ch.4) discussed the model in his book.

Page 17: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Stereotype Model. Formulation

I Stereotype model:

log(

P [yij = k | x]P [yij = 1 | x]

)= µk + (φkβ′)x k = 2, . . . , q

I Nothing in the stereotype model treats the response asordinal.

I Including an increasing order constraint (Anderson, J.A.,1984):

0 = φ1 ≤ φ2 ≤ · · · ≤ φq = 1 ,

captures the ordinal nature of the outcomes.I The model has received more attention, after Agresti (2010,

Ch.4) discussed the model in his book.

Page 18: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Stereotype Model. Scores φk Interpretation

Use the fitted score parameters φk: determining the spacingamong categories.

Level of depression data: φ0 = 0, φ1 = 0.347, φ2 = 0.853, φ3 = 1.

Page 19: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Stereotype Model. Scores φk Interpretation

Stereotype model for categories a and b:

log(

P [yij = a | x]P [yij = b | x]

)= log

(P [yij = a | x] /P [yij = 1 | x]P [yij = b | x] /P [yij = 1 | x]

)= (µa − µb) + (φa − φb)β′x .

0 = φ1 ≤ . . . φa ≤ · · · ≤ φb . . . ≤ φq = 1

I The larger the difference (φa − φb) ⇒ The more the odds of aand b are influenced by x

Page 20: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Stereotype Model. Scores φk Interpretation

Stereotype model for categories a and b:

log(

P [yij = a | x]P [yij = b | x]

)= log

(P [yij = a | x] /P [yij = 1 | x]P [yij = b | x] /P [yij = 1 | x]

)= (µa − µb) + (φa − φb)β′x .

0 = φ1 ≤ . . . φa ≤ · · · ≤ φb . . . ≤ φq = 1

I If φa = φb ⇒ the logit is the constant µa − µb⇒ The covariates x do not distinguish between a and b⇒ We could collapse the categories a and b in our data.

Page 21: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Stereotype Model. Software

I Stereotype model

I STATA module called SOREG (Lunt, 2001)I R package called ordinalgmifs (Archer et al. 2014)I R package VGAM (Yee, 2008) – it is not able to add the

monotonic constraint in the scoreI R package called clustord (Fernandez and Ryan, soon in

CRAN) https://github.com/vuw-clustering/clustordparameters.

Page 22: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Stereotype Model. Main effects

I Stereotype model:

log(

P [yij = k | x]P [yij = 1 | x]

)= µk + φkβ′x k = 2, . . . , q

I Build up β′x considering row and column effect of the yij(Fernandez et al. 2016).

Page 23: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Stereotype Model. Main effects

I Build up β′x considering row and column effect of the yij(Fernandez et al. 2016).

I Main effects model:

log(

P [yij = k]P [yij = 1]

)= µk + φk(αi + βj)

k = 2, . . . , q i = 1, . . . , n j = 1, . . . ,m

I αi : interpreted as the effect of the rows.I βj : interpreted as the effect of the columns.I Identifiability constraints:

∑i αi =

∑j βj = 0, µ1 = 0, and

0 = φ1 ≤ φ2 ≤ · · · ≤ φq = 1.

Page 24: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Model-based clustering

I Main effect model 2q + n + m− 5 independent parameters.

log(

P [yij = k]P [yij = 1]

)= µk + φk(αi + βj) k=2,. . . ,q

i=1,. . . ,n j=1,. . . ,m

I Avoid αi + βj that overspecifies the data structure ⇒Clustering via finite mixtures models in order to reducedimensionality (McLachlan, G. and Peel, D., 2000).

Page 25: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Model-based clustering - Column clustering

For example column clustering:

We change from main effects model

log(

P [yij = k]P [yij = 1]

)= µk + φk(αi + βj) j = 1, . . . ,m

to

log(

P [yij = k | j ∈ c]P [yij = 1 | j ∈ c]

)= µk + φk(αi + βc) c = 1, . . . ,C < m

where βc is interpreted as the effect of the column cluster c.

Page 26: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Model-based clustering - Column clustering

For example column clustering:

We change from main effects model

log(

P [yij = k]P [yij = 1]

)= µk + φk(αi + βj) j = 1, . . . ,m

to

log(

P [yij = k | j ∈ c]P [yij = 1 | j ∈ c]

)= µk + φk(αi + βc) c = 1, . . . ,C < m

where βc is interpreted as the effect of the column cluster c.

Page 27: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Model-based clustering. Biclustering

I General formulation of model-based clustering(biclustering):

log(

P [yij = k | i ∈ r , j ∈ c]P [yij = 1 | i ∈ r , j ∈ c]

)= µk+φk(αr + βc) k = 2, . . . , q

I αr : interpreted as the effect of the row cluster r .I βc : interpreted as the effect of the column cluster c.I Constraints: α1 = β1 = 0 (or

∑αr =

∑βc = 0) and

0 = φ1 ≤ φ2 ≤ · · · ≤ φq = 1.

I The formulation is similar to a latent class model.I Further, αr + βc can be extended to αr + βc + γrc .I The model provides a simultaneous fuzzy clustering of the

rows and columns.

Page 28: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Model-based clustering. Biclustering

I General formulation of model-based clustering(biclustering):

log(

P [yij = k | i ∈ r , j ∈ c]P [yij = 1 | i ∈ r , j ∈ c]

)= µk+φk(αr + βc) k = 2, . . . , q

I αr : interpreted as the effect of the row cluster r .I βc : interpreted as the effect of the column cluster c.I Constraints: α1 = β1 = 0 (or

∑αr =

∑βc = 0) and

0 = φ1 ≤ φ2 ≤ · · · ≤ φq = 1.

I The formulation is similar to a latent class model.I Further, αr + βc can be extended to αr + βc + γrc .I The model provides a simultaneous fuzzy clustering of the

rows and columns.

Page 29: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Model-based clustering - Column clustering

Main effects model for stereotype model had likelihood:

L(Ω | yij) =n∏

i=1

m∏j=1

q∏k=1

(P[yij = k])I[yij =k]

and with Column clustering model turns into:

L(Ω | yij) =m∏

j=1

[ C∑c=1

κc

n∏i=1

q∏k=1

(P[yic = k])I[yij =k]]

where κc is the proportion of columns in column group c.

Page 30: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Model-based clustering

I Problem: Missing information.I We do not know the actual membership in columns (rows) nor

the number of columns (rows).

κc is the proportion of columns in column group c.

Page 31: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

2. Model fitting

Model fitting

Page 32: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

2. Model fitting

I EM algorithm for finding the ML solution for the parametersof models with missing information (the actual unknowncluster membership of each row and column).

I Information criteria (AIC, BIC,...)

I Comprehensive simulation study (4500 scenarios) testing 12information criteria (Fernandez and Arnold, 2016)

Page 33: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

2. Model fitting

Table: Information criteria summary table

Criteria Definition Proposed for Depending on

AIC −2` + 2K

Regression models

Number of parameters

AICc AIC + 2K(K+1)n−K−1

AICu AICc + n log( nn−K−1 ) Number of parameters

and sample sizeCAIC −2` + K(1 + log(n))

BIC −2` + K log(n)

AIC3 −2` + 3K

Clustering

Number of parameters

CLC −2` + 2EN(R)Entropy

NEC(R) EN(R)`(R)−`(1)

ICL-BIC BIC + 2EN(R) Number of parameters, sample sizeand entropy

AWE −2`c + 2K(3/2 + log(n))

L −`− K2

∑log( nπR

12 )− Number of parameters, sample sizeR

2 log( n12 ) −

R(K+1)2 and mixing proportions

Notes: n represents the sample size, K the number of parameters, R the number of clusters, πR the mixing clusterproportion, ` the log-likelihood and EN(·) the entropy function.

Page 34: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

2. Model fitting. Simulation study

I Simulated data with true number of row clusters.

I General results: Percentage of cases for each criteriondetermines the true number of row clusters (Fit).

Page 35: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

2. Model fitting. One-dimensional Clustering

Table: Top 5. Overall results. One-dimensional clustering

Overall Results Scenario 1 Scenario 2 Scenario 3 Scenario 4 Scenario 5AIC 93.8% 91.4% 97.6% 88.0% 92.9% 99.1%AICc 89.8% 90.2% 94.8% 74.7% 91.1% 98.2%AICu 82.4% 79.0% 80.0% 66.7% 88.0% 98.2%AIC3 67.7% 61.7% 65.6% 56.7% 56.4% 98.2%BIC 43.7% 41.2% 39.1% 40.0% 39.6% 58.7%

Page 36: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

2. Model fitting. Biclustering

Table: Top 5. Overall results. Biclustering

Overall Results Scenario 1 Scenario 2 Scenario 3 Scenario 4 Scenario 5AIC 86.1% 89.2% 82.3% 80.5% 85.5% 92.8%AICc 85.6% 89.2% 81.5% 80.0% 84.5% 92.8%AICu 84.2% 84.8% 80.7% 79.3% 83.3% 92.8%AIC3 71.2% 75.8% 65.5% 64.7% 66.5% 83.3%BIC 36.5% 34.5% 35.2% 33.5% 32.3% 47.2%

Page 37: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

2. Model fitting

I EM algorithm for finding the ML solution for the parametersof models with missing information (the actual unknowncluster membership of each row and column).

I Information criteria (AIC, BIC,...)

I Comprehensive simulation study (4500 scenarios) testing 12information criteria (Fernandez and Arnold, 2016) ⇒ AIC isthe best criterion

Page 38: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

2. Model fitting

I Two possible Bayesian approaches:I “Fixed” dimension: Metropolis-Hastings and Gibbs sampler.I Variable dimension: Reversible Jump MCMC (RJMCMC,

Green, P. J., 1995)

I RJMCMC ⇒ Num. components (dimension) is a parameter.

I Convergence diagnostic: Castelloe and Zimmerman method.

Page 39: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

2. Model fitting packages

I Model-based clustering for ordinal dataI R package called clustord (Fernandez and Ryan, soon in

CRAN) https://github.com/vuw-clustering/clustord

I Model-based clustering for mixed-type dataI R package called clustMD (McParland and Gormley,2017)

https://cran.r-project.org/web/packages/clustMD/clustMD.pdf

I Model-based clustering for Gaussian dataI R package called mclust (Scrucca, 2019)

https://cran.r-project.org/web/packages/mclust/vignettes/mclust.html

Page 40: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Example

Level of Depression Data Set

Page 41: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Example. Level of Depression Dataset

I Patients admitted for deliberated self-harm at the medicaldepartments of 3 major hospitals in Eastern Norway.

I Questionnaire designed to assess the level of depression.I 13 questions(rows), 151 patients (columns).I Ordinal data: 4 categories. From 1 (lower level) to 4 (higher level)I For instance, ”Sadness”

yij =

1 I do not feel sad2 I feel sad most of the time3 I am sad all the time4 I am so sad or unhappy that I can’t stand it

I Possible research questions:I Can we group patients/questions together?I Which questions or patients are similar?I Which questions or patients tend to be linked with higher

values of the ordinal response?

Page 42: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Results Model Fitting - EM algorithm

Table: Level of Depression. Model Fitting (1/3)

Model R C npar AIC AICc BIC ICL.BICNull effects µk + φk 1 1 5 441.63 441.81 460.71 460.71Row effects µk + φkαi n 1 16 428.81 430.52 489.89 489.89

Column effects µk + φkβj 1 m 32 463.85 470.82 586.00 586.00Main effects µk + φk (αi + βj ) n m 43 422.54 421.50 547.67 547.67

Row Clustering

µk + φkαr

2 1 7 415.70 416.04 442.42 442.493 1 9 419.42 419.97 453.77 470.374 1 11 423.36 424.17 465.35 481.865 1 13 427.40 428.53 477.02 496.256 1 15 430.96 432.46 488.22 488.24

µk + φk (αr + βj )

2 m 34 431.02 438.92 560.80 572.873 n 20 435.91 444.82 573.33 594.324 n 22 439.57 449.55 584.62 593.905 n 24 443.91 455.03 596.60 599.436 n 26 447.69 460.02 608.01 618.21

µk + φk (αr + βj + γrj )

2 m 61 406.22 423.83 629.06 639.083 n 42 424.71 491.57 668.25 776.264 n 55 426.25 558.47 680.49 681.495 n 68 549.95 585.80 681.88 684.896 n 81 531.77 630.58 707.40 717.40

Page 43: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Results Model Fitting - EM algorithm

Table: Level of Depression. Model Fitting (2/3)

Model R C npar AIC AICc BIC ICL.BIC

Column Clustering

µk + φkβc

1 2 7 412.46 412.81 439.18 463.051 3 9 418.12 418.67 452.47 482.001 4 11 421.90 422.71 463.89 515.371 5 13 426.43 427.56 476.06 507.191 6 15 429.96 431.46 487.22 547.28

µk + φk (αi + βc )

n 2 18 410.13 415.81 520.82 526.18n 3 20 397.28 409.28 561.54 565.73n 4 22 401.23 413.55 607.22 609.89n 5 24 412.15 447.29 671.71 675.77n 6 26 460.91 513.21 770.10 772.98

µk + φk (αi + βc + γic )

n 2 29 534.06 538.66 664.21 669.38n 3 42 436.57 439.24 512.92 542.04n 4 55 440.43 443.66 524.41 549.82n 5 68 444.03 447.89 535.64 554.73n 6 81 450.14 454.68 549.38 595.48

Page 44: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Results Model Fitting - EM algorithm

Table: Level of Depression. Model Fitting (3/3)

Model R C npar AIC AICc BIC ICL.BIC

Biclustering

µk + φk (αr + βc )

2 2 9 421.76 422.31 456.11 498.312 3 11 419.64 420.20 454.00 490.752 4 13 425.74 426.88 475.37 549.882 5 15 431.31 432.81 488.56 572.193 2 11 423.22 424.03 465.20 517.863 3 13 476.66 477.79 501.77 526.293 4 15 439.87 441.37 497.13 522.803 5 17 435.21 437.13 500.10 567.884 2 13 482.98 484.11 492.13 532.604 3 15 433.70 435.20 490.96 550.304 4 17 435.22 437.14 500.11 571.154 5 19 464.04 466.44 536.56 568.45

µk + φk (αr + βc + γrc )

2 2 10 427.97 429.10 477.59 527.432 3 13 422.00 422.68 460.17 486.882 4 16 434.39 436.09 495.46 520.852 5 19 438.61 441.01 511.13 538.563 2 13 497.76 498.89 505.27 547.383 3 17 433.91 435.84 498.80 540.763 4 21 441.89 444.83 522.05 559.233 5 25 453.08 457.27 548.50 615.814 2 16 445.85 447.55 506.92 528.754 3 21 448.82 451.76 528.98 538.184 4 26 468.71 473.25 567.95 622.254 5 31 530.60 537.12 619.79 648.93

Page 45: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Results. Model Selection. AIC

I Best AIC model: Column clustering model with C = 3groups of patients

Page 46: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Results. Common Visualisation Tools

Figure: Level of Depression: Column Clustering with C=3 patient groups

Page 47: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Results. Common Visualisation Tools

Figure: Level of Depression C=3: Distribution in each group

The proportion of individuals in clusters that had at least oneepisode of DSH (deliberated self-harm, i.e. predictor of suicide(Hawron et al. 2013)) within 3 months is: 3.4%, 16%, and 28%.

Page 48: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Results. More Visualisation Tools

Use the fitted score parameters φk: determining the spacingamong categories.

Level of depression data: φ0 = 0, φ1 = 0.347, φ2 = 0.853, φ3 = 1.

Page 49: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Spaced Mosaic Plot (Fernandez et al, 2015)

I No rows (questions) orcolumn (indiv.) groups.

I Overall distribution ⇒Frequency of each ordinalcategory.

Page 50: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Spaced Mosaic Plot (Fernandez et al, 2015)

I No rows (questions) orcolumn (indiv.) groups.

I Overall distribution ⇒Frequency of each ordinalcategory.

I Level 2 ⇒ Most common.I Level 4 ⇒ Less common.

Page 51: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Spaced Mosaic Plot (Fernandez et al, 2015)

Figure: Level of depression data: Mosaic plot for stereotype model includingcolumn clustering model with C = 3 column (patient) clusters.

Page 52: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Spaced Mosaic Plot (Fernandez et al, 2015)

I Column clusters ⇒ 3 horiz. bands.I Height of each band ⇒

Proportional to number of patientsper group (C1=8.6 + 21.6 + 7.8 + 4.2 = 42.2%).

I Area in each block ⇒ Freq. of the 4ordinal categories per cluster (e.g.patients of C2 ⇒ strong preferenceresponse at Level 1)

I Horizontal separation betweenblocks ⇒ Spacing between theadjacent ordinal categ.(φ1 = 0, φ2 = 0.347, φ3 = 0.852, φ4 = 1)

I Level 3 and 4 are very similar:φ4 − φ3 = 1− 0.852 = 0.148

Page 53: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

3. Results. More Visualisation Tools. Fuzziness

Figure: Contour plot depicting the fuzzy clustering structure with C = 3patient clusters. The left figure is without any sorting and both axes are sortedby patient cluster on the right figure.

Probability two patients are classified in the same cluster.

Page 54: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

4.Bayesian Inference

Bayesian Inference Approach

Page 55: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

4. Developing RJMCMC. DAG

Figure: Directed acyclic graph: Hierarchical Stereotype Mixture Model.One–dimensional Clustering. ”TrGeometric” refers to a truncated Geometricdistribution.

Page 56: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

4. Developing RJMCMC. Sweep

Table: RJMCMC Moves

Block Move Param. Prop. Constants Pr(Move) Move Type1 σ2

µ νσµ = 3 δσµ = 40 1 M-HHyperpar. σ2

α νσα = 3 δσα = 40 1 M-Hσ2β νσβ = 3 δσβ = 40 1 M-H

2 µk σ2µp = 0.3 1 M-H

General φk 1 M-HParameters βj σ2

βp= 0.3 1 M-H

3 αr σ2αp = 0.3 pα = 0.35 M-H

Cluster πr σ2πp = 0.3 pπ = 0.35 M-H

Parameters Split p = 0.3 pS = p ρ1+ρ RJ

Merge pM = p 11+ρ RJ

Page 57: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

4. Developing RJMCMC. Split Step

I Split and Merge steps involve αR and πRI Steps have to be reversible and keep the constraints

(∑R

r=1 αr = 0,∑R

r=1 πr = 1)

I Split move:1 Draw u1, u2 ∼ U(0, 1) and one r ∈ 1, . . . ,R.2 New parameters:

α(t)r = u1α

(t−1)r α

(t)r+1 = (1− u1)α(t−1)

r

π(t)r = u2π

(t−1)r π

(t)r+1 = (1− u2)π(t−1)

r

3 Increase R by 1.4 Relabel r + 1, . . .R as r + 2, . . .R + 1

Page 58: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

4. Developing RJMCMC. Split Step

I Split and Merge steps involve αR and πRI Steps have to be reversible and keep the constraints

(∑R

r=1 αr = 0,∑R

r=1 πr = 1)

I Split move:1 Draw u1, u2 ∼ U(0, 1) and one r ∈ 1, . . . ,R.2 New parameters:

α(t)r = u1α

(t−1)r α

(t)r+1 = (1− u1)α(t−1)

r

π(t)r = u2π

(t−1)r π

(t)r+1 = (1− u2)π(t−1)

r

3 Increase R by 1.4 Relabel r + 1, . . .R as r + 2, . . .R + 1

Page 59: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

4. Developing RJMCMC. Merge Step

I Split and Merge move involve αR and πRI Moves have to be reversible and keep the constraints

(∑R

r=1 αr = 0,∑R

r=1 πr = 1)

I Merge move:1 Draw one random component r ∈ 1, . . . ,R − 1.2 Selecting the adjacent component r + 1.3 New parameters:

α(t)r = α(t−1)

r + α(t−1)r+1

π(t)r = π(t−1)

r + π(t−1)r+1

4 Reduce R by 1.5 Relabel r + 2, . . .R as r + 1, . . .R − 1

Page 60: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

4. Example. Level of depression data set. RJMCMC

Figure: Level of Depression: Dimension (Column) visits

Page 61: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

4. Example. Level of depression data set. RJMCMC

Figure: Level of Depression C=3: Distribution in each group

Page 62: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

5. Summary. Conclusions

I Clustering rows(columns) for ordinal data allows us to:I Describe data with fewer parameters than current methods.I Identify similar rows (i.e. questions.) and/or similar columns

(i.e. subjects).I Find an a posteriori classification.

I Likelihood-based stereotype models ⇒ Inferences andmodel comparison.

I Using the fitted score parameters φk among ordinalcategories, dictated by data

I Data visualisation tools for ordinal clustering data: spacedmosaic plots, fuzziness.

I Model fitting ⇒ EM algorithm (AIC), RJMCMC (Clustercomponent as a parameter).

Page 63: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

References

I Anderson, J. A. (1984). Regression and ordered categorical variables. JRSS Series B,46(1):1-30.

I Castelloe, J. and Zimmerman, D. (2002) Convergence assessment for RJMCMC samplers.Technical Report 313, SAS Institute, Cary, North Carolina.

I Fernandez, D., Pledger, S. and Arnold, R. (2014). Introducing spaced mosaic plots.Research Report Series. ISSN: 1174-2011. 14-3, MSOR, VUW, 2014.

I Fernandez, D., Arnold, R. and Pledger, S. (2016). Mixture-based clustering for the orderedstereotype model. CSDA. 93. 46-75.

I Green, P. J. (1995). Reversible jump MCMC computation and Bayesian modeldetermination. Biometrika, (82):711-732, 1995.

I McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley Series in Probability andStatistics.

I Pledger, S. and Arnold, R (2014). Multivariate methods using mixtures: Correspondenceanalysis, scaling and pattern-detection. CSDA

I Stephens, M. (2000). Dealing with label switching in mixture models. JRSS, Series B, 62,795-809.

Page 64: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

Thank you

Thank you for listening!

Page 65: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

References

I Anderson, J. A. (1984). Regression and ordered categorical variables. JRSS Series B,46(1):1-30.

I Castelloe, J. and Zimmerman, D. (2002) Convergence assessment for RJMCMC samplers.Technical Report 313, SAS Institute, Cary, North Carolina.

I Fernandez, D., Pledger, S. and Arnold, R. (2014). Introducing spaced mosaic plots.Research Report Series. ISSN: 1174-2011. 14-3, MSOR, VUW, 2014.

I Fernandez, D., Arnold, R. and Pledger, S. (2016). Mixture-based clustering for the orderedstereotype model. CSDA. 93. 46-75.

I Green, P. J. (1995). Reversible jump MCMC computation and Bayesian modeldetermination. Biometrika, (82):711-732, 1995.

I McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley Series in Probability andStatistics.

I Pledger, S. and Arnold, R (2014). Multivariate methods using mixtures: Correspondenceanalysis, scaling and pattern-detection. CSDA

I Stephens, M. (2000). Dealing with label switching in mixture models. JRSS, Series B, 62,795-809.

Page 66: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

Extra Slides

Page 67: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Stereotype Model. Response Probabilities

The stereotype model is also described in terms of the responseprobabilities

P [yij = k | x] = exp(µk + φk(β′x))∑q`=1 exp(µ` + φ`(β′x)) k = 1, . . . , q ,

where the probability for the baseline category is defined as,

P [yij = 1 | x] = 1−q∑`=2

exp(µ` + φ`(β′x)) .

Page 68: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

Reparametrization of the scores parameters

0 = φ1 ≤ φ2 ≤ · · · ≤ φq = 1

which we transform to

−∞ ≤ ν2 ≤ ν3 ≤ · · · ≤ νq−1 ≤ ∞ where νk = logit(φk ).

The previous expression may be redefined as,

−∞ ≤ ν2 ≤ ν2 + ez3 ≤ · · · ≤ νq−2 + ezq−1 ≤ ∞ ,

i.e.,νk = νk−1 + ezk for−∞ < zk <∞, k = 3, . . . , q − 1 .

- The inverse parametrization:

φk =

0 k = 1

11+e−ν2

k = 2

expit[

logit(φ2) +∑q

`=3ez`]

k = 3, . . . , q − 11 k = q

. (1)

Page 69: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

Stereotype reformulated as adjacent categories logit

log(

P [yij = k | x]P [yij = k + 1 | x]

)= (µk − µk+1) + (φk − φk+1)δ′x = ηk + ϑk δ

′x k = 2, . . . , q

whereηk = µk − µk+1 k = 1, . . . , q − 1

and the relation between φk and ϑk is defined by

ϑk = φk − φk+1 k = 1, . . . , q − 1

and

φk =q−1∑t=1

ϑt k = 1, . . . , q − 1 .

Adjacent-categories logit model is a particular case of the ordered stereotypemodel when ϑk is a constant such that ϑk < 1(i.e.,φk are fixed and equally spaced)

Page 70: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

Weighted average of the fitted scoresI Fitted response probabilities with the estimated parameters over the R

groups and the q groups

P[yij = k | i ∈ r ] =exp(µk + φk (αr + βj ))∑q`=1

exp(µ` + φ`(αr + βj ))

i = 1, . . . , n j = 1, . . . ,m k = 1, . . . , q r = 1, . . . ,R .

I Weighted average over the q categories for each row cluster

y rij =

q∑k=1

k × P[yij = k | i ∈ r ]

i = 1, . . . , n j = 1, . . . ,m r = 1, . . . ,R .

I Weighted average by using the fitted conditional probabilities zir

y ij =R∑

r=1

zir × y rij i = 1, . . . , n j = 1, . . . ,m .

I Mean y ij over the m columns

y i. =1m

m∑j=1

y ij i = 1, . . . , n .

Page 71: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Finite Mixtures with Stereotype ModelExample: EM - Row clusteringDefine the unknown group membership as latent variables:

Zir = I[i ∈ r ] (i = 1, . . . , n, r = 1, . . . ,R), that follows:∑Rr=1 Zir = 1, and (Zi1, . . . ,Zir ) ∼ Mult(1;π1, . . . , πR )

E-Step:The indicator latent variables fulfill the following convenient identity:∏R

r=1 aZiri =

∑Rr=1 ai Zir for any ai 6= 0.

`c (Ω | yij, Zir) =n∑

i=1

R∑r=1

Zir log(πr ) +n∑

i=1

m∑j=1

q∑k=1

R∑r=1

Zir I(yij = k) log(θrjk),

where Ω are parameters, θrjk = P [yij = k | i ∈ r ], and Zir = E[Zir |yij].

Applying Bayes’ rule at iteration t:

Z (t)ir = E[Zir |yij] = P[Zir = 1|yij] =

P[yij|Zir = 1]P[Zir = 1]∑R`=1 P[yij|Zi` = 1]P[Zi` = 1]

(t−1)r

∏mj=1

∏qk=1

(t−1)rjk

)I(yij =k)

∑Rl=1

π

(t−1)l

∏mj=1

∏qk=1

(t−1)ljk

)I(yij =k) .

Page 72: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Finite Mixtures with Stereotype ModelM-Step:Two separate parts, πr and the remaining parameters:

1. MLE for πr :

π(t)r = 1

n

n∑i=1

E[Zir | yij,Ω(t−1)

]= 1

n

n∑i=1

Z (t)ir , r = 1, . . . ,R.

2. Remaining parameters Ω: numerically maximize the conditional

expectation of the complete data log-likelihood `c :

Ω = argmaxΩ

n∑i=1

m∑j=1

q∑k=1

R∑r=1

Zir I(yij = k) log (θrjk)

,We repeat the two step iteration of the EM algorithm until

convergence.

Page 73: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

4. Developing RJMCMC. Priors

Table: RJMCMC Prior and Hyperparameters

Parameter Prior Distribution Hyperparameters

σ2µ InverseGamma

(νσµ , δσµ

) νσµ = 3δσµ = 40

µk N (0, σ2µ)

φk Dirichlet(λφ) λφ = 1

σ2α InverseGamma (νσα , δσα ) νσα = 3

δσα = 40

αr DegenNormal(R; 0, σ2α)

σ2β InverseGamma

(νσβ , δσβ

) νσβ = 3δσβ = 40

βj DegenNormal(m; 0, σ2β)

γrj DegenNormal(R,m; 0, σ2γ) σ2

γ = 5

πr Dirichlet(λπ) λπ = 1

Page 74: Likelihood-Based Finite Mixture Models for Ordinal Data · 2020. 2. 27. · 1. Model-based clustering I Model-based clustering: process of clustering via statistical models, typically

1. Model-based clustering. Biclustering

I General formulation of model-based clustering(biclustering):

log(

P [yij = k | i ∈ r , j ∈ c]P [yij = 1 | i ∈ r , j ∈ c]

)= µk+φk(αr + βc) k = 2, . . . , q

I Probability of the data response yrc being equal to thecategory k:

P [yrc = k] = exp(µk + φk(αr + βc))∑q`=1 exp(µ` + φ`(αr + βc))