overlapping clustering models, and one (class) svm to bind...

Overlapping Clustering Models, and One (class) SVMto Bind Them All

Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti

Overlapping clustering model

P

=

= Γ Θ B ΘT Γ

n

K

−θTi −γi

ρ!"#$""%

&'$'(")"$*+,-*)"$%

("(."$*/0&*

12((-30)4%

03)"$1233"1)023*

I Largest element of B is 1 for identifiability

I For network models, data is adjacency matrix: Aij ∼ Bernoulli(Pij)I Refer as “DCMMSB-type models”

I Special cases:I ‖θi‖1 = 1, Degree-corrected Mixed Membership Stochastic Blockmodel (DCMMSB) (Airoldi et al., 2008; Jin et al.,

2017)I ‖θi‖2 = 1, Overlapping Continuous Community Assignment Model (OCCAM) (Zhang et al. 2014)I θi binary, may have multiple non-zero entries, Γ identity, Stochastic Blockmodel with Overlaps (SBMO) (Kaufmann

et al. 2016)

I For topic models (Blei et al. 2003), data is word count matrix for different documentscoming from multinomial sample of the populationI Word co-occurrence probability matrix matches the form of P in network models after normalizing

the rows of word-topic matrix

Related workMost algorithms (Mao et al., 2017a, 2017b; Jin et al., 2017; Panov et al., 2017;Rubin-Delanchy et al., 2017) use a two step method:

I Find the pure nodes by finding corners of a simplexI Estimate model parameters via regression

For example, (Mao et al., 2017b) shows that for MMSB,

I Rows vi of V (the top-K eigenvectors of P) lie on a simplex with corners as purenodes (set I)

I One can estimate community memberships by regressing vi on V(I, :)With degree parameters, rows of V are not in a simplex.

Eliminating the effects of degree parametersI Rows of V fall on a coneI Dividing each row by `2 norm, points will be on the surface of a sphere

I A hyperplane through the corners separates all the points from the originI Can find the hyperplane with a One-class SVM (Scholkopf et al, 2001), support vectors are the

corners

Ideal cone problemI Given a matrix Z that is known to be of the form Z = MYP ∈ Rn×m, where

I M ∈ Rn×K≥0 , no row of M is 0

I YP ∈ RK×m corresponds to K (unknown) rows of Z, each scaled to unit `2 norm

I Infer M

Solving the ideal cone problem with Z = V gives us the pure nodes, from whichcommunity memberships can be inferred.

One-class SVM solves ideal cone problemPrimal: Dual:

max b min1

2

∑i,j

βiβjyTi yj

s.t. ‖w‖ ≤ 1, wTyi ≥ b, i ∈ I s.t.∑i

βi = 1, βi ≥ 0, i ∈ I

I where yi = zi/‖zi‖I One-class SVM works if the following condition is satisfied:

ConditionThe matrix YP satisfies (YPYT

P )−11 > 0.

I Intuition: the SVM support vector plane touches all pure nodesICondition always holds for DCMMSB-type models

Empirical Cone ProblemI Given the empirical matrix Y with rows zTi /‖zi‖

I with Z = MYP ∈ Rn×m, andI maxi ‖eTi (Y −Y)‖ ≤ ε.

I Infer M

One-class SVM still works if the following condition is satisfied:ConditionThe matrix YP satisfies (YPYT

P )−11 ≥ η1 for some constant η > 0.

I η can be bounded for DCMMSB-type models and topic models

AlgorithmSVM-cone

I Normalize rows of Z by `2 norm

I Run one-class SVM to get supporting hyperplane

I Cluster all points close to this hyperplane

I Pick one point from each cluster to get near-corner set C

I M = ZYTC(YCYT

C)−1

Proof Sketch for Empirical Cone ProblemI Step 1: Show that SVM solution of empirical cone is nearly ideal, i.e., (w, b) ≈ (w, b)

I Step 2: Show that true corners of the cone are close to supporting hyper plane

I Step 3: Also, all points close to support vectors are nearly corners

I Step 4: Clustering the points that are close to support vectors yields exactly onecluster for each corner

Per-node ConsistencyI If ‖yi − yi‖ ≤ ε→ 0 for all i, using SVM-cone will get M consistently with ground

truth (‖mi − mi‖ → 0)

I Holds with high probability for both DCMMSB-type models and topic models

Per-node consistency guarantee for DCMMSB (informal)

If θi ∼ Dirichlet(α), α0 = αT1, under some conditions on α and Γ, w.h.p.,

‖eTi (Θ− ΘΠ)‖ = O

(γmaxK

2.5min{K2, (κ(P))2}ν2(1 + α0)2

γ5minηλ∗(B)√ρn

)Iλ∗(B): smallest singular value of B, controls the separation between clusters

I ν: controls how balanced entries of α are

ISimilar per-node consistency guarantees for all DCMMSB-type models,and per-word guarantees for topic models.

Simulation ExperimentsI Comparing with GeoNMF (Mao et al. 2017), OCCAM (Zhang et al. 2014) and SAAC

(Kaufmann et al. 2016)

(a) (b) (c) (d)(a) Varying degree heterogeneity on DCMMSB. (b) Varying sparsity on DCMMSB. (c) Varying sparsity

on OCCAM. (d) Varying sparsity on SBMO.

Real-world Network ResultsI Evaluation metric: Averaged Spearman rank correlation coefficients (RC) betweenΘ(:, a), a ∈ [K] and Θ(:, σ(a)), where σ is a permutation of [K].

RCavg(Θ,Θ) =1

Kmaxσ

K∑i=1

RC(Θ(:, i),Θ(:, σ(i))) ∈ [−1, 1].

I Co-authorship networks (assortative)

DBLP1 DBLP2 DBLP3 DBLP4 DBLP50.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

RC

avg

SVM-coneGeoNMF

SVIBSNMF

OCCAMSAAC

DBLP1 DBLP2 DBLP3 DBLP4 DBLP5

100

101

102

103

104

105

Run

ning

time/

s

SVM-coneGeoNMF

SVIBSNMF

OCCAMSAAC

I Author-paper bipartite networks (dissortative)

DBLP1 DBLP2 DBLP3 DBLP4 DBLP50.0

0.1

0.2

0.3

0.4

0.5

RC

avg

SVM-coneGeoNMF

SVIBSNMF

OCCAMSAAC

DBLP1 DBLP2 DBLP3 DBLP4 DBLP5

100

101

102

103

104

105

Run

ning

time/

s

SVM-coneGeoNMF

SVIBSNMF

OCCAMSAAC

(a) Rank correlation. (b) Running time (log scale) .

Topic Models ResultsI `1 reconstruction error and running time (log scale) for semi-synthetic data with

number of documents set to 60,000

NIPS NYT Pubmed 20NG0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

ℓ 1R

econ

stru

ctio

nE

rror

RecoverL2TSVD

GDMSVM-cone

NIPS NYT Pubmed 20NG0

100

200

300

400

500

600

700

800

Run

ning

time/

s

RecoverL2TSVD

GDMSVM-cone

{xmao@cs, purna.sarkar@austin, deepay@}.utexas.edu

overlapping clustering models, and one (class) svm to bind...

Documents