overlapping clustering models, and one (class) svm to bind...
TRANSCRIPT
Overlapping Clustering Models, and One (class) SVMto Bind Them All
Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti
Overlapping clustering model
P
=
= Γ Θ B ΘT Γ
n
K
−θTi −γi
ρ!"#$""%
&'$'(")"$*+,-*)"$%
("(."$*/0&*
12((-30)4%
03)"$1233"1)023*
I Largest element of B is 1 for identifiability
I For network models, data is adjacency matrix: Aij ∼ Bernoulli(Pij)I Refer as “DCMMSB-type models”
I Special cases:I ‖θi‖1 = 1, Degree-corrected Mixed Membership Stochastic Blockmodel (DCMMSB) (Airoldi et al., 2008; Jin et al.,
2017)I ‖θi‖2 = 1, Overlapping Continuous Community Assignment Model (OCCAM) (Zhang et al. 2014)I θi binary, may have multiple non-zero entries, Γ identity, Stochastic Blockmodel with Overlaps (SBMO) (Kaufmann
et al. 2016)
I For topic models (Blei et al. 2003), data is word count matrix for different documentscoming from multinomial sample of the populationI Word co-occurrence probability matrix matches the form of P in network models after normalizing
the rows of word-topic matrix
Related workMost algorithms (Mao et al., 2017a, 2017b; Jin et al., 2017; Panov et al., 2017;Rubin-Delanchy et al., 2017) use a two step method:
I Find the pure nodes by finding corners of a simplexI Estimate model parameters via regression
For example, (Mao et al., 2017b) shows that for MMSB,
I Rows vi of V (the top-K eigenvectors of P) lie on a simplex with corners as purenodes (set I)
I One can estimate community memberships by regressing vi on V(I, :)With degree parameters, rows of V are not in a simplex.
Eliminating the effects of degree parametersI Rows of V fall on a coneI Dividing each row by `2 norm, points will be on the surface of a sphere
I A hyperplane through the corners separates all the points from the originI Can find the hyperplane with a One-class SVM (Scholkopf et al, 2001), support vectors are the
corners
Ideal cone problemI Given a matrix Z that is known to be of the form Z = MYP ∈ Rn×m, where
I M ∈ Rn×K≥0 , no row of M is 0
I YP ∈ RK×m corresponds to K (unknown) rows of Z, each scaled to unit `2 norm
I Infer M
Solving the ideal cone problem with Z = V gives us the pure nodes, from whichcommunity memberships can be inferred.
One-class SVM solves ideal cone problemPrimal: Dual:
max b min1
2
∑i,j
βiβjyTi yj
s.t. ‖w‖ ≤ 1, wTyi ≥ b, i ∈ I s.t.∑i
βi = 1, βi ≥ 0, i ∈ I
I where yi = zi/‖zi‖I One-class SVM works if the following condition is satisfied:
ConditionThe matrix YP satisfies (YPYT
P )−11 > 0.
I Intuition: the SVM support vector plane touches all pure nodesICondition always holds for DCMMSB-type models
Empirical Cone ProblemI Given the empirical matrix Y with rows zTi /‖zi‖
I with Z = MYP ∈ Rn×m, andI maxi ‖eTi (Y −Y)‖ ≤ ε.
I Infer M
One-class SVM still works if the following condition is satisfied:ConditionThe matrix YP satisfies (YPYT
P )−11 ≥ η1 for some constant η > 0.
I η can be bounded for DCMMSB-type models and topic models
AlgorithmSVM-cone
I Normalize rows of Z by `2 norm
I Run one-class SVM to get supporting hyperplane
I Cluster all points close to this hyperplane
I Pick one point from each cluster to get near-corner set C
I M = ZYTC(YCYT
C)−1
Proof Sketch for Empirical Cone ProblemI Step 1: Show that SVM solution of empirical cone is nearly ideal, i.e., (w, b) ≈ (w, b)
I Step 2: Show that true corners of the cone are close to supporting hyper plane
I Step 3: Also, all points close to support vectors are nearly corners
I Step 4: Clustering the points that are close to support vectors yields exactly onecluster for each corner
Per-node ConsistencyI If ‖yi − yi‖ ≤ ε→ 0 for all i, using SVM-cone will get M consistently with ground
truth (‖mi − mi‖ → 0)
I Holds with high probability for both DCMMSB-type models and topic models
Per-node consistency guarantee for DCMMSB (informal)
If θi ∼ Dirichlet(α), α0 = αT1, under some conditions on α and Γ, w.h.p.,
‖eTi (Θ− ΘΠ)‖ = O
(γmaxK
2.5min{K2, (κ(P))2}ν2(1 + α0)2
γ5minηλ∗(B)√ρn
)Iλ∗(B): smallest singular value of B, controls the separation between clusters
I ν: controls how balanced entries of α are
ISimilar per-node consistency guarantees for all DCMMSB-type models,and per-word guarantees for topic models.
Simulation ExperimentsI Comparing with GeoNMF (Mao et al. 2017), OCCAM (Zhang et al. 2014) and SAAC
(Kaufmann et al. 2016)
(a) (b) (c) (d)(a) Varying degree heterogeneity on DCMMSB. (b) Varying sparsity on DCMMSB. (c) Varying sparsity
on OCCAM. (d) Varying sparsity on SBMO.
Real-world Network ResultsI Evaluation metric: Averaged Spearman rank correlation coefficients (RC) betweenΘ(:, a), a ∈ [K] and Θ(:, σ(a)), where σ is a permutation of [K].
RCavg(Θ,Θ) =1
Kmaxσ
K∑i=1
RC(Θ(:, i),Θ(:, σ(i))) ∈ [−1, 1].
I Co-authorship networks (assortative)
DBLP1 DBLP2 DBLP3 DBLP4 DBLP50.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
RC
avg
SVM-coneGeoNMF
SVIBSNMF
OCCAMSAAC
DBLP1 DBLP2 DBLP3 DBLP4 DBLP5
100
101
102
103
104
105
Run
ning
time/
s
SVM-coneGeoNMF
SVIBSNMF
OCCAMSAAC
I Author-paper bipartite networks (dissortative)
DBLP1 DBLP2 DBLP3 DBLP4 DBLP50.0
0.1
0.2
0.3
0.4
0.5
RC
avg
SVM-coneGeoNMF
SVIBSNMF
OCCAMSAAC
DBLP1 DBLP2 DBLP3 DBLP4 DBLP5
100
101
102
103
104
105
Run
ning
time/
s
SVM-coneGeoNMF
SVIBSNMF
OCCAMSAAC
(a) Rank correlation. (b) Running time (log scale) .
Topic Models ResultsI `1 reconstruction error and running time (log scale) for semi-synthetic data with
number of documents set to 60,000
NIPS NYT Pubmed 20NG0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
ℓ 1R
econ
stru
ctio
nE
rror
RecoverL2TSVD
GDMSVM-cone
NIPS NYT Pubmed 20NG0
100
200
300
400
500
600
700
800
Run
ning
time/
s
RecoverL2TSVD
GDMSVM-cone
{xmao@cs, purna.sarkar@austin, deepay@}.utexas.edu