overlapping clustering models, and one (class) svm to bind...

1
Overlapping Clustering Models, and One (class) SVM to Bind Them All Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti Overlapping clustering model P = = Γ Θ B Θ T Γ n K -θ T i - γ i ρ !"#$"" &’$’(")"$* +,-*)"$ ("(."$*/0&* 12((-30)4 03)"$1233"1)023* I Largest element of B is 1 for identifiability I For network models, data is adjacency matrix: A ij Bernoulli(P ij ) I Refer as “DCMMSB-type models” I Special cases: I kθ i k 1 =1, Degree-corrected Mixed Membership Stochastic Blockmodel (DCMMSB) (Airoldi et al., 2008; Jin et al., 2017) I kθ i k 2 =1, Overlapping Continuous Community Assignment Model (OCCAM) (Zhang et al. 2014) I θ i binary, may have multiple non-zero entries, Γ identity, Stochastic Blockmodel with Overlaps (SBMO) (Kaufmann et al. 2016) I For topic models (Blei et al. 2003), data is word count matrix for different documents coming from multinomial sample of the population I Word co-occurrence probability matrix matches the form of P in network models after normalizing the rows of word-topic matrix Related work Most algorithms (Mao et al., 2017a, 2017b; Jin et al., 2017; Panov et al., 2017; Rubin-Delanchy et al., 2017) use a two step method: I Find the pure nodes by finding corners of a simplex I Estimate model parameters via regression For example, (Mao et al., 2017b) shows that for MMSB, I Rows v i of V (the top-K eigenvectors of P) lie on a simplex with corners as pure nodes (set I ) I One can estimate community memberships by regressing v i on V(I , :) With degree parameters, rows of V are not in a simplex. Eliminating the effects of degree parameters I Rows of V fall on a cone I Dividing each row by 2 norm, points will be on the surface of a sphere I A hyperplane through the corners separates all the points from the origin I Can find the hyperplane with a One-class SVM (Sch¨olkopf et al, 2001), support vectors are the corners Ideal cone problem I Given a matrix Z that is known to be of the form Z = MY P R n×m , where I M R n×K 0 , no row of M is 0 I Y P R K ×m corresponds to K (unknown) rows of Z, each scaled to unit 2 norm I Infer M Solving the ideal cone problem with Z = V gives us the pure nodes, from which community memberships can be inferred. One-class SVM solves ideal cone problem Primal: Dual: max b min 1 2 X i,j β i β j y T i y j s.t. kwk≤ 1, w T y i b, i ∈I s.t. X i β i =1i 0,i ∈I I where y i = z i /kz i k I One-class SVM works if the following condition is satisfied: Condition The matrix Y P satisfies (Y P Y T P ) -1 1 > 0. I Intuition: the SVM support vector plane touches all pure nodes I Condition always holds for DCMMSB-type models Empirical Cone Problem I Given the empirical matrix ˆ Y with rows ˆ z T i /k ˆ z i k I with Z = MY P R n×m , and I max i ke T i ( ˆ Y - Y)k≤ . I Infer ˆ M One-class SVM still works if the following condition is satisfied: Condition The matrix Y P satisfies (Y P Y T P ) -1 1 η 1 for some constant η> 0. I η can be bounded for DCMMSB-type models and topic models Algorithm SVM-cone I Normalize rows of ˆ Z by 2 norm I Run one-class SVM to get supporting hyperplane I Cluster all points close to this hyperplane I Pick one point from each cluster to get near-corner set C I ˆ M = ˆ Z ˆ Y T C ( ˆ Y C ˆ Y T C ) -1 Proof Sketch for Empirical Cone Problem I Step 1: Show that SVM solution of empirical cone is nearly ideal, i.e., w, ˆ b) (w,b) I Step 2: Show that true corners of the cone are close to supporting hyper plane I Step 3: Also, all points close to support vectors are nearly corners I Step 4: Clustering the points that are close to support vectors yields exactly one cluster for each corner Per-node Consistency I If ky i - ˆ y i k≤ 0 for all i, using SVM-cone will get M consistently with ground truth (km i - ˆ m i k→ 0) I Holds with high probability for both DCMMSB-type models and topic models Per-node consistency guarantee for DCMMSB (informal) If θ i Dirichlet(α), α 0 = α T 1, under some conditions on α and Γ, w.h.p., ke T i (Θ - ˆ ΘΠ)k = ˜ O γ max K 2.5 min{K 2 , (κ(P)) 2 }ν 2 (1 + α 0 ) 2 γ 5 min ηλ * (B) ρn I λ * (B): smallest singular value of B, controls the separation between clusters I ν : controls how balanced entries of α are I Similar per-node consistency guarantees for all DCMMSB-type models, and per-word guarantees for topic models. Simulation Experiments I Comparing with GeoNMF (Mao et al. 2017), OCCAM (Zhang et al. 2014) and SAAC (Kaufmann et al. 2016) (a) (b) (c) (d) (a) Varying degree heterogeneity on DCMMSB. (b) Varying sparsity on DCMMSB. (c) Varying sparsity on OCCAM. (d) Varying sparsity on SBMO. Real-world Network Results I Evaluation metric: Averaged Spearman rank correlation coefficients (RC) between Θ(:,a), a [K ] and ˆ Θ(:(a)), where σ is a permutation of [K ]. RC avg ( ˆ Θ, Θ)= 1 K max σ K X i=1 RC( ˆ Θ(:,i), Θ(:(i))) [-1, 1]. I Co-authorship networks (assortative) DBLP1 DBLP2 DBLP3 DBLP4 DBLP5 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 RC avg SVM-cone GeoNMF SVI BSNMF OCCAM SAAC DBLP1 DBLP2 DBLP3 DBLP4 DBLP5 10 0 10 1 10 2 10 3 10 4 10 5 Running time /s SVM-cone GeoNMF SVI BSNMF OCCAM SAAC I Author-paper bipartite networks (dissortative) DBLP1 DBLP2 DBLP3 DBLP4 DBLP5 0.0 0.1 0.2 0.3 0.4 0.5 RC avg SVM-cone GeoNMF SVI BSNMF OCCAM SAAC DBLP1 DBLP2 DBLP3 DBLP4 DBLP5 10 0 10 1 10 2 10 3 10 4 10 5 Running time /s SVM-cone GeoNMF SVI BSNMF OCCAM SAAC (a) Rank correlation. (b) Running time (log scale) . Topic Models Results I 1 reconstruction error and running time (log scale) for semi-synthetic data with number of documents set to 60,000 NIPS NYT Pubmed 20NG 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 1 Reconstruction Error RecoverL2 TSVD GDM SVM-cone NIPS NYT Pubmed 20NG 0 100 200 300 400 500 600 700 800 Running time /s RecoverL2 TSVD GDM SVM-cone {xmao@cs, purna.sarkar@austin, deepay@}.utexas.edu

Upload: others

Post on 25-Dec-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Overlapping Clustering Models, and One (class) SVM to Bind ...xmao/posters/svm_cone_poster.pdfOverlapping Clustering Models, and One (class) SVM to Bind Them All Xueyu Mao, Purnamrita

Overlapping Clustering Models, and One (class) SVMto Bind Them All

Xueyu Mao, Purnamrita Sarkar, Deepayan Chakrabarti

Overlapping clustering model

P

=

= Γ Θ B ΘT Γ

n

K

−θTi −γi

ρ!"#$""%

&'$'(")"$*+,-*)"$%

("(."$*/0&*

12((-30)4%

03)"$1233"1)023*

I Largest element of B is 1 for identifiability

I For network models, data is adjacency matrix: Aij ∼ Bernoulli(Pij)I Refer as “DCMMSB-type models”

I Special cases:I ‖θi‖1 = 1, Degree-corrected Mixed Membership Stochastic Blockmodel (DCMMSB) (Airoldi et al., 2008; Jin et al.,

2017)I ‖θi‖2 = 1, Overlapping Continuous Community Assignment Model (OCCAM) (Zhang et al. 2014)I θi binary, may have multiple non-zero entries, Γ identity, Stochastic Blockmodel with Overlaps (SBMO) (Kaufmann

et al. 2016)

I For topic models (Blei et al. 2003), data is word count matrix for different documentscoming from multinomial sample of the populationI Word co-occurrence probability matrix matches the form of P in network models after normalizing

the rows of word-topic matrix

Related workMost algorithms (Mao et al., 2017a, 2017b; Jin et al., 2017; Panov et al., 2017;Rubin-Delanchy et al., 2017) use a two step method:

I Find the pure nodes by finding corners of a simplexI Estimate model parameters via regression

For example, (Mao et al., 2017b) shows that for MMSB,

I Rows vi of V (the top-K eigenvectors of P) lie on a simplex with corners as purenodes (set I)

I One can estimate community memberships by regressing vi on V(I, :)With degree parameters, rows of V are not in a simplex.

Eliminating the effects of degree parametersI Rows of V fall on a coneI Dividing each row by `2 norm, points will be on the surface of a sphere

I A hyperplane through the corners separates all the points from the originI Can find the hyperplane with a One-class SVM (Scholkopf et al, 2001), support vectors are the

corners

Ideal cone problemI Given a matrix Z that is known to be of the form Z = MYP ∈ Rn×m, where

I M ∈ Rn×K≥0 , no row of M is 0

I YP ∈ RK×m corresponds to K (unknown) rows of Z, each scaled to unit `2 norm

I Infer M

Solving the ideal cone problem with Z = V gives us the pure nodes, from whichcommunity memberships can be inferred.

One-class SVM solves ideal cone problemPrimal: Dual:

max b min1

2

∑i,j

βiβjyTi yj

s.t. ‖w‖ ≤ 1, wTyi ≥ b, i ∈ I s.t.∑i

βi = 1, βi ≥ 0, i ∈ I

I where yi = zi/‖zi‖I One-class SVM works if the following condition is satisfied:

ConditionThe matrix YP satisfies (YPYT

P )−11 > 0.

I Intuition: the SVM support vector plane touches all pure nodesICondition always holds for DCMMSB-type models

Empirical Cone ProblemI Given the empirical matrix Y with rows zTi /‖zi‖

I with Z = MYP ∈ Rn×m, andI maxi ‖eTi (Y −Y)‖ ≤ ε.

I Infer M

One-class SVM still works if the following condition is satisfied:ConditionThe matrix YP satisfies (YPYT

P )−11 ≥ η1 for some constant η > 0.

I η can be bounded for DCMMSB-type models and topic models

AlgorithmSVM-cone

I Normalize rows of Z by `2 norm

I Run one-class SVM to get supporting hyperplane

I Cluster all points close to this hyperplane

I Pick one point from each cluster to get near-corner set C

I M = ZYTC(YCYT

C)−1

Proof Sketch for Empirical Cone ProblemI Step 1: Show that SVM solution of empirical cone is nearly ideal, i.e., (w, b) ≈ (w, b)

I Step 2: Show that true corners of the cone are close to supporting hyper plane

I Step 3: Also, all points close to support vectors are nearly corners

I Step 4: Clustering the points that are close to support vectors yields exactly onecluster for each corner

Per-node ConsistencyI If ‖yi − yi‖ ≤ ε→ 0 for all i, using SVM-cone will get M consistently with ground

truth (‖mi − mi‖ → 0)

I Holds with high probability for both DCMMSB-type models and topic models

Per-node consistency guarantee for DCMMSB (informal)

If θi ∼ Dirichlet(α), α0 = αT1, under some conditions on α and Γ, w.h.p.,

‖eTi (Θ− ΘΠ)‖ = O

(γmaxK

2.5min{K2, (κ(P))2}ν2(1 + α0)2

γ5minηλ∗(B)√ρn

)Iλ∗(B): smallest singular value of B, controls the separation between clusters

I ν: controls how balanced entries of α are

ISimilar per-node consistency guarantees for all DCMMSB-type models,and per-word guarantees for topic models.

Simulation ExperimentsI Comparing with GeoNMF (Mao et al. 2017), OCCAM (Zhang et al. 2014) and SAAC

(Kaufmann et al. 2016)

(a) (b) (c) (d)(a) Varying degree heterogeneity on DCMMSB. (b) Varying sparsity on DCMMSB. (c) Varying sparsity

on OCCAM. (d) Varying sparsity on SBMO.

Real-world Network ResultsI Evaluation metric: Averaged Spearman rank correlation coefficients (RC) betweenΘ(:, a), a ∈ [K] and Θ(:, σ(a)), where σ is a permutation of [K].

RCavg(Θ,Θ) =1

Kmaxσ

K∑i=1

RC(Θ(:, i),Θ(:, σ(i))) ∈ [−1, 1].

I Co-authorship networks (assortative)

DBLP1 DBLP2 DBLP3 DBLP4 DBLP50.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

RC

avg

SVM-coneGeoNMF

SVIBSNMF

OCCAMSAAC

DBLP1 DBLP2 DBLP3 DBLP4 DBLP5

100

101

102

103

104

105

Run

ning

time/

s

SVM-coneGeoNMF

SVIBSNMF

OCCAMSAAC

I Author-paper bipartite networks (dissortative)

DBLP1 DBLP2 DBLP3 DBLP4 DBLP50.0

0.1

0.2

0.3

0.4

0.5

RC

avg

SVM-coneGeoNMF

SVIBSNMF

OCCAMSAAC

DBLP1 DBLP2 DBLP3 DBLP4 DBLP5

100

101

102

103

104

105

Run

ning

time/

s

SVM-coneGeoNMF

SVIBSNMF

OCCAMSAAC

(a) Rank correlation. (b) Running time (log scale) .

Topic Models ResultsI `1 reconstruction error and running time (log scale) for semi-synthetic data with

number of documents set to 60,000

NIPS NYT Pubmed 20NG0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

ℓ 1R

econ

stru

ctio

nE

rror

RecoverL2TSVD

GDMSVM-cone

NIPS NYT Pubmed 20NG0

100

200

300

400

500

600

700

800

Run

ning

time/

s

RecoverL2TSVD

GDMSVM-cone

{xmao@cs, purna.sarkar@austin, deepay@}.utexas.edu