vertex neighborhoods, low conductance cuts, and good seeds for local community methods
DESCRIPTION
My talk from KDD2012 about vertex neighborhoods and low conductance cuts. See the paper here: http://arxiv.org/abs/1112.0031 and http://dl.acm.org/citation.cfm?id=2339628TRANSCRIPT
Vertex Neighborhoods, !Low Conductance Cuts, !and Good Seeds for Local Community Methods
DAVID F. GLEICH PURDUE
C. SESHADHRI SANDIA - LIVERMORE
KDD2012 David Gleich · Purdue
Neighborhoods are good communities
Neighborhoods are good communities ^
conductance
^
Vertex
Neighborhoods are good communities ^
conductance
^
A Vertex
(4-4𝜅)/(3-2𝜅)
Neighborhoods are good communities ^
conductance
^
A Vertex
(4-4𝜅)/(3-2𝜅) where 𝜅 is the clustering
coefficient and
the graph has a heavy tailed degree distribution
Neighborhoods are good communities ^
conductance
^
A Vertex
(4-4𝜅)/(3-2𝜅) where 𝜅 is the clustering
coefficient and
the graph has a heavy tailed degree distribution
is a y
A vertex neighborhood is a “good” conductance community in a graph with a heavy-tailed degree distribution and large clustering coefficient.
Our contributions
1. The previous theorem and its proof. This shows that good communities are expected and easy to find in modern networks with heavy-tailed degrees and large clustering.
2. An empirical evaluation of neighborhood communities that shows vertex neighborhoods are the “backbone” of the network community profile.
KDD2012 David Gleich · Purdue
Formal background for the theorem 1. Vertex neighborhoods 2. Low conductance cuts 3. Clustering coefficients
KDD2012 David Gleich · Purdue
Vertex neighborhoods The set of a vertex and"all its neighborhood Also called an “egonet” Prior research on egonets of social networks from the “structural holes” perspective [Burt95,Kleinberg08]. Used for anomaly detection [Akoglu10], "community seeds [Huang11,Schaeffer11], "overlapping communities [Schaeffer07,Rees10].
KDD2012 David Gleich · Purdue
Conductance communities Conductance is one of the most important community scores [Schaeffer07] The conductance of a set of vertices is the ratio of edges leaving to total edges: Equivalently, it’s the probability that a random edge leaves the set. Small conductance ó Good community
�(S) =
cut(S)
min
�vol(S), vol(
¯S)
�(edges leaving the set)
(total edges in the set)
KDD2012 David Gleich · Purdue
cut(S) = 7
vol(S) = 33
vol(
¯S) = 11
�(S) = 7/11
Clustering coefficients Wedge Global clustering coefficient
=
number of closed wedges
number of wedges
center of wedge
closed wedge
Probability that a random wedge is closed
KDD2012 David Gleich · Purdue
Simple version of theorem
If global clustering coefficient = 1, then "the graph is a disjoint union of cliques. Vertex neighborhoods are optimal communities!
KDD2012 David Gleich · Purdue
Theorem Condition Let graph G have clustering coefficient 𝜅 and "have vertex degrees bounded "by a power-law function with exponent 𝛾 less than 3. Theorem Then there exists a vertex neighborhood with conductance
log degree
log
prob
abilit
y ↵1n/d�
↵2n/d�
4(1 � )/(3 � 2)
KDD2012 David Gleich · Purdue
Proof Sketch 1) Large clustering coefficient "⇒ many wedges are closed 2) Heavy tailed degree dist "⇒ a few vertices have a very large degree 3) Large degree ⇒ O(d 2) wedges ⇒ “most” of wedges Thus, there must exist a vertex with a high edge density ⇒ “good” conductance Use the probabilistic method to formalize
100 101 102 103 1040
0.2
0.4
0.6
0.8
1
CD
F of
Num
ber o
f Wed
ges
Degree
KDD2012 David Gleich · Purdue
Confession!The theory is weak
�(S) 4(1 � )/(3 � 2)
Collaboration networks "𝜅 ~ [0.1 – 0.5]
Social networks "𝜅 ~ [0.05 – 0.1]
Graph Verts Edges Avg.
Deg.
Max
Deg.
¯C
ca-AstroPh 17903 196972 22.0 504 0.318 0.633
email-Enron 33696 180811 10.7 1383 0.085 0.509
cond-mat-2005 36458 171735 9.4 278 0.243 0.657
arxiv 86376 517563 12.0 1253 0.560 0.678
dblp 226413 716460 6.3 238 0.383 0.635
hollywood-2009 1069126 56306653 105.3 11467 0.310 0.766
fb-Penn94 41536 1362220 65.6 4410 0.098 0.212
fb-A-oneyear 1138557 4404989 7.7 695 0.038 0.060
fb-A 3097165 23667394 15.3 4915 0.048 0.097
soc-LiveJournal1 4843953 42845684 17.7 20333 0.118 0.274
oregon2-010526 11461 32730 5.7 2432 0.037 0.352
p2p-Gnutella25 22663 54693 4.8 66 0.005 0.005
as-22july06 22963 48436 4.2 2390 0.011 0.230
itdk0304 190914 607610 6.4 1071 0.061 0.158
Graph Verts Edges Avg.
Deg.
Max
Deg.
¯C
ca-AstroPh 17903 196972 22.0 504 0.318 0.633
email-Enron 33696 180811 10.7 1383 0.085 0.509
cond-mat-2005 36458 171735 9.4 278 0.243 0.657
arxiv 86376 517563 12.0 1253 0.560 0.678
dblp 226413 716460 6.3 238 0.383 0.635
hollywood-2009 1069126 56306653 105.3 11467 0.310 0.766
fb-Penn94 41536 1362220 65.6 4410 0.098 0.212
fb-A-oneyear 1138557 4404989 7.7 695 0.038 0.060
fb-A 3097165 23667394 15.3 4915 0.048 0.097
soc-LiveJournal1 4843953 42845684 17.7 20333 0.118 0.274
oregon2-010526 11461 32730 5.7 2432 0.037 0.352
p2p-Gnutella25 22663 54693 4.8 66 0.005 0.005
as-22july06 22963 48436 4.2 2390 0.011 0.230
itdk0304 190914 607610 6.4 1071 0.061 0.158
Tech. networks "𝜅 ~ [0.005 – 0.05]
This bound is useless unless 𝜅 ≥ 1/2
KDD2012 David Gleich · Purdue
We view this theory as "“intuition for the truth”
KDD2012 David Gleich · Purdue
Empirical Evaluation using Network Community Profiles
large global clustering coe�cients and large mean clusteringcoe�cients.
Social networks. The nodes are people again, and theedges are either explicit “friend” relationships (fb-Penn94 [36],fb-A [39], soc-LiveJournal [4]) or observed network activityover edges in a one-year span (fb-A-oneyear [39]).
Technological networks. The nodes act in a commu-nication network either as agents (p2p-Gnutella25 [27]) or asrouters (oregon2 [23], as-22july06 [29], itdk0304 [10]). Theedges are observed communications between the nodes.
Web graphs. The nodes are web-pages, and the edgesare symmetrized links between the pages [25].
6. EMPIRICAL NEIGHBORHOODCOMMUNITIES
6.1 ComputationWe first show that we can adapt any procedure to compute
all local clustering coe�cients to compute the conductancescores for each neighborhood in the graph. Most of the workto compute a local clustering coe�cient is performed whenfinding the number of triangles at the vertex. We can expressthe number of triangles with v as:
edges(N1
(v) \ {v})/2
because each edge among v’s neighbors produces a triangle(recall that the edges function double-counts). Note alsothat edges(N
1
(v) \ {v})/2 = edges(N1
(v))/2 � dv. Thencut(N
1
(v)) = vol(N1
(v)) � edges(N1
(v)). And so, giventhe number of triangles, we can compute the cut given thevolume of the neighborhood as well. This is easy to do withany graph structure that explicitly stores the degrees.
6.2 Quality of neighborhood communitiesWe use Leskovec et al.’s [24] network community plot to
show the information on all neighborhood communities si-multaneously. These plots will help us understand if theneighborhood communities are high quality (low conduc-tance), and how they compare to other community detectionmethods. Given the conductance scores from all the neigh-borhood communities and their size in terms of number ofvertices, we first identify the best community at each size.The network community plot shows the relationship betweenbest community conductance and community size on a log-log scale. In Leskovec et al., they found that these plots hada characteristic shape for modern information networks: aninitial sharp decrease until the community size is between100 and 1000, then a considerable rise in the conductancescores for larger communities. In our case, neighborhoodcommunities cannot be any larger than the maximum degreeplus one, and so we mark this point on the figures. We alwayslook at the smaller side of the cut, so no community canbe larger than half the vertices of the graph. We also markthis location on the plots. Each subsequent figure in thispaper utilizes this size-vs-conductance plot, and we will con-tinually layer information from new methods above resultsfrom old methods. The result are information-dense plotsthat need slightly more study than would be ideal, however,we point out the salient features in each plot in the text.Note also that we deliberately attempt to preserve the axeslimits across figures to promote comparisons. However, somefigures have di↵erent axis limits to exhibit the range of data.
web-Google itdk0304
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
fb-A-oneyear arxiv
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
soc-LiveJournal1 ca-AstroPh
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
Number of vertices in cluster Number of vertices in cluster
Figure 2: The best neighborhood community con-
ductance at each size (black) and the Fiedler com-
munity (red). (Note the axis limits on ca-AstroPh).
First, we show these network community plots for six ofthe networks in Figure 2. These figures are representative ofthe best and worst of our results. Plots for other graphs areavailable on the website given in the introduction.
The three graphs on the left show cases where a neighbor-hood community is or is nearby the best Fiedler community(the red circle). The three graphs on the right highlightinstances where the Fiedler community is much better thanany neighborhood community. We find it mildly surprisingthat these neighborhood communities can be as good asthe Fiedler community. The structure of the plot for bothfb-A-oneyear and soc-LiveJournal1 is instructive. Neighbor-hoods of the highest degree vertices are not community-like– suggesting that these nodes are somehow exceptional. Infact, by inspection of these communities, many of them arenearly a star graph. However, a few of the large degree nodesdefine strikingly good communities (these are sets with a fewhundred vertices with conductance scores of around 10�2).This evidence concurs with the intuition from Theorem 4.6.
6.3 Comparison to PPR communitiesNote that these plots show the same shape as observed
by Leskovec et al. [24]. Consequently, in the next set offigures, and in the remainder of the empirical investigation,
Community Size
Minimum conductance for
any community of the given size
Canonical shape found by Leskovec, Lang, Dasgupta, and Mahoney Holds for a variety of approximations to conductance.
KDD2012 David Gleich · Purdue
Empirical Evaluation using Network Community Profiles
large global clustering coe�cients and large mean clusteringcoe�cients.
Social networks. The nodes are people again, and theedges are either explicit “friend” relationships (fb-Penn94 [36],fb-A [39], soc-LiveJournal [4]) or observed network activityover edges in a one-year span (fb-A-oneyear [39]).
Technological networks. The nodes act in a commu-nication network either as agents (p2p-Gnutella25 [27]) or asrouters (oregon2 [23], as-22july06 [29], itdk0304 [10]). Theedges are observed communications between the nodes.
Web graphs. The nodes are web-pages, and the edgesare symmetrized links between the pages [25].
6. EMPIRICAL NEIGHBORHOODCOMMUNITIES
6.1 ComputationWe first show that we can adapt any procedure to compute
all local clustering coe�cients to compute the conductancescores for each neighborhood in the graph. Most of the workto compute a local clustering coe�cient is performed whenfinding the number of triangles at the vertex. We can expressthe number of triangles with v as:
edges(N1
(v) \ {v})/2
because each edge among v’s neighbors produces a triangle(recall that the edges function double-counts). Note alsothat edges(N
1
(v) \ {v})/2 = edges(N1
(v))/2 � dv. Thencut(N
1
(v)) = vol(N1
(v)) � edges(N1
(v)). And so, giventhe number of triangles, we can compute the cut given thevolume of the neighborhood as well. This is easy to do withany graph structure that explicitly stores the degrees.
6.2 Quality of neighborhood communitiesWe use Leskovec et al.’s [24] network community plot to
show the information on all neighborhood communities si-multaneously. These plots will help us understand if theneighborhood communities are high quality (low conduc-tance), and how they compare to other community detectionmethods. Given the conductance scores from all the neigh-borhood communities and their size in terms of number ofvertices, we first identify the best community at each size.The network community plot shows the relationship betweenbest community conductance and community size on a log-log scale. In Leskovec et al., they found that these plots hada characteristic shape for modern information networks: aninitial sharp decrease until the community size is between100 and 1000, then a considerable rise in the conductancescores for larger communities. In our case, neighborhoodcommunities cannot be any larger than the maximum degreeplus one, and so we mark this point on the figures. We alwayslook at the smaller side of the cut, so no community canbe larger than half the vertices of the graph. We also markthis location on the plots. Each subsequent figure in thispaper utilizes this size-vs-conductance plot, and we will con-tinually layer information from new methods above resultsfrom old methods. The result are information-dense plotsthat need slightly more study than would be ideal, however,we point out the salient features in each plot in the text.Note also that we deliberately attempt to preserve the axeslimits across figures to promote comparisons. However, somefigures have di↵erent axis limits to exhibit the range of data.
web-Google itdk0304
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
fb-A-oneyear arxiv
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
soc-LiveJournal1 ca-AstroPh
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
Number of vertices in cluster Number of vertices in cluster
Figure 2: The best neighborhood community con-
ductance at each size (black) and the Fiedler com-
munity (red). (Note the axis limits on ca-AstroPh).
First, we show these network community plots for six ofthe networks in Figure 2. These figures are representative ofthe best and worst of our results. Plots for other graphs areavailable on the website given in the introduction.
The three graphs on the left show cases where a neighbor-hood community is or is nearby the best Fiedler community(the red circle). The three graphs on the right highlightinstances where the Fiedler community is much better thanany neighborhood community. We find it mildly surprisingthat these neighborhood communities can be as good asthe Fiedler community. The structure of the plot for bothfb-A-oneyear and soc-LiveJournal1 is instructive. Neighbor-hoods of the highest degree vertices are not community-like– suggesting that these nodes are somehow exceptional. Infact, by inspection of these communities, many of them arenearly a star graph. However, a few of the large degree nodesdefine strikingly good communities (these are sets with a fewhundred vertices with conductance scores of around 10�2).This evidence concurs with the intuition from Theorem 4.6.
6.3 Comparison to PPR communitiesNote that these plots show the same shape as observed
by Leskovec et al. [24]. Consequently, in the next set offigures, and in the remainder of the empirical investigation,
Community Size"(Degree + 1)
Minimum conductance for any community
neighborhood of the given size
“Egonet community profile” shows the same shape, 3 secs to compute.
1.1M verts, 4M edges
The Fiedler community computed from the normalized Laplacian is a neighborhood!
KDD2012 David Gleich · Purdue
Facebook data from Wilson et al. 2009
Not just one graph
large global clustering coe�cients and large mean clusteringcoe�cients.
Social networks. The nodes are people again, and theedges are either explicit “friend” relationships (fb-Penn94 [36],fb-A [39], soc-LiveJournal [4]) or observed network activityover edges in a one-year span (fb-A-oneyear [39]).
Technological networks. The nodes act in a commu-nication network either as agents (p2p-Gnutella25 [27]) or asrouters (oregon2 [23], as-22july06 [29], itdk0304 [10]). Theedges are observed communications between the nodes.
Web graphs. The nodes are web-pages, and the edgesare symmetrized links between the pages [25].
6. EMPIRICAL NEIGHBORHOODCOMMUNITIES
6.1 ComputationWe first show that we can adapt any procedure to compute
all local clustering coe�cients to compute the conductancescores for each neighborhood in the graph. Most of the workto compute a local clustering coe�cient is performed whenfinding the number of triangles at the vertex. We can expressthe number of triangles with v as:
edges(N1
(v) \ {v})/2
because each edge among v’s neighbors produces a triangle(recall that the edges function double-counts). Note alsothat edges(N
1
(v) \ {v})/2 = edges(N1
(v))/2 � dv. Thencut(N
1
(v)) = vol(N1
(v)) � edges(N1
(v)). And so, giventhe number of triangles, we can compute the cut given thevolume of the neighborhood as well. This is easy to do withany graph structure that explicitly stores the degrees.
6.2 Quality of neighborhood communitiesWe use Leskovec et al.’s [24] network community plot to
show the information on all neighborhood communities si-multaneously. These plots will help us understand if theneighborhood communities are high quality (low conduc-tance), and how they compare to other community detectionmethods. Given the conductance scores from all the neigh-borhood communities and their size in terms of number ofvertices, we first identify the best community at each size.The network community plot shows the relationship betweenbest community conductance and community size on a log-log scale. In Leskovec et al., they found that these plots hada characteristic shape for modern information networks: aninitial sharp decrease until the community size is between100 and 1000, then a considerable rise in the conductancescores for larger communities. In our case, neighborhoodcommunities cannot be any larger than the maximum degreeplus one, and so we mark this point on the figures. We alwayslook at the smaller side of the cut, so no community canbe larger than half the vertices of the graph. We also markthis location on the plots. Each subsequent figure in thispaper utilizes this size-vs-conductance plot, and we will con-tinually layer information from new methods above resultsfrom old methods. The result are information-dense plotsthat need slightly more study than would be ideal, however,we point out the salient features in each plot in the text.Note also that we deliberately attempt to preserve the axeslimits across figures to promote comparisons. However, somefigures have di↵erent axis limits to exhibit the range of data.
web-Google itdk0304
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
fb-A-oneyear arxiv
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
soc-LiveJournal1 ca-AstroPh
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
Number of vertices in cluster Number of vertices in cluster
Figure 2: The best neighborhood community con-
ductance at each size (black) and the Fiedler com-
munity (red). (Note the axis limits on ca-AstroPh).
First, we show these network community plots for six ofthe networks in Figure 2. These figures are representative ofthe best and worst of our results. Plots for other graphs areavailable on the website given in the introduction.
The three graphs on the left show cases where a neighbor-hood community is or is nearby the best Fiedler community(the red circle). The three graphs on the right highlightinstances where the Fiedler community is much better thanany neighborhood community. We find it mildly surprisingthat these neighborhood communities can be as good asthe Fiedler community. The structure of the plot for bothfb-A-oneyear and soc-LiveJournal1 is instructive. Neighbor-hoods of the highest degree vertices are not community-like– suggesting that these nodes are somehow exceptional. Infact, by inspection of these communities, many of them arenearly a star graph. However, a few of the large degree nodesdefine strikingly good communities (these are sets with a fewhundred vertices with conductance scores of around 10�2).This evidence concurs with the intuition from Theorem 4.6.
6.3 Comparison to PPR communitiesNote that these plots show the same shape as observed
by Leskovec et al. [24]. Consequently, in the next set offigures, and in the remainder of the empirical investigation,
large global clustering coe�cients and large mean clusteringcoe�cients.
Social networks. The nodes are people again, and theedges are either explicit “friend” relationships (fb-Penn94 [36],fb-A [39], soc-LiveJournal [4]) or observed network activityover edges in a one-year span (fb-A-oneyear [39]).
Technological networks. The nodes act in a commu-nication network either as agents (p2p-Gnutella25 [27]) or asrouters (oregon2 [23], as-22july06 [29], itdk0304 [10]). Theedges are observed communications between the nodes.
Web graphs. The nodes are web-pages, and the edgesare symmetrized links between the pages [25].
6. EMPIRICAL NEIGHBORHOODCOMMUNITIES
6.1 ComputationWe first show that we can adapt any procedure to compute
all local clustering coe�cients to compute the conductancescores for each neighborhood in the graph. Most of the workto compute a local clustering coe�cient is performed whenfinding the number of triangles at the vertex. We can expressthe number of triangles with v as:
edges(N1
(v) \ {v})/2
because each edge among v’s neighbors produces a triangle(recall that the edges function double-counts). Note alsothat edges(N
1
(v) \ {v})/2 = edges(N1
(v))/2 � dv. Thencut(N
1
(v)) = vol(N1
(v)) � edges(N1
(v)). And so, giventhe number of triangles, we can compute the cut given thevolume of the neighborhood as well. This is easy to do withany graph structure that explicitly stores the degrees.
6.2 Quality of neighborhood communitiesWe use Leskovec et al.’s [24] network community plot to
show the information on all neighborhood communities si-multaneously. These plots will help us understand if theneighborhood communities are high quality (low conduc-tance), and how they compare to other community detectionmethods. Given the conductance scores from all the neigh-borhood communities and their size in terms of number ofvertices, we first identify the best community at each size.The network community plot shows the relationship betweenbest community conductance and community size on a log-log scale. In Leskovec et al., they found that these plots hada characteristic shape for modern information networks: aninitial sharp decrease until the community size is between100 and 1000, then a considerable rise in the conductancescores for larger communities. In our case, neighborhoodcommunities cannot be any larger than the maximum degreeplus one, and so we mark this point on the figures. We alwayslook at the smaller side of the cut, so no community canbe larger than half the vertices of the graph. We also markthis location on the plots. Each subsequent figure in thispaper utilizes this size-vs-conductance plot, and we will con-tinually layer information from new methods above resultsfrom old methods. The result are information-dense plotsthat need slightly more study than would be ideal, however,we point out the salient features in each plot in the text.Note also that we deliberately attempt to preserve the axeslimits across figures to promote comparisons. However, somefigures have di↵erent axis limits to exhibit the range of data.
web-Google itdk0304
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
fb-A-oneyear arxiv
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
soc-LiveJournal1 ca-AstroPh
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
Number of vertices in cluster Number of vertices in cluster
Figure 2: The best neighborhood community con-
ductance at each size (black) and the Fiedler com-
munity (red). (Note the axis limits on ca-AstroPh).
First, we show these network community plots for six ofthe networks in Figure 2. These figures are representative ofthe best and worst of our results. Plots for other graphs areavailable on the website given in the introduction.
The three graphs on the left show cases where a neighbor-hood community is or is nearby the best Fiedler community(the red circle). The three graphs on the right highlightinstances where the Fiedler community is much better thanany neighborhood community. We find it mildly surprisingthat these neighborhood communities can be as good asthe Fiedler community. The structure of the plot for bothfb-A-oneyear and soc-LiveJournal1 is instructive. Neighbor-hoods of the highest degree vertices are not community-like– suggesting that these nodes are somehow exceptional. Infact, by inspection of these communities, many of them arenearly a star graph. However, a few of the large degree nodesdefine strikingly good communities (these are sets with a fewhundred vertices with conductance scores of around 10�2).This evidence concurs with the intuition from Theorem 4.6.
6.3 Comparison to PPR communitiesNote that these plots show the same shape as observed
by Leskovec et al. [24]. Consequently, in the next set offigures, and in the remainder of the empirical investigation,
arXiv – 86k verts, 500k edges soc-LiveJournal – 5M verts, 42M edges
15 more graphs available www.cs.purdue.edu/~dgleich/codes/neighborhoods KDD2012 David Gleich · Purdue
Filling in the !Network Community Profile
large global clustering coe�cients and large mean clusteringcoe�cients.
Social networks. The nodes are people again, and theedges are either explicit “friend” relationships (fb-Penn94 [36],fb-A [39], soc-LiveJournal [4]) or observed network activityover edges in a one-year span (fb-A-oneyear [39]).
Technological networks. The nodes act in a commu-nication network either as agents (p2p-Gnutella25 [27]) or asrouters (oregon2 [23], as-22july06 [29], itdk0304 [10]). Theedges are observed communications between the nodes.
Web graphs. The nodes are web-pages, and the edgesare symmetrized links between the pages [25].
6. EMPIRICAL NEIGHBORHOODCOMMUNITIES
6.1 ComputationWe first show that we can adapt any procedure to compute
all local clustering coe�cients to compute the conductancescores for each neighborhood in the graph. Most of the workto compute a local clustering coe�cient is performed whenfinding the number of triangles at the vertex. We can expressthe number of triangles with v as:
edges(N1
(v) \ {v})/2
because each edge among v’s neighbors produces a triangle(recall that the edges function double-counts). Note alsothat edges(N
1
(v) \ {v})/2 = edges(N1
(v))/2 � dv. Thencut(N
1
(v)) = vol(N1
(v)) � edges(N1
(v)). And so, giventhe number of triangles, we can compute the cut given thevolume of the neighborhood as well. This is easy to do withany graph structure that explicitly stores the degrees.
6.2 Quality of neighborhood communitiesWe use Leskovec et al.’s [24] network community plot to
show the information on all neighborhood communities si-multaneously. These plots will help us understand if theneighborhood communities are high quality (low conduc-tance), and how they compare to other community detectionmethods. Given the conductance scores from all the neigh-borhood communities and their size in terms of number ofvertices, we first identify the best community at each size.The network community plot shows the relationship betweenbest community conductance and community size on a log-log scale. In Leskovec et al., they found that these plots hada characteristic shape for modern information networks: aninitial sharp decrease until the community size is between100 and 1000, then a considerable rise in the conductancescores for larger communities. In our case, neighborhoodcommunities cannot be any larger than the maximum degreeplus one, and so we mark this point on the figures. We alwayslook at the smaller side of the cut, so no community canbe larger than half the vertices of the graph. We also markthis location on the plots. Each subsequent figure in thispaper utilizes this size-vs-conductance plot, and we will con-tinually layer information from new methods above resultsfrom old methods. The result are information-dense plotsthat need slightly more study than would be ideal, however,we point out the salient features in each plot in the text.Note also that we deliberately attempt to preserve the axeslimits across figures to promote comparisons. However, somefigures have di↵erent axis limits to exhibit the range of data.
web-Google itdk0304
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
fb-A-oneyear arxiv
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
soc-LiveJournal1 ca-AstroPh
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
Number of vertices in cluster Number of vertices in cluster
Figure 2: The best neighborhood community con-
ductance at each size (black) and the Fiedler com-
munity (red). (Note the axis limits on ca-AstroPh).
First, we show these network community plots for six ofthe networks in Figure 2. These figures are representative ofthe best and worst of our results. Plots for other graphs areavailable on the website given in the introduction.
The three graphs on the left show cases where a neighbor-hood community is or is nearby the best Fiedler community(the red circle). The three graphs on the right highlightinstances where the Fiedler community is much better thanany neighborhood community. We find it mildly surprisingthat these neighborhood communities can be as good asthe Fiedler community. The structure of the plot for bothfb-A-oneyear and soc-LiveJournal1 is instructive. Neighbor-hoods of the highest degree vertices are not community-like– suggesting that these nodes are somehow exceptional. Infact, by inspection of these communities, many of them arenearly a star graph. However, a few of the large degree nodesdefine strikingly good communities (these are sets with a fewhundred vertices with conductance scores of around 10�2).This evidence concurs with the intuition from Theorem 4.6.
6.3 Comparison to PPR communitiesNote that these plots show the same shape as observed
by Leskovec et al. [24]. Consequently, in the next set offigures, and in the remainder of the empirical investigation,
Minimum conductance for any community
neighborhood of the given size
We are missing a region of the NCP when we just look at neighborhoods
KDD2012 David Gleich · Purdue
Community Size"(Degree + 1)
Personalized PageRank Communities [Andersen06] To find the canonical NCP structure, Leskovec et al. used a personalized PageRank based community finder. These start with a single vertex seed, and then expand the community based on the solution of a personalized PageRank problem. The resulting community satisfies a local Cheeger inequality. This needs to run thousands of times for an NCP
KDD2012 David Gleich · Purdue
100 101 102 103 104 105
10−4
10−3
10−2
10−1
100
maxdeg
100 101 102 103 104 105
10−4
10−3
10−2
10−1
100
maxdeg
Filling in the !Network Community Profile
large global clustering coe�cients and large mean clusteringcoe�cients.
Social networks. The nodes are people again, and theedges are either explicit “friend” relationships (fb-Penn94 [36],fb-A [39], soc-LiveJournal [4]) or observed network activityover edges in a one-year span (fb-A-oneyear [39]).
Technological networks. The nodes act in a commu-nication network either as agents (p2p-Gnutella25 [27]) or asrouters (oregon2 [23], as-22july06 [29], itdk0304 [10]). Theedges are observed communications between the nodes.
Web graphs. The nodes are web-pages, and the edgesare symmetrized links between the pages [25].
6. EMPIRICAL NEIGHBORHOODCOMMUNITIES
6.1 ComputationWe first show that we can adapt any procedure to compute
all local clustering coe�cients to compute the conductancescores for each neighborhood in the graph. Most of the workto compute a local clustering coe�cient is performed whenfinding the number of triangles at the vertex. We can expressthe number of triangles with v as:
edges(N1
(v) \ {v})/2
because each edge among v’s neighbors produces a triangle(recall that the edges function double-counts). Note alsothat edges(N
1
(v) \ {v})/2 = edges(N1
(v))/2 � dv. Thencut(N
1
(v)) = vol(N1
(v)) � edges(N1
(v)). And so, giventhe number of triangles, we can compute the cut given thevolume of the neighborhood as well. This is easy to do withany graph structure that explicitly stores the degrees.
6.2 Quality of neighborhood communitiesWe use Leskovec et al.’s [24] network community plot to
show the information on all neighborhood communities si-multaneously. These plots will help us understand if theneighborhood communities are high quality (low conduc-tance), and how they compare to other community detectionmethods. Given the conductance scores from all the neigh-borhood communities and their size in terms of number ofvertices, we first identify the best community at each size.The network community plot shows the relationship betweenbest community conductance and community size on a log-log scale. In Leskovec et al., they found that these plots hada characteristic shape for modern information networks: aninitial sharp decrease until the community size is between100 and 1000, then a considerable rise in the conductancescores for larger communities. In our case, neighborhoodcommunities cannot be any larger than the maximum degreeplus one, and so we mark this point on the figures. We alwayslook at the smaller side of the cut, so no community canbe larger than half the vertices of the graph. We also markthis location on the plots. Each subsequent figure in thispaper utilizes this size-vs-conductance plot, and we will con-tinually layer information from new methods above resultsfrom old methods. The result are information-dense plotsthat need slightly more study than would be ideal, however,we point out the salient features in each plot in the text.Note also that we deliberately attempt to preserve the axeslimits across figures to promote comparisons. However, somefigures have di↵erent axis limits to exhibit the range of data.
web-Google itdk0304
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
fb-A-oneyear arxiv
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
soc-LiveJournal1 ca-AstroPh
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
Number of vertices in cluster Number of vertices in cluster
Figure 2: The best neighborhood community con-
ductance at each size (black) and the Fiedler com-
munity (red). (Note the axis limits on ca-AstroPh).
First, we show these network community plots for six ofthe networks in Figure 2. These figures are representative ofthe best and worst of our results. Plots for other graphs areavailable on the website given in the introduction.
The three graphs on the left show cases where a neighbor-hood community is or is nearby the best Fiedler community(the red circle). The three graphs on the right highlightinstances where the Fiedler community is much better thanany neighborhood community. We find it mildly surprisingthat these neighborhood communities can be as good asthe Fiedler community. The structure of the plot for bothfb-A-oneyear and soc-LiveJournal1 is instructive. Neighbor-hoods of the highest degree vertices are not community-like– suggesting that these nodes are somehow exceptional. Infact, by inspection of these communities, many of them arenearly a star graph. However, a few of the large degree nodesdefine strikingly good communities (these are sets with a fewhundred vertices with conductance scores of around 10�2).This evidence concurs with the intuition from Theorem 4.6.
6.3 Comparison to PPR communitiesNote that these plots show the same shape as observed
by Leskovec et al. [24]. Consequently, in the next set offigures, and in the remainder of the empirical investigation,
Minimum conductance for
any community of the given size
7807 seconds
This region fills when using the PPR method (like now!)
KDD2012 David Gleich · Purdue
Community Size"
Vertex Neighborhoods, !Low Conductance Cuts, !and Good Seeds for Local Community Methods
KDD2012 David Gleich · Purdue
Am I a good seed?!Locally Minimal Communities
“My conductance is the best locally.”
�(N(v )) �(N(w))
for all w adjacent to v
In Zachary’s Karate Club network, there are four locally minimal communities, the two leaders and two peripheral nodes.
KDD2012 David Gleich · Purdue
100 101 102 103 104 105
10−4
10−3
10−2
10−1
100
maxdeg
100 101 102 103 104 105
10−4
10−3
10−2
10−1
100
maxdeg
Locally minimal communities capture extremal neighborhoods
large global clustering coe�cients and large mean clusteringcoe�cients.
Social networks. The nodes are people again, and theedges are either explicit “friend” relationships (fb-Penn94 [36],fb-A [39], soc-LiveJournal [4]) or observed network activityover edges in a one-year span (fb-A-oneyear [39]).
Technological networks. The nodes act in a commu-nication network either as agents (p2p-Gnutella25 [27]) or asrouters (oregon2 [23], as-22july06 [29], itdk0304 [10]). Theedges are observed communications between the nodes.
Web graphs. The nodes are web-pages, and the edgesare symmetrized links between the pages [25].
6. EMPIRICAL NEIGHBORHOODCOMMUNITIES
6.1 ComputationWe first show that we can adapt any procedure to compute
all local clustering coe�cients to compute the conductancescores for each neighborhood in the graph. Most of the workto compute a local clustering coe�cient is performed whenfinding the number of triangles at the vertex. We can expressthe number of triangles with v as:
edges(N1
(v) \ {v})/2
because each edge among v’s neighbors produces a triangle(recall that the edges function double-counts). Note alsothat edges(N
1
(v) \ {v})/2 = edges(N1
(v))/2 � dv. Thencut(N
1
(v)) = vol(N1
(v)) � edges(N1
(v)). And so, giventhe number of triangles, we can compute the cut given thevolume of the neighborhood as well. This is easy to do withany graph structure that explicitly stores the degrees.
6.2 Quality of neighborhood communitiesWe use Leskovec et al.’s [24] network community plot to
show the information on all neighborhood communities si-multaneously. These plots will help us understand if theneighborhood communities are high quality (low conduc-tance), and how they compare to other community detectionmethods. Given the conductance scores from all the neigh-borhood communities and their size in terms of number ofvertices, we first identify the best community at each size.The network community plot shows the relationship betweenbest community conductance and community size on a log-log scale. In Leskovec et al., they found that these plots hada characteristic shape for modern information networks: aninitial sharp decrease until the community size is between100 and 1000, then a considerable rise in the conductancescores for larger communities. In our case, neighborhoodcommunities cannot be any larger than the maximum degreeplus one, and so we mark this point on the figures. We alwayslook at the smaller side of the cut, so no community canbe larger than half the vertices of the graph. We also markthis location on the plots. Each subsequent figure in thispaper utilizes this size-vs-conductance plot, and we will con-tinually layer information from new methods above resultsfrom old methods. The result are information-dense plotsthat need slightly more study than would be ideal, however,we point out the salient features in each plot in the text.Note also that we deliberately attempt to preserve the axeslimits across figures to promote comparisons. However, somefigures have di↵erent axis limits to exhibit the range of data.
web-Google itdk0304
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
fb-A-oneyear arxiv
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
soc-LiveJournal1 ca-AstroPh
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
Number of vertices in cluster Number of vertices in cluster
Figure 2: The best neighborhood community con-
ductance at each size (black) and the Fiedler com-
munity (red). (Note the axis limits on ca-AstroPh).
First, we show these network community plots for six ofthe networks in Figure 2. These figures are representative ofthe best and worst of our results. Plots for other graphs areavailable on the website given in the introduction.
The three graphs on the left show cases where a neighbor-hood community is or is nearby the best Fiedler community(the red circle). The three graphs on the right highlightinstances where the Fiedler community is much better thanany neighborhood community. We find it mildly surprisingthat these neighborhood communities can be as good asthe Fiedler community. The structure of the plot for bothfb-A-oneyear and soc-LiveJournal1 is instructive. Neighbor-hoods of the highest degree vertices are not community-like– suggesting that these nodes are somehow exceptional. Infact, by inspection of these communities, many of them arenearly a star graph. However, a few of the large degree nodesdefine strikingly good communities (these are sets with a fewhundred vertices with conductance scores of around 10�2).This evidence concurs with the intuition from Theorem 4.6.
6.3 Comparison to PPR communitiesNote that these plots show the same shape as observed
by Leskovec et al. [24]. Consequently, in the next set offigures, and in the remainder of the empirical investigation,
Red dots are conductance "and size of a "
locally minimal community
Usually about 1%
of # of vertices.
The red circles – the best local mins – find the extremes in the egonet profile.
KDD2012 David Gleich · Purdue
Community Size"
100 101 102 103 104 105
10−4
10−3
10−2
10−1
100
maxdeg
100 101 102 103 104 105
10−4
10−3
10−2
10−1
100
maxdeg
Filling in the NCP!Growing locally minimal comm.
large global clustering coe�cients and large mean clusteringcoe�cients.
Social networks. The nodes are people again, and theedges are either explicit “friend” relationships (fb-Penn94 [36],fb-A [39], soc-LiveJournal [4]) or observed network activityover edges in a one-year span (fb-A-oneyear [39]).
Technological networks. The nodes act in a commu-nication network either as agents (p2p-Gnutella25 [27]) or asrouters (oregon2 [23], as-22july06 [29], itdk0304 [10]). Theedges are observed communications between the nodes.
Web graphs. The nodes are web-pages, and the edgesare symmetrized links between the pages [25].
6. EMPIRICAL NEIGHBORHOODCOMMUNITIES
6.1 ComputationWe first show that we can adapt any procedure to compute
all local clustering coe�cients to compute the conductancescores for each neighborhood in the graph. Most of the workto compute a local clustering coe�cient is performed whenfinding the number of triangles at the vertex. We can expressthe number of triangles with v as:
edges(N1
(v) \ {v})/2
because each edge among v’s neighbors produces a triangle(recall that the edges function double-counts). Note alsothat edges(N
1
(v) \ {v})/2 = edges(N1
(v))/2 � dv. Thencut(N
1
(v)) = vol(N1
(v)) � edges(N1
(v)). And so, giventhe number of triangles, we can compute the cut given thevolume of the neighborhood as well. This is easy to do withany graph structure that explicitly stores the degrees.
6.2 Quality of neighborhood communitiesWe use Leskovec et al.’s [24] network community plot to
show the information on all neighborhood communities si-multaneously. These plots will help us understand if theneighborhood communities are high quality (low conduc-tance), and how they compare to other community detectionmethods. Given the conductance scores from all the neigh-borhood communities and their size in terms of number ofvertices, we first identify the best community at each size.The network community plot shows the relationship betweenbest community conductance and community size on a log-log scale. In Leskovec et al., they found that these plots hada characteristic shape for modern information networks: aninitial sharp decrease until the community size is between100 and 1000, then a considerable rise in the conductancescores for larger communities. In our case, neighborhoodcommunities cannot be any larger than the maximum degreeplus one, and so we mark this point on the figures. We alwayslook at the smaller side of the cut, so no community canbe larger than half the vertices of the graph. We also markthis location on the plots. Each subsequent figure in thispaper utilizes this size-vs-conductance plot, and we will con-tinually layer information from new methods above resultsfrom old methods. The result are information-dense plotsthat need slightly more study than would be ideal, however,we point out the salient features in each plot in the text.Note also that we deliberately attempt to preserve the axeslimits across figures to promote comparisons. However, somefigures have di↵erent axis limits to exhibit the range of data.
web-Google itdk0304
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
fb-A-oneyear arxiv
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
ver t s2
soc-LiveJournal1 ca-AstroPh
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!4
10!3
10!2
10!1
100
maxdeg
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
100
101
102
103
104
105
10!2
10!1
100
maxdeg
ver t s2
Number of vertices in cluster Number of vertices in cluster
Figure 2: The best neighborhood community con-
ductance at each size (black) and the Fiedler com-
munity (red). (Note the axis limits on ca-AstroPh).
First, we show these network community plots for six ofthe networks in Figure 2. These figures are representative ofthe best and worst of our results. Plots for other graphs areavailable on the website given in the introduction.
The three graphs on the left show cases where a neighbor-hood community is or is nearby the best Fiedler community(the red circle). The three graphs on the right highlightinstances where the Fiedler community is much better thanany neighborhood community. We find it mildly surprisingthat these neighborhood communities can be as good asthe Fiedler community. The structure of the plot for bothfb-A-oneyear and soc-LiveJournal1 is instructive. Neighbor-hoods of the highest degree vertices are not community-like– suggesting that these nodes are somehow exceptional. Infact, by inspection of these communities, many of them arenearly a star graph. However, a few of the large degree nodesdefine strikingly good communities (these are sets with a fewhundred vertices with conductance scores of around 10�2).This evidence concurs with the intuition from Theorem 4.6.
6.3 Comparison to PPR communitiesNote that these plots show the same shape as observed
by Leskovec et al. [24]. Consequently, in the next set offigures, and in the remainder of the empirical investigation,
Growing only locally minimal
communities
283 seconds vs.
7807 seconds
Full NCP Locally min NCP
Original Egonet
KDD2012 David Gleich · Purdue
Community Size"
100 101 102 103 104 105
10−4
10−3
10−2
10−1
100
maxdeg ver t s
2
100 101 102 103 104 105
10−4
10−3
10−2
10−1
100
maxdeg ver t s
2
Filling in the NCP!Growing locally minimal comm.
Growing only locally minimal
communities
143 seconds vs.
2211 seconds
Full NCP Locally min NCP
Original Egonets arXiv – 86k verts, 500k edges
KDD2012 David Gleich · Purdue
Community Size"
Recap A theorem relating clustering,"heavy-tailed degrees, and"low-conductance cuts of "vertex neighborhoods. Empirical evaluation of "vertex neighborhoods. More on k-cores in the paper. ⇒ Many communities are easy to find! ⇒ Explains success of community detection?
Acknowledgements!David supported by NSF CAREER
award 1149756-CCF. Sesh supported by the Sandia
LDRD program (project 158477) and the applied mathematics program at
the Dept. of Energy.
KDD2012 David Gleich · Purdue
Code and results available online
www.cs.purdue.edu/~dgleich/ codes/neighborhoods
Two words on computing
Can be done by just counting the triangles at each node. Linear complexity in |E| in a power-law graph. It’s possible to do this in MapReduce too.
KDD2012 David Gleich · Purdue
100 101 102 103 104 105
10−4
10−3
10−2
10−1
100
maxdeg ver t s
2
100 101 102 103 104 105
10−4
10−3
10−2
10−1
100
maxdeg ver t s
2
Filling in the NCP!Growing k cores
Growing only locally minimal
communities and k-cores
143 seconds
vs. 2211 seconds
Full NCP Locally min NCP
Original Egonets arXiv – 86k verts, 500k edges
KDD2012 David Gleich · Purdue
Community Size"
PPR grown k-cores
k-cores
Clustering coefficients Wedge Global clustering coefficient Local clustering coefficient
=
number of closed wedges
number of wedges
Cv =
number of closed wedges centered at vnumber of wedges centered at v
center of wedge
closed wedge
Probability that a random wedge is closed
KDD2012 David Gleich · Purdue