higher-order organization of complex networks
TRANSCRIPT
![Page 1: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/1.jpg)
CEPDR
CEPVR
IL2R
OLLRRIALRIAR
RIVLRIVR
RMDDR
RMDLRMDR
RMDVL
RMFLSMDDL
SMDDR
SMDVR
URBR
Higher-order organization !of complex networks
910
8
72
0
4
3
11
6
5
1
David F. Gleich!Purdue University!
Joint work with "Austin Benson and Jure Leskovec, Stanford "Supported by NSF CAREER CCF-1149756, IIS-1422918 DARPA SIMPLEX
PCMI2016 David Gleich · Purdue 1
Code & Data snap.stanford.edu/higher-order" github.com/arbenson/higher-order-organization-julia
![Page 2: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/2.jpg)
Network analysis has two important observations about real-world networks
Real-world networks have modular organization!
Edge-based clustering and community detection sometimes expose this structure.
Control widgets are over-expressed in complex networks. !
We can expose this motif or graphlet analysis
PCMI2016 David Gleich · Purdue 2
Milo et al., Science, 2002. Co-author network
![Page 3: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/3.jpg)
Nodes and edges are not the fundamental units of these networks.
Why should we look for structure "
in terms of them?
PCMI2016 David Gleich · Purdue 3
![Page 4: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/4.jpg)
Idea Find clusters
PCMI2016 David Gleich · Purdue 4
![Page 5: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/5.jpg)
Idea Find clusters of motifs
PCMI2016 David Gleich · Purdue 5
![Page 6: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/6.jpg)
In practice, motifs organize real-world networks !amazing well and recover aquatic layers in food webs
Micronutrient !sources!
Benthic Fishes!
Benthic Macroinvertibrates!
Pelagic fishes !And benthic Prey!
http://marinebio.org/oceans/marine-zones/
We don’t know how to find this structure based on edge partitioning.
PCMI2016 David Gleich · Purdue 6
![Page 7: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/7.jpg)
Aside How did we get to this idea and looking at this problem?
• Research is a journey.
PCMI2016 David Gleich · Purdue 7
![Page 8: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/8.jpg)
We can do motif-based clustering by generalizing spectral clustering
Spectral clustering is a classic technique to partition graphs by looking at eigenvectors.
M. Fiedler, 1973, Algebraic connect-ivity of graphs
Graph Laplacian Eigenvector PCMI2016 David Gleich · Purdue 8
![Page 9: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/9.jpg)
Spectral clustering works based on conductance
There are many ways to measure the quality of a set of nodes of a graph to gauge how they partition the graph.
cut(S) = 7 cut(
¯S) = 7
|S| = 15 | ¯S| = 20
vol(S) = 85 vol(
¯S) = 151
cut(S) = 7 cut(
¯S) = 7
|S| = 15 | ¯S| = 20
vol(S) = 85 vol(
¯S) = 151
cut(S) = 7/85 + 7/151 = 0.1287
cut sparsity(S) = 7/15 = 0.4667
�(S) = cond(S) = 7/85 = 0.0824
n
�(S) = cut(S)/min(vol(S), vol(
¯S))
PCMI2016 David Gleich · Purdue 9
![Page 10: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/10.jpg)
Conductance sets in graphs
PCMI2016 David Gleich · Purdue 10
Conductance is one of the most important quality scores [Schaeffer07]
used in Markov chain theory, bioinformatics, vision, etc. PCMI Nelson showed how use you can this to get heavy-hitters in turnstile algs! The conductance of a set of vertices is the ratio of edges leaving to total edges: Equivalently, it’s the probability that a random edge leaves the set. Small conductance ó Good set
�(S) =
cut(S)
min
�vol(S), vol(
¯S)
�(edges leaving the set)
(total edges in the set)
cut(S) = 7
vol(S) = 33
vol(
¯S) = 11
�(S) = 7/11
![Page 11: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/11.jpg)
Spectral clustering has theoretical guarantees
Cheeger Inequality
Finding the best conductance set is NP-hard. L • Cheeger realized the eigenvalues of the
Laplacian provided a bound in manifolds • Alon and Milman independently realized
the same thing for a graph!
J. Cheeger, 1970, A lower bound on the smallest eigenvalue of the Laplacian
N. Alon, V. Milman 1985. λ1 isoperi-metric inequalities for graphs and superconcentrators
Laplacian �2⇤/2 �2 2�⇤
0 = �1 �2 ... �n 2Eigenvalues of the Laplacian
�⇤ = set of smallest conductance
PCMI2016 David Gleich · Purdue 11
![Page 12: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/12.jpg)
The sweep cut algorithm realizes the guarantee
We can find a set S that achieves the Cheeger bound. 1. Compute the eigenvector
associated with λ2. 2. Sort the vertices by their values
in the eigenvector: σ1, σ2, … σn 3. Let Sk = {σ1, …, σk} and
compute the conductance of each Sk: φk = φ(Sk)
4. Pick the minimum φm of φk .
M. Mihail, 1989 Conductance and convergence of Markov chains
F. C. Graham, 1992, Spectral Graph Theory.
�m 4p
�⇤PCMI2016 David Gleich · Purdue 12
![Page 13: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/13.jpg)
The sweep cut visualized
0 20 400
0.2
0.4
0.6
0.8
1
Si
φ i
�(S) =
cut(S)
min
�vol(S), vol(
¯S)
�
PCMI2016 David Gleich · Purdue 13
![Page 14: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/14.jpg)
Demo…
PCMI2016 David Gleich · Purdue 14
![Page 15: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/15.jpg)
That’s spectral clustering 40+ years of ideas and successful applications • Fast algorithms that avoid eigenvectors "
(Graculus from Dhillon et al. 2007) • Local algorithms for seeded detection"
(Spielman & Teng 2004; Andersen, Chung, Lang 2006)"PCMI: Kimon gave a talk about this yesterday!
• Overlapping algorithms • Embeddings • And more!
PCMI2016 David Gleich · Purdue 15
![Page 16: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/16.jpg)
But current problems are much more rich than when spectral was designed
Spectral clustering is theoretically justified for undirected, simple graphs" Many datasets are directed, weighted, signed, colored, layered,
R. Milo, 2002, Science
X
Y
X causes Y to be expressed Z represses Y
X
Z
Y
+
– PCMI2016 David Gleich · Purdue 16
![Page 17: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/17.jpg)
Our contributions 1. A generalized conductance metric for motifs 2. A new spectral clustering algorithm to minimize the generalized
conductance. 3. AND an associated Cheeger inequality. 4. Aquatic layers in food webs 5. Control structures in neural networks 6. Hub structure in transportation networks 7. Anomaly detection in Twitter
Benson, Gleich, Leskovec, Science 2016.
PCMI2016 David Gleich · Purdue 17
![Page 18: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/18.jpg)
Motif-based conductance generalizes !edge-based conductance Need notions of cut and volume!
�(S) =
#(edges cut)
min(vol(S), vol(
¯S))
Edges cut! Triangles cut!S S
SS̄ S̄
vol(S) = #(edge end points in S) volM (S) = #(triangle
end points in S)
�M (S) =
#(triangles cut)
min(volM (S), volM (
¯S))
PCMI2016 David Gleich · Purdue 18
![Page 19: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/19.jpg)
An example of motif-conductance
910
6
58
17
2
0
4
3
11
910
8
72
0
4
3
11
6
5
1
S̄
S
Motif
�M (S) =
motifs cut
motif volume
=
1
10
PCMI2016 David Gleich · Purdue 19
![Page 20: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/20.jpg)
Going from motifs back to a matrix for spectral clustering
910
6
58
17
2
0
4
3
11
910
6
58
17
2
0
4
3
11
11
1
1 1
1
1
1 1
1
1
1 1
1
1
1
2
3
AW (M)
ij = counts co-occurrences of motif pattern between i , j
W (M)
PCMI2016 David Gleich · Purdue 20
![Page 21: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/21.jpg)
Going from motifs back to a matrix for spectral clustering
910
6
58
17
2
0
4
3
11
11
1
1 1
1
1
1 1
1
1
1 1
1
1
1
2
3
W (M)
ij = counts co-occurrences of motif pattern between i , j
W (M)
KEY INSIGHT!Spectral clustering on W(M) yields results on the new motif notion of conductance
�M (S) =
motifs cut
motif volume
=
1
10
PCMI2016 David Gleich · Purdue 21
![Page 22: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/22.jpg)
A motif-based clustering algorithm 1. Form weighted graph W(M) 2. Compute the Fiedler vector associated with λ2 of the
motif-normalized Laplacian
3. Run a (motif-cond) sweep cut on f!
910
6
58
17
2
0
4
3
11
11
1
1 1
11
1 1
1
1
1 1
1
1
1
2
3
W (M)
D = diag(W (M)e)
L(M) = D�1/2(D � W (M))D�1/2
L(M)z = �2z
f(M) = D�1/2z
PCMI2016 David Gleich · Purdue 22
![Page 23: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/23.jpg)
The sweep cut results
2 4 6 8 100
0.2
0.4
0.6
0.8
11
2
0
4
3
1
2
0
4
3
910
6
Best higher-order cluster
2nd best higher-order cluster
910
6
58
17
2
0
4
3
11
11
1
1 1
1
1
1 1
1
1
1 1
1
1
1
2
3
(Order from the Fiedler vector)
PCMI2016 David Gleich · Purdue 23
![Page 24: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/24.jpg)
The motif-based Cheeger inequality
THEOREM!If the motif has three nodes, then the sweep procedure on the weighted graph finds a set S of nodes for which THEOREM For more than 4 nodes, we "use a slightly altered conductance.
�M (S) 4q
�⇤M
cut
M
(S, G) =
X
{i ,j ,k}2M(G)
Indicator[x
i
, x
j
, x
k
not the same]
= quadratic in x
M(G) = {instances of M in G}Key Proof Step!
PCMI2016 David Gleich · Purdue 24
![Page 25: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/25.jpg)
Awesome advantages We inherit 40+ years of research! • Fast algorithms "
(ARPACK, etc.)! • Local methods! • Overlapping!
• Easy to implement "(20 lines of Matlab/Julia)
• Scalable (1.4B edges graphs "are not a prob.)
PCMI2016 David Gleich · Purdue 25
12/13/2015 motif_example
file:///Users/arbenson/Desktop/html/motif_example.html 1/2
function [S, conductances] = MotifClusterM36(A) B = spones(A & A'); % bidirectional links U = A - B; % unidirectional links W = (B * U') .* U' + (U * B) .* U + (U' * U) .* B; % Motif M_3^6 D = diag(sum(W)); Ln = speye(size(W, 1)) - sqrt(D)^(-1) * W * sqrt(D)^(-1); [Z, ~] = eigs(Ln, 2, 'sm'); [~, order] = sort(sqrt(D)^(-1) * Z(:, 2)); conductances = zeros(n, 1); x = zeros(n, 1); for i = 1:n x(order(i)) = 1; xn = ~x + 0; conductances(i) = x' * (D - W) * x / min(x' * D * x, xn' * D * xn); end [~, split] = min(conductances); S = order(1:split);
Error using motif_example (line 2) Not enough input arguments.
Published with MATLAB® R2015a
![Page 26: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/26.jpg)
Case studies
An intro note! 1. Aquatic layers in food webs."
Signed patterns in regulatory networks 2. Control structures in neural networks 3. Hub structure in transportation networks. 4. Scaling and large data
PCMI2016 David Gleich · Purdue 26
![Page 27: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/27.jpg)
NOTE !The partition depends on the motif
1011
9
83
1
5
4
12
7
6
2
1011
9
83
1
5
4
12
7
6
2
PCMI2016 David Gleich · Purdue 27
![Page 28: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/28.jpg)
Case study 1!Motifs partition the food webs Food webs model energy exchange in species of an ecosystem i -> j means i’s energy goes to j "(or j eats i) Via Cheeger, motif conductance is better than edge conductance.
PCMI2016 David Gleich · Purdue 28
![Page 29: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/29.jpg)
Demo
PCMI2016 David Gleich · Purdue 29
![Page 30: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/30.jpg)
Case study 1!Motifs partition the food webs
Micronutrient !sources!
Benthic Fishes!
Benthic Macroinvertebrates!
Pelagic fishes !and benthic prey!
Motif M6 reveals aquatic layers.
A B
C
Figure 1: Higher-order network structures and the higher-order network clusteringframework. A: Higher-order structures are captured by network motifs. For example, all13 connected three-node directed motifs are shown here. B: Clustering of a network based onmotif M7. For a given motif M , our framework aims to find a set of nodes S that minimizesmotif conductance, �M(S), which we define as the ratio of the number of motifs cut (filledtriangles cut) to the minimum number of nodes in instances of the motif in either S or S̄ (11).In this case, there is one motif cut. C: The higher-order network clustering framework. Given agraph and a motif of interest (in this case, M7), the framework forms a motif adjacency matrix(WM ) by counting the number of times two nodes co-occur in an instance of the motif. Aneigenvector of a Laplacian transformation of the motif adjacency matrix is then computed. Theordering � of the nodes provided by the components of the eigenvector (13) produces nested setsSr = {�1, . . . , �r} of increasing size r. We prove that the set Sr with the smallest motif-basedconductance, �M(Sr), is a near-optimal higher-order cluster (11).
7
84% accuracy vs. 69% for other methods
PCMI2016 David Gleich · Purdue 30
![Page 31: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/31.jpg)
Case study 2!Nictation control in neural network
(d) From Nictation, a dispersal behavior of the nematode Caenorhabditis elegans, is regulated by IL2 neurons, Lee et al. Nature Neuroscience.
"We find the control mechanism that explains this based on the bi-fan motif (Milo et al. found it over-expressed)
A B
C
Nicatation – standing on a tail and waving
A B
C
PCMI2016 David Gleich · Purdue 31
![Page 32: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/32.jpg)
Case study 3 !Rich structure beyond clusters
North American air "transport network Nodes are airports Edges reflect "reachability, and "are unweighted. (Based on Frey"et al.’s 2007)
PCMI2016 David Gleich · Purdue 32
![Page 33: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/33.jpg)
We can use complex motifs with non-anchored nodes
Accepted pending minor revisions
Do not distribute.
D
C
B
A
Figure 4: Higher-order spectral analysis of a network of airports in Canada and the UnitedStates (22). A: The three higher-order structures used in our analysis. Each motif is “an-chored” by the blue nodes i and j, which means our framework only seeks to cluster togetherthe blue nodes. Specifically, the motif adjacency matrix adds weight to the (i, j) edge basedon the number of third intermediary nodes (green squares). The first two motifs correspondto highly-connected cities and the motif on the right connects non-hubs to non-hubs. B: Thetop 50 most populous cities in the United States which correspond to nodes in the network.The edge thickness is proportional to the weight in the motif adjacency matrix WM . The thick,dark lines indicate that large weights correspond to popular mainline routes. C: Embedding ofnodes provided by their corresponding components of the first two non-trivial eigenvectors ofthe normalized Laplacian for WM . The marked cities are eight large U.S. hubs (green), threeWest coast non-hubs (red), and three East coast non-hubs (purple). The primary spectral coor-dinate (left to right) reveals how much of a hub the city is, and the second spectral coordinate(top to bottom) captures West-East geography (11). D: Embedding of nodes provided by theircorresponding components in the first two non-trivial eigenvectors of the standard, edge-based(non-higher-order) normalized Laplacian. This method does not capture the hub and geographyfound by the higher-order method. For example, Atlanta, the largest hub, is in the center of theembedding, next to Salina, a non-hub.
10
Counts length-two walks
PCMI2016 David Gleich · Purdue 33
![Page 34: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/34.jpg)
The weighting alone reveals hub-like structure
PCMI2016 David Gleich · Purdue 34
![Page 35: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/35.jpg)
The motif embedding shows this structure and splits into east-west
Top 10 U.S. hubs
East coast non-hubs!
West coast non-hubs!
Primary spectral coordinate
Atlanta, the top hub, is next to Salina, a non-hub.
MOTIF SPECTRAL EMBEDDING
EDGE SPECTRAL EMBEDDING
PCMI2016 David Gleich · Purdue 35
![Page 36: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/36.jpg)
Case study 4!Large scale stuff
The up-linked triangle finds an anomalous cluster in Twitter.
Anomalous cluster in the 1.4B edge Twitter graph. All nodes are holding accounts for a company, and the orange nodes have incomplete profiles.
PCMI2016 David Gleich · Purdue 36
![Page 37: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/37.jpg)
Related work.
§ Laplacian we propose was originally proposed by Rodríguez [2004] and again by Zhou et al. [2006]"Our new theory (motif Cheeger inequality) explains why these were good ideas.
§ Falls under general strategy of encoding hypergraph partitioning problem as graph clustering problem [Agarwal+ 06]
§ Serrour, Arenas, and Gómez, Detecting communities of triangles in complex networks using spectral optimization, 2011.
§ Arenas et al., Motif-based communities in complex networks, 2008.
PCMI2016 David Gleich · Purdue 37
![Page 38: Higher-order organization of complex networks](https://reader031.vdocuments.mx/reader031/viewer/2022030316/587a55b51a28ab520b8b5577/html5/thumbnails/38.jpg)
Paper!Benson, Gleich, Leskovec!Science, 2016 1. A generalized conductance metric for motifs 2. A new spectral clustering algorithm to
minimize the generalized conductance. 3. AND an associated Cheeger inequality. 4. Aquatic layers in food webs 5. Control structures in neural networks 6. Hub structure in transportation networks 7. Anomaly detection in Twitter 8. Lots of cool stuff on signed networks.
Thank you!
Joint work with "Austin Benson and Jure Leskovec, Stanford Supported by NSF CAREER CCF-1149756, IIS-1422918 IIS- DARPA SIMPLEX
9 10
8
7
2
04
3
11
6
5
1
PCMI2016 David Gleich · Purdue 38