netmine: mining tools for large graphs
DESCRIPTION
NetMine: Mining Tools for Large Graphs. Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch. Introduction. Protein Interactions [genomebiology.com]. Internet Map [lumeta.com]. Food Web [Martinez ’91]. ► Graphs are ubiquitous. Friendship Network [Moody ’01]. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/1.jpg)
NetMine: Mining Tools for Large Graphs
Deepayan ChakrabartiYiping ZhanDaniel BlandfordChristos FaloutsosGuy Blelloch
![Page 2: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/2.jpg)
2
Introduction
Internet Map [lumeta.com]
Food Web [Martinez ’91]
Protein Interactions [genomebiology.com]
Friendship Network [Moody ’01]
► Graphs are ubiquitous
![Page 3: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/3.jpg)
3
Graph “Patterns”
How does a real-world graph look like? Patterns/“Laws”
Degree distributions/power-laws
Count vs Outdegree
Power Laws
![Page 4: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/4.jpg)
4
Graph “Patterns”
How does a real-world graph look like? Patterns/“Laws”
Degree distributions/power-laws “Small-world” “Scree” plots and others…
Hop-plot
Effective Diameter
![Page 5: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/5.jpg)
5
Graph “Patterns”
Why do we like them? They capture interesting properties of graphs. They provide “condensed information” about the
graph. They are needed to build/test realistic graph
generators (useful for simulation studies extrapolations/sampling )
They help detect abnormalities and outliers.
![Page 6: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/6.jpg)
6
Graph Patterns
Degree-distributions “power laws” “Small-world” small diameter “Scree” plots … What else?
![Page 7: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/7.jpg)
7
Our Work
The NetMine toolkit
contains all the patterns mentioned before, and adds:
The “min-cut” plot a novel pattern which carries interesting
information about the graph. A-plots
a tool to quickly find suspicious subgraphs/nodes.
![Page 8: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/8.jpg)
8
Outline
Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions
![Page 9: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/9.jpg)
9
“Min-cut” plot
What is a min-cut? Minimizes the number of edges cut
Two partitions of almost equal size
Size of mincut = 2
![Page 10: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/10.jpg)
10
“Min-cut” plot
Do min-cuts recursively.
log (# edges)
log (mincut-size / #edges)
N nodes
Mincut size = sqrt(N)
![Page 11: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/11.jpg)
11
“Min-cut” plot
Do min-cuts recursively.
log (# edges)
log (mincut-size / #edges)
N nodes
New min-cut
![Page 12: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/12.jpg)
12
“Min-cut” plot
Do min-cuts recursively.
log (# edges)
log (mincut-size / #edges)
N nodes
New min-cut
Slope = -0.5
For a d-dimensional grid, the slope is -1/d
![Page 13: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/13.jpg)
13
“Min-cut” plot
log (# edges)
log (mincut-size / #edges)
Slope = -1/d
For a d-dimensional grid, the slope is -1/d
log (# edges)
log (mincut-size / #edges)
For a random graph, the slope is 0
![Page 14: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/14.jpg)
14
“Min-cut” plot
Min-cut sizes have important effects on graph properties, such as efficiency of divide-and-conquer algorithms compact graph representation difference of the graph from well-known
graph types for example, slope = 0 for a random graph
![Page 15: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/15.jpg)
15
“Min-cut” plot
What does it look like for a real-world graph?
log (# edges)
log (mincut-size / #edges)
?
![Page 16: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/16.jpg)
16
Experiments
Datasets: Google Web Graph: 916,428 nodes and
5,105,039 edges Lucent Router Graph: Undirected graph of
network routers from www.isi.edu/scan/mercator/maps.html; 112,969 nodes and 181,639 edges
User Website Clickstream Graph: 222,704 nodes and 952,580 edges
![Page 17: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/17.jpg)
17
Experiments
Used the METIS algorithm [Karypis, Kumar, 1995]
log (# edges)
log
(min
cut-
size
/ #
edge
s)
• Google Web graph
• Values along the y-axis are averaged
• We observe a “lip” for large edges
• Slope of -0.4, corresponds to a 2.5-dimensional grid!
Slope~ -0.4
![Page 18: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/18.jpg)
18
Experiments
Same results for other graphs too…
log (# edges) log (# edges)
log
(min
cut-
size
/ #
edge
s)
log
(min
cut-
size
/ #
edge
s)
Lucent Router graph Clickstream graph
Slope~ -0.57 Slope~ -0.45
![Page 19: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/19.jpg)
19
Observations
Linear slope for some range of values “Lip” for high #edges Far from random graphs (because slope ≠ 0)
![Page 20: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/20.jpg)
20
Outline
Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions
![Page 21: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/21.jpg)
21
A-plots
How can we find abnormal nodes or subgraphs? Visualization
but most graph visualization techniques do not scale to large graphs!
![Page 22: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/22.jpg)
22
A-plots
However, humans are pretty good at “eyeballing” data
Our idea: Sort the adjacency matrix in novel ways and plot the matrix so that patterns become visible to the user
We will demonstrate this on the SCAN+Lucent Router graph (284,805 nodes and 898,492 edges)
![Page 23: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/23.jpg)
23
A-plots
Three types of such plots for undirected graphs… RV-RV (RankValue vs RankValue) Sort nodes
based on their “network value” (~first eigenvector)
Rank of Network Value
Ra
nk
of N
etw
ork
Va
lue
![Page 24: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/24.jpg)
24
A-plots
Three types of such plots for undirected graphs… RD-RD (RankDegree vs RankDegree) Sort
nodes based on their degree
Rank of Degree of node
Ra
nk
of D
eg
ree
of n
ode
![Page 25: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/25.jpg)
25
A-plots
Three types of such plots for undirected graphs… D-RV (Degree vs RankValue) Sort nodes
according to “network value”, and show their corresponding degree
![Page 26: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/26.jpg)
26
RV-RV plot (RankValue vs RankValue) We can see a
“teardrop” shape and also some
blank “stripes” and a strong
diagonal (even though there
are no self-loops)!Rank of Network Value
Ra
nk
of N
etw
ork
Va
lue
Stripes
![Page 27: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/27.jpg)
27
RV-RV plot (RankValue vs RankValue)
Rank of Network Value
Ra
nk
of N
etw
ork
Va
lue
The “teardrop” structure can be explained by degree-1 and degree-2 nodes
NV1 = 1/λ * NV2
1 2
![Page 28: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/28.jpg)
28
RV-RV plot (RankValue vs RankValue)
Rank of Network Value
Ra
nk
of N
etw
ork
Va
lue
Strong diagonal nodes are more likely to connect to “similar” nodes
![Page 29: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/29.jpg)
29
RD-RD (RankDegree vs RankDegree)
Rank of Degree of node
Ra
nk
of D
eg
ree
of n
ode
• Isolated dots due to 2-node isolated components
![Page 30: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/30.jpg)
30
D-RV (Degree vs RankValue)
Rank of Network Value
Deg
ree
![Page 31: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/31.jpg)
31
D-RV (Degree vs RankValue)
Rank of Network Value
Deg
ree
Why?
![Page 32: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/32.jpg)
32
Explanation of “Spikes” and “Stripes” RV-RV plot had stripes; D-RV plot shows
spikes. Why? “Stripe” nodes degree-2 nodes connecting only to the “Spike” nodes
“Spike” nodes high degree, but all edges to “Stripe” nodes
Stripe
![Page 33: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/33.jpg)
33
A-plots
They helped us detect a buried abnormal subgraph
in a large real-world dataset which can then be taken to the domain
experts.
![Page 34: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/34.jpg)
34
Outline
Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions
![Page 35: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/35.jpg)
35
Conclusions
We presented “Min-cut” plot
A novel graph pattern with relevance for many algorithms and applications
A-plots which help us find interesting abnormalities
All the methods are scalable Their usage was demonstrated on large real-world
graph datasets
![Page 36: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/36.jpg)
36
RV-RV plot (RankValue vs RankValue) We can see a
“teardrop” shape and also some
blank “stripes” and a strong
diagonal.
Rank of Network Value
Ra
nk
of N
etw
ork
Va
lue
![Page 37: NetMine: Mining Tools for Large Graphs](https://reader036.vdocuments.mx/reader036/viewer/2022062520/56815b67550346895dc95a0b/html5/thumbnails/37.jpg)
37
RD-RD (RankDegree vs RankDegree)
Rank of Degree of node
Ra
nk
of D
eg
ree
of n
ode
• Isolated dots due to 2-node isolated components