graphscope : parameter-free mining of large time-evolving graphs
DESCRIPTION
GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs. Jimeng Sun CMU Spiros Papadimitriou IBM Philip S. Yu IBM Christos Faloutsos CMU. Motivation of GraphScope. Time-evolving graphs Network traffic graphs Email networks Customer product relationships - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/1.jpg)
GraphScope: Parameter-Free Mining of Large Time-Evolving GraphsJimeng Sun CMU
Spiros Papadimitriou IBM
Philip S. Yu IBM
Christos Faloutsos CMU
![Page 2: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/2.jpg)
Motivation of GraphScope
Time-evolving graphs Network traffic graphs Email networks Customer product relationshipsCall detail records in telecom networks Financial transaction data
Key questions:1. How to monitor community structures?
2. How to detect the change points?
2
![Page 3: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/3.jpg)
3
1. Community discovery
5 10 15 20 25
5
10
15
20
25
5 10 15 20 25
5
10
15
20
25
Products
Graph Adjacency matrix
289 /300
48/50
5/200 2/75
Books
CEOsResearchers
BMWs
97%
96%
3%
3%
54%54%
Simultaneously group: customers and products,or, source-destination traffic graphs,or, sender-recipient communication, etc…
Cus
tom
ers
Product groups
Cus
tom
er g
roup
s
Customers
ProductsCustomers
Products
e.g.,
![Page 4: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/4.jpg)
4
2. Change detection
time
Find change points in group structure
Products
Cus
tom
ers
Produ
cts
holiday season
![Page 5: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/5.jpg)
Given graphs G1, G2, … Gt where Gi is n-by-m
1. partition them into time segments G(1), G(2), …
2. for each segment, identify the groups
5
Problem definition
time
1. Scalable, 2. Parameter-free, 3. Incremental
G(1) G(2)
![Page 6: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/6.jpg)
6
Outline
MotivationGraphScope
Community discovery Change detection
Experiments
![Page 7: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/7.jpg)
7
Community detectionClustering problem Compression problem
t = 0 t = 1 t = 2
![Page 8: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/8.jpg)
8
Cost objective within a time segment
p 1,1
p 1,2
p 1,3
p 2,1
p 2,2
p 2,3
p 3,3
p 3,2
p 3,1
n1
n2
n3k =
3 row
groups
m 1
m 2
m 3
ℓ = 3
col. g
roup
s
dsegment duration
log dnimj
i,j d nimj H(pi,j)
density of ones (edges)
d n1m2 H(p1,2) bits for (1,2)
code cost
bits total
i,j+
description cost
+
+ log* d
![Page 9: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/9.jpg)
9
Cost objective within a time segment
code cost(blocks)
description cost(blocks’ model)
+
one row groupone col group
n row groupsm col groups
low
high low
high
![Page 10: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/10.jpg)
10
Cost objectivewithin a time segment
code cost(blocks)
description cost(blocks’ model)
+
k = 3 row groupsℓ = 3 col groups
low
low
![Page 11: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/11.jpg)
Search for the optimum grouping
Problem is NP-hard even for one timestamp on column permutation onlyReduction from TSP problem [Johnson+ 03]
HeuristicsSearch: Split, Merge, Shuffle Initialization: Resume, Restart
11
![Page 12: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/12.jpg)
12
Outline
MotivationGraphScope
Community discovery Change detection
Experiments
![Page 13: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/13.jpg)
13
Change point detection
Option 1:Append to current segment
![Page 14: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/14.jpg)
14
Change point detection
change point
Option 2:Start new segment
![Page 15: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/15.jpg)
15
Change point detection
1: append
2: split (time)
In both cases, we do row & col. shuffles, splits and/or merges
Choose the most parsimonious option
![Page 16: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/16.jpg)
16
Outline
MotivationGraphScope
Single timestamp Multiple timestamp
Experiments
![Page 17: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/17.jpg)
Objectives
Effectiveness on Community discoveryChange detection
Compression benefit Scalable, incremental computation
17
![Page 18: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/18.jpg)
18
Evolving communitiesNETWORK
29K hosts (nodes)12K edges (on avg)1,220 hours
~ 14.6M edges totaltime
![Page 19: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/19.jpg)
19
Community change pointsENRON
34K email addresses12K emails (on avg)165 weeks
~ 2M emails total
Key change-pointscorrespond to
key events
![Page 20: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/20.jpg)
Compression gain
20GraphScope gives 10%-150% compression gain
Graphscope
![Page 21: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/21.jpg)
21
Graph stream clusteringScalability—NETWORK
29K hosts (nodes) 12K edges per hour (on average) 1,220 hours (timestamps) ~ 14.6M edges total
< 2 sec / snapshot on avg
![Page 22: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/22.jpg)
Related work
Co-clustering [Dhillon+ KDD03] [Chakrabarti+ KDD04]
Graph partitioning [Karypis+ 99]
Time-evolving graphs [Chakrabarti+ KDD06] [Chi+ KDD07] [Asur+ KDD07]
22
![Page 23: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/23.jpg)
23
Summary
Organize into few, homogeneous communities
Find changes in community structure
Scalable Parameter-free Incremental
![Page 24: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/24.jpg)
GraphScope: Parameter-Free Mining of Large Time-Evolving GraphsJimeng Sun
Spiros Papadimitriou
Philip S. Yu
Christos Faloutsos
![Page 25: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/25.jpg)
25
Graph stream clustering
t = 0 t = 1 t = 2
![Page 26: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/26.jpg)
28
Graph clustering – [Chakrabarti+ KDD’04]
versus
Column groups Column groups
Row
gro
ups
Row
gro
ups
Good Clustering
1. Similar nodes are grouped together
2. As few groups as necessary
A few, homogeneous
blocks
Good Compression
Why is this better?
implies
![Page 27: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/27.jpg)
29
Graph clustering – [Chakrabarti+ KDD’04]
versus
Column groups Column groups
Row
gro
ups
Row
gro
ups
Good Clustering
1. Similar nodes are grouped together
2. As few groups as necessary
A few, homogeneous
blocks
Good Compression
Why is this better?
implies
Good Clustering
GoodCompression
implies
![Page 28: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/28.jpg)
30
log nimj
Assumes group paritionings,sizes and densities are given
i,j nimj H(pi,j)
Cost objective
n1
n2
n3
m1 m2 m3
p1,1 p1,2 p1,3
p2,1 p2,2 p2,3
p3,3p3,2p3,1
n £ m adj. matrix
k =
3 r
ow g
roup
s
ℓ = 3 col. groups
density of ones (edges)
n1m2 H(p1,2) bits for (1,2)
code cost
bits total
irow-partitionidescription j
col-partitionjdescription
i,jtransmit#edges ei,j
+
+
description cost
+
block size entropy
![Page 29: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/29.jpg)
31
Graph clusteringScalability
Number of edges
Tim
e (s
ec)
Splits
Shuffles
Linear on the number of edges Scalable
Time vs. Size
![Page 30: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/30.jpg)
32
Cost objective
code cost(blocks)
description cost(blocks’ model)
+
one row groupone col group
n row groupsm col groups
low
high low
high
![Page 31: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/31.jpg)
33
Cost objective
code cost(blocks)
description cost(blocks’ model)
+
k = 3 row groupsℓ = 3 col groups
low
low
![Page 32: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/32.jpg)
34
Search for optimum
k
ℓ
bit
cost
Cost vs. number of groups
one row
groupone
col group
n row
groupsm
col g
roupsk =
3 row
groupsℓ =
3 co
l groups
![Page 33: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/33.jpg)
35
splitshuffle
k = 5, ℓ = 5k = 5, ℓ = 5
Search for optimumSummary
k=1, ℓ=2 k=2, ℓ=2 k=2, ℓ=3 k=3, ℓ=3 k=3, ℓ=4 k=4, ℓ=4 k=4, ℓ=5
k = 1, ℓ = 1
splitshuffle
Split:Increase k or ℓ
Shuffle:Rearrange rows and cols
Merge:Decrease k or ℓ
![Page 34: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs](https://reader035.vdocuments.mx/reader035/viewer/2022062309/56813513550346895d9c66f7/html5/thumbnails/34.jpg)
36
Graph clustering – [Chakrabarti+ KDD’04]
Given a graph of interactions or associationsCustomers to products Documents to termsPeople to peopleComputer communicationsFinancial transactions
Find simultaneouslyCommunities (source and destination)Their number