segmenting sequences of node-labeled graphs
TRANSCRIPT
Segmenting Sequences of Node-labeled Graphs
Sorour E. Amiri, Liangzhe Chen, B. Aditya PrakashDepartment of Computer Science
Virginia Tech
ICDM, DaMNet, Barcelona, Spain, December 12, 2016
Outline Motivation Background Our Proposed Method: SnapNETS Experiments Conclusion
Amiri, Chen, Prakash 2
Network SequencesEpidemiology: disease spreads over contact networks
Social Media: Information spreads over friendship networks
3
Flu
Meme
Amiri, Chen, Prakash
Making sense of network sequences
4
Flu
when do the infection patterns change?
Star Bridge Near Clique
Reason:• Virus mutation• Vaccination• …
Amiri, Chen, Prakash
Making sense of network sequences
5
Meme Reason:• Event• …
Star Clique
when do the infection patterns change?
Amiri, Chen, Prakash
Problem 1: Network sequence segmentation
Given a sequence of networks with labeled nodes, Find the best segmentation which captures:
Different distribution of node labels.
6
Star Bridge Near CliqueAmiri, Chen, Prakash
Outline Motivation Background Our Proposed Method: SnapNETS Experiments Conclusion
7Amiri, Chen, Prakash
Alternative 1: Feature Ext. &Time-series
8
0 0 0 … 2F1: #cliques (of active subgraph)
F2: #ladders (of inactive subgraph)
F3: #ladders (of active subgraph)
1 1 0 … 0
0 0 0 … 1
[Henderson et al. 2010] [Likas, Vlassis, and Verbeek 2003] [Li et al. 2009]
Amiri, Chen, Prakash
G1 G2 G3 G4-1
0
1
2
Features time series
F1 F2 F3
Step 1: Feature Extraction
Step 2: Time-series segmentation
Alternative 1: Feature Ext. &Time-series
Drawbacks: Laborious feature-engineering “Local” change detection:
o One aggregation time periodo Threshold
9Amiri, Chen, Prakash
G1 G2 G3 G4-1
0
1
2
Features time series
F1 F2 F3
Alternative 2: Plain-graph-based analysis
10
[Shah et al. 2015] [Sun et al. 2007] [Lin et al. 2009] [Qu et al. 2014]
Step 1: Extract active subgraphs
Amiri, Chen, Prakash
Step 2: Dynamic graph segmentation
Alternative 2: Plain-graph-based analysis
Drawbacks: Inactive nodes are important to detect different patterns
Amiri, Chen, Prakash 11
Entire graph Active subgraph
Desirable Properties P1. Parameter-free:
• No threshold, No fixed granularity
P2. Comprehensive: • Use the entire graph
12Amiri, Chen, Prakash
Outline Motivation Background Our Proposed Method: SnapNETS
Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation
Experiments Conclusion
13Amiri, Chen, Prakash
Overview of SnapNETS Goal 1. Summarize each graph:
Keep structural and label dependent properties
Goal 2. Construct Segmentation graph:Define nodes and edgesDefining edges weights
o extract the features of summarized graphs
Goal 3. Find the best segmentation:Define the best segmentation (path)Compute the best segmentation
14Amiri, Chen, Prakash
Technical Challenges Using the entire graph snapshots:
Summarize graph while satisfying P2
Finding the number of segments: Compute segmentation while satisfying P1
15
Reminder: P1. Parameter-free P2. Comprehensive
Amiri, Chen, Prakash
Outline Motivation Background Our Proposed Method: SnapNETS
Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation
Experiments Conclusion
16Amiri, Chen, Prakash
Goal 1: Summarizing graph snapshots
We want to preserve Structural properties Nodes labels
Role of Eigenvalue:
17Amiri, Chen, Prakash
Same leading eigenvalue ( ) of Adjacency matrix Same diffusive properties
Leading eigenvalue Epidemic threshold [Prakash et al. 2012]
18
Our Approach We want to get a smaller graph with similar eigenvalues:
Successively merge nodes
Amiri, Chen, Prakash
Problem 2: Graph summarization Given: A graph with labeled nodes and a compression ratio. Find: a coarsened graph such that:
19Amiri, Chen, Prakash
CoarseNet algorithm [Purohit et al.2014] Matrix perturbation approach Successively merge nodes Keep leading eigenvalue
Our tweak Do not merge nodes with different labels
Problem 2: Graph summarization
20
Given: A graph with labeled nodes and a compression ratio.Find: a coarsened graph such that:
Amiri, Chen, Prakash
Outline Motivation Background Our Proposed Method: SnapNETS
Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation
Experiments Conclusion
21Amiri, Chen, Prakash
Nodes: For each segment there is a node + {Source (‘s’), Target (‘t’)}
Edges: There is a directed edge between adjacent nodes
Goal 2: Segmentation graph
22Amiri, Chen, Prakash
Edge Weights
23
How can we measure the distance between two segments?Amiri, Chen, Prakash
Our Approach Step 1: Extract features from summary graphs:
Easier and more efficient than on original graphs. No complex features
24Amiri, Chen, Prakash
Step 2: Distance of adjacent segments
25
Edge Weights
Amiri, Chen, Prakash
Outline Motivation Background Our Proposed Method: SnapNETS
Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation
Experiments Conclusion
26Amiri, Chen, Prakash
Goal 3: Finding the best segmentation Observation:
For each segmentation there is a path from ‘s’ to ‘t’For each path from ‘s’ to ‘t’ there is a segmentation
Therefore,• Best segmentation problem Path optimization problem
27Amiri, Chen, Prakash
Possible approach Longest path?
28
S t. . .
S t0.01 0.01 0.01 0.01
0.9 0.9 0.9
Sum = 3
Sum = 2.7
Over segmentation problem
Amiri, Chen, Prakash
Problem 3: Finding the best segmentation
Our idea: Average longest path
Advantages: Parameter free Naturally balances weight of the path with the number of segments.
29
Given a segmentation graphFind the average longest path from ‘s’ to ‘t’
Amiri, Chen, Prakash
Solving ALP Finding the ALP in general graphs is NP-hard. The segmentation graph is a DAG ALP can be solved in
polynomial time Negative cycle detection [Waggoner et al. 2013]
30Amiri, Chen, Prakash
Complete algorithm
31
Time complexity:
Amiri, Chen, Prakash
Outline Motivation Background Our Proposed Method: SnapNETS
Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation
Experiments Conclusion
32Amiri, Chen, Prakash
Experiments: datasets Different Domains with range of sizes:
BA-degree: Random Barabasi Albert graph Higgs: Tweets dataset (with the follower-followee network) Memetracker: Who-copies-from-whom blog and website network DBLP: Co-authorship network related to ‘network’ topic.
33Amiri, Chen, Prakash
Experiments: baselines DYNAMMO [Li et al. 2009]:
Feature Etraction & time series Change point detection ( Reconstruction errors) # segments = # segments of SnapNETS .
VOG [Koutra et al. 2014]: Get active sub-graph 10 most important sub-structures Cut when the set of sub-structures changes significantly
o (threshold = the one gives the best result)
SN-LP: Longest Path instead of ALP
34Amiri, Chen, Prakash
Experiments: Quantitative analysis
35
SnapNETS outperforms the baselines Clear patterns in summary graphs
We found Ground truth segmentation
As-Oregon
Amiri, Chen, Prakash
Case studies: Memetracker
36
Televised vice-presidential debates
Summary graphs are close to the case when all nodes have the same label (f5)
Random nodes are active (f8)
Summary graphs are substantially sparser (f2).
Many active nodes got merged into important nodes such as CNN and BBC to form hubs (f6)
Amiri, Chen, Prakash
Case studies: AS-Oregon
37
New community New segment
Amiri, Chen, Prakash
Outline Motivation Background Our Proposed Method: SnapNETS
Overview Goal 1: Summarizing Act-snapshots Goal 2: Constructing the segmentation graph Goal 3: Finding the best segmentation
Experiments Conclusion
38Amiri, Chen, Prakash
Conclusion: SnapNets Properties:
P1. Parameter-free P2. Comprehensive
Patterns: the ‘placement’ and ‘connection’ of active/inactive nodes:
• structural (e.g. community/role/centrality) • rate changes.
Global method: SnapNETS is a ‘global’ method and not simply a change-point detection method.
39Amiri, Chen, Prakash
Future Work Faster ALP: Linear? Handle dynamic graphs with varying
nodes and edges More node labels and real value features Work with partially observed graphs
40Amiri, Chen, Prakash
Any questions?
41
Funding:
Code at: https://github.com/SorourAmiri/SnapNETS
Sorour E. Amiri Liangzhe Chen B. Aditya Prakash
Goal 1 Goal 2 Goal 3Finding the best segmentation
Successively merge nodesKeep leading eigenvalueKeep same set of labels
Graph summarization Segmentation graph Nodes Edges Edge weights
ALP
SnapNETS Result