the very small world of the well-connected xiaolin shi, matt bonner, lada adamic, anna gilbert
DESCRIPTION
Network or Hairball? Huge networks difficult to study, store, share.. Can we shrink or summarize a network? Starting point: important vertices Vertex-Importance Graph SynopsisTRANSCRIPT
![Page 1: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/1.jpg)
The Very Small World
of theWell-Connected
Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert
![Page 2: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/2.jpg)
Outline VIGS: Vertex-Importance Graph Synopsis
Testing VIGS with different datasets and importance measures
Analytical expectations
Making guarantees about VIGS
Connectedness: KeepOne, KeepAll
Related Work
Graph Sampling, Rich Club, K-cores, Web Measure
![Page 3: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/3.jpg)
Network or
Hairball?
Huge networks difficult to study, store, share..
Can we shrink or summarize a network? Starting point: important vertices
Vertex-Importance Graph Synopsis
![Page 4: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/4.jpg)
Vertex-Importance Graph Synopsis
Create subgraph of important vertices
Study both key nodes and entire graph
Which vertices are important? High-traffic routers? The most quoted blog?
Standard, well-defined measures Degree, Betweenness, Closeness, PageRank
![Page 5: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/5.jpg)
VIGS In Action• Starting point: random graph with 100 vertices• Select an importance measure - Degree• pick 9 highest degree vertices• keep only edges between these 9 vertices
average degree = 4 average degree = 0.9
![Page 6: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/6.jpg)
Motivating example: citations among ACM
papers
500 random papers 500 most cited papers
![Page 7: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/7.jpg)
Datasets Erdos-Renyi random graph and three real networks BuddyZoo - collection of buddy lists TREC - links between blogs Web - an older web crawl from PARC
Erdos-Renyi BuddyZoo TREC Web
Vertices 10,000 135,131 29,690 152,171
Edges 49,935 803,200 195,940 1,686,541
ASP 4.26 5.96 3.72 3.48
Directed false false true true
![Page 8: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/8.jpg)
Importance measures degree (number
of connections) denoted by size
betweenness (number of shortest paths a vertex lies on) denoted by color
![Page 9: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/9.jpg)
Importance measures degree (number
of connections) denoted by size
closeness (length of shortest path to all others) denoted by color
![Page 10: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/10.jpg)
High correlation between different importance measurements
Undirected graphs - higher correlation Closeness has lowest correlation in all datasets
Correlation among measures
![Page 11: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/11.jpg)
High correlation between different importance measurements Undirected graphs – higher orrelation Closeness has lowest correlation in all datasets
Correlation among measures
![Page 12: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/12.jpg)
Assortativity In an assortative graph, high-value nodes
tend to connect to other high-value nodes Example: degree
assortative disassortative
![Page 13: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/13.jpg)
Assortativity - Degree
• ER: Neutral
• BZ: Assortative
• TREC and Web: Disassortative
![Page 14: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/14.jpg)
Assortativity
![Page 15: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/15.jpg)
Degree distributions
![Page 16: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/16.jpg)
Subgraphs
Apply VIGS! Select Degree, top 100 nodes Example: degree Substantial difference between datasets!
![Page 17: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/17.jpg)
Subgraphs
The selection of an importance measure may have an impact, even in the same dataset
![Page 18: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/18.jpg)
Connectivity: size of largest component
Proportion of nodes that are connected either directly or indirectly
![Page 19: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/19.jpg)
Subgraph Connectivity - ER
• Highly connected, even with only a few vertices
• All importance measures almost completely connected by 2000 nodes
• Better performance than random
![Page 20: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/20.jpg)
Subgraph Connectivity
![Page 21: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/21.jpg)
subgraphs: density
average degree = 4 average degree = 0.9
What is the proportion of edges to nodes in the original graphs vs. subgraphs?
![Page 22: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/22.jpg)
Subgraph Density - ER
• Black line slope = Edges/Vertices in entire network
• Lower dotted line = subgraph of random vertices
• VIGS subgraphs: lower than total density, higher than random subgraph density
![Page 23: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/23.jpg)
Subgraph Density
![Page 24: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/24.jpg)
Average Shortest Path‘ASP’
![Page 25: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/25.jpg)
whole network ASP
ASP between IV’s in subgraph.
ASP between IV’s in whole graph
ER ASP shorter between IV’s, but higher in subgraph
Subgraph Average Shortest Path
‘ASP’ for Erdos Renyi
![Page 26: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/26.jpg)
Subgraph ASP’s
![Page 27: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/27.jpg)
Relative Rank of Vertices in Subgraph - ER
• Do IV’s maintain their relative rank in subgraphs?
• IV and edges only• ER - little correlation,
steadily increasing until all vertices are included
![Page 28: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/28.jpg)
Relative Rank in Subgraph
![Page 29: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/29.jpg)
TREC anomaly - closeness
![Page 30: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/30.jpg)
Four Regions Four regions, highlighted in density plot:
OriginalCloseness only, Regions highlighted
![Page 31: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/31.jpg)
Cause: Blog Aggregator One node has connections to 99% of the
nodes between 1 and 7961! (regions 1, 2, 3) This same node has only 1 connection to a
node beyond 7961 (region 4) Nodes between 5828 and 7961 (region 3)
have only 1 connection: to the aggregator Spam blogs? New blogs? Private blogs?
![Page 32: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/32.jpg)
Examining Density
The first 3 regions feature nodes connected to the aggregator
R1: well connected blogs Average increase in total edges
per node added: 12.93 R2: far less connected, but
not quite barren Average increase per node: 3.2
R3: isolated spam/new blogs 1 edge per node increase
![Page 33: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/33.jpg)
Examining Density
R4: well connected, but not linked to aggregator
Average increase even higher than region 1: 17.8
Aggregator inflated the closeness scores of connected nodes (R1, 2, 3) above those in region 4
![Page 34: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/34.jpg)
Examining Avg Shortest Paths (ASP)
R1: ASP slightly below 2 Some nodes directly connected,
99%+ within 2 hops via aggregator R2 and 3: ASP levels at ~2
Fewer and fewer direct links, but all accessible via aggregator
R4: ASP’s begin to increase ASP doesn’t explode: ~70% of R4
links are to R1 or R2 nodes R3 only reachable from R4 via agr. Access to aggregator through
connected R1/R2 nodes: adds a hop to path
![Page 35: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/35.jpg)
Examining Relative Ranking Correlation
R1-3: correlation steadily decreases
R4: rapid increase in correlation!
Spam blogs importance in subgraph initially inflated
Realigns when blogs in 4 connect with real blogs in 1-2
![Page 36: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/36.jpg)
Localized to closeness Region 1, 2 and 3 nodes have high closeness
thanks to the aggregator Recall ASP graph - short distance to many, many
nodes via aggr. Connection to aggregator doesn’t confer high
degree, PageRank or Betweenness - nodes must ‘fend for themselves’ Degree: link to aggr. Is just 1 link. PR: aggr. ‘vote’ diluted by high degree Bet: Aggr. Is gateway to its children, could use any
child to reach aggr.
![Page 37: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/37.jpg)
• VIGS results vary by graph and importance measure
• Still, subgraphs tended towards– High connectivity– Average or higher density– Shorter ASP’s– Maintain relative importance rank of vertices
– “spam” affects closeness primarily
Empirical Analysis Summary
![Page 38: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/38.jpg)
Preserving Properties So far, just studying subgraphs Applying VIGS - may need guarantees Hard to make a guarantee?
Example property: subgraph is connected
Preserving Properties
![Page 39: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/39.jpg)
Preserving Properties Is it difficult to guarantee the connectedness
of a VIGS subgraph? NP-complete: reducible to Steiner Minimum
Spanning Tree (MST) problem Resort to heuristics
KeepOne, KeepAll from Gilbert and Levchenko (2004)
![Page 40: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/40.jpg)
KeepOne and KeepAll KeepOne - build an MST: drop as many vertices/edges as
possible while maintaining connectivity. Problem! ASP/diameter could increase
Solution: KeepAll - MST, but add all vertices/edges on a shortest path
![Page 41: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/41.jpg)
Heuristic Performance - ER
• KO - did not have to add many vertices, but shortest path rather large (ER ASP was 4.26)
• KA - good improvement in path length, but huge increase in vertices
ASP
![Page 42: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/42.jpg)
Heuristic Performance - BZ
• Similar performance to ER - KO results in significantly longer shortest paths, but KA adds many vertices
• Is 4000 too many vertices to add? Small compared to total graph, but huge compared to number of important vertices
ASP
![Page 43: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/43.jpg)
Heuristic Performance - TREC
• Almost completely connected from the start
• KA adds only a few vertices, doesn’t change much
• Results for Web dataset similar
ASP
![Page 44: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/44.jpg)
Related Work Graph sampling - Similar objective: synopsis
Concerned only with original graph Random sampling, snowball sampling… Lee, Kim, Jeong (2006), Leskovec, Faloutsos (2006), Li, Church, Hastie (2006)
Rich-club Concerned only with high degree nodes Zhou, Mondragon (2004), Colizza, Flammini, Serrano, Vespignani (2006)
![Page 45: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/45.jpg)
Related Work K-cores
Subgraphs where each vertex has at least k-connections within the subgraph
Dorogovstev, Goltsev, Mendes (2006) Core connectivity
Smallest number of important vertices to remove before destroying largest component
Mislove, Marcon, Gummadi, Druschel, Bhattacharjee (2007)
![Page 46: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/46.jpg)
VIGS wrap up vertex-importance graph synopsis
create a subgraph of important vertices to study both the full graph and these vertices in particular
properties of VIGS depend on entire network and importance measure
real world networks have dense, closely knit VIGS
in some cases easy to meet connectivity & ASP guarantees
![Page 47: The Very Small World of the Well-Connected Xiaolin Shi, Matt Bonner, Lada Adamic, Anna Gilbert](https://reader036.vdocuments.mx/reader036/viewer/2022070605/5a4d1ad17f8b9ab0599716a4/html5/thumbnails/47.jpg)
Thanks to Xiaolin Shi
Matthew Bonner
Lada Adamic
NSF DMS 0547744