f t c ., t ctwiki.di.uniroma1.it/pub/bdc/schedule/lecture9-tight-knit.pdf · smallest tight-knit...
TRANSCRIPT
FINDING TIGHT-KNIT CIRCLES (I.E., TRIANGLE COUNTING)
Irene Finocchi
Network properties
¨ Locality of relationships ¤ Relationships tend to cluster: high clustering coefficient ¤ If A is related to both B and C, than B and C are
related with probability higher than average ¨ Giant connected component ¨ Sparse: m = o(n2) ¨ Small-world
¤ Small average distance between nodes ¨ Scale-free
¤ Power-law degree distribution
Counting triangles
¨ How many triangles in a graph?
¨ Triangle = smallest clique = smallest tight-knit community
¨ A lot of interest for social network analysis (not only!)
1
23
4 5
Naïf approach
¨ List all node triples, and for each triple check if it forms a triangle
¨ triples
¨ Θ(n3) time independently of graph structure (and number of edges)
1
23
4 5
n3
!
"##
$
%&&
Three case studies
¨ Clique on n nodes
¨ Star with n-1 leaves
¨ Binary tree on n nodes
Naïf approach: Θ(n3)
Algorithm NodeIterator
Γ(x) = neighbors of node x
1
23
4 5
Ο(n3) iterations
1; 6;
NodeIterator on the three case studies
¨ Clique on n nodes: Θ(n3) ¤ Cannot improve on this bound: after all, this is the
number of triangles
¨ Star with n-1 leaves: Θ(n2)
¤ Better than naïf, but still 0 triangles and sparse (constant degree on average)
¨ Binary tree on n nodes: Θ(n)
¤ Cannot do better than linear time
1
23
4 5
1 1 1
1 1 1
1 1 1
Reduce 1 v= 1
1
23
4 5
1 1 1 1
1 1 1 1 $
Reduce 2
MR-NodeIterator: analysis
¨ Rounds: 2
¨ Global space: O(n3)
¨ Local reducer space: O(n)
Better algorithms exist
MR-NodeIterator performance
Algorithm NodeIterator++
1
23
4 5
Each triangle counted only once degree d(w)>d(u) or (d(w)=d(u) and w>u)
NodeIterator++: analysis
h = # nodes of G with degree > √m h × √m ≤ 2m h ≤ 2√m
d+(v) = # neighbors of v with degree > d(v) 1) d(v) ≤ √m: d+(v) ≤ d(v) ≤ √m 2) d(v) > √m:
nodes counted in d+(v) have degree > d(v) > √m hence d+(v) ≤ h ≤ 2√m
Algorithm NodeIterator++
degree d(w)>d(u) or (d(w)=d(u) and w>u)
O(m3/2)
# iterations
NodeIterator++ on the three case studies
¨ Clique on n nodes: Θ(n3) ¤ Cannot improve on this bound: after all, this is the
number of triangles
¨ Star with n-1 leaves: Θ(n1.5)
¤ √n faster than before (m=Θ(n))
¨ Binary tree on n nodes: Θ(n)
¤ Cannot do better than linear time
1
23
4 5
MR-NodeIterator++: analysis
¨ Rounds: 2
¨ Global space: O(m3/2)
¨ Local reducer space: O(n)
Local space can be further reduced
MR-NodeIterator++ performance
The curse of the last reducer
• Very skewed degree distributions • NodeIterator++ deals with skewness much better
References
Siddharth Suri & Sergei Vassilvitskii, Counting triangles and the curse of the last reducer, Proc. 20th International Conference on World Wide Web, 2011