fast counting of triangles in large networks without counting: algorithms and laws

Post on 25-Feb-2016

50 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Fast counting of triangles in large networks without counting: Algorithms and laws. Charalampos E. Tsourakakis School of Computer Science Carnegie mellon university. Triangle related problems. - PowerPoint PPT Presentation

TRANSCRIPT

CHARALAMPOS E. TSOURAKAKISSCHOOL OF COMPUTER SCIENCE CARNEGIE MELLON UNIVERSITY

Fast counting of triangles in large networks without

counting:Algorithms and laws

1

ICDM, Dec. '08

C. E. Tsourakakis

Triangle related problems

Given an undirected, simple graph G(V,E) a triangle is a set of three vertices such that any two of them are connected by an edge of the graph.

Related problems Decide if a graph is triangle-free. Count the total number of triangles Δ(G). Count the number of triangles Δ(v) that vertex

v participates in. List the triangles that each vertex v participates in.

2

ICDM, Dec. '08

Generality

Our focus

C. E. Tsourakakis

Why is Triangle Counting important?From the Graph Mining Perspective

ICDM, Dec. '08

3

Clustering coefficient Transitivity ratio Social Network Analysis fact: “Friends of

friends are friends” [WF94]Other applications include:Hidden Thematic Structure of the Web [EM02]Motif Detection e.g. biological networks

[YPSB05]Web Spam Detection [BPCG08]

A

CB

C. E. Tsourakakis

Outline

ICDM, Dec. '08

4

Related WorkProposed Method

Theorems Algorithms Explaining efficiency

ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions

C. E. Tsourakakis

Related Work

ICDM, Dec. '08

5

Fast Low space

Time complexity

O(n2.37) O(n3)

Space complexity

O(n2) O(m)=O(n2)

Fast Low space

Time complexity

O(m0.7n1.2+n2+o(1)) e.g. O( n )

Space complexity

O(n2) (eventually) O(m)

2maxd

Dense graphs

S p a r s e g r a p h s

C. E. Tsourakakis

Outline

ICDM, Dec. '08

6

Related WorkProposed Method

Theorems Algorithms Explaining efficiency

ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions

C. E. Tsourakakis

Theorem [EigenTriangle]

ICDM, Dec. '08

7

Theorem 1

Δ(G) = # triangles in graph G(V,E) = eigenvalues of

adjacency matrix AG

||

1

3)(6V

iiG

||21 ... V

C. E. Tsourakakis

Theorem [EigenTriangleLocal]

ICDM, Dec. '08

8

Theorem 2

Δ(i) = #Δs vertex i participates at. = i-th eigenvector = j-th entry of

2||

1

3)(2 ij

V

jjui

ijuiu

iu

i

Δ(i) = 2

C. E. Tsourakakis

Outline

ICDM, Dec. '08

9

Related WorkProposed Method

Theorems Algorithms Explaining efficiency

ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions

C. E. Tsourakakis

EigenTriangle Algorithm (interactively)

ICDM, Dec. '08

10

I want to compute

the number of

triangles!

Use Lanczos to compute the first

two eigenvalues please!

Is the cube of the

second one significantly smaller than the cube of the first?

NOIterate

then!

After some iterations…(hopefully

few!)

Compute the k-th

eigenvalue.Is

much smaller than

?

3|| k

1

1

3k

i

YES!Algorithm

terminates! The estimated # of Δs

is the sum of cubes of λi’s divided by 6!

C. E. Tsourakakis

EigenTriangle Algorithm

ICDM, Dec. '08

11

C. E. Tsourakakis

EigenTriangleLocal Algorithm

ICDM, Dec. '08

12

Why are these two

algorithms efficient on power law networks?

C. E. Tsourakakis

Typical Spectra of Power Law Networks

ICDM, Dec. '08

13

AirportsPolitical blogs

C. E. Tsourakakis

1st Reason :Top Eigenvalues of Power-Law Graphs

ICDM, Dec. '08

14

Very important for us because:Few eigenvalues contribute a lot!Cubes amplify this even more!Lanczos converges fast due to large spectral gaps [GL89]!

C. E. Tsourakakis

1st Reason :Top Eigenvalues of Power-Law Graphs

ICDM, Dec. '08

15

One of the first to observe that the top eigenvalues follow a power-law were Faloutsos, Faloutsos and Faloutsos [FFF99].

Some years later Mihail & Papadimitriou [MP02] and Chung, Lu and Vu [CLV03] gave an explanation of this fact.

C. E. Tsourakakis

2nd Reason :Bulk of eigenvalues

ICDM, Dec. '08

16

Almost symmetric around 0!

Sum of cubes almost cancels out!

Political Blogs

Omit!

Keep only 3!

3

C. E. Tsourakakis

Outline

ICDM, Dec. '08

17

Related WorkProposed Method

Theorems Algorithms Explaining efficiency

ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions

C. E. Tsourakakis

Datasets

ICDM, Dec. '08

18

Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M

~37M Wikipedia 2006-Nov-04

~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008

(means over 151 timestamps)

C. E. Tsourakakis

Datasets

ICDM, Dec. '08

19

Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M

~37M Wikipedia 2006-Nov-04

~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008

(means over 151 timestamps)

Social Networks

C. E. Tsourakakis

Datasets

ICDM, Dec. '08

20

Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M

~37M Wikipedia 2006-Nov-04

~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008

(means over 151 timestamps)

Social Networks

Co-authorship network

C. E. Tsourakakis

Datasets

ICDM, Dec. '08

21

Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M

~37M Wikipedia 2006-Nov-04

~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008

(means over 151 timestamps)

Social Networks

Co-authorship network

Information Networks

C. E. Tsourakakis

Datasets

ICDM, Dec. '08

22

Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M

~37M Wikipedia 2006-Nov-04

~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008

(means over 151 timestamps)

Social Networks

Co-authorship network

Information Networks

Web Graphs

C. E. Tsourakakis

Datasets

ICDM, Dec. '08

23

Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M

~37M Wikipedia 2006-Nov-04

~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008

(means over 151 timestamps)

Social Networks

Co-authorship network

Information Networks

Web Graphs

Internet Graphs

C. E. Tsourakakis

Datasets

ICDM, Dec. '08

24

~3.15M nodes~37M edges

Nodes Edges Description~75K ~405K Epinions network~404K ~2.1M Flickr~27K ~341K Arxiv Hep-Th~1K ~17K Political blogs~13K ~148K Reuters news~3M 35M Wikipedia 2006-Sep-05~3.15M

~37M Wikipedia 2006-Nov-04

~13.5K ~37.5K AS Oregon~23.5K ~47.5K CAIDA AS 2004 to 2008

(means over 151 timestamps)

C. E. Tsourakakis

Competitor: Node Iterator 25

Node Iterator algorithm For each node, look at its neighbors, then

check how many edges among them.Complexity: O( )We report the results as the speedup vs.

Node Iterator.

2maxnd

ICDM, Dec. '08

C. E. Tsourakakis

Results: #Eigenvalues vs. Speedup26

ICDM, Dec. '08

C. E. Tsourakakis

Results: #Edges vs. Speedup 27

ICDM, Dec. '08

Observe the trend

C. E. Tsourakakis

Some interesting observations28

6.2 typical rank for at least 95%Speedups are between 33.7x and 1159x.

The mean speedup is 250.Notice the increasing speedup as the size of the network grows.

ICDM, Dec. '08

C. E. Tsourakakis

Evaluating the Local Counting Method

ICDM, Dec. '08

29

Triangles node i participatesTria

ngle

s no

de i

parti

cipa

tes

acco

rdin

g to

our

est

imat

ion

C. E. Tsourakakis

#Eigenvalues vs. ρ for three networks

30

ICDM, Dec. '08

2-3 eigenvaluesalmost ideal results!

C. E. Tsourakakis

Outline

ICDM, Dec. '08

31

Related WorkProposed Method

Theorems Algorithms Explaining efficiency

ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions

C. E. Tsourakakis

Triangle Participation Power Law (TPPL)

ICDM, Dec. '08

32

EPINIONS

δ = #TrianglesCou

nt o

f nod

es p

artic

ipat

ing

in δ

tria

ngle

s

C. E. Tsourakakis

Triangle Participation Power Law (TPPL)

ICDM, Dec. '08

33

HEP_TH (coauthorship)

Flickr

C. E. Tsourakakis

Degree Triangle Power Law (DTPL)

ICDM, Dec. '08

34

EPINIONS

d , all degrees appearing in the graph

Mea

n #Δ

s ov

er a

ll no

des

with

deg

ree

d

C. E. Tsourakakis

Degree Triangle Power Law (DTPL)

ICDM, Dec. '08

35

Flickr

Reuters

C. E. Tsourakakis

Observations on TPPL & DTPL

ICDM, Dec. '08

36

TTPL:Many nodes few triangles

Few nodes many triangles

C. E. Tsourakakis

Observations on TPPL & DTPL

ICDM, Dec. '08

37

DTPL: Power law fits nicely to the Degree-

Triangle plot. Slope is the opposite of the slope of the

degree distribution (slope complementarity).

C. E. Tsourakakis

Outline

ICDM, Dec. '08

38

Related WorkProposed Method

Theorems Algorithms Explaining efficiency

ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions

C. E. Tsourakakis

Kronecker graphs

ICDM, Dec. '08

39

Kronecker graphs is a model for generating graphs that mimic properties of real-world networks. The basic operation is the Kronecker product([LCKF05]).0 1 1

1 0 1

1 1 0

Initiator graph

Adjacency matrix A[0]

KroneckerProduct

Adjacency matrix A[1]Adjacency matrix A[2]

Repeat k times Adjacency matrix A[k]

C. E. Tsourakakis

Triangles in Kronecker Graphs

ICDM, Dec. '08

40

Theorem[KroneckerTRC ]Let B = A[k] k-th Kronecker product and Δ(GA),

Δ(GΒ) the total number of triangles in GA , GΒ . Then,

the following equality holds: 06 1 , k)Δ(G ) Δ(G k

Ak

B

C. E. Tsourakakis

Outline

ICDM, Dec. '08

41

Related WorkProposed Method

Theorems Algorithms Explaining efficiency

ExperimentsTriangle-related LawsTriangles in Kronecker GraphsConclusions

C. E. Tsourakakis

Conclusions

ICDM, Dec. '08

42

Triangles can be approximated with high accuracy in power law networks by taking a few, constant number of eigenvalues.

The method is easily parallelizable (matrix-vector multiplications only) and converges fast due to large spectral gaps.

New triangle-related power lawsClosed formula for triangles in Kronecker

graphs.

C. E. Tsourakakis

Future Work

ICDM, Dec. '08

43

Import in HADOOP

PEGASUS (Peta-Graph Mining)

On-going work with U Kang and Christos Faloutsos in collaboration with Yahoo! Research.

C. E. Tsourakakis

Christos Faloutsos

Ioannis Koutis

ICDM, Dec. '08

44

Acknowledgements

For the helpful discussions

C. E. Tsourakakis

Maria Tsiarli

ICDM, Dec. '08

45

Acknowledgements

For the PEGASUS logo

C. E. Tsourakakis

46

ICDM, Dec. '08

C. E. Tsourakakis

References

ICDM, Dec. '08

47

[WF94] Wasserman, Faust: “Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences)”

[EM02] Eckmann, Moses: “Curvature of co-links uncovers hidden thematic layers in the World Wide Web”

[YPSB05] Ye, Peyser, Spencer, Bader: “Commensurate distances and similar motifs in genetic congruence and protein interaction networks in yeast”

C. E. Tsourakakis

References

ICDM, Dec. '08

48

[BPCG08] Becchetti, Boldi, Castillo, Gionis Efficient Semi-Streaming Algorithms for Local Triangle Counting in Massive Graphs

[LCKF05] Leskovec, Chakrabarti, Kleinberg, Faloutsos: “Realistic, Mathematically Tractable Graph Generation and Evolution using Kronecker Multiplication”

[FFF09] Faloutsos, Faloutsos, Faloutsos: “On power-law relationships of the Internet topology”

C. E. Tsourakakis

References

ICDM, Dec. '08

49

[MP02] Mihail, Papadimitriou: “On the Eigenvalue Power Law”

[CLV03] Chung, Lu, Vu: “Spectra of Random Graphs with given expected degrees”

[GL89] Golub, Van Loan: “Matrix Computations”

C. E. Tsourakakis

References

ICDM, Dec. '08

50

For more references, paper and slides:http://www.cs.cmu.edu/~ctsourak

C. E. Tsourakakis

Questions?

ICDM, Dec. '08

51

top related