computing local and global centrality

38
COMPUTING LOCAL AND GLOBAL CENTRALITY DAVID F. GLEICH (AND MANY OTHERS)! DATA MINING, NETWORKS AND DYNAMICS 2011 NOVEMBER 7 1

Upload: david-gleich

Post on 15-Jan-2015

811 views

Category:

Technology


2 download

DESCRIPTION

Some recent results on computing PageRank and Katz scores on large networks that I presented at a Dagstuhl workshop.

TRANSCRIPT

Page 1: Computing Local and Global Centrality

COMPUTING LOCAL AND GLOBAL CENTRALITY DAVID F. GLEICH (AND MANY OTHERS)! DATA MINING, NETWORKS AND DYNAMICS 2011 NOVEMBER 7

1

Page 2: Computing Local and Global Centrality

Pooya Esfandiar

Byung-Won On

Chen Greif

Laks V.S. Lakshmanan

Francesco Bonchi

LOCAL GLOBAL

Vahab Mirrokni

Reid Andersen

2/41

Page 3: Computing Local and Global Centrality

Graph centrality Global How important is a node? Local How important is a node with respect to another one?

3/41

Page 4: Computing Local and Global Centrality

Graph centrality Koschützki et al. must respect isomorphism higher is better Examples node-degree 1/shortest-path

4/41

Page 5: Computing Local and Global Centrality

Graph centrality This talk Path summation X

`

f (paths of length `)

X

`

↵` · number of paths of

length ` between i and j

local Katz score

5/41

Page 6: Computing Local and Global Centrality

A – adjacency matrix L – Laplacian matrix P – random walk transition matrix Katz score                                                  Commute time PageRank

                     

Ki ,j = [(I � ↵AT )�1]i ,j

Ci ,j = vol(G)(L+i ,i + L+

j ,j � 2L+i ,j )

Xi ,j = (1 � ↵)[(I � ↵PT )�1]i ,j

(I � ↵PT )x = (1 � ↵)e/n

6/41

Page 7: Computing Local and Global Centrality

USES FOR CENTRALITY

Ranking features for web-search/classification Najork, M. A.; Zaragoza, H. & Taylor, M. J.#HITS on the web: How does it compare? Becchetti, L.; Castillo, C.; Donato, D.; Baeza-Yates, R. & Leonardi, S. Link analysis for Web spam detection

Interesting nodes

GeneRank, ProteinRank, TwitterRank, IsoRank, FutureRank, HostRank, DiffusionRank, ItemRank, SocialPageRank, SimRank

7/41

Page 8: Computing Local and Global Centrality

USES FOR CENTRALITY

Ranking networks of comparisons. Chartier, T. P.; Kreutzer, E.; Langville, A. N. & Pedings, K. E. Sensitivity and Stability of Ranking Vectors

Clustering or community detection

Andersen, R.; Chung, F. & Lang, K.#Local Graph Partitioning using PageRank Vectors

Link prediction

Savas et al. Hold on about 90 minutes

8/41

Page 9: Computing Local and Global Centrality

THESE GET USED A LOT. THEY

MUST BE FAST.

9

Page 10: Computing Local and Global Centrality

MATRICES, MOMENTS, QUADRATURE

Estimate a quadratic form Also used by Benzi and Bonito (LAA) for Katz scores and the matrix exponential

(ei � ej )T L+(ei � ej )

l x

T f (Z )x u

14

(ei + ej )T (I � ↵PT )�1(ei + ej ) �14

(ei � ej )T (I � ↵PT )�1(ei � ej ) Katz

Commute

10/4

1

Page 11: Computing Local and Global Centrality

MMQ - THE BIG IDEA

Quadratic form                              

Weighted sum                                 

Stieltjes integral                                 

Quadrature approximation                                 

Matrix equation                      David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Think                              

A is s.p.d. use EVD

“A tautology”

Lanczos

22 of 47

11/4

1

Page 12: Computing Local and Global Centrality

MMQ PROCEDURE Goal                                     Given                                     1. Run k-steps of Lanczos on       starting with       2. Compute          ,       with an additional eigenvalue at       ,

set                               3. Compute       ,       with an additional eigenvalue at    , set

                           4. Output                      as lower and upper bounds on      

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Correspond to a Gauss-Radau rule, with u as a prescribed node

Correspond to a Gauss-Radau rule, with l as a prescribed node

25 of 47 12/4

1

Page 13: Computing Local and Global Centrality

5 10 15 20 25 30-50

0

50arxiv, Katz, hard alpha

matrix-vector products5 10 15 20 25 30

-50

0

50arxiv, Katz, hard alpha

matrix-vector products

5 10 15 20 25 30

10-5

100

arxiv, Katz, hard

matrix-vector products5 10 15 20 25 30

10-5

100

arxiv, Katz, hard

matrix-vector products

𝛼 = 1/( || A ||2 + 1 )

Error Bounds

How well does it work?

13/4

1

Page 14: Computing Local and Global Centrality

MY COMPLAINTS

Matvecs are expensive. Takes many iterations. Just one score comes out!

14/4

1

Page 15: Computing Local and Global Centrality

KATZ SCORES ARE LOCALIZED

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar

Up to 50 neighbors is 99.65% of the total mass

32 of 47

Katz scores are highly localized.

(I � ↵AT )k = ei

15/4

1

Page 16: Computing Local and Global Centrality

HOW CAN WE EXPLOIT THIS?

16

Page 17: Computing Local and Global Centrality

TOP-K ALGORITHM FOR KATZ

Approximate      

                                          where       is sparse Keep       sparse too Ideally, don’t “touch” all of      

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of 47

T

17/4

1

Page 18: Computing Local and Global Centrality

TOP-K ALGORITHM FOR KATZ

Approximate      

                                          where       is sparse Keep       sparse too Ideally, don’t “touch” all of      

David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of 47

T

This is possible for "personalized PageRank!

18/4

1

Page 19: Computing Local and Global Centrality

Richardson Ax = b

x

(k+1) = x

(k ) + r

(k )

r

(k+1) = b � Ax

(k ) min x

T Ax � 2x

Tbequivalent#

to

A = AT , A ⌫ 0 Gradient descent

What about coordinate descent?

Gauss-Southwell Ax = b

x

(k+1) = x

(k ) + r (k )j ej

r

(k+1) = r

(k ) + r (k )j Aej

How to pick j?

Frequently “rediscovered” for PageRank. McSherry (WWW2005), Berkhin (JIM 2007), Andersen-Chung-Lang (FOCS 2006)

19/4

1

Page 20: Computing Local and Global Centrality

DEMO!

20

Page 21: Computing Local and Global Centrality

NEW CONVERGENCE THEORY

Katz and PageRank are equivalent if Gauss-Southwell converges when 𝛼 < 1 / || A ||2 (Luo and Tseng 1992) if j is picked as the largest residual Read all about it Fast matrix computations for pair-wise and column-wise commute times and Katz scores. Bonchi, Esfandiar, Gleich, Greif, Lakshmanan, J. Internet Mathematics (to appear)

𝛼 < 1 / || A ||1

21/4

1

Page 22: Computing Local and Global Centrality

10−2 10−1 100 101 102

0

0.2

0.4

0.6

0.8

1

Equivalent matrix−vector products

Prec

isio

n@k

for e

xact

top−

k se

ts

hollywood, Katz, hard alpha

k=10k=100k=1000cg k=25k=25

10−2 10−1 100 101 102

0

0.2

0.4

0.6

0.8

1

Equivalent matrix−vector products

Prec

isio

n@k

for e

xact

top−

k se

ts

hollywood, Katz, hard alpha

1,000,000 node, 100,000,000 edges

22/4

1

Page 23: Computing Local and Global Centrality

OPEN QUESTIONS

I can’t find any existing derivation of this method in the non-symmetric case (prior to the PageRank literature). Any thoughts? How to show that the method convergence for a non-symmetric matrix when is not diagonally dominant?

(I � ↵PT )

23/4

1

Page 24: Computing Local and Global Centrality

OVERLAPPING CLUSTERS FOR DISTRIBUTED CENTRALITY

24

Page 25: Computing Local and Global Centrality

LARGE GRAPHS, IN PRACTICE

src -> dst src -> dst src -> dst

src -> dst src -> dst src -> dst

Edge lists maybe tied together by a common host, stored redundantly on many hard drives.

Copy 1 Copy 2

src -> dst src -> dst src -> dst

src -> dst src -> dst src -> dst

Copy 1 Copy 2

src -> dst src -> dst src -> dst

src -> dst src -> dst src -> dst

Copy 1 Copy 2

25/4

1

Page 26: Computing Local and Global Centrality

UTILIZE SOME REDUNDANCY?

To compute global PageRank?

26

Page 27: Computing Local and Global Centrality

Overlapping clusters for distributed computation. #Andersen, Gleich, Mirrokni, WSDM2012 (to appear).

Use the redundancy to reduce communication when solving a PageRank problem

Overlapping Clusters

27/4

1

Page 28: Computing Local and Global Centrality

Communication avoiding algorithms Communication is the limiting factor in most computations these days. Flops are, relatively speaking, free.

28/4

1

Page 29: Computing Local and Global Centrality

KEY POINTS

Utilize personalized PageRank vectors to find the clusters with “good” conductance scores. Define “core” vertices for each cluster. Find a good way to cover the graph with these clusters. Use restricted additive Schwarz to solve #(thanks Prof. Szyld and Frommer!)

29/4

1

Page 30: Computing Local and Global Centrality

All nodes solve locally using #the coordinate descent method.

30/4

1

Page 31: Computing Local and Global Centrality

All nodes solve locally using #the coordinate descent method.

A core vertex for the gray cluster.

31/4

1

Page 32: Computing Local and Global Centrality

All nodes solve locally using #the coordinate descent method.

Red sends residuals to white. White send residuals to red.

32/4

1

Page 33: Computing Local and Global Centrality

White then uses the coordinate descent method to adjust its solution. Will cause communication to red/blue.

33/4

1

Page 34: Computing Local and Global Centrality

1 1.1 1.2 1.3 1.4 1.5 1.6 1.70

0.5

1

1.5

2

Volume Ratio

Rel

ativ

e W

ork

Metis Partitioner

Swapping Probability (usroads)PageRank Communication (usroads)Swapping Probability (web−Google)PageRank Communication (web−Google)

How much more of the graph we need to store.

It works!

34/4

1

Page 35: Computing Local and Global Centrality

PERSONALIZED PAGERANK CLUSTERS

Solve #to a large degree-weighted tolerance 𝜺 Sweep over the vertices in order of their degree-normalized rank. Find the best conductance set. A Cheeger-like inequality. (Not a heuristic.)

(I � ↵PT )x = (1 � ↵)ei

35/4

1

Page 36: Computing Local and Global Centrality

CORE VERTICES

Compute the expected “leavetime” for each vertex in a cluster. Keep increasing the threshold for a “good” vertex until every vertex is core in some cluster. Then approximate a set-cover problem to cover the graph with clusters, and use a heuristic to pack vertices until

36/4

1

Page 37: Computing Local and Global Centrality

MY QUESTIONS "and future directions

REVERSE ORDER

37

Page 38: Computing Local and Global Centrality

GRAPH SPECTRA

Some work by Banerjee and Jost. 38/4

1