ranking of nodes in graphs - university of pennsylvania

28
Ranking of nodes in graphs Alejandro Ribeiro Dept. of Electrical and Systems Engineering University of Pennsylvania [email protected] http://www.seas.upenn.edu/users/ ~ aribeiro/ October 1, 2014 Stoch. Systems Analysis Ranking of nodes in graphs 1

Upload: others

Post on 20-May-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ranking of nodes in graphs - University of Pennsylvania

Ranking of nodes in graphs

Alejandro RibeiroDept. of Electrical and Systems Engineering

University of [email protected]

http://www.seas.upenn.edu/users/~aribeiro/

October 1, 2014

Stoch. Systems Analysis Ranking of nodes in graphs 1

Page 2: Ranking of nodes in graphs - University of Pennsylvania

Ranking of nodes in graphs: Random walk

Ranking of nodes in graphs: Random walk

Ranking of nodes in graphs: Markov chain

Stoch. Systems Analysis Ranking of nodes in graphs 2

Page 3: Ranking of nodes in graphs - University of Pennsylvania

Graphs

1

2 3

45

6

I Graph ⇒ A set of J nodes j = 1, . . . , J⇒ Connected by a set of edges E defined as ordered pairs (i , j)

I In figure ⇒ nodes are j = 1, 2, 3, 4, 5,⇒ edges E = {(1, 2), (1, 5),(2, 3), (2, 5), (3, 4), ...

(3, 6), (4, 5), (4, 6), (5, 4)}I Websites and links ⇒ “The” web.

I People and friendship ⇒ Social network

Stoch. Systems Analysis Ranking of nodes in graphs 3

Page 4: Ranking of nodes in graphs - University of Pennsylvania

How well connected nodes are?

1

2 3

45

6

I Q: Which node is the most connected? A: Define most connected

I Can define “most connected” in different ways

I Node rankings to measure website quality, social influence

I There are two important connectivity indicators

⇒ How many nodes point to a link (outgoing links irrelevant)

⇒ How Important are the links that point to a node

Stoch. Systems Analysis Ranking of nodes in graphs 4

Page 5: Ranking of nodes in graphs - University of Pennsylvania

Connectivity ranking

I Insight ⇒ There is information in the structure of the network

I Knowledge is distributed through the network

⇒ The network (not the nodes) knows the rankings

I Idea exploited by Google’s PageRank c© to rank webpages

I ... by social scientists to study trust & reputation in social networks

I ... by ISI to rank scientific papers, transactions & magazines ...

1

2 3

45

6

I No one points to 1

I Only 1 points to 2

I Only 2 points to 3, but 2more important than 1

I 4 as high as 5 with less links

I Links to 5 have lower rank

I Same for 6

Stoch. Systems Analysis Ranking of nodes in graphs 5

Page 6: Ranking of nodes in graphs - University of Pennsylvania

Preliminary definitions

I Graph G = (V ,E ) ⇒ sets of vertices V = {1, 2, . . . , J} and edges E

I Edges (elements of E ) are ordered pairs (i , j)

I We say there is a connection from i to j

I Outgoing neighborhood of i is the set of nodes j to which i points

n(i) := {j : (i , j) ∈ E}

I Incoming neighborhood, n−1(i) is the set of nodes that point to i :

n−1(i) := {j : (j , i) ∈ E}

I Connected graph

⇒ There is a path from any node to any other node

Stoch. Systems Analysis Ranking of nodes in graphs 6

Page 7: Ranking of nodes in graphs - University of Pennsylvania

Definition of rank

I Agent A chooses node i , e.g., web page, at random for initial visit

I Next visit randomly chosen between links in the neighborhood n(i)

⇒ All neighbors chosen with equal probability

I If reach a dead end because node i has no neighbors

⇒ Chose next visit at random equiprobably among all nodes

I Redefine graph G = (V ,E ) adding edges from dead ends to all nodes

I Restrict attention to connected (modified) graphs

1

2 3

45

6

I Rank of node i is the average number of visits of A to i

Stoch. Systems Analysis Ranking of nodes in graphs 7

Page 8: Ranking of nodes in graphs - University of Pennsylvania

Equiprobable random walk

I Formally, let An be the node visited at time nI Define transition probability Pij from node i into node j

Pij := P[An+1 = j

∣∣An = i]

I Next visit equiprobable among neighbors

Pij =1

#[n(i)]=

1

Ni, for all j ∈ n(i)

I Defined number of neighbors Ni = #[n(i)]

1

2 3

45

6

to 1

to 2

to 3

to 4

to 51/2

1/2

1/2

1/2 1/2

1/2

1/2

1/2

1

1/5

1/5

1/5

1/5

1/5

I Still have a graph

I But also a MC

I Red (not blue) circles

Stoch. Systems Analysis Ranking of nodes in graphs 8

Page 9: Ranking of nodes in graphs - University of Pennsylvania

Formal definition of rank

I Consider variable I {Am = i} to indicate visit to state i at time m.

I Rank ri of i-th node defined as time average of number of visits, i.e.,

ri := limn→∞

1

n

n∑m=1

I {Am = i}

I Define vector of ranks r := [r1, r2, . . . , rJ ]T

I Rank ri can be approximated by average rni at time n

rni :=1

n

n∑m=1

I {Am = i}

I Since limn→∞ rni = ri , it holds rni ≈ ri for n sufficiently large

I Define vector of approximate ranks rn := [rn1, rn2, . . . , rnJ ]T

I If modified graph is connected, rank independent of initial visit

Stoch. Systems Analysis Ranking of nodes in graphs 9

Page 10: Ranking of nodes in graphs - University of Pennsylvania

Algorithm

Output : Vector r(i) with ranking of node iInput : Vector N(i) containing number of neighbors of iInput : Matrix N(i , k) containing indices j of neighbors of i

m = 1; r=zeros(J,1); % Initialize time and ranksA0 = random(’unid’,J); % Draw first visit uniformly at randomwhile m < n do

jump = random(’unid’,NAm−1); % Neighbor uniformly at randomAm = N(Am−1, jump); % Jump to selected neighborr(Am) = r(Am) + 1; % Update ranking for Am

m = m + 1;endr = r/n; % Normalize by number of iterations n

Stoch. Systems Analysis Ranking of nodes in graphs 10

Page 11: Ranking of nodes in graphs - University of Pennsylvania

Example: Social graph

I Asked students taking ESE303 about homework collaboration

I Created (crude) graph of the social network of students in this class

I Used ranking algorithm to understand connectedness

I E.g., If I want to know how well students are coping with the class itis best to ask people with higher connectivity ranking

I 2009 data

Stoch. Systems Analysis Ranking of nodes in graphs 11

Page 12: Ranking of nodes in graphs - University of Pennsylvania

Ranked class graph

Aarti Kochhar

Ranga Ramachandran

Saksham Karwal

Aditya Kaji

Alexandra Malikova

Pia Ramchandani

Amanda Smith

Amanda Zwarenstein

Jane Kim

Katie Joo

Lisa Zheng

Michael Harker

Rebecca Gittler

Lindsey Eatough

Ankit Aggarwal

Priya Takiar

Anthony Dutcher

Ciara Kennedy

Carolina Lee

Daniela Savoia

Robert Feigenberg

Ceren Dumaz

Charles JeonChris Setian

Eric Lamb

Pallavi Yerramilli

Ella Kim

Jacci JeffriesHarish Venkatesan

Ivan LevcovitzJesse Beyroutey

Jihyoung Ahn

Madhur Agarwal

Owen Tian

Xiang-Li Lim

Paul Deren

Varun Balan

Thomas Cassel

Shahid Bosan

Sugyan Lohiaa

Stoch. Systems Analysis Ranking of nodes in graphs 12

Page 13: Ranking of nodes in graphs - University of Pennsylvania

Convergence metrics

I Recall r is vector of ranks and rn of rank iterates

I By definition limn→∞

rn = r . How fast rn converges to r (r given)?

I Can measure by distance between r and rn ⇒

ζn := ‖r − rn‖2 =

( J∑i=1

(rni − ri )2

)1/2

I If interest is only on largest ranked nodes, e.g., a web search

I Denote r (i) as the index of the i-th highest ranked node

I Similarly, r(i)n is the index of the i-th highest ranked node at time n

I First element wrongly ranked at time n

ξn := mini

r (i) 6= r (i)n

Stoch. Systems Analysis Ranking of nodes in graphs 13

Page 14: Ranking of nodes in graphs - University of Pennsylvania

Evaluation of convergence metrics

Distance

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000010−2

10−1

100

101

time (n)

corre

ctly

rank

ed n

odes

First element wrongly ranked

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

2

4

6

8

10

12

14

time (n)

corre

ctly

rank

ed n

odes

I Distance gets close to10−2 in approx. 5× 103

iterations

I Bad: Two largest ranksin 3× 103 iterations

I Awful: Six best ranks in8× 103 iterations

I Convergence appears(very) slow

Stoch. Systems Analysis Ranking of nodes in graphs 14

Page 15: Ranking of nodes in graphs - University of Pennsylvania

When does this algorithm converge?

I Can confidently claim convergence not until 105 iterations

I True for particular case. Slow convergence inherent to algorithm

I Example has 40 nodes, want to use in network with 109 nodes

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2x 105

0

5

10

15

20

25

30

35

40

time (n)

corre

ctly r

anke

d no

des

I Use fact that this process is a MC to obtain faster algorithm

Stoch. Systems Analysis Ranking of nodes in graphs 15

Page 16: Ranking of nodes in graphs - University of Pennsylvania

Ranking of nodes in graphs: Markov chain

Ranking of nodes in graphs: Random walk

Ranking of nodes in graphs: Markov chain

Stoch. Systems Analysis Ranking of nodes in graphs 16

Page 17: Ranking of nodes in graphs - University of Pennsylvania

Limit probabilities

I Recall definition of rank ⇒ ri := limt→∞

1

t

t∑u=1

I{A(u) = i}

I Rank is time average of number of state visits in a MC

⇒ Can be equally obtained from limiting probabilities

I Recall transition probabilities ⇒ Pij =1

Ni, for all j ∈ n(i)

I Stationary distribution π = [π1, π1, . . . , πJ ]T solution of

πi =∑

j∈n−1(i)

Pj iπj =∑

j∈n−1(i)

πjNj

for all i

I Plus normalization equation∑J

i=1 πi = 1

I As per ergodicity ⇒ r = π

Stoch. Systems Analysis Ranking of nodes in graphs 17

Page 18: Ranking of nodes in graphs - University of Pennsylvania

Matrix notation, eigenvalue problem

I As always, can define matrix P with elements Pij

πi =∑

j∈n−1(i)

Pjiπj =J∑

j=1

Pjiπj for all i

I Right hand side is just definition of a matrix product leading to

π = PTπ, πT1 = 1

I Also added normalization equation

I Can solve as system of linear equations or eigenvalue problem on PT

I Non-iterative method ⇒ Convergence not an issue

I But requires matrix P available at a central location

I Computationally costly (matrix P with 109 rows and columns)I All methods are costly to compute exact solutionI This one is costly to find even approximate solution

Stoch. Systems Analysis Ranking of nodes in graphs 18

Page 19: Ranking of nodes in graphs - University of Pennsylvania

What are limit probabilities?

I Let pi (n) denote probability of agent A visiting node i at time t

pi (n) := P [An = i ]

I Probabilities at time n + 1 and n can be related

P [An+1 = i ] =∑

j∈n−1(i)

P[An = i

∣∣An = j]

P [An = j ]

I Which is, of course, probability propagation in a MC

pi (n + 1) =∑

j∈n−1(i)

Pjipj(n)

I By definition limit probabilities are (let p(n) = [p1(n), . . . , pJ(n)]T )

limn→∞

p(n) = π = r

I Compute ranks from limit of probability propagation

Stoch. Systems Analysis Ranking of nodes in graphs 19

Page 20: Ranking of nodes in graphs - University of Pennsylvania

Probability propagation

I Can also write probability propagation in matrix form

pi (n + 1) =∑

j∈n−1(i)

Pjipj(n) =J∑

j=1

Pjipj(n) for all i

I Right hand side is just definition of a matrix product leading to

p(n + 1) = PTp(n)

I Can approximate rank by probability distribution

⇒ r = limn→∞ p(n) ≈ p(n) for n sufficiently large

Stoch. Systems Analysis Ranking of nodes in graphs 20

Page 21: Ranking of nodes in graphs - University of Pennsylvania

Algorithm

I Algorithm is just a recursive matrix product

Output : Vector r(i) with ranking of node iInput : Matrix P containing transition probabilities

m = 1; % Initialize timer=(1/J)ones(J,1); % Initial distribution uniform across all nodeswhile m < n do

r = PT r; % Probability propagationm = m + 1;

end

Stoch. Systems Analysis Ranking of nodes in graphs 21

Page 22: Ranking of nodes in graphs - University of Pennsylvania

Interpretation of probability propagation

I Why does the random walk converges so slow?

I What does it take to obtain a time average rni close to ri?

I Need to register a large number of agent visits to every state

I Back of hand: 40 nodes, some 100 visits to each ⇒ 4× 103 iters.

I Idea: Unleash a large number of agents K

ri = limn→∞

1

n

n∑m=1

1

K

K∑k=1

I {Akm = i}

I Visits are now spread over time and space

⇒ Converges “K times faster” (depends agents’ initial distribution)

⇒ But haven’t changed computational cost

Stoch. Systems Analysis Ranking of nodes in graphs 22

Page 23: Ranking of nodes in graphs - University of Pennsylvania

Interpretation of prob. propagation (continued)

I What happens if we unleash infinite number of agents K?

ri = limn→∞

1

n

n∑m=1

limK→∞

1

K

K∑k=1

I {Akm = i}

I Using law of large numbers and expected value of indicator function

ri = limn→∞

1

n

n∑m=1

E [I {Am = i}] = limn→∞

1

n

n∑m=1

P [Am = i ]

I Graph walk is a MC, then limm→∞

P [Am = i ] = limm→∞

pi (m) exists, and

ri = limn→∞

1

n

n∑m=1

pi (m) = limn→∞

pi (n)

I Probability propagation ≈ Unleashing infinite number of agents

I Interpretation true for any MC

Stoch. Systems Analysis Ranking of nodes in graphs 23

Page 24: Ranking of nodes in graphs - University of Pennsylvania

Distance to rank

I Initialize with uniform probability distribution ⇒ p(0) = (1/J)1

I Distance between p(n) and r

0 20 40 60 80 100 120 14010−4

10−3

10−2

10−1

100

time (n)

Dis

tanc

e

I Distance is 10−2 in approximately 30 iterations, 10−4 in 140iterations

I Convergence is two orders of magnitude faster than random walk

Stoch. Systems Analysis Ranking of nodes in graphs 24

Page 25: Ranking of nodes in graphs - University of Pennsylvania

Number of nodes correctly ranked

I Rank of highest ranked node that is wrongly ranked by time n

0 20 40 60 80 100 120 1400

5

10

15

20

25

30

35

40

time (n)

corre

ctly

rank

ed n

odes

I Not bad: All nodes correctly ranked in 120 iterations

I Good: Ten best ranks in 80 iterations

I Great: Four best ranks in 20 iterations

Stoch. Systems Analysis Ranking of nodes in graphs 25

Page 26: Ranking of nodes in graphs - University of Pennsylvania

Distributed algorithm to compute ranks

I Nodes want to compute their rank ri

⇒ Can communicate with neighbors only (incoming + outgoing)

⇒ Access to neighborhood information only

I Recall probability update

pi (n + 1) =∑

j∈n−1(i)

Pjipj(n) =∑

j∈n−1(i)

1

Njpj(n)

I Uses local information only

I Algorithm. Nodes keep local rank estimates pi (n)

I Receive rank (probability) estimates pj(n) from neighbors j ∈ n−1(i)

I Update local rank estimate pi (n + 1) =∑

j∈n−1(i) pj(n)/Nj

I Communicate rank estimate pi (n + 1) to outgoing neighbors j ∈ n(i)

I Need only know number of neighbors of my neighbors

Stoch. Systems Analysis Ranking of nodes in graphs 26

Page 27: Ranking of nodes in graphs - University of Pennsylvania

Distributed implementation of random walk

I Can communicate with neighbors only (incoming + outgoing)

I But cannot access neighborhood information

I Pass agent around

I Local rank estimates ri (n) and counter with number of visits Vi

I Algorithm run by node i at time n

if Agent received from neighbor thenVi = Vi + 1Choose random neighborSend agent to chosen neighbor

endn = n + 1; ri (n) = Vi/n;

I Speed up convergence by generating many agents to pass around

Stoch. Systems Analysis Ranking of nodes in graphs 27

Page 28: Ranking of nodes in graphs - University of Pennsylvania

Comparison of different algoithms

I Random walk implementation

⇒ Most secure & robust. No information shared with other nodes

⇒ Implementation can be distributed

⇒ Convergence exceedingly slow

I System of linear equations

⇒ Least security and robustness. Graph in central server

⇒ Distributed implementation not clear

⇒ Non-iterative method, convergence not a problem

⇒ But computationally costly to obtain approximate solutions

I Probability propagation, matrix powers

⇒ Somewhat secure/robust. Info. shared with neighbors only

⇒ Implementation can be distributed

⇒ Convergence rate acceptable (orders of magnitude faster than RW)

Stoch. Systems Analysis Ranking of nodes in graphs 28