fan chung university of california, san diego the pagerank of a graph

93
Fan Chung University of California, San Diego The PageRank of a graph

Upload: ambrose-flynn

Post on 03-Jan-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Fan Chung University of California, San Diego The PageRank of a graph

Fan Chung

University of California, San Diego

The PageRank of a graph

Page 2: Fan Chung University of California, San Diego The PageRank of a graph
Page 3: Fan Chung University of California, San Diego The PageRank of a graph

What is PageRank?

•Ranking the vertices of a graph

•Partially ordered sets + graph theory

-Applications --web search,ITdata sets, xxxxxxxxxxxxxpartitioning algorithms,…

-Theory --- correlations among vertices

A new graph invariant for dealing with:

Page 4: Fan Chung University of California, San Diego The PageRank of a graph

fan

Page 5: Fan Chung University of California, San Diego The PageRank of a graph

Outline of the talk

• Motivation

• Local cuts and the Cheeger constant

using: eigenvectors random walks PageRank heat kernel

• Define PageRank

• Four versions of the Cheeger inequality

• Four partitioning algorithms

• Green’s functions and hitting time

Page 6: Fan Chung University of California, San Diego The PageRank of a graph

What is PageRank?

What is Rank?

Page 7: Fan Chung University of California, San Diego The PageRank of a graph
Page 8: Fan Chung University of California, San Diego The PageRank of a graph
Page 9: Fan Chung University of California, San Diego The PageRank of a graph

What is PageRank?

PageRank is defined on any graph.

Page 10: Fan Chung University of California, San Diego The PageRank of a graph

An induced subgraph of the collaboration graph with authors of Erdös number ≤ 2.

Page 11: Fan Chung University of California, San Diego The PageRank of a graph

A subgraph of the Hollywood graph.

Page 12: Fan Chung University of California, San Diego The PageRank of a graph

A subgraph of a BGP graph

Page 13: Fan Chung University of California, San Diego The PageRank of a graph

Yahoo IM graph Reid Andersen 2005

The Octopus graph

Page 14: Fan Chung University of California, San Diego The PageRank of a graph

Graph Theory has 250 years of history.

Leonhard Euler 1707-

1783

The Bridges of Königsburg

Is it possible to walk over every bridge once and only once?

Page 15: Fan Chung University of California, San Diego The PageRank of a graph

Geometric graphs

Algebraic graphs

General graphs

Topological graphs

Page 16: Fan Chung University of California, San Diego The PageRank of a graph

Massive dataMassive graphs

• WWW-graphs

• Call graphs

• Acquaintance graphs

• Graphs from any data a.base

protein interaction network Jawoong Jeong

Page 17: Fan Chung University of California, San Diego The PageRank of a graph

Big and bigger graphs

New directions in graph theory

Page 18: Fan Chung University of California, San Diego The PageRank of a graph

Many basic questions:

• Correlation among vertices?

• The `geometry’ of a network ?

• Quantitative analysis?

distance, flow, cut, …

eigenvalues, rapid mixing, …

• Local versus global?

Page 19: Fan Chung University of California, San Diego The PageRank of a graph

Google’s answer:

The definition for PageRank?

Page 20: Fan Chung University of California, San Diego The PageRank of a graph

A measure for the “importance” of a website

1 14 79 785( )x R x x x

2 1002 3225 9883 30027( )x R x x x x

The “importance” of a website is proportional

to the sum of the importance of all the sites

that link to it.

Page 21: Fan Chung University of California, San Diego The PageRank of a graph

A solution for the “importance” of a website

1 14 79 785( )x x x x

2 1002 3225 9883 30027( )x x x x x

Solve 1 2( , , , )nx x x for1

n

i ij jj

x a x

x

Page 22: Fan Chung University of California, San Diego The PageRank of a graph

A solution for the “importance” of a website

1 14 79 785( )x x x x

2 1002 3225 9883 30027( )x x x x x

Solve 1 2( , , , )nx x x for1

n

i ij jj

x a x

x

x = ρ A x ij n nA a

Adjacency matrix

Page 23: Fan Chung University of California, San Diego The PageRank of a graph

A solution for the “importance” of a website

1 14 79 785( )x x x x

2 1002 3225 9883 30027( )x x x x x

Solve 1 2( , , , )nx x x for1

n

i ij jj

x a x

x

x = ρ A x

Page 24: Fan Chung University of California, San Diego The PageRank of a graph

Graph models

directed graphs

weighted graphs

(undirected) graphs

Page 25: Fan Chung University of California, San Diego The PageRank of a graph

Graph models

directed graphs

weighted graphs

(undirected) graphs

Page 26: Fan Chung University of California, San Diego The PageRank of a graph

Graph models

directed graphs

weighted graphs

(undirected) graphs

1.5

1.2

3.3\

1.5

2

2.3

1

2.81.1

Page 27: Fan Chung University of California, San Diego The PageRank of a graph

In a directed graph,

authority

there are two types of “importance”:

hub Jon Kleinberg 1998

Page 28: Fan Chung University of California, San Diego The PageRank of a graph

Two types of the “importance” of a website

Solve

1 2( , , , )nx x x

and

x

x = r A y

Importance as Authorities :

1 2( , , , )my y y yImportance as Hubs :

y = s A xT

x = rs A A xT

y = rs A A yT

Page 29: Fan Chung University of California, San Diego The PageRank of a graph

Eigenvalue problem for n x n matrix:.

n ≈ 30 billion websites

Hard to compute eigenvalues

Even harder to compute eigenvectors

Page 30: Fan Chung University of California, San Diego The PageRank of a graph

In the old days,

compute for a given (whole) graph.

In reality,

can only afford to compute “locally”.

Page 31: Fan Chung University of California, San Diego The PageRank of a graph

A traditional algorithm

Input: a given graph on n vertices.

New algorithmic paradigm

Input: access to a (huge) graph

Efficient algorithm means polynomial algorithms

n3, n2, n log n, n

(e.g., for a vertex v, find its neighbors)Bounded number of access.

Page 32: Fan Chung University of California, San Diego The PageRank of a graph

A traditional algorithm

Input: a given graph on n vertices.

New algorithmic paradigm

Input: access to a (huge) graph

Efficient algorithm means polynomial algorithms

n3, n2, n log n, n

(e.g., for a vertex v, find its neighbors)Bounded number of access.

Exponential polynomial

Infinity finite

Page 33: Fan Chung University of California, San Diego The PageRank of a graph

The definition of PageRank given by

Brin and Page is based on

random walks.

Page 34: Fan Chung University of California, San Diego The PageRank of a graph

Random walks in a graph.

1,

( , )

0 .u

if u vdP u v

otherwise

G : a graph

P : transition probability matrix

2

I PW

A lazy walk:

ud := the degree of u.

Page 35: Fan Chung University of California, San Diego The PageRank of a graph

A (bored) surfer

• either surf a random webpage

with probability 1- α

Original definition of PageRank

α : the jumping constant

• or surf a linked webpage

with probability α

1 1 1( , ,...., ) (1 )n n np pW

Page 36: Fan Chung University of California, San Diego The PageRank of a graph

Two equivalent ways to define PageRank pr(α,s)

(1) (1 )p s pW

s: the seed as a row vector

Definition of personalized PageRank

α : the jumping constant

s

Page 37: Fan Chung University of California, San Diego The PageRank of a graph

Two equivalent ways to define PageRank p=pr(α,s)

(1) (1 )p s pW

0

(1 ) ( )t t

t

p sW

(2)

s = 1 1 1( , ,...., )n n n

Definition of PageRank

the (original) PageRank

some “seed”, e.g.,s = (1,0,....,0)

personalized PageRank

Page 38: Fan Chung University of California, San Diego The PageRank of a graph

Depends on the applications?

How good is PageRank as a measure of correlationship?

Isoperimetric properties

How “good” is the cut?

Page 39: Fan Chung University of California, San Diego The PageRank of a graph

Isoperimetric properties

“What is the shortest curve enclosing a unit area?”

In a graph G and an integer m,

what is the minimum cut disconnecting a subgraph of ≥ m vertices?In a graph G, what is the minimum cut e(S,V-S)

so that e(S,V-S) is the smallest?_____Vol S

Page 40: Fan Chung University of California, San Diego The PageRank of a graph

Two types of cuts:

• Vertex cut

• edge cut

How “good” is the cut?

S

E(S,V-S)

Page 41: Fan Chung University of California, San Diego The PageRank of a graph

e(S,V-S)_____Vol S

e(S,V-S)_____

|S|

Vol S = Σ deg(v)

v ε S|S| = Σ 1

v ε S

S

V-S

Page 42: Fan Chung University of California, San Diego The PageRank of a graph

The Cheeger constant

( , )min

min( , )GS

e S Sh

vol S vol S

The Cheeger constant for graphs

hG and its variations are sometimes called

“conductance”, “isoperimetric number”, …

Gh

Gh

The volume of S is ( ) xx S

vol S d

Page 43: Fan Chung University of California, San Diego The PageRank of a graph

The Cheeger constant

( , )min ( ) min

min( , )GS S

e S Sh h S

vol S vol S

The Cheeger inequality

2

22G

G

hh

The Cheeger inequality

: the first nontrivial eigenvalue of the xx(normalized) Laplacian of a connected graph.

Page 44: Fan Chung University of California, San Diego The PageRank of a graph

The spectrum of a graph

Many ways to define the spectrum of a graph.

How are the eigenvalues related to How are the eigenvalues related to

properties of graphs?properties of graphs?

•Adjacency matrix

Page 45: Fan Chung University of California, San Diego The PageRank of a graph

•Combinatorial Laplacian

L D A diagonal degree matrix

adjacency matrix

Gustav Robert Kirchhoff

1824-1887

The spectrum of a graph

•Adjacency matrix

Page 46: Fan Chung University of California, San Diego The PageRank of a graph

•Combinatorial Laplacian

L D A diagonal degree matrix

adjacency matrix

Gustav Robert Kirchhoff

1824-1887

The spectrum of a graph

•Adjacency matrix

Matrix tree theorem

0i

i

#spanning . trees

Page 47: Fan Chung University of California, San Diego The PageRank of a graph

•Combinatorial Laplacian

L D A diagonal degree matrix

adjacency matrix

Gustav Robert Kirchhoff

1824-1887

The spectrum of a graph

•Adjacency matrix

•Normalized Laplacian

Random walks

Rate of convergence

Page 48: Fan Chung University of California, San Diego The PageRank of a graph

The spectrum of a graph

not symmetric in general

•Normalized Laplacian

symmetric normalized

1( ) ( ( ) ( ))

y xx

f x f x f yd

L( , )x y 1 if x y

{ 1

x y

if x y and x yd d

Discrete Laplace operator

with eigenvalues 0 1 10 2n

( , )L x y 1 if x y

{ 1

x

if x y and x yd

loopless, simple

Page 49: Fan Chung University of California, San Diego The PageRank of a graph

The spectrum of a graph

•Normalized Laplacian

symmetric normalized

1( ) ( ( ) ( ))

y xx

f x f x f yd

Discrete Laplace operator

with eigenvalues 0 1 10 2n

=

( , )L x y 1 if x y

{ 1

x

if x y and x yd

L( , )x y 1 if x y

{ 1

x y

if x y and x yd d

not symmetric in general

Page 50: Fan Chung University of California, San Diego The PageRank of a graph

dictates many properties

of a graph.• expander

• diameter

• discrepancy

• subgraph containment

• ….

Spectral implications for finding good cuts?

Page 51: Fan Chung University of California, San Diego The PageRank of a graph

Finding a cut by a sweep

For : ( ) ,f V G R

1 2

1 2 ( )( ) ( ).

n

n

v v v

f vf v f v

d d d

1{ , , }, 1,..., ,fj jS v v j n

order the vertices

Consider sets

and the Cheeger constant of

( ) min ( )fj

jh f h S

.fjS

Define

Page 52: Fan Chung University of California, San Diego The PageRank of a graph

Using a sweep by the eigenvector,

can reduce the exponential number of

choices of subsets to a linear number.

Finding a cut by a sweep

Page 53: Fan Chung University of California, San Diego The PageRank of a graph

Using a sweep by the eigenvector,

can reduce the exponential number of

choices of subsets to a linear number.

Finding a cut by a sweep

Still, there is a lower bound guarantee

by using the Cheeger inequality.

2

22

hh

Page 54: Fan Chung University of California, San Diego The PageRank of a graph

Four types of Cheeger inequalities.

eigenvectors random walks PageRank heat kernel

Four proofs using:

Leading to four different

one-sweep partitioning algorithms.

Page 55: Fan Chung University of California, San Diego The PageRank of a graph

• graph spectral method

• random walks

• PageRank

• heat kernel

spectral partition algorithm

local partition algorithms

Four proofs of Cheeger inequalities

Page 56: Fan Chung University of California, San Diego The PageRank of a graph

Graph partitioning Local graph partitioning

Page 57: Fan Chung University of California, San Diego The PageRank of a graph

What is a local graph partitioning algorithm?

A local graph partitioning algorithm finds a small

cut near the given seed(s) with running time

depending only on the size of the output.

Page 58: Fan Chung University of California, San Diego The PageRank of a graph

Examples of local partitioning

Page 59: Fan Chung University of California, San Diego The PageRank of a graph

Examples of local partitioning

Page 60: Fan Chung University of California, San Diego The PageRank of a graph

Examples of local partitioning

Page 61: Fan Chung University of California, San Diego The PageRank of a graph

Examples of local partitioning

Page 62: Fan Chung University of California, San Diego The PageRank of a graph

Examples of local partitioning

h

Page 63: Fan Chung University of California, San Diego The PageRank of a graph

h

h

Page 64: Fan Chung University of California, San Diego The PageRank of a graph

• graph spectral method

• random walks

• PageRank

• heat kernel

spectral partition algorithm

local partition algorithms

Four proofs of Cheeger inequalities

Page 65: Fan Chung University of California, San Diego The PageRank of a graph

• graph spectral method

• random walks

• PageRank

• heat kernel

Lovasz, Simonovits, 90, 93 Spielman, Teng, 04

Andersen, Chung, Lang, 06

Chung, PNAS , 08.

Four proofs of Cheeger inequalities

Cheeger 60’s, Fiedler ’73Alon 86’, Jerrum+Sinclair

89’

Page 66: Fan Chung University of California, San Diego The PageRank of a graph

where is the first non-trivial eigenvalue of the Laplacian and is the minimum Cheeger ratio in a sweep using the eigenvector .

f

2 2

22 2

hh

Using eigenvector

the Cheeger inequality can be stated as

The Cheeger inequality

Partition algorithm

,

f

Page 67: Fan Chung University of California, San Diego The PageRank of a graph

Proof of the Cheeger inequality:

from definition

by Cauchy-Schwarz ineq.

summation by parts.

from the definition.

Page 68: Fan Chung University of California, San Diego The PageRank of a graph

2 2

28 8G h

h

where is the minimum Cheeger ratio over sweeps by using a lazy walk of k steps from every vertex for an appropriate range of k .

A Cheeger inequality using random walks

G

Lovász, Simonovits, 90, 93

2( )( , ) ( ) 1

8

k

k k

u

vol SW u S S

d

Leads to a Cheeger inequality:

Page 69: Fan Chung University of California, San Diego The PageRank of a graph

Using the PageRank vector.

A Cheeger inequality using PageRank

Recall the definition of PageRank p=pr(α,s):

(1) (1 )p s pW

0

(1 ) ( )t t

t

p sW

(2)

Organize the random walks by a scalar α.

Page 70: Fan Chung University of California, San Diego The PageRank of a graph

Random walks versus PageRank

How fast is the convergence to the

stationary distribution? Choose α to satisfy the

required property. For what k, can one

have

?

kf W

Page 71: Fan Chung University of California, San Diego The PageRank of a graph

with seed as a subset S

2 2

8log 8logS S

S S

hh

s s

Using the PageRank vector

and a Cheeger

inequality can be obtained :

where S is the Dirichlet eigenvalue of the Laplacian, and is the minimum Cheeger ratio over sweeps by using the appropriate personalized PageRank with seeds S.

A Cheeger inequality using PageRank

S

( ) ( ) / 4,vol S vol G

Page 72: Fan Chung University of California, San Diego The PageRank of a graph

2

2

( ( ) ( ))inf

( )u v

Sf

ww

f u f v

f w d

over all f satisfying the Dirichlet boundary

condition:

Dirichlet eigenvalues for a subset

( ) 0f v

S V

S

V

for all .v S

Page 73: Fan Chung University of California, San Diego The PageRank of a graph

min ( )

ST S

h h T

Local Cheeger constant for a subset

S V

S

V

Page 74: Fan Chung University of California, San Diego The PageRank of a graph

with seed as a subset S

2 2

8log 8logS S

S S

hh

s s

Using the PageRank vector

and a Cheeger

inequality can be obtained :

where S is the Dirichlet eigenvalue of the Laplacian, and is the minimum Cheeger ratio over sweeps by using personalized PageRank with seed S.

A Cheeger inequality using PageRank

S

( ) ( ) / 4,vol S vol G

Page 75: Fan Chung University of California, San Diego The PageRank of a graph

Algorithmic aspects of PageRank

• Fast approximation algorithm for x personalized PageRank

Can use the jumping constant to approximate PageRank with a support of the desired size.

greedy type algorithm, almost linear complexity

• Errors can be effectively bounded.

Page 76: Fan Chung University of California, San Diego The PageRank of a graph

A graph partition algorithm using PageRank

Given a set S with ,Sh

randomly choose a vertex v in S.

With probability at least1

,2

the one-sweep algorithm using

( , )f pr v

has an initial segment with the Cheeger

constant at most ( log | |).fh O S

1

2( ) ( ),vol S vol G

Page 77: Fan Chung University of California, San Diego The PageRank of a graph

Graph partitioning using PageRank vector.

198,430 nodes and 1,133,512 edges

Page 78: Fan Chung University of California, San Diego The PageRank of a graph
Page 79: Fan Chung University of California, San Diego The PageRank of a graph
Page 80: Fan Chung University of California, San Diego The PageRank of a graph

Kevin Lang 2007

Page 81: Fan Chung University of California, San Diego The PageRank of a graph

• graph spectral method

• random walks

• PageRank

• heat kernel

Lovasz, Simonovits, 90, 93 Spielman, Teng, 04

Andersen, Chung, Lang, 06

Chung, PNAS , 08.

Four proofs of Cheeger inequalities

Fiedler ’73, Cheeger, 60’sAlon 86’

Page 82: Fan Chung University of California, San Diego The PageRank of a graph

PageRank versus heat kernel

,0

(1 ) ( )k ks

k

p sW

,0

( )

!

kt

t sk

tWe s

k

Geometric sum

Exponential sum

Page 83: Fan Chung University of California, San Diego The PageRank of a graph

PageRank versus heat kernel

,0

(1 ) ( )k ks

k

p sW

,0

( )

!

kt

t sk

tWe s

k

Geometric sum

Exponential sum

(1 )p pW

recurrence

( )I Wt

Heat equation

Page 84: Fan Chung University of California, San Diego The PageRank of a graph

A Cheeger inequality using the heat kernel

2, / 4

,

( )( ) ( ) t ut

t uu

vol SS S e

d

Theorem:

where is the minimum Cheeger ratio over sweeps by using heat kernel pagerank over all u in S.

,t u

Theorem: For

, ( ) ( ) .2

St

t S

eS S

2/3( ) ( ) ,vol S vol G

Page 85: Fan Chung University of California, San Diego The PageRank of a graph

22( ... ...)

2 !

kt k

t

t tH e I tW W W

k

( )t I We

Definition of heat kernel

tLe2

2 ... ( 1) ...2 !

kk kt t

I tL L Lk

( )t tH I W Ht

,t s tsH

Page 86: Fan Chung University of California, San Diego The PageRank of a graph

2 2

8 8S S

S S

hh

Using the upper and lower bounds,

a Cheeger inequality can be

obtained :

where S is the Dirichlet eigenvalue of the Laplacian, and is the minimum Cheeger ratio over sweeps by using heat kernel with seeds S for appropriate t.

A Cheeger inequality using the heat kernel

S

Page 87: Fan Chung University of California, San Diego The PageRank of a graph
Page 88: Fan Chung University of California, San Diego The PageRank of a graph
Page 89: Fan Chung University of California, San Diego The PageRank of a graph
Page 90: Fan Chung University of California, San Diego The PageRank of a graph

• Covering and packing using PageRank

Many applications of PageRank for problems in Graph Theory

• Pebbing and routing using PageRank

• Graph embedding using PageRank

• Relating graph invariants of subgraphs

to the host graph using PageRank

• Your favorite old problem using PageRank?

• Graph drawing using PageRank

Page 91: Fan Chung University of California, San Diego The PageRank of a graph

• Random graphs with general degrees

• pageranks

• Algorithmic game theory, graphical

games

New Directions in Graph Theory for information networks

Topics:

Using: • Spectral methods

• Probabilistic methods

• Quasirandom …

Page 92: Fan Chung University of California, San Diego The PageRank of a graph
Page 93: Fan Chung University of California, San Diego The PageRank of a graph