spectral properties of google matrix and beyond · ) s= s ss s sc 0 s cc where s ssis block...

Spectral properties of Google matrix andbeyond

Klaus M. Frahm1

Quantware MIPS Center Université Paul SabatierK. Jaffrès-Runser2, D.L. Shepelyansky11 Laboratoire de Physique Théorique du CNRS, IRSAMC2 Institut de Recherche en Informatique de Toulouse, INPT

Google Matrix: fundamentals, applications and beyond

IHES, October 15 – 18, 2018

Perron-Frobenius operators

Physical system evolving by a discrete Markov process:

pi(t + 1) =∑j

Gij pj(t) with∑i

Gij = 1 , Gij ≥ 0 .

Transition probabilities Gij ⇒ Perron-Frobenius matrix.Conservation of probability:

∑i pi(t + 1) =

∑i pi(t) = 1.

In general GT 6= G and complex eigenvalues λ with |λ| ≤ 1.eT = (1, . . . , 1) is left eigenvector with λ1 = 1⇒ existence of (atleast) one right eigenvector P for λ1 = 1 also called PageRank in thecontext of Google matrices: GP = 1 P

For non-degenerate λ1 and finite gap |λ2| < 1: limt→∞

p(t) = P

⇒ Power method to compute P with rate of convergence ∼ |λ2|t.2

Google matrixConstruct an Adjacency matrix A for a directed network with Nnodes and N` links by :

Ajk = 1 if there is a link k → j and Ajk = 0 otherwise.

Sum-normalization of each non-zero column of A ⇒ S0.

Replacing each zero column (dangling nodes) with e/N ⇒ S.

Eventually apply the damping factor α < 1 (typically α = 0.85):

Google matrix: G(α) = αS + (1− α)1

NeeT .

⇒ λ1 is non-degenerate and |λ2| ≤ α.

Same procedure for inverted network: A∗ ≡ AT where S∗ and G∗ are obtained in

the same way from A∗. Note: in general: S∗ 6= ST . Leading (right) eigenvector of S∗

or G∗ is called CheiRank .

(Brin and Page, Comp. Networks ISDN Syst. 30, 107 (1998).)

3

Example:

A =

0 1 1 0 01 0 1 1 00 1 0 1 00 0 1 0 00 0 0 1 0

S0 =

0 1

213 0 0

1 0 13

13 0

0 12 0 1

3 0

0 0 13 0 0

0 0 0 13 0

, S =

0 1

213 0 1

5

1 0 13

13

15

0 12 0 1

315

0 0 13 0 1

5

0 0 0 13

15

4

PageRank(KF, Georgeot and Shepelyansky, J. Phys. A 44 465101 (2011).)

Example for university networks of Cambridge 2006 and Oxford 2006 (N ≈ 2× 105

and N` ≈ 2× 106).

P (i) =∑j

Gij P (j)

P (i) represents the “importance” of “node/page i” obtained as sum of all other pages

j pointing to i with weight P (j). Sorting of P (i) ⇒ index K(i) for order of

appearance of search results in search engines such as Google.

5

Numerical methods

• Power method to obtain P or P ∗: rate of convergence for G(α) ∼ α t.

• 2d Rank : representation of nodes (node-density) in K −K∗−plane

where K (K∗) is sorting index of PageRank P (CheiRank P ∗).

• Complex eigenvalues: Full “exact” diagonalization for N . 104 or Arnoldimethod to determine largest nA ∼ 102 − 104 eigenvalues.

• Invariant subspaces in realistic WWW networks⇒ large degeneracies of λ1:

⇒ S =

(Sss Ssc0 Scc

)where Sss is block diagonal according to the subspaces and can be diagonalizedseparately. Scc corresponds to the core space with |λmax| < 1.

• Strange numerical problems to determine accurately “small” eigenvalues, inparticular for (nearly) triangular network structure due to large Jordan-blocks(e.g. citation network of Physical Review and recently Bitcoin network).

6

Applications

• University WWW-networks (N ∼ 2× 105, N` ∼ 2× 106).

• Linux Kernel network (N ∼ 3× 105)

(Nodes = kernel functions)

(Ermann, Chepelianskii and Shepelyansky, EPJB 79, 115 (2011))

• Wikipedia, different language editions (N ∼ 4× 106, N` ∼ 108)

• World trade network (N ∼ 104 but more complicated structure).(Ermann and Shepelyansky, Acta Physica Polonica A 120(6A), A158 (2011))

• Twitter 2009 (N ∼ 4× 107, N` ∼ 1.5× 109).

• Physical Review citation network (N ∼ 5× 105, N` ∼ 5× 106).

(Review: Ermann, KF, and Shepelyansky, Rev. Mod. Phys. 87, 1261 (2015). )

7

Example: Wikipedia 2009

(Ermann, KF, and Shepelyansky, EPJB 86, 193 (2013))N = 3282257 nodes, N` = 71012307 network links.

left (right): PageRank (CheiRank)

black: PageRank (CheiRank) at α = 0.85grey: PageRank (CheiRank) at α = 1− 10−8

red and green: first two core space eigenvectors

blue and pink: two eigenvectors with large imaginary part in the eigenvalue

8

“Themes” of certain Wikipedia eigenvectors:

-1 -0.5 0 0.5 10

0.5

-0.82 -0.8 -0.78 -0.76 -0.74 -0.72

0

0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98

0

Aus

tral

ia

Switz

erla

ndE

ngla

nd

Bangladesh

New Zeland

Poland

KuwaitIceland

Austria

Bra

zil

Chi

na

Aus

tral

ia

Aus

tral

iaC

anad

a

England

muscle-arterybiology

DNAR

NA

prot

ein

skin

muscle-artery

muscle-artery

mathematics

math (function, geometry,surface, logic-circuit)

rail

war

Gaa

fu A

lif A

toll

Qua

ntum

Lea

p

Tex

as-D

alla

s-H

oust

on

Lan

guag

e

music

Bible

poetryfootball

song

poetryaircraft

Q: How to analyze network structure for such and also more generalsub-groups ? (⇒ reduced Google matrix, see below)

9

Anderson Localization(Anderson, Phys. Rev. 109, 1492 (1958); Nobel Prize 1977).

Hψn = −tψn−1 − tψn+1 + εnψn = Eψn , −W2 < εn <

W2

Exponential localization in d = 1, 2 (for W > 0) and d = 3 (for W > Wc = 16.5t).

For typical PageRank vectors of WWW or Wikipedia there is typically a power lawlocalization : P (j) ∼ K(j)−β with β ≈ 0.9.

However, some special cases for leading core space eigenvector :1

10-5

10-10

10-15

10-20

0 100 200 300 400

ψ1(c

ore

)

K (core)

Cambridge 2002Cambridge 2003Cambridge 2004Cambridge 2005

Leeds 20061− λ(core)1

Cambridge 2002 3.996 · 10−17

Cambridge 2003 4.01 · 10−17

Cambridge 2004 2.91 · 10−9

Cambridge 2005 4.01 · 10−17

Leeds 2006 3.126 · 10−19

(KF, Georgeot and Shepelyansky, J. Phys. A 44, 465101 (2011).)

10

Fractal Weyl law(KF, Eom, Shepelyansky. PRE 89, 052814 (2014).)Number of states Nλ with |λ| > λc: Nλ ∼ (Nt)

b with Nt = network size ofPhysical Review citation network at time t:

103

104

105

106

1920 1940 1960 1980 2000

Nt

t

t0 = 1791

τ = 11.4

Nt

2(t-t0)/τ

100

101

102

103

103

104

105

106

Nλ

Nt

a = 0.32

b = 0.51

λc = 0.50

Nλ = a (Nt)b

nA = 4000

nA = 2000 10

0

101

102

103

103

104

105

106

Nλ

Nt

a = 0.24

b = 0.47

λc = 0.65

Nλ = a (Nt)b

nA = 4000

nA = 2000

0.5

0.6

0.7

0.8

0.2 0.4 0.6 0.8 1

b

λc

(b)

(a) (c)

(d)

Linux kernel function network: Nλ ∼ N ν , ν ≈ 0.65 (λc = 0.1 or 0.25).(Ermann, Chepelianskii and Shepelyansky, EPJB 79, 115 (2011))(see also talk of S. Nonnenmacher.)

11

Random Perron-Frobenius matrices(KF, Eom, Shepelyansky. PRE 89, 052814 (2014).)

Random matrix ensembles for hermitian/hamiltonian matrices well known since1955 (Wigner, Annals of Mathematics. 62, 548 (1955); Nobel Prize 1963).

Construct random matrix ensembles Gij for PF-matrices such that:

Gij ≥ 0, Gij (approximately) non-correlated, distributed with same distributionP (Gij) (of finite variance σ2),∑

j

Gij = 1 ⇒ 〈Gij〉 = 1/N

⇒ average of G has one eigenvalue λ1 = 1 (⇒ “flat” PageRank) and othereigenvalues λj = 0 (for j 6= 1).

degenerate perturbation theory for the fluctuations⇒ circular eigenvalue densitywith R =

√Nσ and one unit eigenvalue.

full ⇒ R = 1/√

3N

sparse with Q non-zero elements per column ⇒ R ∼ 1/√Q

power law with P (G) ∼ G−b for 2 < b < 3 ⇒ R ∼ N 1−b/2

12

Numerical verification:

uniform full:N = 400

uniform sparse:N = 400,Q = 20

power law:b = 2.5

triangularrandom andaverage

constant sparse:N = 400,Q = 20

power law case:Rth ∼ N−0.25

13

Reduced Google matrix(KF, Shepelyansky arXiv:1602.02394; KF, Jaffrès-Runser, Shepelyansky, EPJB 89, 269 (2016).)

Consider a sub-network with Nr � N nodes providing adecomposition in reduced and scattering nodes:

G =

(Grr Grs

Gsr Gss

), P =

(PrPs

)GP = P ⇒ GRPr = Pr

with the effective reduced Google matrix :

GR = Grr + Grs(1−Gss)−1Gsr

containing direct link contributions from Grr andscattering contributions from Grs(1−Gss)

−1Gsr.GR has the same symmetries as G: (GR)ij ≥ 0 and

∑i(GR)ij = 1.

Analogy with quantum chaotic scattering: S = 1− 2iW † 1E−H+iWW †

W .(Mahaux and Weidenmüller, Phys. Rev. 170, 847 (1968).)

14

Problem: practical evaluation of (1−Gss)−1 is very difficult for large

network sizes and the expansion

(1−Gss)−1 =

∞∑l=0

G lss

typically converges very slowly since the leading eigenvalue λc of Gss

is very close to unity: 1− λc � 1.More efficient expression:

(1−Gss)−1 = Pc

1

1− λc+Qc

∞∑l=0

G lss

with Gss = QcGssQc, the projectors Pc = ψRψTL ,Qc = 1− Pc and

ψR,L are right/left eigenvectors of Gss for λc such that ψTLψR = 1.The leading eigenvalue of Gss is close to α = 0.85⇒ rapid convergence of the matrix series.

15

⇒ three components of GR:

GR = Grr + Gpr + Gqr

Grr = rr sub-block of G ⇒ direct links

Gpr = GrsψRψ

TL

1− λcGsr =

ψRψTL

1− λc, rank 1

withψR = GrsψR , ψTL = ψTLGsr

Gqr = Grs

[Qc

∞∑l=0

G lss

]Gsr ⇒ indirect links

Typically: Gpr is numerically dominant but

Gqr has a more interesting structure allowing to identify friends/followers.

16

http://www.quantware.ups-tlse.fr/QWLIB/wikipolitnet/

Application : Main network = Wikipedia 2013, different language editions.Groups = leading 20/40 politicians of certain countries or G20 state leaders.

Node density in lnK − lnK∗−plane:

20 US, Enwiki

20 UK, Enwiki

G20, Enwiki

40 DE, Dewiki

40 FR, Frwiki

20 RU, Ruwiki

17

http://www.quantware.ups-tlse.fr/QWLIB/wikipolitnet/

Positions in K −K∗−plane

0

5

10

15

20

0 5 10 15 20

K*

K

Enwiki G20 EN

PutinErdogan

ObamaHarper

KirchnerCameron

RousseffHollande

SinghGillard

JintaoMyung−bak

CalderónZuma

MerkelAbdullah of Saudi Arabia

YudhoyonoMonti

NodaBarroso

0

5

10

15

20

25

30

35

40

0 5 10 15 20 25 30 35 40K

*

K

Frwiki Politicians FR

SarkozyHollande

Jean−Marie Le PenMélenchon

BayrouJuppé

Kosciusko−MorizetFillon

DelanoëFabius

MontebourgBorloo

RoyalStrauss−Kahn

Marine Le PenAubry

Datide Villepin

LangJoly

MoscoviciCopé

Cohn−BenditHortefeux

DuflotGuigou

PécresseLagarde

VallsTaubira

YadeRaffarin

Dupont−AignanMorin

BesancenotSapin

BartoloneGuaino

GuéantPhilippot

18

Enwiki G20 state leader

1− λc = 2.465× 10−4

GR

Grr

Gpr

Gqr

19

Grr Enwiki G20 EN

Obama

Ob

ama

Putin

Pu

tin

Cameron

Cam

eron

Harper

Harp

er

Merkel

Merk

el

Singh

Sin

gh

Jintao

Jintao

Hollande

Ho

lland

e

Gillard

Gillard

Barroso

Barro

so

Zuma

Zu

ma

MontiM

on

tiKirchner

Kirch

ner

Erdogan

Erd

og

an

Calderón

Cald

erón

Abdullah

Ab

du

llah

Yudhoyono

Yu

dh

oy

on

o

Rousseff

Ro

usseff

Myung−bak

My

un

g−

bak

Noda

No

da

20

Gqr Enwiki G20 EN

Obama

Obam

a

Putin

Putin

Cameron

Cam

eron

Harper

Harp

er

Merkel

Merk

el

Singh

Sin

gh

Jintao

Jintao

Hollande

Hollan

de

Gillard

Gillard

Barroso

Barro

so

Zuma

Zum

a

Monti

Monti

Kirchner

Kirch

ner

Erdogan

Erd

ogan

Calderón

Cald

erón

Abdullah

Abdullah

Yudhoyono

Yudhoyono

RousseffR

ousseff

Myung−bak

Myung−

bak

Noda

Noda

G20 EN EnwikiName Friends FollowersObama Putin Noda

Merkel AbdullahCalderón Myung-bak

Putin Merkel NodaObama Myung-bakBarroso Merkel

Cameron Putin GillardObama BarrosoMerkel Hollande

Harper Obama MerkelCameron GillardPutin Myung-bak

Merkel Barroso HollandePutin MontiObama Kirchner

21

Gqr Frwiki Politicians FR

Sarkozy

Sark

ozy

Hollande

Ho

lland

e

Jean−Marie Le Pen

Jean−

Marie L

e Pen

Royal

Ro

yal

Raffarin

Raffarin

Villepin

Villep

in

Fillon

Fillo

n

Bayrou

Bay

rou

Fabius

Fab

ius

Strauss−Kahn

Strau

ss−K

ahn

Lang

Lan

g

Juppé

Jup

pé

Borloo

Bo

rloo

Delanoë

Delan

oë

Mélenchon

Mélen

cho

n

Marine Le Pen

Marin

e Le P

en

Lagarde

Lag

arde

Aubry

Au

bry

Cohn−Bendit

Co

hn

−B

end

it

Pécresse

Pécresse

Copé

Co

pé

Kosciusko−Morizet

Ko

sciusk

o−

Mo

rizet

Montebourg

Mo

nteb

ou

rg

Bartolone

Barto

lon

e

Dati

Dati

Besancenot

Besan

ceno

t

Dupont−Aignan

Du

po

nt−

Aig

nan

Joly

Joly

Taubira

Tau

bira

Guigou

Gu

igo

u

Hortefeux

Ho

rtefeux

Yade

Yad

e

Moscovici

Mo

scov

ici

Valls

Valls

GuéantG

uéan

tMorin

Mo

rin

Duflot

Du

flot

Sapin

Sap

in

Guaino

Gu

aino

Philippot

Ph

ilipp

ot

Politicians FR FrwikiName Friends FollowersSarkozy Fillon Hortefeux

J.-M. Le Pen PécresseHollande Yade

Hollande Sarkozy RoyalRoyal AubryFabius Fabius

J.-M. Le Pen Sarkozy PhilippotM. Le Pen M. Le PenBayrou Taubira

Royal Hollande MoscoviciSarkozy PhilippotFabius Hollande

Raffarin Sarkozy CopéJuppé PécresseFillon Hortefeux

22

GR

Frwiki Friends

12

3

4 5

15

19

SarkozyHollande

JMLePenMélenchon

Cohn−Bendit

GR

Frwiki Followers

12

3

47

8

9

11

14

15

1618

19

22

23

24

25

26

27

28

29

3031 32

33

34 35

36

37

38

39

40

SarkozyHollande

JMLePenMélenchon

Cohn−Bendit

Gqr Frwiki Friends

12

3

4 5 7

8

911

12

1516

18

19

28

SarkozyHollande

JMLePenMélenchon

Cohn−Bendit

Gqr Frwiki Followers

12

3

4

89

11

13

15

16

1718

19

20

21

22

23

2425

26

27

28

29

30

3132

33

34

3536

37

3839

40

SarkozyHollande

JMLePenMélenchon

Cohn−Bendit

(Other applications of GR → talk of K. Jaffrès-Runser.)

23

Ising-PageRankIsing model: H = −J

∑<i,j>

SiSj − h∑i

Si , Si = ±1.

(Ising, Z. Phys., 31, 253 (1925))

Ising model of Google matrix:(work in progress)Double network size (of a given network such as Wikipedia etc.) into red and bluenodes and attribute to each node i a preference with probability wr (or wb = 1− wr)to link to other red (blue) nodes:

wb

wr

i j

i j

i j

24

Preferential vector for dangling nodes or damping factor :

1

N

1...1

→ 1

N

wrwb...wrwb

PageRank Pr(i) (or Pb(i)) for red (blue) nodes.

Vote: Vr = #{nodes i | Pr(i) > Pb(i)}/N (for english Wikipedia 2017):

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Vr

wr

25

Effect of Elite Nodes

Different probabilities of red preference wr,elite for NEl elite nodes and wr for othernodes. Elite nodes are selected as NEl top nodes according to K-Rank (orK∗-Rank or 2D-Rank).

Vote modification: ∆Vr = Vr,El − Vr(for english Wikipedia 2017 with NEl = 1000):

wr,elite = 0

-0.02

-0.015

-0.01

-0.005

0

0 0.2 0.4 0.6 0.8 1

∆V

r

wr

K

K*

2D

Elite = top 1000-K-Rank nodes

wr,

eli

te

rw

−1 10.5−0.5 0

0.5

0

0 0.5 1

26

Conclusions

• Google matrix constructed from directed networks (WWW, Wikipedia, Twitter,Linux kernel, PR citation network etc.) with efficient computation of PageRank,leading complex eigenvalues (also exploiting the structure of invariantsubspaces) and some eigenvectors.

• Typical power law localization of PageRank but also examples of quasiexponential localization.

• Weyl fractal scaling for certain networks (Linux kernel, PR citation network,certain Ulam networks).

• Different simple models of random PF matrices do not describe the spectra ofrealistic Google matrices.

• Approach of reduced Google matrix GR for sub-networks of Wikipedia etc.Decomposition of GR in three contributions; construction of friend/followernetwork using GR or Gqr; different language editions of Wikipedia allow to takeinto account multi-cultural aspects.

• Ising-PageRank for networks with a doubled number of (red and blue) nodes.Effect of selected elite nodes on the vote.

27

spectral properties of google matrix and beyond · ) s= s ss s sc 0 s cc where s ssis block...

Documents