spectral properties of google matrix and beyond · ) s= s ss s sc 0 s cc where s ssis block...
TRANSCRIPT
Spectral properties of Google matrix andbeyond
Klaus M. Frahm1
Quantware MIPS Center Université Paul SabatierK. Jaffrès-Runser2, D.L. Shepelyansky11 Laboratoire de Physique Théorique du CNRS, IRSAMC2 Institut de Recherche en Informatique de Toulouse, INPT
Google Matrix: fundamentals, applications and beyond
IHES, October 15 – 18, 2018
Perron-Frobenius operators
Physical system evolving by a discrete Markov process:
pi(t + 1) =∑j
Gij pj(t) with∑i
Gij = 1 , Gij ≥ 0 .
Transition probabilities Gij ⇒ Perron-Frobenius matrix.Conservation of probability:
∑i pi(t + 1) =
∑i pi(t) = 1.
In general GT 6= G and complex eigenvalues λ with |λ| ≤ 1.eT = (1, . . . , 1) is left eigenvector with λ1 = 1⇒ existence of (atleast) one right eigenvector P for λ1 = 1 also called PageRank in thecontext of Google matrices: GP = 1 P
For non-degenerate λ1 and finite gap |λ2| < 1: limt→∞
p(t) = P
⇒ Power method to compute P with rate of convergence ∼ |λ2|t.2
Google matrixConstruct an Adjacency matrix A for a directed network with Nnodes and N` links by :
Ajk = 1 if there is a link k → j and Ajk = 0 otherwise.
Sum-normalization of each non-zero column of A ⇒ S0.
Replacing each zero column (dangling nodes) with e/N ⇒ S.
Eventually apply the damping factor α < 1 (typically α = 0.85):
Google matrix: G(α) = αS + (1− α)1
NeeT .
⇒ λ1 is non-degenerate and |λ2| ≤ α.
Same procedure for inverted network: A∗ ≡ AT where S∗ and G∗ are obtained in
the same way from A∗. Note: in general: S∗ 6= ST . Leading (right) eigenvector of S∗
or G∗ is called CheiRank .
(Brin and Page, Comp. Networks ISDN Syst. 30, 107 (1998).)
3
Example:
A =
0 1 1 0 01 0 1 1 00 1 0 1 00 0 1 0 00 0 0 1 0
S0 =
0 1
213 0 0
1 0 13
13 0
0 12 0 1
3 0
0 0 13 0 0
0 0 0 13 0
, S =
0 1
213 0 1
5
1 0 13
13
15
0 12 0 1
315
0 0 13 0 1
5
0 0 0 13
15
4
PageRank(KF, Georgeot and Shepelyansky, J. Phys. A 44 465101 (2011).)
Example for university networks of Cambridge 2006 and Oxford 2006 (N ≈ 2× 105
and N` ≈ 2× 106).
P (i) =∑j
Gij P (j)
P (i) represents the “importance” of “node/page i” obtained as sum of all other pages
j pointing to i with weight P (j). Sorting of P (i) ⇒ index K(i) for order of
appearance of search results in search engines such as Google.
5
Numerical methods
• Power method to obtain P or P ∗: rate of convergence for G(α) ∼ α t.
• 2d Rank : representation of nodes (node-density) in K −K∗−plane
where K (K∗) is sorting index of PageRank P (CheiRank P ∗).
• Complex eigenvalues: Full “exact” diagonalization for N . 104 or Arnoldimethod to determine largest nA ∼ 102 − 104 eigenvalues.
• Invariant subspaces in realistic WWW networks⇒ large degeneracies of λ1:
⇒ S =
(Sss Ssc0 Scc
)where Sss is block diagonal according to the subspaces and can be diagonalizedseparately. Scc corresponds to the core space with |λmax| < 1.
• Strange numerical problems to determine accurately “small” eigenvalues, inparticular for (nearly) triangular network structure due to large Jordan-blocks(e.g. citation network of Physical Review and recently Bitcoin network).
6
Applications
• University WWW-networks (N ∼ 2× 105, N` ∼ 2× 106).
• Linux Kernel network (N ∼ 3× 105)
(Nodes = kernel functions)
(Ermann, Chepelianskii and Shepelyansky, EPJB 79, 115 (2011))
• Wikipedia, different language editions (N ∼ 4× 106, N` ∼ 108)
• World trade network (N ∼ 104 but more complicated structure).(Ermann and Shepelyansky, Acta Physica Polonica A 120(6A), A158 (2011))
• Twitter 2009 (N ∼ 4× 107, N` ∼ 1.5× 109).
• Physical Review citation network (N ∼ 5× 105, N` ∼ 5× 106).
(Review: Ermann, KF, and Shepelyansky, Rev. Mod. Phys. 87, 1261 (2015). )
7
Example: Wikipedia 2009
(Ermann, KF, and Shepelyansky, EPJB 86, 193 (2013))N = 3282257 nodes, N` = 71012307 network links.
left (right): PageRank (CheiRank)
black: PageRank (CheiRank) at α = 0.85grey: PageRank (CheiRank) at α = 1− 10−8
red and green: first two core space eigenvectors
blue and pink: two eigenvectors with large imaginary part in the eigenvalue
8
“Themes” of certain Wikipedia eigenvectors:
-1 -0.5 0 0.5 10
0.5
-0.82 -0.8 -0.78 -0.76 -0.74 -0.72
0
0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98
0
Aus
tral
ia
Switz
erla
ndE
ngla
nd
Bangladesh
New Zeland
Poland
KuwaitIceland
Austria
Bra
zil
Chi
na
Aus
tral
ia
Aus
tral
iaC
anad
a
England
muscle-arterybiology
DNAR
NA
prot
ein
skin
muscle-artery
muscle-artery
mathematics
math (function, geometry,surface, logic-circuit)
rail
war
Gaa
fu A
lif A
toll
Qua
ntum
Lea
p
Tex
as-D
alla
s-H
oust
on
Lan
guag
e
music
Bible
poetryfootball
song
poetryaircraft
Q: How to analyze network structure for such and also more generalsub-groups ? (⇒ reduced Google matrix, see below)
9
Anderson Localization(Anderson, Phys. Rev. 109, 1492 (1958); Nobel Prize 1977).
Hψn = −tψn−1 − tψn+1 + εnψn = Eψn , −W2 < εn <
W2
Exponential localization in d = 1, 2 (for W > 0) and d = 3 (for W > Wc = 16.5t).
For typical PageRank vectors of WWW or Wikipedia there is typically a power lawlocalization : P (j) ∼ K(j)−β with β ≈ 0.9.
However, some special cases for leading core space eigenvector :1
10-5
10-10
10-15
10-20
0 100 200 300 400
ψ1(c
ore
)
K (core)
Cambridge 2002Cambridge 2003Cambridge 2004Cambridge 2005
Leeds 20061− λ(core)1
Cambridge 2002 3.996 · 10−17
Cambridge 2003 4.01 · 10−17
Cambridge 2004 2.91 · 10−9
Cambridge 2005 4.01 · 10−17
Leeds 2006 3.126 · 10−19
(KF, Georgeot and Shepelyansky, J. Phys. A 44, 465101 (2011).)
10
Fractal Weyl law(KF, Eom, Shepelyansky. PRE 89, 052814 (2014).)Number of states Nλ with |λ| > λc: Nλ ∼ (Nt)
b with Nt = network size ofPhysical Review citation network at time t:
103
104
105
106
1920 1940 1960 1980 2000
Nt
t
t0 = 1791
τ = 11.4
Nt
2(t-t0)/τ
100
101
102
103
103
104
105
106
Nλ
Nt
a = 0.32
b = 0.51
λc = 0.50
Nλ = a (Nt)b
nA = 4000
nA = 2000 10
0
101
102
103
103
104
105
106
Nλ
Nt
a = 0.24
b = 0.47
λc = 0.65
Nλ = a (Nt)b
nA = 4000
nA = 2000
0.5
0.6
0.7
0.8
0.2 0.4 0.6 0.8 1
b
λc
(b)
(a) (c)
(d)
Linux kernel function network: Nλ ∼ N ν , ν ≈ 0.65 (λc = 0.1 or 0.25).(Ermann, Chepelianskii and Shepelyansky, EPJB 79, 115 (2011))(see also talk of S. Nonnenmacher.)
11
Random Perron-Frobenius matrices(KF, Eom, Shepelyansky. PRE 89, 052814 (2014).)
Random matrix ensembles for hermitian/hamiltonian matrices well known since1955 (Wigner, Annals of Mathematics. 62, 548 (1955); Nobel Prize 1963).
Construct random matrix ensembles Gij for PF-matrices such that:
Gij ≥ 0, Gij (approximately) non-correlated, distributed with same distributionP (Gij) (of finite variance σ2),∑
j
Gij = 1 ⇒ 〈Gij〉 = 1/N
⇒ average of G has one eigenvalue λ1 = 1 (⇒ “flat” PageRank) and othereigenvalues λj = 0 (for j 6= 1).
degenerate perturbation theory for the fluctuations⇒ circular eigenvalue densitywith R =
√Nσ and one unit eigenvalue.
full ⇒ R = 1/√
3N
sparse with Q non-zero elements per column ⇒ R ∼ 1/√Q
power law with P (G) ∼ G−b for 2 < b < 3 ⇒ R ∼ N 1−b/2
12
Numerical verification:
uniform full:N = 400
uniform sparse:N = 400,Q = 20
power law:b = 2.5
triangularrandom andaverage
constant sparse:N = 400,Q = 20
power law case:Rth ∼ N−0.25
13
Reduced Google matrix(KF, Shepelyansky arXiv:1602.02394; KF, Jaffrès-Runser, Shepelyansky, EPJB 89, 269 (2016).)
Consider a sub-network with Nr � N nodes providing adecomposition in reduced and scattering nodes:
G =
(Grr Grs
Gsr Gss
), P =
(PrPs
)GP = P ⇒ GRPr = Pr
with the effective reduced Google matrix :
GR = Grr + Grs(1−Gss)−1Gsr
containing direct link contributions from Grr andscattering contributions from Grs(1−Gss)
−1Gsr.GR has the same symmetries as G: (GR)ij ≥ 0 and
∑i(GR)ij = 1.
Analogy with quantum chaotic scattering: S = 1− 2iW † 1E−H+iWW †
W .(Mahaux and Weidenmüller, Phys. Rev. 170, 847 (1968).)
14
Problem: practical evaluation of (1−Gss)−1 is very difficult for large
network sizes and the expansion
(1−Gss)−1 =
∞∑l=0
G lss
typically converges very slowly since the leading eigenvalue λc of Gss
is very close to unity: 1− λc � 1.More efficient expression:
(1−Gss)−1 = Pc
1
1− λc+Qc
∞∑l=0
G lss
with Gss = QcGssQc, the projectors Pc = ψRψTL ,Qc = 1− Pc and
ψR,L are right/left eigenvectors of Gss for λc such that ψTLψR = 1.The leading eigenvalue of Gss is close to α = 0.85⇒ rapid convergence of the matrix series.
15
⇒ three components of GR:
GR = Grr + Gpr + Gqr
Grr = rr sub-block of G ⇒ direct links
Gpr = GrsψRψ
TL
1− λcGsr =
ψRψTL
1− λc, rank 1
withψR = GrsψR , ψTL = ψTLGsr
Gqr = Grs
[Qc
∞∑l=0
G lss
]Gsr ⇒ indirect links
Typically: Gpr is numerically dominant but
Gqr has a more interesting structure allowing to identify friends/followers.
16
http://www.quantware.ups-tlse.fr/QWLIB/wikipolitnet/
Application : Main network = Wikipedia 2013, different language editions.Groups = leading 20/40 politicians of certain countries or G20 state leaders.
Node density in lnK − lnK∗−plane:
20 US, Enwiki
20 UK, Enwiki
G20, Enwiki
40 DE, Dewiki
40 FR, Frwiki
20 RU, Ruwiki
17
Positions in K −K∗−plane
0
5
10
15
20
0 5 10 15 20
K*
K
Enwiki G20 EN
PutinErdogan
ObamaHarper
KirchnerCameron
RousseffHollande
SinghGillard
JintaoMyung−bak
CalderónZuma
MerkelAbdullah of Saudi Arabia
YudhoyonoMonti
NodaBarroso
0
5
10
15
20
25
30
35
40
0 5 10 15 20 25 30 35 40K
*
K
Frwiki Politicians FR
SarkozyHollande
Jean−Marie Le PenMélenchon
BayrouJuppé
Kosciusko−MorizetFillon
DelanoëFabius
MontebourgBorloo
RoyalStrauss−Kahn
Marine Le PenAubry
Datide Villepin
LangJoly
MoscoviciCopé
Cohn−BenditHortefeux
DuflotGuigou
PécresseLagarde
VallsTaubira
YadeRaffarin
Dupont−AignanMorin
BesancenotSapin
BartoloneGuaino
GuéantPhilippot
18
Enwiki G20 state leader
1− λc = 2.465× 10−4
GR
Grr
Gpr
Gqr
19
Grr Enwiki G20 EN
Obama
Ob
ama
Putin
Pu
tin
Cameron
Cam
eron
Harper
Harp
er
Merkel
Merk
el
Singh
Sin
gh
Jintao
Jintao
Hollande
Ho
lland
e
Gillard
Gillard
Barroso
Barro
so
Zuma
Zu
ma
MontiM
on
tiKirchner
Kirch
ner
Erdogan
Erd
og
an
Calderón
Cald
erón
Abdullah
Ab
du
llah
Yudhoyono
Yu
dh
oy
on
o
Rousseff
Ro
usseff
Myung−bak
My
un
g−
bak
Noda
No
da
20
Gqr Enwiki G20 EN
Obama
Obam
a
Putin
Putin
Cameron
Cam
eron
Harper
Harp
er
Merkel
Merk
el
Singh
Sin
gh
Jintao
Jintao
Hollande
Hollan
de
Gillard
Gillard
Barroso
Barro
so
Zuma
Zum
a
Monti
Monti
Kirchner
Kirch
ner
Erdogan
Erd
ogan
Calderón
Cald
erón
Abdullah
Abdullah
Yudhoyono
Yudhoyono
RousseffR
ousseff
Myung−bak
Myung−
bak
Noda
Noda
G20 EN EnwikiName Friends FollowersObama Putin Noda
Merkel AbdullahCalderón Myung-bak
Putin Merkel NodaObama Myung-bakBarroso Merkel
Cameron Putin GillardObama BarrosoMerkel Hollande
Harper Obama MerkelCameron GillardPutin Myung-bak
Merkel Barroso HollandePutin MontiObama Kirchner
21
Gqr Frwiki Politicians FR
Sarkozy
Sark
ozy
Hollande
Ho
lland
e
Jean−Marie Le Pen
Jean−
Marie L
e Pen
Royal
Ro
yal
Raffarin
Raffarin
Villepin
Villep
in
Fillon
Fillo
n
Bayrou
Bay
rou
Fabius
Fab
ius
Strauss−Kahn
Strau
ss−K
ahn
Lang
Lan
g
Juppé
Jup
pé
Borloo
Bo
rloo
Delanoë
Delan
oë
Mélenchon
Mélen
cho
n
Marine Le Pen
Marin
e Le P
en
Lagarde
Lag
arde
Aubry
Au
bry
Cohn−Bendit
Co
hn
−B
end
it
Pécresse
Pécresse
Copé
Co
pé
Kosciusko−Morizet
Ko
sciusk
o−
Mo
rizet
Montebourg
Mo
nteb
ou
rg
Bartolone
Barto
lon
e
Dati
Dati
Besancenot
Besan
ceno
t
Dupont−Aignan
Du
po
nt−
Aig
nan
Joly
Joly
Taubira
Tau
bira
Guigou
Gu
igo
u
Hortefeux
Ho
rtefeux
Yade
Yad
e
Moscovici
Mo
scov
ici
Valls
Valls
GuéantG
uéan
tMorin
Mo
rin
Duflot
Du
flot
Sapin
Sap
in
Guaino
Gu
aino
Philippot
Ph
ilipp
ot
Politicians FR FrwikiName Friends FollowersSarkozy Fillon Hortefeux
J.-M. Le Pen PécresseHollande Yade
Hollande Sarkozy RoyalRoyal AubryFabius Fabius
J.-M. Le Pen Sarkozy PhilippotM. Le Pen M. Le PenBayrou Taubira
Royal Hollande MoscoviciSarkozy PhilippotFabius Hollande
Raffarin Sarkozy CopéJuppé PécresseFillon Hortefeux
22
GR
Frwiki Friends
12
3
4 5
15
19
SarkozyHollande
JMLePenMélenchon
Cohn−Bendit
GR
Frwiki Followers
12
3
47
8
9
11
14
15
1618
19
22
23
24
25
26
27
28
29
3031 32
33
34 35
36
37
38
39
40
SarkozyHollande
JMLePenMélenchon
Cohn−Bendit
Gqr Frwiki Friends
12
3
4 5 7
8
911
12
1516
18
19
28
SarkozyHollande
JMLePenMélenchon
Cohn−Bendit
Gqr Frwiki Followers
12
3
4
89
11
13
15
16
1718
19
20
21
22
23
2425
26
27
28
29
30
3132
33
34
3536
37
3839
40
SarkozyHollande
JMLePenMélenchon
Cohn−Bendit
(Other applications of GR → talk of K. Jaffrès-Runser.)
23
Ising-PageRankIsing model: H = −J
∑<i,j>
SiSj − h∑i
Si , Si = ±1.
(Ising, Z. Phys., 31, 253 (1925))
Ising model of Google matrix:(work in progress)Double network size (of a given network such as Wikipedia etc.) into red and bluenodes and attribute to each node i a preference with probability wr (or wb = 1− wr)to link to other red (blue) nodes:
wb
wr
i j
i j
i j
24
Preferential vector for dangling nodes or damping factor :
1
N
1...1
→ 1
N
wrwb...wrwb
PageRank Pr(i) (or Pb(i)) for red (blue) nodes.
Vote: Vr = #{nodes i | Pr(i) > Pb(i)}/N (for english Wikipedia 2017):
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Vr
wr
25
Effect of Elite Nodes
Different probabilities of red preference wr,elite for NEl elite nodes and wr for othernodes. Elite nodes are selected as NEl top nodes according to K-Rank (orK∗-Rank or 2D-Rank).
Vote modification: ∆Vr = Vr,El − Vr(for english Wikipedia 2017 with NEl = 1000):
wr,elite = 0
-0.02
-0.015
-0.01
-0.005
0
0 0.2 0.4 0.6 0.8 1
∆V
r
wr
K
K*
2D
Elite = top 1000-K-Rank nodes
wr,
eli
te
rw
−1 10.5−0.5 0
0.5
0
0 0.5 1
26
Conclusions
• Google matrix constructed from directed networks (WWW, Wikipedia, Twitter,Linux kernel, PR citation network etc.) with efficient computation of PageRank,leading complex eigenvalues (also exploiting the structure of invariantsubspaces) and some eigenvectors.
• Typical power law localization of PageRank but also examples of quasiexponential localization.
• Weyl fractal scaling for certain networks (Linux kernel, PR citation network,certain Ulam networks).
• Different simple models of random PF matrices do not describe the spectra ofrealistic Google matrices.
• Approach of reduced Google matrix GR for sub-networks of Wikipedia etc.Decomposition of GR in three contributions; construction of friend/followernetwork using GR or Gqr; different language editions of Wikipedia allow to takeinto account multi-cultural aspects.
• Ising-PageRank for networks with a doubled number of (red and blue) nodes.Effect of selected elite nodes on the vote.
27