mathematical foundations for search and surf engines
Post on 05-Feb-2016
29 Views
Preview:
DESCRIPTION
TRANSCRIPT
Mathematical Foundations for Search and Surf Engines
Jean-Louis Lassez
Ryan Rossi
Lecture 1 & 2
• Introduction• Ergodic Theorem• Perron-Frobenius Theorem• Power Method• Foundations of PageRank
Search EnginesSearch engine: deterministicRanking of sitesMathematical foundations: Markov and PerronFrobenius
Surf EnginesNon deterministic: window shoppingRanking of linksMathematical foundations: Singular ValueDecomposition
Initial approach to ranking sites:the in-degree heuristic
1
2
6 7
5
3
8
4
Google’s PageRank approach
Problem: rank the sites in order of most visited
Another Approach: Hubs and Authorities
Authorities: sites that contain the mostimportant information
Hubs: sites that provide directions to theauthorities
AH
Ranking Hyperlinks
• Local• Global• Update
Local…..
…..
…..
…..
?
?
?
Global
??
…..…
..…
..
…..
2005
A
C
D F
E
B
G
H
J
I
K
2006
L
Updated
Part I: Ranking Sites
• The Web as a Markov Chain• The Ergodic Theorem• Perron-Frobenius Theorem• Algorithms: PageRank, HITS, SALSA
Markov Chains
• Text Analysis• Speech Recognition• Statistical Mechanics• Decision Science: Medicine, Commerce
….more recently…..
Bioinformatics
M1 M2 M3
d1
I1
d2
I2
d1
I1
Internet
Markov’s Ergodic Theorem (1906)
Any irreducible, finite, aperiodic Markov Chain has all states Ergodic (reachable at any time in the future) and has a unique stationary distribution, which is a probability vector.
s1
.45
1s2
s3 s4
.25
.6
.75.55.4
Probabilities after 15 stepsX1 = .33X2 = .26X3 = .26X4 = .13
30 stepsX1 = .33 X2 = .26X3 = .23X4 = .16
100 stepsX1 = .37 X2 = .29X3 = .21X4 = .13
500 stepsX1 = .38 X2 = .28X3 = .21X4 = .13
1000 stepsX1 = .38 X2 = .28X3 = .21X4 = .13
s1
.45
1s2
s3 s4
.25
.6
.75.55.4
Probabilities after 15 stepsX1 = .46X2 = .20X3 = .26X4 = .06
30 stepsX1 = .36 X2 = .26X3 = .23X4 = .13
100 stepsX1 = .38 X2 = .28X3 = .21X4 = .13
500 stepsX1 = .38 X2 = .28X3 = .21X4 = .13
1000 stepsX1 = .38 X2 = .28X3 = .21X4 = .13
p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2
p13x1 + p23x2 + p33x3 = x3
∑xi = 1 xi ≥ 0
p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2
p13x1 + p23x2 + p33x3 = x3
∑xi = 1 xi ≥ 0
X1, X2, X3 are probabilities
p11x1 + p21x2 + p31x3 = x1 p12x1 + p22x2 + p32x3 = x2
p13x1 + p23x2 + p33x3 = x3
∑xi = 1 xi ≥ 0
Probability of going to S1 from S2
Steady State Constraints
Solved by Power Iteration Method Based on Perron-Frobenius
Theorem
Where e is an initial vector and M is the stochasticmatrix associated to the system.
eM n
n lim
Power Method Example
M =
0 1 0 01/2 0 1/2 01/2 0 0 1/21 0 0 0
1
4 3
2
e =
1111
0.36360.36360.18180.0909
eM n
n lim
Power Method Example
M =
0 1 0 01/2 0 1/2 01/2 0 0 1/21 0 0 0
1
4 3
2
e =
210073
0.36360.36360.18180.0909
eM n
n lim
Difficulties• Proof, Design and Implementation
• Notion of convergence• Deal with complex numbers• Unique solution………
Furthermore, it does not always work
The process does not converge, even though the solutionis obvious.
1 6
3 452
Perron-Frobenius Theorem: Is it really
necessary?
Secret Weapon:
• Fourier• Gauss• Tarski• Robinson
The Power of Elimination
The elimination of the proof is an idealseldom reached
Elimination gives us the heart of the proof
Symbolic Gaussian EliminationThree variables:
p11x1 + p21x2 + p31x3 = x1
p12x1 + p22x2 + p32x3 = x2
p13x1 + p23x2 + p33x3 = x3
∑xi = 1
with Maple we get:x1 = (p31p21 + p31p23 + p32p21) / Σ
x2 = (p13p32 + p12p31 + p12p32) / Σ
x3 = (p13p21 + p12p23 + p13p23) / Σ
Σ = (p31p21 + p31p23 + p32p21 + p13p32 + p12p31 + p12p32 +
p13p21 + p12p23 + p13p23)
s1
p12
p21p11
s2p22
s3
p33
p13p23
p31 p32
x1 = (p31p21 + p23p31 + p32p21) / Σx2 = (p13p32 + p31p12 + p12p32) / Σx3 = (p21p13 + p12p23 + p13p23) / Σ
s1
p12
p21p11
s2p22
s3
p33
p13p23
p31 p32
x1 = (p31p21 + p23p31 + p32p21) / Σx2 = (p13p32 + p31p12 + p12p32) / Σx3 = (p21p13 + p12p23 + p13p23) / Σ
s1
p12
p21p11
s2p22
s3
p33
p13p23
p31 p32
x1 = (p31p21 + p23p31 + p32p21) / Σx2 = (p13p32 + p31p12 + p12p32) / Σx3 = (p21p13 + p12p23 + p13p23) / Σ
s1
p12
p21p11
s2p22
s3
p33
p13p23
p31 p32
x1 = (p31p21 + p23p31 + p32p21) / Σx2 = (p13p32 + p31p12 + p12p32) / Σx3 = (p21p13 + p12p23 + p13p23) / Σ
s1
p12
p21p11
s2p22
s3
p33
p13p23
p31 p32
x1 = (p31p21 + p23p31 + p32p21) / Σx2 = (p13p32 + p31p12 + p12p32) / Σx3 = (p21p13 + p12p23 + p13p23) / Σ
s1
p12
p21p11
s2p22
s3
p33
p13p23
p31 p32
x1 = (p31p21 + p23p31 + p32p21) / Σx2 = (p13p32 + p31p12 + p12p32) / Σx3 = (p21p13 + p12p23 + p13p23) / Σ
Symbolic Gaussian Elimination
System with four variables:
p21x2 + p31x3 + p41x4 = x1
p12x1 + p42x4 = x2
p13x1 = x3
p34x3= x4
∑xi = 1
s1
p12
p21s2
s3
p31
s4
p41
p34
p42p13
x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σx2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σx3 = p41p21p13 + p42p21p13 / Σx4 = p21p13p34 / Σ
Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12
+ p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34
with Maple we get:
s1
p12
p21s2
s3
p31
s4
p41
p34
p42p13
x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σx2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σx3 = p41p21p13 + p42p21p13 / Σx4 = p21p13p34 / Σ
Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12
+ p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34
s1
p12
p21s2
s3
p31
s4
p41
p34
p42p13
x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σx2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σx3 = p41p21p13 + p42p21p13 / Σx4 = p21p13p34 / Σ
Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12
+ p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34
s1
p12
p21s2
s3
p31
s4
p41
p34
p42p13
x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σx2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σx3 = p41p21p13 + p42p21p13 / Σx4 = p21p13p34 / Σ
Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12
+ p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34
s1
p12
p21s2
s3
p31
s4
p41
p34
p42p13
x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σx2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σx3 = p41p21p13 + p42p21p13 / Σx4 = p21p13p34 / Σ
Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12
+ p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34
Do you see the LIGHT?
James Brown (The Blues Brothers)
s1
p12
p21s2
s3
p31
s4
p41
p34
p42p13
x1 = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 / Σx2 = p31p41p12 + p31p42p12 + p34p41p12 + p34p42p12 + p13p34p42 / Σx3 = p41p21p13 + p42p21p13 / Σx4 = p21p13p34 / Σ
Σ = p21p34p41 + p34p42p21 + p21p31p41 + p31p42p21 + p31p41p12 + p31p42p12 + p34p41p12
+ p34p42p12 + p13p34p42 + p41p21p13 + p42p21p13 + p21p13p34
Ergodic Theorem Revisited
If there exists a reverse spanning tree in a graph of theMarkov chain associated to a stochastic system, then:
(a)the stochastic system admits the followingprobability vector as a solution:
(b) the solution is unique.(c) the conditions {xi ≥ 0}i=1,n are redundant and thesolution can be computed by Gaussian elimination.
Cycle
This system is now covered by the proof
1
6
3
4
5
2
Markov Chain as a Conservation System
s1
p12
p21p11s2
p22
s3
p33
p13 p 23
p31p32
p11x1 + p21x2 + p31x3 = x1(p11 + p12 + p13) p12x1 + p22x2 + p32x3 = x2(p21 + p22 + p23)p13x1 + p23x2 + p33x3 = x3(p31 + p32 + p33)
Kirchoff’s Current Law (1847)• The sum of currents
flowing towards a node is equal to the sum of currents flowing away from the node.
i31 + i21 = i12 + i13
i32 + i12 = i21
i13 = i31 + i32
1 2
3
i13
i31 i32
i12
i21
i3 + i2 = i1 + i4
Kirchhoff’s Matrix Tree Theorem (1847)
The theorem allows us to calculate the numberof spanning trees of a connected graph
Two theorems for the price of one!!DifferencesErgodic Theorem:The symbolic proof is simpler and moreappropriate than using Perron-Frobenius
Kirchhoff Theorem:Our version calculates the spanning trees andnot their total sum.
Internet Sites• Kleinberg• Google• SALSA
• Indegree Heuristic
Google PageRank Patent
• “The rank of a page can be interpreted as the probability that a surfer will be at the page after following a large number of forward links.”
The Ergodic Theorem
Google PageRank Patent
• “The iteration circulates the probability through the linked nodes like energy flows through a circuit and accumulates in important places.”
Kirchoff (1847)
Rank Sinks
5
6
7
1
2
3
4
No Spanning Tree
References• Kirchhoff, G. "Über die Auflösung der Gleichungen, auf welche man bei
der untersuchung der linearen verteilung galvanischer Ströme geführt wird." Ann. Phys. Chem. 72, 497-508, 1847.
• А. А. Марков. "Распространение закона больших чисел на величины, зависящие друг от друга". "Известия Физико-математического общества при Казанском университете", 2-я серия, том 15, ст. 135-156, 1906.
• H. Minkowski, Geometrie der Zahlen (Leipzig, 1896).• J. Farkas, "Theorie der einfachen Ungleichungen," J. F. Reine u. Ang. Mat.,
124, 1-27, 1902.• H. Weyl, "Elementare Theorie der konvexen Polyeder," Comm. Helvet., 7,
290-306, 1935.• A. Charnes, W. W. Cooper, “The Strong Minkowski Farkas-Weyl Theorem
for Vector Spaces Over Ordered Fields,” Proceedings of the National Academy of Sciences, pp. 914-916, 1958.
References• E. Steinitz. Bedingt Konvergente Reihen und Konvexe Systeme. J. reine
angew. Math., 146:1-52, 1916.• K. Jeev, J-L. Lassez: Symbolic Stochastic Systems. MSV/AMCS 2004: 321-
328• V. Chandru, J-L. Lassez: Qualitative Theorem Proving in Linear Constraints.
Verification: Theory and Practice 2003: 395-406• Jean-Louis Lassez: From LP to LP: Programming with Constraints. DBPL
1991 : 257-283
top related