pagerank(2)
TRANSCRIPT
![Page 1: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/1.jpg)
The PageRank Citation Ranking:The PageRank Citation Ranking:Bringing Order to the WebBringing Order to the Web
Larry Page etc.
Stanford University
Presented by
Guoqiang Su & Wei Li
![Page 2: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/2.jpg)
ContentsContents
Motivation Related work Page Rank & Random Surfer Model Implementation Application Conclusion
![Page 3: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/3.jpg)
MotivationMotivation
Web: heterogeneous and unstructured Free of quality control on the web Commercial interest to manipulate ranking
![Page 4: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/4.jpg)
Related WorkRelated Work
Academic citation analysis Link-based analysis Clustering methods of link structure Hubs & Authorities Model
![Page 5: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/5.jpg)
BacklinkBacklink
Link Structure of the Web Approximation of importance / quality
![Page 6: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/6.jpg)
PageRankPageRank
Pages with lots of backlinks are important Backlinks coming from important pages
convey more importance to a page
Problem: Rank Sink
∑∈
=uBv vN
vRcuR
)()(
![Page 7: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/7.jpg)
Rank SinkRank Sink Page cycles pointed by some incoming link
Problem: this loop will accumulate rank but never distribute any rank outside
![Page 8: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/8.jpg)
Escape TermEscape Term
Solution: Rank Source
c is maximized and = 1 E(u) is some vector over the web pages
– uniform, favorite page etc.
)()(
)( ucEN
vRcuR
uBv v
+= ∑∈
1R
![Page 9: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/9.jpg)
Matrix NotationMatrix Notation
R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized
ReEAcR TT )( ×+=
)( TeEA ×+
![Page 10: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/10.jpg)
Computing PageRankComputing PageRank
- initialize vector over web pages
loop:
- new ranks sum of normalized backlink ranks
- compute normalizing factor
- add escape term
- control parameter
while - stop when converged
SR ←0
iT
i RAR ←+1
111 +−← ii RRd
dERR ii +← ++ 11
ii RR −← +1σ
εσ >
![Page 11: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/11.jpg)
Random Surfer ModelRandom Surfer Model
Page Rank corresponds to the probability distribution of a random walk on the web graphs
E(u) can be re-phrased as the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever
![Page 12: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/12.jpg)
ImplementationImplementation Computing resources — 24 million pages — 75 million URLs
Memory and disk storage
Weight Vector
(4 byte float)
Matrix A (linear access)
![Page 13: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/13.jpg)
Implementation (Con't)Implementation (Con't)
Unique integer ID for each URL Sort and Remove dangling links Rank initial assignment Iteration until convergence Add back dangling links and Re-compute
![Page 14: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/14.jpg)
Convergence PropertiesConvergence Properties Graph (V, E) is an expander with factor α if
for all (not too large) subsets S: |As|≥ α|s| Eigenvalue separation: Largest eigenvalue
is sufficiently larger than the second-largest eigenvalue
Random walk converges fast to a limiting probability distribution on a set of nodes in the graph.
![Page 15: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/15.jpg)
Convergence Properties (con't)Convergence Properties (con't) PageRank computation is O(log(|V|)) due to
rapidly mixing graph G of the web.
![Page 16: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/16.jpg)
Personalized PageRankPersonalized PageRank Rank Source E can be initialized :
– uniformly over all pages: e.g. copyright warnings, disclaimers, mailing lists archives
✦ result in overly high ranking– total weight on a single page, e.g. Netscape, McCarthy
✦ great variation of ranks under different single pages as rank source
– and everything in-between, e.g. server root pages
✦ allow manipulation by commercial interests
![Page 17: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/17.jpg)
Applications IApplications I Estimate web traffic
– Server/page aliases
– Link/traffic disparity, e.g. porn sites, free web-mail
Backlink predictor– Citation counts have been used to predict future citations
– very difficult to map the citation structure of the web completely
– avoid the local maxima that citation counts get stuck in and get better performance
![Page 18: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/18.jpg)
Applications II - Ranking ProxyApplications II - Ranking Proxy
Surfer's Navigation Aid
Annotating links by PageRank (bar graph)
Not query dependent
![Page 19: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/19.jpg)
IssuesIssues Users are no random walkers – Content based methods
Starting point distribution – Actual usage data as starting vector
Reinforcing effects/bias towards main pages How about traffic to ranking pages? No query specific rank Linkage spam – PageRank favors pages that managed to get other pages to link to them – Linkage not necessarily a sign of relevancy, only of promotion (advertisement…)
![Page 20: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/20.jpg)
Evaluation IEvaluation I
![Page 21: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/21.jpg)
Evaluation IIEvaluation II
![Page 22: Pagerank(2)](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559c1b271a28ab2c598b4885/html5/thumbnails/22.jpg)
ConclusionConclusion PageRank is a global ranking based on the
web's graph structure PageRank use backlinks information to
bring order to the web PageRank can separate out representative
pages as cluster center A great variety of applications