the pagerank citation ranking: bringing order to the web

Post on 01-Jan-2016

22 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

The PageRank Citation Ranking: Bringing Order to the Web. Larry Page etc. Stanford University, Technical Report 1998 Presented by: Ratiya Komalarachun. Contents. Motivation Related work Background Knowledge Page Rank & Random Surfer Model Implementation Application Conclusion. - PowerPoint PPT Presentation

TRANSCRIPT

The PageRank Citation Ranking:The PageRank Citation Ranking:Bringing Order to the WebBringing Order to the Web

Larry Page etc.Stanford University, Technical Report 1998

Presented by:Ratiya Komalarachun

2

ContentsContents

Motivation Related work Background Knowledge Page Rank & Random Surfer Model Implementation Application Conclusion

3

MotivationMotivation

Web: heterogeneous and unstructured

Free of quality control on the web

Commercial interest to manipulate ranking

4

Related WorkRelated Work Academic citation analysis

Link based analysis

Clustering methods of link structure

Hubs & Authorities Model based on an eigenvector calculation

5

hubs

Hubs & Authorities ModelHubs & Authorities Model

authorities

6

Hubs & Authorities ModelHubs & Authorities Model

Mutually reinforcing relationship

“A good hub is a page that points to many good authorities”

“A good authority is a page that is pointed by many good hub”

7

Link Structure of the WebLink Structure of the Web Forward links (outedges) Backlinks (inedges) Approximation of importance /

quality

8

PageRankPageRank A page has high rank if the sum of

the ranks of its backlinks is high

Backlinks coming from important pages convey more importance to a page

Problem: Dangling Links, Rank Sink

9

Dangling LinksDangling Links

10

PageRank CalculationPageRank Calculation

uBv vN

vRcuR

)()(

Given: R(u) = Rank of u, R(v) = Rank of v,

c < 1 (used for normalization) Nv = number of link from v

Bu = the set of pages that point to u

11

PageRank CalculationPageRank Calculation

100 50

50

9

3

3

3

53

50

12

Page cycles pointed by some incoming link

Problem: Ranking increase, don’t effect any rank outside

Rank SinkRank Sink

.6

.6

.6

.6

13

Escape TermEscape Term Solution: Rank Source

E(u) is some vector over the web pages– uniform, favorite page etc.

)()(

)( ucEN

vRcuR

uBv v

14

R is the dominant eigenvector and c is the dominant eigenvalue of because c is maximized

Matrix NotationMatrix Notation

ReEAcR TT )(

15

Computing PageRankComputing PageRank - initialize vector over web pages Loop: - new ranks sum of normalized backlink ranks

- compute normalizing factor

- add escape term

- control parameter

While - stop when converged

SR 0

iT

i RAR 1

111 ii RRd

dERR ii 11

ii RR 1

16

Page Rank vs. Random Surfer Model

E(u) = “the random surfer gets bored periodically and jumps to a different page and not kept in a loop forever”

Random Surfer ModelRandom Surfer Model

17

ImplementationImplementation Computing resources — 24 million pages — 75 million URLs

— Process 550 pages/sec Memory and disk storage

Weight Vector (4 byte float)

Matrix A (linear access)

18

ImplementationImplementation

Assign a unique integer ID Sort and Remove dangling links Rank initial assignment Iteration until convergence Add back dangling links and Re-

compute

19

Convergence PropertiesConvergence Properties

Using theory of random walks on graphs

O(log(|V|)) due to rapidly mixing graph G of the web.

20

Convergence PropertiesConvergence Properties

21

Searching with PageRankSearching with PageRank

Using title search

Comparing with Altavista

22

Sample ResultsSample Results

23

Some Applications Some Applications

Estimate web traffic

Backlink predictor

User Navigation

24

ConclusionConclusion PageRank is a global ranking based

on the web's graph structure PageRank uses backlinks

information to bring order to the web

PageRank can separate out representative pages as cluster center

A great variety of applications

top related