page rank - users.cs.fiu.edugiri/teach/5768/f19/lecs/unitx8-pagerank.pdfpagerank! essential...

25
Page Rank

Upload: others

Post on 01-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

Page Rank

Page 2: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

PageRank & Link AnalysisLawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. ”The PageRank citation ranking: Bringing order to the web.” 1999.

Page 3: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Early Search Engines

! Crawl the web, collect terms, build inverted index (term to URL mapping)

! Spammers found tricks to beat the system ❑ Add irrelevant terms to URL (Term Spam) ❑ Copy top hit URL to your page

11/18/19

"3

Page 4: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Importance of Web Pages

! If more links into X, then X is important ! If important pages link to X, then X is important

❑ Chicken and Egg problem for importance

! Random Walk Idea ❑ Simulate random walk & count # of recurring visits ❑ Spam farm problem to trap random walker

11/18/19

"4

Page 5: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Random Walks & PageRank

! Perform random walks ! Importance of page is proportional to how often you visit a node ! During web search, prefer to report important pages

11/18/19

"5

Page 6: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Example

11/18/19

"6

Page 7: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Example

11/18/19

"7

Page 8: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Example

11/18/19

"8

Page 9: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Random Walks

11/18/19

"9

Page 10: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Computing the Stationary Prob

11/18/19

"10

Page 11: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Random Walks w/ Dead Ends

11/18/19

"11

Page 12: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Rand Walks w/o Dead Ends

11/18/19

"12

Page 13: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Rand Walk w/ Teleporting (0.8)

11/18/19

"13

Page 14: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

PageRank

! Essential component of Google search engine ! PageRank allows efficient and stable prioritization of search results ! Vote of Confidence Principle

❑ PageRank of a web page will be high if it is linked to other highly ranked pages

! Random Walk / Markov Chain Analogy ❑ Which pages are most visited on a random walk?

! Teleportation Analogy ❑ Models when a user jumps next to an unlinked page

Page 15: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

Random Walk + Teleportation Analogy

Page 16: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Implementing PageRank

! Even v may not fit in memory ! Use MapReduce

11/18/19

"16

Page 17: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Partitioning for MapReduce

11/18/19

"17

Page 18: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

Impact of teleportation parameter

10 Wikipedia pages with highest PageRank [Gleich, ‘09]

Page 19: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

Applications of PageRank

! Clustering [Andersen ‘06] [reset to same page] ! Sports Ranking [Govan ‘08] ! Bioinformatics – GeneRank [Morrison ‘05], ProteinRank [Freschi ‘07] ! Network Alignment [Singh ‘07] ! Literature – BookRank; Bibliometrics – CiteRank, AuthorRank, TimedPageRank; ! Information Systems – PopRank, FactRank, ObjectRank, FolkRank ! Recommender Systems – ItemRank ! Social Networks – BuddyRank, TwitterRank ! Web – HostRank, DirRank, TrustRank, BadRank, VisualRank

Page 20: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

Mathematics of PageRank

! Given: Pij = prob of transition from j to i ! PageRank is given by solution x to equation

❑ (αP + (1-α)veT)x = x ❑ Thus x is eigenvector of a certain matrix ❑ Alternative equation: (I-αP)x = (1-α)v ❑ Eigenvectors can be computed using ❑ ge et al. used an iterative powering method:

▪ x(k+1) = αPx(k) + (1-α)v ❑ Convergence in 3656 iterations to within 10-16 error

Random Walk Teleportation

Page 21: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

Variants of PageRank: Localized

Page 22: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Topic-Sensitive PageRank

! Teleport to a URL with same topic

11/18/19

"22

Page 23: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Variants of PageRank

! Hubs & Authorities ❑ Authorities are nodes with information

▪ E.g., Course webpage

❑ Hubs provide links to authorities ▪ E.g., list of courses offered

! Good Authorities linked from Good Hubs ! Good Hubs link to Good Authorities

11/18/19

"23

Page 24: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

CAP 5510 / CGS 5166

Computing h and a

11/18/19

"24

Page 25: Page Rank - users.cs.fiu.edugiri/teach/5768/F19/lecs/UnitX8-PageRank.pdfPageRank! Essential component of Google search engine ! PageRank allows efficient and stable prioritization

Variants of PageRank

! Reverse PageRank (follow inlinks, not outlinks) ! Dirichlet PageRank (fix importance of subset) ! Weighted PageRank ! Undirected PageRank ! Timed PageRank ! PageTrust