ch. 14: link analysis and web search - people | mit csaillink analysis: voting by in-links no...
TRANSCRIPT
![Page 1: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/1.jpg)
Ch. 14: Link Analysis and Web Search
Bishesh Khadka
![Page 2: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/2.jpg)
Search “MIT” ---> How do we get to www.mit.edu?
![Page 3: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/3.jpg)
Problem of Ranking
● Search is hard○ Information retrieval systems○ Keywords
■ Synonymy■ Polysemy
● Ranking on web is harder○ Abundance of information○ Content credibility
![Page 4: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/4.jpg)
Link Analysis: Voting by In-Links
● No intrinsic “rank” value in web pages● Aggregate the number of In-Links
○ In-Links = Endorsements● Algorithm:
1. Find “sample” of “relevant” pages2. Aggregate In-Links3. Rank based on In-Link counts
![Page 5: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/5.jpg)
![Page 6: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/6.jpg)
Link Analysis: List-Finding Technique
● In-Link voting isn’t perfect○ Skewed to pages with most In-Links○ Even irrelevant ones
![Page 7: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/7.jpg)
![Page 8: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/8.jpg)
Link Analysis: List-Finding Technique
● In-Link voting isn’t perfect○ Skewed to pages with most In-Links○ Even irrelevant ones
● “Hub” pages● “score” is sum of votes for pages it points to
![Page 9: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/9.jpg)
![Page 10: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/10.jpg)
Link Analysis: Repeated Improvement
● Intuition: Lists with links to “good” sites are credible● Pages with list compilations are “hubs”● Pages these hubs point to are “authorities”● Algorithm:
1. All hubs and auths have score 1 2. For k iterations:
● ∀ auth page p: auth(p) = ∑ hub(j) ∀ j hubs that have voted for p
● ∀ hub page p: hub(p) = ∑ auth(j) ∀ j auths that p has voted for
![Page 11: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/11.jpg)
![Page 12: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/12.jpg)
Link Analysis: Repeated Improvement
● Hub and auth scores normalized between each set of pages● Scores stabilize as k gets large
![Page 13: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/13.jpg)
![Page 14: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/14.jpg)
PageRank
● Intuition: a page is important if it is cited by other important pages● Algorithm:
1. ∀ page i PageRanki = 12. For k iterations:
● ∀ page i send PageRanki / (# outgoing edges in i) to every outgoing edge
● Update all PageRank values to be ∑ received
![Page 15: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/15.jpg)
![Page 16: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/16.jpg)
![Page 17: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/17.jpg)
PageRank: Scaled
● Invalid nodes can end up with all the PageRank
![Page 18: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/18.jpg)
![Page 19: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/19.jpg)
PageRank: Scaled
● Invalid nodes can end up with all the PageRank● Intuition: all water going to deepest point● Scaled Algorithm:
1. ∀ page i PageRanki = 12. For k iterations:
● Perform normal PageRank updates● Scale all PageRanks by factor s● Add (1-s)/n PageRanks to all nodes
S = 0.8 - 0.9 in practice
![Page 20: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/20.jpg)
PageRank: Random Walk Definition
● The probability of being at a page X after k steps of random walk is precisely the PageRank of X after k applications of the Basic PageRank Update Rule
● Scaled: with probability s the traveler follows random edge as before, but with probability 1 - s the traveler jumps to any random node
● Proof in 14.6
![Page 21: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/21.jpg)
Link Analysis: Beyond Web
● Authority in network structures● Publications● Supreme Court Cases
![Page 22: Ch. 14: Link Analysis and Web Search - People | MIT CSAILLink Analysis: Voting by In-Links No intrinsic “rank” value in web pages Aggregate the number of In-Links In-Links = Endorsements](https://reader033.vdocuments.mx/reader033/viewer/2022060911/60a5c661978d71105c4197d3/html5/thumbnails/22.jpg)
● References ○ D. Easley, J. Kleinberg, Networks, Crowds, and Markets:
Reasoning About a Highly Connected World, Cambridge University Press, Cambridge, UK, 2010