google’s pagerank by zack kenz. outline intro to web searching review of linear algebra weather...

29
Google’s PageRank By Zack Kenz

Upload: laura-bond

Post on 17-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Google’s PageRank

By Zack Kenz

Page 2: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Outline

Intro to web searchingReview of Linear AlgebraWeather exampleBasics of PageRankSolving the Google MatrixCalculating the PageRankWrapping up

Page 3: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Some Search Engine History

Early basis of searching was on page content only

Bonuses for word placementPaying for placementNatural language searches (Think: Ask

Jeeves)Meta search engines

Page 4: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Why Google?

No one exploited the link structure of the internet

Relatively easy to exploit content-based engines with concealed text

Adaptive to a growing internet

Simpler, faster

Page 5: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

PageRank, According to Google

“PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B.”

“Google looks at considerably more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves ‘important’ weigh more heavily and help to make other pages ‘important.’”

Page 6: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Linear Algebra Terms

Row Stochastic MatrixEigenvector: A nonzero vector x such

that Ax=λx for a scalar λEigenvalue: A scalar λ that gives a

nontrivial solution x for Ax=λxDominant Eigenvalue (eigenvector)

Page 7: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Tomorrow’s Weather

Example on the board

Page 8: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Scoring Web Pages

Random web surfer

Goal: Assign a score to over 25 billion web pages, store the scores

Score based on the probability of going to a particular page

Page 9: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Surf’s Up!

Page 10: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Hyperlink Matrix

Page 11: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Hyperlink Matrix

Page 12: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Dangling Nodes

Page 13: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Dangling Nodes

Page 14: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Dangling Nodes

Page 15: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Web Link Surfer Matrix

Page 16: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

One More Fix

Need to account for the fact that a surfer can type in URLs instead of using links

Add in a personalization vector, When multiplied by a column vector of ones, we get an

additional personalization matrix

Page 17: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

One More Fix

Need to account for the fact that a surfer can type in URLs instead of using links

Add in a personalization vector, When multiplied by a column vector of ones, we get an

additional personalization matrix

Page 18: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Google Matrix

Recall

is a damping factor, usually .85

Page 19: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

The True Google Matrix?

Page 20: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Solution of the Google Matrix

Since the Google matrix is row stochastic, it has an eigenvalue of λ=1

λ=1 is biggest and not repeated Let be the corresponding eigenvector The eigensystem has a unique

solution for , then, is a row probability vector

Page 21: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Solution of the Google Matrix

Since the Google matrix is row stochastic, it has an eigenvalue of λ=1

λ=1 is biggest and not repeated Let be the corresponding eigenvector The eigensystem has a unique

solution for , then, is a row probability vector

contains every page’s PageRank

Page 22: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Computing Scores:The Linear Algebra Way

Recall

Page 23: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

λ = 1 is the dominant eigenvalue of G and is the dominant left eigenvector

As a result the power method applied to G converges to the PageRank vector

Given a starting vector like , the power method calculates successive iterates until a stopping condition is reached

Computing Scores:The Power Method

Page 24: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Speeding Things Up

Page 25: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Wrapping Up:The Overall Page Scoring

PageRank is still only a portion of what determines the order of search results

Results are based off of many factors, especially page content

Page 26: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Wrapping Up:Improving PageRank

Avoiding link spamming – tweak the personalization vector and α

Power method convergence algorithmsDummy node

Page 27: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Questions?

Page 28: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating

Questions?

Rebecca S Wills. Google’s PageRank: The Math Behind the Search Engine. Department of Mathematics, North Carolina State University. 1 May 2006.

Amy N. Langville and Carl D. Meyer. Fiddling with PageRank. Department of Mathematics, North Carolina State University. 15 August 2003

http://www.searchenginehistory.com/ http://www.google.com/technology/ and http://www.google.com David C Lay. Linear Algebra and Its Applications, 3ed. Pearson Education:

2003. Dr. Biebighauser

http://eperformance.co.uk/uploaded_images/google%20beta-786468.jpg http://webmechanics.uoregon.edu/Images/Surf%20web.jpg http://www.modmyifone.com/iphone_wallpapers/file.php?n=282&w=l http://www.smashingmagazine.com/images/pagerank/google-pagerank.jpg

Sources

Page 29: Google’s PageRank By Zack Kenz. Outline Intro to web searching Review of Linear Algebra Weather example Basics of PageRank Solving the Google Matrix Calculating