investigating google’s pagerank a lgorithm

1
Researchers: Erik Andersson [email protected] Per-Anders Ekström [email protected] Advisors: Lars Eldén [email protected] Maya G. Neytcheva [email protected] Investigating Google’s PageRank Algorithm Iterations for Convergence The Power method requires far more iterations to converge than the Arnoldi method. Execution Time The time for convergence is better for the restarted Arnoldi method than other tested methods. As alpha increases this difference becomes more evident. PageRank Explained A page is important if other important pages link to it . This is an eigenvector problem: The matrix Q describes the link structure. To assure a reasonable answer Q must be modified. Here d shows which pages lack outlinks and alpha determines the general probability of “teleporting” to a random Web page. Example The following small 6- page link structure would give us the following Q. Eigenvector methods used Power method + low memory demands slow for large alpha- values Arnoldi method + few iterations for convergence high memory demands increasing work for each iteration Restarted Arnoldi + fast for all alpha- values ± much less memory needed than for normal Arnold, but higher than the Power method Web-Crawler Written in Perl and used to retrieve the link structure of a specified domain. Project in course ”Scientific Computing 10p.” at the Division of Scientific Computing, Department of Information Technology, Uppsala University Contact: Lina von Sydow [email protected]

Upload: mariska-takacs

Post on 16-Mar-2016

33 views

Category:

Documents


1 download

DESCRIPTION

Iterations for Convergence The Power method requires far more iterations to converge than the Arnoldi method. Execution Time The time for convergence is better for the restarted Arnoldi method than other tested methods. As alpha increases this difference becomes more evident. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Investigating Google’s PageRank A lgorithm

Researchers:Erik [email protected]

Per-Anders Ekströ[email protected]

Advisors: Lars Eldén

[email protected]

Maya G. [email protected]

Investigating Google’s PageRank AlgorithmIterations for ConvergenceThe Power method requires far more iterations to converge than the Arnoldi method.

Execution Time The time for convergence is better for the restarted Arnoldi method than other tested methods.

As alpha increases this difference becomes more evident.

PageRank ExplainedA page is important if other important pages link to it.

This is an eigenvector problem:

The matrix Q describes the link structure. To assure a reasonable answer Q must be modified.

Here d shows which pages lack outlinks and alpha determines the general probability of “teleporting” to a random Web page.

Example The following small 6-page link structure wouldgive us the following Q.

Eigenvector methods usedPower method + low memory demands – slow for large alpha-values Arnoldi method + few iterations for convergence – high memory demands – increasing work for each iterationRestarted Arnoldi + fast for all alpha-values ± much less memory needed than for normal Arnold, but higher than the Power method

Web-Crawler Written in Perl and used to retrieve the link structure of a specified domain.

This is the link structure of it.uu.se.

Project in course ”Scientific Computing 10p.” at the Division of Scientific Computing, Department of Information Technology, Uppsala University

Contact:Lina von [email protected]