google pagerank

Post on 22-Nov-2014

9.386 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Explanation of Google's PageRank algorithm with some examples followed by a discussion of related Search engine optimisation (SEO) issues.

TRANSCRIPT

December 9, 2008

Google PageRank Prof. Beat Signer

Department of Computer Science

Vrije Universiteit Brussel

http://www.beatsigner.com

<bsigner@vub.ac.be>

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Overview

History of PageRank

PageRank algorithm

Examples

Implications for website development

2

December 9, 2008 Beat Signer, signer@inf.ethz.ch

History of PageRank

Developed as part of an academic

project at Stanford University

research platform to aid under-

standing of large-scale web data

and enable researches to easily

experiment with new search technologies

Larry Page and Sergey Brin worked on the project

about a new kind of search engine (1995-1998) which

finally led to a functional prototype called Google

3

Larry Page Sergey Brin

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Web Search Until 1998

Find all documents using a query term

use information retrieval (IR) solutions

ranking based on "on-page factors"

problem: poor quality of search results (order)

Page and Brin proposed to compute the

absolute qualtity of a page (PageRank)

based on the number and quality of pages

linking to a page (votes)

4

December 9, 2008 Beat Signer, signer@inf.ethz.ch

PageRank

A page has a high PageRank R if

there are many pages linking to it

or, if there are some pages with a high PageRank

linking to it

Total score = IR score x PageRank

5

P1

R1

P2

R2

P3

R3

P4

R4

P5

R5

P6

R6

P7

R7

P8

R8

December 9, 2008 Beat Signer, signer@inf.ethz.ch

PageRank Algorithm

where

Bi is the set of pages

that link to page Pi

Lj is the number of

outgoing links for page Pj

6

ij BP j

j

iL

PRPR

)()( P1 P2

P3

P1

1

P2

1

P3

1

P1

1.5

P2

1.5

P3

0.75

P1

1.5

P2

1.5

P3

0.75

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Matrix Representation

Let us define a hyperlink

matrix H

7

P1 P2

P3

otherwise0

if1 ijj

ij

BPLH

0210

001

1210

H iPRRand

HRR

R is an eigenvector of H

with eigenvalue 1

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Matrix Representation ...

We can use the power method to find R

8

tt HRR 1

0210

001

1210

HFor our example

this results in or 122R 2.04.04.0

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Dangling Pages

Problem with pages

that have no outbound

links (P2)

9

P1 P2

01

00H and 00R

210

210C

211

210CHSand

C

C

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Strongly Connected Pages (Graph)

Add new transition

probabilities between

all pages

with probability d we follow

the hyperlink structure S

with probability 1-d we

choose a random page

10

P1 P2

P3 P4

P5

S1G dn

d 1

1 GRR

1-d

1-d 1-d

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Examples

11

S1G dn

d 1

1

A1

0.26

A2

0.37

A3

0.37

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Examples ...

12

A1

0.13

A2

0.185

A3

0.185

B1

0.13

B2

0.185

B3

0.185

5.0AP 5.0BP

S1G dn

d 1

1

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Examples ...

13

A1

0.10

A2

0.14

A3

0.14

B1

0.22

B2

0.20

B3

0.20

38.0AP 62.0BP

S1G dn

d 1

1

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Examples ...

14

A1

0.3

A2

0.23

A3

0.18

B1

0.10

B2

0.095

B3

0.095

71.0AP 29.0BP

S1G dn

d 1

1

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Examples ...

15

A1

0.35

A2

0.24

A3

0.18

B1

0.09

B2

0.07

B3

0.07

77.0AP 23.0BP

S1G dn

d 1

1

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Examples ...

16

A1

0.33

A2

0.17

A3

0.175

B1

0.08

B2

0.06

B3

0.06

80.0AP

20.0BPA4

0.125

S1G dn

d 1

1

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Implications for Website Development

First make sure that your page gets indexed

on-page factors

Think about your site's internal link structure

create many internal links for important pages

be "careful" about where to put outgoing links

Increase the number of pages

Ensure that webpages are addressed consistently

http://www.vub.ac.be http://www.vub.ac.be/index.php

Make sure that you get links from good websites

17

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Consistent Addressing of Webpages

18

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Search Engine Optimisations (SEO)

Internet marketing has become a big business

white hat and black hat optimisations

Bad ranking or removal from index can cost a

company a lot of money

e.g. supplemental index ("Google hell")

19

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Black Hat Optimisations (Don'ts)

Link farms

Spamdexing in guestbooks, Wikipedia etc.

"solution": <a rel="nofollow" href="…">…</a>

Doorway pages (cloaking)

e.g. BMW Germany and Ricoh Germany banned in

February 2006

Selling/buying links

...

20

December 9, 2008 Beat Signer, signer@inf.ethz.ch

On-Page Factors (Speculative)

It is assumed that there are over 200 on-page

and off-page factors

Positive factors

keyword in title tag

keyword in URL

keyword in domain name

quality of HTML code

page freshness (occasional changes)

December 9, 2008 Beat Signer, signer@inf.ethz.ch

On-Page Factors (Speculative) …

Negative factors

links to "bad neighbourhood"

over optimisation penalty (keyword stuffing)

text with same colour as background (hidden content)

automatic redirects via the refresh meta tag

any copyright violations

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Off-Page Factors (Speculative)

Positive factors

high PageRank

anchor text of inbound links

links from authority sites (Hilltop algorithm)

listed in DMOZ (ODP) and Yahoo directories

site age (stability)

domain expiration date

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Off-Page Factors (Speculative) …

Negative factors

link buying (fast increasing number of inbound links)

link farms

cloaking (different pages for spider and user)

limited (temporal) availability of site

links from bad neighbourhood?

competitor attack (e.g. duplicate content)?

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Tools

Google toolbar

PageRank information not frequently updated

Google webmaster tools

meta description issues

title tag issues

non-indexable content issues

number and URLs of indexed pages

number and URLs of inbound/outbound links

...

25

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Questions

Is PageRank fair?

What about Google's power and influence?

26

December 9, 2008 Beat Signer, signer@inf.ethz.ch

Conclusions

PageRank algorithm

absolute quality of a page based on incoming links

random surfer model

computed as eigenvector of Google matrix G

Implications for website development and SEO

PageRank is just one (important) factor

27

December 9, 2008 Beat Signer, signer@inf.ethz.ch

References

The PageRank Citation Ranking: Bringing Order

to the Web, L. Page, S. Brin, R. Motwani and

T. Winograd, January 1998

The Anatomy of a Large-Scale Hypertextual

Web Search Engine, S. Brin and L. Page,

Computer Networks and ISDN Systems, 30(1-7),

April 1998

December 9, 2008 Beat Signer, signer@inf.ethz.ch

References …

PageRank Uncovered, C. Ridings and

M. Shishigin, September 2002

PageRank Calculator,

http://www.webworkshop.net/pagerank_

calculator.php

29

top related