egalitarian engines?

32
Egalitarian engines? S. Fortunato, A. Flammini, F. Menczer & A. Vespignani

Upload: kaethe

Post on 21-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Egalitarian engines?. S. Fortunato, A. Flammini, F. Menczer & A. Vespignani. Outline. Search engines The Google revolution: PageRank Popularity bias The feared scenario: Googlearchy! Empirical test: Googlocracy? The importance of query topics Outlook. Search Engines. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Egalitarian engines?

Egalitarian engines?

S. Fortunato, A. Flammini,

F. Menczer & A. Vespignani

Page 2: Egalitarian engines?

Outline

Search engines The Google revolution: PageRank Popularity bias The feared scenario: Googlearchy! Empirical test: Googlocracy? The importance of query topics Outlook

Page 3: Egalitarian engines?

Search Engines

“A search engine is a program designed to help find information stored on a computer systemsuch as the World Wide Web …” Wikipedia

First search engine: Archie (1990, Internet)

First WWW search engine: Wandex (1993)

Page 4: Egalitarian engines?

Timeline

Page 5: Egalitarian engines?

The revolution

Invented by S. Brin and L. Page (1998).Novelty: for the first time, a search engine ranks pages according to their relevance in the graph topology of the Web!

Web pages Nodes

Hyperlinks Edges

Page 6: Egalitarian engines?

Degree distribution of the Web Graph

Page 7: Egalitarian engines?

PageRank

It is the prestige measure used by Google to rank Web pages.

ijout jk

jpq

N

qip

)(

)(1)(

p(i) ~ probability that a user browsing the Web by clicking from one page to another (i.e. by following hyperlinks) visits page i.

Page 8: Egalitarian engines?

Theoretical/empirical result: the PageRank of a page is approximately proportional to thenumber of incoming links of the page (linkpopularity or in-degree)

Page 9: Egalitarian engines?

Google recipe: Web pages are ranked according to their in-degree.

Other factors play a role in the ranking, butPageRank is the only factor that treats Webpages like points of a graph, regardless their semantic features. How attractive are Web pages for users?

Traffic

Page 10: Egalitarian engines?

Traffic is related to the frequency of visitsof Web pages by users.

Operative definition: the traffic t to a page is the fraction of times the page is clicked within some period.

Question: how does the traffic t grow with thelink popularity (in-degree) k of a page?

Page 11: Egalitarian engines?

Null model: in a world where people navigatethe Web only by browsing, the traffic t to a page is just the probability to visit a page during this process → PageRank ~ in-degree Null model prediction → t ~ k

In the real Web, navigation by searching isreplacing navigation by browsing.

What consequences are there on the relationbetween t and k? Do search engines introducea popularity bias?

Page 12: Egalitarian engines?

t ~ k → no bias; t ~ k with α > 1→ googlearchy; t ~ k with α < 1→ googlocracy.

α

α

There are three possible scenarios:

Page 13: Egalitarian engines?

The feared scenario: Googlearchy

Page 14: Egalitarian engines?
Page 15: Egalitarian engines?
Page 16: Egalitarian engines?

Search Dominant Model

Distribution of clicks on the hits of a hit list; Relation between the rank of a hit in a hit

list and its PageRank/in-degree.

All users discover and navigate the Web by submitting queries to search engines and looking at the results.

Two empirical ingredients:

Page 17: Egalitarian engines?

The fraction of clicks on a hit is our traffic t.

Hits are identified by their rank r in the list.

Page 18: Egalitarian engines?

By ordering all Web pages in decreasing values of in-degree, a page with in-degree kwill have rank

1.1~ kr

8.16.11.16.1 ~)(~~ kkrt

Googlearchy: search engines boost the popularity of the most popular pages much faster than simply surfing on the Web!

(from cumulative degree distr.)

Page 19: Egalitarian engines?

Empirical test of popularity bias

28,124 sites Traffic from Alexa In-degree from

Google and Yahoo Analysis repeated

after 2 months

Page 20: Egalitarian engines?

Googlocracy?

Data vs. Models

Page 21: Egalitarian engines?

What are we missing?

1.1~ kr

6.1~ rt (at hit list level)

(overall)

L

G

The two relations cannot be combined!

?)( Grft

Page 22: Egalitarian engines?

In particular, very specific queries lead to small hit lists which often contain little popular Web sites/pages.

Hit lists depend on the interests of the usersand can be of various sizes.

The importance of query topics

Similarly, it is unlikely that small hit lists contain very popular sites/pages.

Page 23: Egalitarian engines?
Page 24: Egalitarian engines?
Page 25: Egalitarian engines?

Hit list size distribution

Page 26: Egalitarian engines?

Our model “Artificial” Web with N pages, labeled from 1 to N; At each step, a hit list is created such that:

1) all pages have the same probability to appear in the hit list; 2) the size of the hit list is taken from the empirical distribution.

For each hit list, clicks are distributed among the hits according to the empirical distribution

After a sufficient number of hit lists has been created, we check how many clicks went to a page with label/rank r

)( Grft

6.1~ Lrt

Page 27: Egalitarian engines?

Data vs “Semantically Correct” model

Page 28: Egalitarian engines?

Conclusions The use of search engines partially mitigates the

rich-get-richer nature of the Web, giving new sites an increased chance of being discovered (compared to surfing alone), as long as they are about specific topics that match the interests of users.

The combination of (i) how search engines index and rank results, (ii) what queries users submit, and (iii) how users view the results, leads to an egalitarian effect (“Googlocracy”).

Page 29: Egalitarian engines?

Reactions

Page 30: Egalitarian engines?

Looks scientific, but actually biased, and not right! A research floats "The Egalitrian Effects of Search Engines"

Being on good terms with Google and googling people including blogger and bloging people, still did not stop me from thinking Streets are much better than the rough roads of the past centuries , too !That is what I thought after reading the full text of the research paper. I do not think the survey methods were right,though they looked very scientific.I have made an experement with Google page ranking, here is a look at how "egalitarian" google in reality is like:

Page 31: Egalitarian engines?
Page 32: Egalitarian engines?