cs 728 lecture 4 it’s a small world on the web

16
CS 728 Lecture 4 It’s a Small World on the Web

Upload: kana

Post on 25-Feb-2016

29 views

Category:

Documents


1 download

DESCRIPTION

CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks. It is a ‘small world’ after all Billions of people on Earth, yet every pair separated by “six degrees” of acquaintance relationships - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CS 728 Lecture 4 It’s a Small World on the Web

CS 728Lecture 4

It’s a Small World on the Web

Page 2: CS 728 Lecture 4 It’s a Small World on the Web

Small World Networks• It is a ‘small world’ after all

– Billions of people on Earth, yet every pair separated by “six degrees” of acquaintance relationships

– Notion popularized by experimental psychologist Stanley Milgram’s, different from his more infamous experiment

• Mathematically– Sparse – linear number of edges– Diameter - small like logarithm (log N) – Clustering is high – neighbors are neighbors

Page 3: CS 728 Lecture 4 It’s a Small World on the Web

Small World = Small Diameter + Clustering

• Defined by two measures:– characteristic path length L = number of

edges in shortest path between two vertices, averaged over all vertex pairs

– clustering coefficient C:• take vertex v with k 1 neighbors• at most k(k-1)/2 edges among neighbors• C(v) = fraction of k(k-1)/2 edges present• C = average clustering coefficient

• C >> C_random, L L_random

Page 4: CS 728 Lecture 4 It’s a Small World on the Web

The small world of the Web

• Empirical study of Web-graph reveals small-world property– Sparse graph– Average distance (d) in simulated web: d = 0.35 + 2.06 log (n) e.g. n = 109, d ~= 19– Diameter properties inferred from sampling

• Calculation of max. diameter computationally demanding for large values of n

– Clustering unknown

Page 5: CS 728 Lecture 4 It’s a Small World on the Web

Implications for Web

• Logarithmic scaling of diameter makes future navigation of web manageable– 10-fold increase of web pages results in only

2 more additional ‘clicks’, but …– Users may not take shortest path, may use

bookmarks or just get distracted on the way– Search engines play a crucial role, how can

they use this SW link structure?

Page 6: CS 728 Lecture 4 It’s a Small World on the Web

Small World in Real World of Hollywood: The Kevin Bacon Game

Goal: Connect any actor to Kevin Bacon, by linking actors who have acted in the same movie.

Oracle of Bacon website uses Internet Movie Database (IMDB.com) to find shortest link between any two actors. Created by students at Univ. of Virginia http://oracleofbacon.org/

Boxed version of theKevin Bacon Game

Page 7: CS 728 Lecture 4 It’s a Small World on the Web
Page 8: CS 728 Lecture 4 It’s a Small World on the Web

The Hollywood NetworkTotal # of actors in

database: ~550,000

Most actors are within three links of each other!

Average path length to Kevin Bacon: 2.79

Actor closest to “center”: Rod Steiger (2.53)

Rank of Kevin, in closeness to center: 876th

Center of Hollywood?

Page 9: CS 728 Lecture 4 It’s a Small World on the Web

Math Citation Network:Erdős Number

Number of links required to connect scholars to Erdős, via co-authorship of papers

Erdős wrote 1500+ papers with 507 co-authors.

Jerry Grossman’s (Oakland Univ.) website allows mathematicians to compute their Erdos numbers:

http://www.oakland.edu/enp/

Connecting path lengths, among mathematicians only:– average is 4.65– maximum is 13

Paul Erdős (1913-1996)

Page 10: CS 728 Lecture 4 It’s a Small World on the Web

My number is 3

- Erdős and Renyi showed that average path length between connected nodes in a random graph is logarithmic

- But degree sequences in social networks like Web and Hollywood are not Poisson

- Back to Power-laws

Erdős Arny

Rosenberg Fred AnnexsteinFan

Chung

kN

lnln

Page 11: CS 728 Lecture 4 It’s a Small World on the Web

Classes of small-world networks– Single-scale: Connectivity distribution decays exponentially

(e.g., Poisson and random graphs)– Scale-free: Power-law distribution of connectivity over

entire range– Broad-scale: Power-law over “broad range” + abrupt cut-off

Page 12: CS 728 Lecture 4 It’s a Small World on the Web

Bow-tie Structure of Web

• A large scale study (Altavista crawls) reveals another interesting property of web – “symmetric asymmetry”– Study of 200 million nodes & 1.5 billion links– Small-world property not applicable to entire

web• Some parts unreachable• Others have long paths

– Power-law connectivity holds though• Page indegree ( = 2.1), outdegree ( = 2.72)

Page 13: CS 728 Lecture 4 It’s a Small World on the Web

Bow-tie Components• Strongly Connected

Component (SCC)– Core with small-world

property• Upstream (IN)

– Core can’t reach IN• Downstream (OUT)

– OUT can’t reach core• Disconnected (Tendrils)

Page 14: CS 728 Lecture 4 It’s a Small World on the Web

Component Properties

• Each component is roughly same size– ~50 million nodes

• Tendrils not connected to SCC– But reachable from IN and can reach OUT

• Tubes: directed paths IN->Tendrils->OUT• Disconnected components

– Maximal and average diameter is infinite

Page 15: CS 728 Lecture 4 It’s a Small World on the Web

Empirical Numbers for Bow-tie

• Maximal minimal (?) diameter– 28 for SCC, 500 for entire graph

• Probability of a path between any 2 nodes– ~1 quarter (0.24)

• Average length – 16 (directed path exists), 7 (undirected)

• Shortest directed path between 2 nodes in SCC: 16-20 links on average

Page 16: CS 728 Lecture 4 It’s a Small World on the Web

Next Time:Models for the Web Graph

• Stochastic models that can explain or at least partially reproduce the properties of the web graph. Goals of model– power law distribution properties– maintain the small world property– bow-tie structure