cs728 lecture 5 generative graph models and the web

25
CS728 Lecture 5 Generative Graph Models and the Web

Upload: amos-mccoy

Post on 18-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

CS728Lecture 5

Generative Graph Models and the Web

Importance of Generative Models

Gives insight into the graph formation process:– Anomaly detection – abnormal behavior,

evolution– Predictions – predicting future from the past– Simulations and evaluation of new algorithms– Graph sampling – many real world graphs like

the web are too large and complex to deal with – Goal: generating graphs with small world

property, clustering, power-laws, other naturally occurring structures

Graph Models: Waxman Models

• Used for models of clustering in Internet-like topologies and networks with long and short edges

• The vertices are distributed at random in a plane. • An edge is added between each pair of vertices with probability

p.

p(u,v) = * exp( -d / (*L) ), 0 , 1.

• L is the maximum distance between any two nodes. • Increase in alpha increases the number of edges in the graph. • Increase in beta increases the number of long edges relative to

short edges. • d is the Euclidean distance from u to v in Waxman-1. • d is a random number between [0, L] in Waxman-2.

Graph Models: Configuration Model

• Random Graph from given degree sequence

• Problem: Given a degree sequence, d1,d2, d3, …., dn generate a random graph with that degree sequence

• Solution:

Place di stubs onto vertex I

Choose pairs of stubs at random

• Problem: we may construct graphs with loops and multiedges

• To prevent this there must be enough “absorbing” residual degree capacity.

• Algorithm:• Maintain list of nodes sorted by residual degrees d(v)• Repeat until all nodes have been chosen:

– pick arbitrary vertex v– add edges from v to d(v) vertices of highest residual

degree– update residual degrees

To randomize further, we can start with a realization and repeatedly 2-swap pairs of edges (u,v), (s,t) to (u,t), (s,v)

Works OK, But is there a more ‘natural’ generative model?

Generative Graph models: Preferential attachment

• Price’s Model [65] : Physics citations – “cummulative advantage”

• Herb Simon [50’s]: Nobel and Turing Awards, political scientist “rich get richer” (Pareto)

• Matthew effect / Matilda effect: sociology • Barabasi and Albert 99: Preferential attachment:

– Add a new node, create d out-links– Probability of linking a node is proportional to

its current degree• Simple explanation of power-law degree

distributions

Issues with preferential attachment and Power-laws

• Barabasi model fixed constant m for out-degree• Price’s model directed with m mean out-degree• Probability of adding a new edge is proportional to its (in)

degree k – problem at the start degree 0– Price’s model: prop to deg + 1– Analysis: prob a node has degree k

• pk ~ k-3 (Barabasi model)• pk ~ k-(2+1/m) power-law with exponent 2-3 (Price)

• Exercise: give pseudocode that generates such a graph in linear time

Variations on the PA Theme

• Clustering, Small-World and Ageing

• Copying Model

• Alpha and beta Models

• Temporal Evolution

• Densification

Graph models: Copying model

• Copying model • [Kleinberg, Kumar, Raghavan, Rajagopalan and Tomkins, 99]:

– Add a node and choose the number of edges to add

– Choose a random vertex and “copy” its links (neighbors)

• Also generates power-law degree distributions

• Generates communities - clustering

Graph Models: The Alpha ModelWatts (1999)

model: Add edges to nodes, as in random graphs, but makes links more likely when two nodes have a common friend.

For a range of values:

– The world is small (average path length is short), and

– Groups tend to form (high clustering coefficient).

Probability of linkage as a functionof number of mutual friends

( is 0 in upper left,1 in diagonal,

and ∞ in bottom right curves.)

Graph Models: The Beta ModelWatts and Strogatz (1998)“Link Rewiring”

= 0 = 0.125 = 1

People knowothers atrandom.

Not clustered,but “small world”

People knowtheir neighbors,

and a few distant people.

Clustered and“small world”

People know their neighbors.

Clustered, butnot a “small world”

Graph Models: The Beta Model

First five random links reduce the average path length of the network by half, regardless of N!

Both and models reproduce short-path results of random graphs, but also allow for clustering.

Small-world phenomena occur at threshold between order and chaos.

Watts and Strogatz (1998)

Clu

ster

ing

coef

ficie

nt /

Nor

mal

ized

pat

h le

ngth

Clustering coefficient (C) and average path length (L) plotted against

Other Related Work

• Hybrid models: Beta + Waxman on grid• Huberman and Adamic, 1999: Growth dynamics of the

world wide web – Argue against Barabasi model for its age dependence

• Kumar, Raghavan, Rajagopalan, Sivakumar and Tomkins, 1999: Stochastic models for the web graph

• Watts, Dodds, Newman, 2002: Identity and search in social networks

• Medina, Lakhina, Matta, and Byers, 2001: BRITE: An Approach to Universal Topology Generation

• …

Statistics

• Statistics of common networks:

N - nodes K - degree

D - distance

C- clique fraction

Actors 225,226 61 3.65 0.79

Power-grid 4,941 2.67 18.7 0.08

C.elegans 282 14 2.65 0.28

Large k = large c?

Small c = large d?

Modeling Ageing and Temporal Evolution

• N(t) … nodes at time t

• E(t) … edges at time t

• Suppose thatN(t+1) = 2 * N(t)

• Q: what is guess for E(t+1) =? 2 * E(t)

• A: over-doubled?

Temporal Evolution of Graphs

• Densification Power Law – networks appear denser over time – the number of edges grows faster than the

number of nodes – average degree is increasing

a … densification exponent

or

equivalently

Graph Densification

• Densification Power Law

• Densification exponent: 1 ≤ a ≤ 2:– a=1: linear growth – constant out-degree

(assumed in the literature so far)– a=2: quadratic growth – clique

• Let’s see the real graphs!

Densification – ArXiv citation graph in Physics

• Citations among physics papers

• 1992:– 1,293 papers,

2,717 citations• 2003:

– 29,555 papers, 352,807 citations

• For each month M, create a graph of all citations up to month M N(t)

E(t)

1.69

Densification – Patent Citations

• Citations among patents granted

• 1975– 334,000 nodes– 676,000 edges

• 1999– 2.9 million nodes– 16.5 million edges

• Each year is a datapoint N(t)

E(t)

1.66

Densification – Internet Autonomous Systems

• Graph of Internet• 1997

– 3,000 nodes– 10,000 edges

• 2000– 6,000 nodes– 26,000 edges

• One graph per day

N(t)

E(t)

1.18

Evolution of the Diameter

• Prior work on Power Law graphs hints at Slowly growing diameter:– diameter ~ O(log N)– diameter ~ O(log log N)

• What is happening in real data?

• Diameter shrinks over time– As the network grows the distances

between nodes slowly decrease

Diameter – ArXiv citation graph

• Citations among physics papers

• 1992 –2003

• One graph per year

time [years]

diameter

Diameter – “Patents”

• Patent citation network

• 25 years of data

time [years]

diameter

Diameter – Autonomous Systems

• Graph of Internet

• One graph per day

• 1997 – 2000

number of nodes

diameter

Next Time: Densification – Possible Explanations

• Generative models to capture the Densification Power Law and Shrinking diameters

• 2 proposed models:– Community Guided Attachment – obeys

Densification– Forest Fire model – obeys Densification,

Shrinking diameter (and Power Law degree distribution)