finding patterns in large, real networks

129
CMU SCS Finding patterns in large, real networks Christos Faloutsos CMU

Upload: travis

Post on 12-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Finding patterns in large, real networks. Christos Faloutsos CMU. Thanks to. Deepayan Chakrabarti (CMU/Yahoo) Michalis Faloutsos (UCR) Spiros Papadimitriou (CMU/IBM) George Siganos (UCR). Introduction. Protein Interactions [genomebiology.com]. Internet Map [lumeta.com]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Finding patterns in large, real networks

CMU SCS

Finding patterns in large, real networks

Christos Faloutsos

CMU

Page 2: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 2

CMU SCS

Thanks to

• Deepayan Chakrabarti (CMU/Yahoo)

• Michalis Faloutsos (UCR)

• Spiros Papadimitriou (CMU/IBM)

• George Siganos (UCR)

Page 3: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 3

CMU SCS

Introduction

Internet Map [lumeta.com]

Food Web [Martinez ’91]

Protein Interactions [genomebiology.com]

Friendship Network [Moody ’01]

Graphs are everywhere!

Page 4: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 4

CMU SCS

Outline

• Laws– static

– time-evolution laws

– why so many power laws?

• tools/results

• future steps

Page 5: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 5

CMU SCS

Motivating questions

• Q1: What do real graphs look like?• Q2: How to generate realistic graphs?

Page 6: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 6

CMU SCS

Virus propagation

• Q3: How do viruses/rumors propagate?

• Q4: Who is the best person/computer to immunize against a virus?

Page 7: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 7

CMU SCS

Graph clustering & mining

• Q5: which edges/nodes are ‘abnormal’?

• Q6: split a graph in k ‘natural’ communities - but how to determine k?

Page 8: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 8

CMU SCS

Outline

• Laws– static

– time-evolution laws

– why so many power laws?

• tools/results

• future steps

Page 9: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 9

CMU SCS

Topology

Q1: How does the Internet look like? Any rules?

(Looks random – right?)

Page 10: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 10

CMU SCS

Laws – degree distributions

• Q: avg degree is ~2 - what is the most probable degree?

degree

count ??

2

Page 11: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 11

CMU SCS

Laws – degree distributions

• Q: avg degree is ~2 - what is the most probable degree?

degreedegree

count ??

2

count

2

Page 12: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 12

CMU SCS

I.Power-law: outdegree O

The plot is linear in log-log scale [FFF’99]

freq = degree (-2.15)

O = -2.15

Exponent = slope

Outdegree

Frequency

Nov’97

-2.15

Page 13: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 13

CMU SCS

II.Power-law: rank R

• The plot is a line in log-log scale

R

Rank: nodes in decreasing degree order

April’98

log(rank)

log(degree)

-0.82

att.com

ibm.com

Page 14: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 14

CMU SCS

III.Power-law: eigen E

• Eigenvalues in decreasing order (first 20)• [Mihail+, 02]: R = 2 * E

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

Dec’98

Page 15: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 15

CMU SCS

IV. The Node Neighborhood

• N(h) = # of pairs of nodes within h hops

Page 16: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 16

CMU SCS

IV. The Node Neighborhood

• Q: average degree = 3 - how many neighbors should I expect within 1,2,… h hops?

• Potential answer:1 hop -> 3 neighbors

2 hops -> 3 * 3

h hops -> 3h

Page 17: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 17

CMU SCS

IV. The Node Neighborhood

• Q: average degree = 3 - how many neighbors should I expect within 1,2,… h hops?

• Potential answer:1 hop -> 3 neighbors

2 hops -> 3 * 3

h hops -> 3h

WE HAVE DUPLICATES!

Page 18: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 18

CMU SCS

IV. The Node Neighborhood

• Q: average degree = 3 - how many neighbors should I expect within 1,2,… h hops?

• Potential answer:1 hop -> 3 neighbors

2 hops -> 3 * 3

h hops -> 3h

‘avg’ degree: meaningless!

Page 19: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 19

CMU SCS

IV. Power-law: hopplot H

Pairs of nodes as a function of hops N(h)= hH

H = 4.86

Dec 98Hops

# of Pairs # of Pairs

Hops Router level ’95

H = 2.83

Page 20: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 20

CMU SCS

Observation• Q: Intuition behind ‘hop exponent’?• A: ‘intrinsic=fractal dimensionality’ of the

network

N(h) ~ h1 N(h) ~ h2

...

Page 21: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 21

CMU SCS

Hop plots

• More on fractal/intrinsic dimensionalities: very soon

Page 22: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 22

CMU SCS

Any other ‘laws’?

Yes!

Page 23: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 23

CMU SCS

Any other ‘laws’?

Yes!

• Small diameter– six degrees of separation / ‘Kevin Bacon’

– small worlds [Watts and Strogatz]

Page 24: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 24

CMU SCS

Any other ‘laws’?• Bow-tie, for the web [Kumar+ ‘99]

• IN, SCC, OUT, ‘tendrils’

• disconnected components

Page 25: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 25

CMU SCS

Any other ‘laws’?• power-laws in communities (bi-partite cores)

[Kumar+, ‘99]

2:3 core(m:n core)

Log(m)

Log(count)

n:1

n:2n:3

Page 26: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 26

CMU SCS

More Power laws

• Also hold for other web graphs [Barabasi+, ‘99], [Kumar+, ‘99]

• citation graphs (see later)

• and many more

Page 27: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 27

CMU SCS

Outline

• Laws– static

– time-evolution laws

– why so many power laws?

• tools/results

• future steps

Page 28: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 28

CMU SCS

How do graphs evolve?

Page 29: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 29

CMU SCS

Time Evolution: rank R

The rank exponent has not changed! [Siganos+, ‘03]

Domainlevel

#days since Nov. ‘97

Page 30: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 30

CMU SCS

How do graphs evolve?

• degree-exponent seems constant - anything else?

Page 31: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 31

CMU SCS

Evolution of diameter?

• Prior analysis, on power-law-like graphs, hints that

diameter ~ O(log(N)) or

diameter ~ O( log(log(N)))

• i.e.., slowly increasing with network size

• Q: What is happening, in reality?

Page 32: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 32

CMU SCS

Evolution of diameter?

• Prior analysis, on power-law-like graphs, hints that

diameter ~ O(log(N)) or

diameter ~ O( log(log(N)))

• i.e.., slowly increasing with network size

• Q: What is happening, in reality?

• A: It shrinks(!!), towards a constant value

Page 33: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 33

CMU SCS

Shrinking diameter

[Leskovec+05a]• Citations among physics

papers • 11yrs; @ 2003:

– 29,555 papers– 352,807 citations

• For each month M, create a graph of all citations up to month M

Page 34: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 34

CMU SCS

Shrinking diameter• Authors &

publications

• 1992– 318 nodes

– 272 edges

• 2002– 60,000 nodes

• 20,000 authors

• 38,000 papers

– 133,000 edges

Page 35: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 35

CMU SCS

Shrinking diameter• Patents & citations

• 1975– 334,000 nodes

– 676,000 edges

• 1999– 2.9 million nodes

– 16.5 million edges

• Each year is a datapoint

Page 36: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 36

CMU SCS

Shrinking diameter• Autonomous

systems

• 1997– 3,000 nodes

– 10,000 edges

• 2000– 6,000 nodes

– 26,000 edges

• One graph per day

Page 37: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 37

CMU SCS

Temporal evolution of graphs

• N(t) nodes; E(t) edges at time t

• suppose that

N(t+1) = 2 * N(t)

• Q: what is your guess for

E(t+1) =? 2 * E(t)

Page 38: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 38

CMU SCS

Temporal evolution of graphs

• N(t) nodes; E(t) edges at time t

• suppose that

N(t+1) = 2 * N(t)

• Q: what is your guess for

E(t+1) =? 2 * E(t)

• A: over-doubled!

Page 39: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 39

CMU SCS

Temporal evolution of graphs

• A: over-doubled - but obeying:

E(t) ~ N(t)a for all t

where 1<a<2

Page 40: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 40

CMU SCS

Temporal evolution of graphs

• A: over-doubled - but obeying:

E(t) ~ N(t)a for all t

• Identically:

log(E(t)) / log(N(t)) = constant for all t

Page 41: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 41

CMU SCS

Densification Power Law

ArXiv: Physics papers

and their citations

1.69

N(t)

E(t)

Page 42: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 42

CMU SCS

Densification Power Law

ArXiv: Physics papers

and their citations

1.69

N(t)

E(t)

‘tree’

1

Page 43: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 43

CMU SCS

Densification Power Law

ArXiv: Physics papers

and their citations

1.69

N(t)

E(t)‘clique’

2

Page 44: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 44

CMU SCS

Densification Power Law

U.S. Patents, citing each other

1.66

N(t)

E(t)

Page 45: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 45

CMU SCS

Densification Power Law

Autonomous Systems

1.18

N(t)

E(t)

Page 46: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 46

CMU SCS

Densification Power Law

ArXiv: authors & papers

1.15

N(t)

E(t)

Page 47: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 47

CMU SCS

Summary of ‘laws’

• Power laws for degree distributions

• …………... for eigenvalues, bi-partite cores

• Small & shrinking diameter (‘6 degrees’)

• ‘Bow-tie’ for web; ‘jelly-fish’ for internet

• ``Densification Power Law’’, over time

Page 48: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 48

CMU SCS

Outline

• Laws– static

– time-evolution laws

– why so many power laws?• other power laws

• fractals & self-similarity

• case study: Kronecker graphs

• tools/results

• future steps

Page 49: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 49

CMU SCS

Power laws

• Many more power laws, in diverse settings:

Page 50: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 50

CMU SCS

A famous power law: Zipf’s law

• Bible - rank vs frequency (log-log)

log(rank)

log(freq)

“a”

“the”

Page 51: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 51

CMU SCS

Power laws, cont’ed

• length of file transfers [Bestavros+]

• web hit counts [Huberman]

• magnitude of earthquakes (Guttenberg-Richter law)

• sizes of lakes/islands (Korcak’s law)

• Income distribution (Pareto’s law)

Page 52: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 52

CMU SCS

The Peer-to-Peer Topology

• Frequency versus degree

• Number of adjacent peers follows a power-law

[Jovanovic+]

Page 53: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 53

CMU SCS

Web Site Traffic

log(freq)

log(count)Zipf

Click-stream data

u-id’s url’s

log(freq)

log(count)

‘yahoo’

‘super-surfer’

Page 54: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 54

CMU SCS

Lotka’s law

(Lotka’s law of publication count); and citation counts: (citeseer.nj.nec.com 6/2001)

log(#citations)

log(count)

J. Ullman

Page 55: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 55

CMU SCS Swedish sex-web

Nodes: people (Females; Males)Links: sexual relationships

Liljeros et al. Nature 2001

4781 Swedes; 18-74; 59% response rate.

Albert Laszlo Barabasihttp://www.nd.edu/~networks/Publication%20Categories/04%20Talks/2005-norway-3hours.ppt

Page 56: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 56

CMU SCS Swedish sex-web

Nodes: people (Females; Males)Links: sexual relationships

Liljeros et al. Nature 2001

4781 Swedes; 18-74; 59% response rate.

Albert Laszlo Barabasihttp://www.nd.edu/~networks/Publication%20Categories/04%20Talks/2005-norway-3hours.ppt

Page 57: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 57

CMU SCS

Power laws

• Q: Why so many power laws?

• A1: self-similarity

• A2: ‘rich-get-richer’

Page 58: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 58

CMU SCS

Digression: intro to fractals

• Fractals: sets of points that are self similar

Page 59: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 59

CMU SCS

A famous fractal

e.g., Sierpinski triangle:

...zero area;

infinite length!

dimensionality = ??

Page 60: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 60

CMU SCS

A famous fractal

e.g., Sierpinski triangle:

...zero area;

infinite length!

dimensionality = log(3)/log(2) = 1.58

Page 61: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 61

CMU SCS

A famous fractal

equivalent graph:

Page 62: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 62

CMU SCS

Intrinsic (‘fractal’) dimension

How to estimate it?

Page 63: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 63

CMU SCS

Intrinsic (‘fractal’) dimension

• Q: fractal dimension of a line?

• A: nn ( <= r ) ~ r^1(‘power law’: y=x^a)

• Q: fd of a plane?

• A: nn ( <= r ) ~ r^2

fd== slope of (log(nn) vs log(r) )

Page 64: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 64

CMU SCS

Sierpinsky triangle

log( r )

log(#pairs within <=r )

1.58

== ‘correlation integral’

= CDF of pairwise distances

Page 65: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 65

CMU SCS

Sierpinsky triangle

log( r )

log(#pairs within <=r )

1.58

== ‘correlation integral’

= CDF of pairwise distances

Page 66: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 66

CMU SCS

Line

log( r )

log(#pairs within <=r )

1.58

== ‘correlation integral’

= CDF of pairwise distances

1...

Page 67: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 67

CMU SCS

2-d (Plane)

log( r )

log(#pairs within <=r )

1.58

== ‘correlation integral’

= CDF of pairwise distances

2

Page 68: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 68

CMU SCS

Recall: Hop Plot

• Internet routers: how many neighbors within h hops? (= correlation integral!)

Reachability function: number of neighbors within r hops, vs r (log-log).

Mbone routers, 1995log(hops)

log(#pairs)

2.8

Page 69: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 69

CMU SCS

Fractals and power laws

They are related concepts:

• fractals <=>

• self-similarity <=>

• scale-free <=>

• power laws ( y= xa )– F = C r(-2)

Page 70: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 70

CMU SCS

Fractals and power laws

• Power laws and fractals are closely related• And fractals appear in MANY cases

– coast-lines: 1.1-1.5– brain-surface: 2.6– rain-patches: 1.3– tree-bark: ~2.1– stock prices / random walks: 1.5– ... [see Mandelbrot; or Schroeder]

Page 71: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 71

CMU SCS

Outline

• Laws– static

– time-evolution laws

– why so many power laws?• other power laws

• fractals & self-similarity

• case study: Kronecker graph generator

• tools/results

• future steps

Page 72: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 72

CMU SCS

Self-similarity at work

• Q2: How to generate realistic graphs?

• (Q2’) and how to grow it over time?

Page 73: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 73

CMU SCS

Wish list for a generator:

• Power-law-tail in- and out-degrees

• Power-law-tail scree plots

• shrinking/constant diameter

• Densification Power Law

• communities-within-communities

Q: how to achieve all of them?

Page 74: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 74

CMU SCS

Wish list for a generator:

• Power-law-tail in- and out-degrees

• Power-law-tail scree plots

• shrinking/constant diameter

• Densification Power Law

• communities-within-communities

Q: how to achieve all of them?

A: Kronecker matrix product [Leskovec+05b]

Page 75: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 75

CMU SCS

Kronecker product

Page 76: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 76

CMU SCS

Kronecker product

Page 77: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 77

CMU SCS

Kronecker product

N N*N N**4

Page 78: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 78

CMU SCS

Properties of Kronecker graphs:

• Power-law-tail in- and out-degrees

• Power-law-tail scree plots

• constant diameter

• perfect Densification Power Law

• communities-within-communities

Page 79: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 79

CMU SCS

Properties of Kronecker graphs:

• Power-law-tail in- and out-degrees

• Power-law-tail scree plots

• constant diameter

• perfect Densification Power Law

• communities-within-communities

and we can prove all of the above

(first and only generator that does that)

Page 80: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 80

CMU SCS

Kronecker - ArXiv

real

(stochastic)Kronecker

Degree Scree Diameter D.P.L.

(det. Kronecker)

Page 81: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 81

CMU SCS

Kronecker - patents

Degree Scree Diameter D.P.L.

Page 82: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 82

CMU SCS

Kronecker - A.S.

Page 83: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 83

CMU SCS

Outline

• Laws– static

– time-evolution laws

– why so many power laws?

• tools/results– virus propagation

– connection subgraphs

– cross-associations

• future steps

Page 84: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 84

CMU SCS

Virus propagation

• Q3: How do viruses/rumors propagate?

• Q4: Who is the best person/computer to immunize against a virus?

Page 85: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 85

CMU SCS

The model: SIS

• ‘Flu’ like: Susceptible-Infected-Susceptible

• Virus ‘strength’ s= /

Infected

Healthy

NN1

N3

N2

Prob.

Prob. β

Prob.

Page 86: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 86

CMU SCS

Epidemic threshold

of a graph: the value of , such that

if strength s = / < an epidemic can not happen

Thus,

• given a graph

• compute its epidemic threshold

Page 87: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 87

CMU SCS

Epidemic threshold

What should depend on?

• avg. degree? and/or highest degree?

• and/or variance of degree?

• and/or third moment of degree?

• and/or diameter?

Page 88: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 88

CMU SCS

Epidemic threshold

• [Theorem] We have no epidemic, if

β/δ <τ = 1/ λ1,A

Page 89: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 89

CMU SCS

Epidemic threshold

• [Theorem] We have no epidemic, if

β/δ <τ = 1/ λ1,A

largest eigenvalueof adj. matrix A

attack prob.

recovery prob.epidemic threshold

Proof: [Wang+03]

Page 90: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 90

CMU SCS

Experiments (Oregon)

/ > τ (above threshold)

/ = τ (at the threshold)

/ < τ (below threshold)

Page 91: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 91

CMU SCS

Our result:

• Holds for any graph

• includes older results as special cases

Page 92: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 92

CMU SCS

Virus propagation

• Q3: How do viruses/rumors propagate?

• Q4: Who is the best person/computer to immunize against a virus?

• A4: the one that causes the highest decrease in 1

Page 93: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 93

CMU SCS

Outline

• Laws– static

– time-evolution laws

– why so many power laws?

• tools/results– virus propagation

– connection subgraphs

– cross-associations

• future steps

Page 94: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 94

CMU SCS

(A-A) Kidman-Diaz

• What are the best paths between ‘Kidman’ and ‘Diaz’?

Page 95: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 95

CMU SCS

(A-A) Kidman-Diaz

Strong, direct link

• What are the best paths between ‘Kidman’ and ‘Diaz’? [KDD’04, w/ McCurley+Tomkins]

Page 96: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 96

CMU SCS

Outline

• Laws– static

– time-evolution laws

– why so many power laws?

• tools/results– virus propagation

– connection subgraphs

– cross-associations

• future steps

Page 97: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 97

CMU SCS

Graph clustering & mining

• Q5: which edges/nodes are ‘abnormal’?

• Q6: split a graph in k ‘natural’ communities - but how to determine k?

Page 98: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 98

CMU SCS

Graph partitioning

• Documents x terms

• Customers x products

• Users x web-sites

Page 99: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 99

CMU SCS

Graph partitioning

• Documents x terms

• Customers x products

• Users x web-sites

• Q: HOW MANY PIECES?

Page 100: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 100

CMU SCS

Graph partitioning

• Documents x terms

• Customers x products

• Users x web-sites

• Q: HOW MANY PIECES?

• A: MDL/ compression

Page 101: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 101

CMU SCS

Cross-associations

1x2 2x2

Page 102: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 102

CMU SCS

Cross-associations

2x3 3x3 3x4

Page 103: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 103

CMU SCS

Cross-associations

Page 104: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 104

CMU SCS

Cross-associations

Page 105: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 105

CMU SCS

Graph clustering & mining

• Q5: which edges/nodes are ‘abnormal’?

• Q6: split a graph in k ‘natural’ communities - but how to determine k?

• A6: choose the k that leads to best overall compression (= MDL = Minimum Description Language)

Page 106: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 106

CMU SCS

Cross-associations

outlier edgemissing edge

Page 107: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 107

CMU SCS

Outline

• Laws– static

– time-evolution laws

– why so many power laws?

• tools/results

• other related projects

• future steps

Page 108: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 108

CMU SCS

Motivation

water distribution network

normal operation

May have hundreds of measurements, butit is unlikely they are completely unrelated!

Phase 1 Phase 2 Phase 3

: : : : : :

: : : : : :

chlo

rin

e c

on

cen

trati

on

s

sensorsnear leak

sensorsawayfrom leak

Page 109: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 109

CMU SCS

Phase 1 Phase 2 Phase 3

: : : : : :

: : : : : :

Motivation

water distribution network

normal operation major leak

May have hundreds of measurements, butit is unlikely they are completely unrelated!

chlo

rin

e c

on

cen

trati

on

s

sensorsnear leak

sensorsawayfrom leak

Page 110: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 110

CMU SCS

Motivation

actual measurements(n streams)

k hidden variable(s)

We would like to discover a few “hidden(latent) variables” that summarize the key trends

Phase 1

: : : : : :

: : : : : :

chlo

rin

e c

on

cen

trati

on

s

Phase 1

k = 1

Page 111: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 111

CMU SCS

Motivation

We would like to discover a few “hidden(latent) variables” that summarize the key trends

chlo

rin

e c

on

cen

trati

on

s

Phase 1 Phase 1Phase 2 Phase 2

actual measurements(n streams)

k hidden variable(s)

k = 2

: : : : : :

: : : : : :

Page 112: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 112

CMU SCS

Motivation

We would like to discover a few “hidden(latent) variables” that summarize the key trends

chlo

rin

e c

on

cen

trati

on

s

Phase 1 Phase 1Phase 2 Phase 2Phase 3 Phase 3

actual measurements(n streams)

k hidden variable(s)

k = 1

: : : : : :

: : : : : :

Page 113: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 113

CMU SCS

Stream mining

• Solution: SPIRIT [VLDB’05]– incremental, on-line PCA

Page 114: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 114

CMU SCS

Outline

• Laws

• tools/results

• other related projects

• future steps

Page 115: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 115

CMU SCS

Future steps

• connection sub-graphs on multi-graphs [Tong]

• traffic matrix, evolving over time [Sun+]

• influence propagation on blogs [Leskovec+]

• automatic web site classification [Leskovec]

Page 116: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 116

CMU SCS

Conclusions

• Power laws & self similarity– excellent tools, for real datasets,

– where Gaussian, uniform etc assumptions fail

• Additional tools– Kronecker, for graph generators

– eigenvalues for virus propagation

– MDL/compression for graph partitioning

Page 117: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 117

CMU SCS

References

• [Aiello+, '00] William Aiello, Fan R. K. Chung, Linyuan Lu: A random graph model for massive graphs. STOC 2000: 171-180

• [Albert+] Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi: Diameter of the World Wide Web, Nature 401 130-131 (1999)

• [Barabasi, '03] Albert-Laszlo Barabasi Linked: How Everything Is Connected to Everything Else and What It Means (Plume, 2003)

Page 118: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 118

CMU SCS

References, cont’d

• [Barabasi+, '99] Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999

• [Broder+, '00] Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web, WWW, 2000

Page 119: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 119

CMU SCS

References, cont’d

• [Chakrabarti+, ‘04] RMAT: A recursive graph generator, D. Chakrabarti, Y. Zhan, C. Faloutsos, SIAM-DM 2004

• [Dill+, '01] Stephen Dill, Ravi Kumar, Kevin S. McCurley, Sridhar Rajagopalan, D. Sivakumar, Andrew Tomkins: Self-similarity in the Web. VLDB 2001: 69-78

Page 120: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 120

CMU SCS

References, cont’d

• [Fabrikant+, '02] A. Fabrikant, E. Koutsoupias, and C.H. Papadimitriou. Heuristically Optimized Trade-offs: A New Paradigm for Power Laws in the Internet. ICALP, Malaga, Spain, July 2002

• [FFF, 99] M. Faloutsos, P. Faloutsos, and C. Faloutsos, "On power-law relationships of the Internet topology," in SIGCOMM, 1999.

Page 121: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 121

CMU SCS

References, cont’d

• [Leskovec+05a] Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations KDD 2005, Chicago, IL. (Best research paper award)

• [Leskovec+05b] Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg and Christos Faloutsos, Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, ECML/PKDD 2005, Porto, Portugal.

Page 122: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 122

CMU SCS

References, cont’d

• [Jovanovic+, '01] M. Jovanovic, F.S. Annexstein, and K.A. Berman. Modeling Peer-to-Peer Network Topologies through "Small-World" Models and Power Laws. In TELFOR, Belgrade, Yugoslavia, November, 2001

• [Kumar+ '99] Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins: Extracting Large-Scale Knowledge Bases from the Web. VLDB 1999: 639-650

Page 123: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 123

CMU SCS

References, cont’d

• [Leland+, '94] W. E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1-15, Feb. 1994.

• [Mihail+, '02] Milena Mihail, Christos H. Papadimitriou: On the Eigenvalue Power Law. RANDOM 2002: 254-262

Page 124: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 124

CMU SCS

References, cont’d

• [Milgram '67] Stanley Milgram: The Small World Problem, Psychology Today 1(1), 60-67 (1967)

• [Montgomery+, ‘01] Alan L. Montgomery, Christos Faloutsos: Identifying Web Browsing Trends and Patterns. IEEE Computer 34(7): 94-95 (2001)

Page 125: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 125

CMU SCS

References, cont’d

• [Palmer+, ‘01] Chris Palmer, Georgos Siganos, Michalis Faloutsos, Christos Faloutsos and Phil Gibbons The connectivity and fault-tolerance of the Internet topology (NRDM 2001), Santa Barbara, CA, May 25, 2001

• [Pennock+, '02] David M. Pennock, Gary William Flake, Steve Lawrence, Eric J. Glover, C. Lee Giles: Winners don't take all: Characterizing the competition for links on the

web Proc. Natl. Acad. Sci. USA 99(8): 5207-5211 (2002)

Page 126: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 126

CMU SCS

References, cont’d

• [Schroeder, ‘91] Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W H Freeman & Co., 1991 (excellent book on fractals)

Page 127: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 127

CMU SCS

References, cont’d

• [Siganos+, '03] G. Siganos, M. Faloutsos, P. Faloutsos, C. Faloutsos Power-Laws and the AS-level Internet Topology, Transactions on Networking, August 2003.

• [Wang+03] Yang Wang, Deepayan Chakrabarti, Chenxi Wang and Christos Faloutsos: Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint, SRDS 2003, Florence, Italy.

Page 128: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 128

CMU SCS

Reference

• [Watts+ Strogatz, '98] D. J. Watts and S. H. Strogatz Collective dynamics of 'small-world' networks, Nature, 393:440-442 (1998)

• [Watts, '03] Duncan J. Watts Six Degrees: The Science of a Connected Age W.W. Norton & Company; (February 2003)

Page 129: Finding patterns in large, real networks

LLNL 05 C. Faloutsos 129

CMU SCS

Thank you!

www.cs.cmu.edu/~christos

www.db.cs.cmu.edu