graph mining, self-similarity and power laws

59
School of Computer Science Carnegie Mellon Graph Mining, self- similarity and power laws Christos Faloutsos Carnegie Mellon University

Upload: phyllis-petty

Post on 02-Jan-2016

33 views

Category:

Documents


3 download

DESCRIPTION

Graph Mining, self-similarity and power laws. Christos Faloutsos Carnegie Mellon University. Overview. Achievements global patterns and ‘laws’ (static/dynamic) generators influence propagation communities; graph partitioning local patterns: frequent subgraphs Challenges. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Graph Mining, self-similarity and  power  laws

School of Computer ScienceCarnegie Mellon

Graph Mining, self-similarity and power laws

Christos Faloutsos

Carnegie Mellon University

Page 2: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 2

School of Computer ScienceCarnegie Mellon

Overview

• Achievements– global patterns and ‘laws’ (static/dynamic)– generators– influence propagation– communities; graph partitioning– local patterns: frequent subgraphs

• Challenges

Page 3: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 3

School of Computer ScienceCarnegie Mellon

Motivating questions:

• How does the Internet look like?• What constitutes a ‘normal’ social

network?• ‘network value’ of a customer?

[Domingos+]• which gene/species affects the others

the most?

Page 4: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 4

School of Computer ScienceCarnegie Mellon

Problem #1 - topology

How does the Internet look like? Any rules?

Page 5: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 5

School of Computer ScienceCarnegie Mellon

Solution#1: Rank exponent R• A1: Power law in the degree distribution

[SIGCOMM99]

internet domains

log(rank)

log(degree)

-0.82

att.com

ibm.com

Page 6: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 6

School of Computer ScienceCarnegie Mellon

Power laws

• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]

log indegree

log(freq)

from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]

Page 7: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 7

School of Computer ScienceCarnegie Mellon

epinions.com

• who-trusts-whom [Richardson + Domingos, KDD 2001]

(out) degree

count

Page 8: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 8

School of Computer ScienceCarnegie Mellon

Even more power laws:

• web hit counts [w/ A. Montgomery]

Web Site Traffic

log(freq)

log(count)

Zipf“yahoo.com”

Page 9: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 9

School of Computer ScienceCarnegie Mellon

Overview

• Achievements– global patterns and ‘laws’ (static/dynamic)– generators– influence propagation– communities; graph partitioning– local patterns: frequent subgraphs

• Challenges

Page 10: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 10

School of Computer ScienceCarnegie Mellon

Problem#2: evolutionGiven a graph:

• how will it look like, next year?

[from Lumeta: ISPs 6/1999]

Page 11: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 11

School of Computer ScienceCarnegie Mellon

Evolution of diameter?

• Prior analysis, on power-law-like graphs, hints that

diameter ~ O(log(N)) or

diameter ~ O( log(log(N)))

• i.e.., slowly increasing with network size

• Q: What is happening, in reality?

Page 12: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 12

School of Computer ScienceCarnegie Mellon

Evolution of diameter?

• Prior analysis, on power-law-like graphs, hints that

diameter ~ O(log(N)) or

diameter ~ O( log(log(N)))

• i.e.., slowly increasing with network size

• Q: What is happening, in reality?

• A: It shrinks(!!), towards a constant value

Page 13: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 13

School of Computer ScienceCarnegie Mellon

Shrinking diameter

ArXiv physics papers and their citations

[Leskovec+05a]

Page 14: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 14

School of Computer ScienceCarnegie Mellon

Shrinking diameter

ArXiv: who wrote what

Page 15: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 15

School of Computer ScienceCarnegie Mellon

Shrinking diameter

U.S. patents citing each other

Page 16: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 16

School of Computer ScienceCarnegie Mellon

Shrinking diameter

Autonomous systems

Page 17: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 17

School of Computer ScienceCarnegie Mellon

Temporal evolution of graphs

• N(t) nodes; E(t) edges at time t

• suppose that

N(t+1) = 2 * N(t)

• Q: what is your guess for

E(t+1) =? 2 * E(t)

Page 18: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 18

School of Computer ScienceCarnegie Mellon

Temporal evolution of graphs

• N(t) nodes; E(t) edges at time t

• suppose that

N(t+1) = 2 * N(t)

• Q: what is your guess for

E(t+1) =? 2 * E(t)

• A: over-doubled!x

Page 19: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 19

School of Computer ScienceCarnegie Mellon

Densification Power Law

ArXiv: Physics papers

and their citations

1.69

N(t)

E(t)

Page 20: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 20

School of Computer ScienceCarnegie Mellon

Densification Power Law

U.S. Patents, citing each other

1.66

N(t)

E(t)

Page 21: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 21

School of Computer ScienceCarnegie Mellon

Densification Power Law

Autonomous Systems

1.18

N(t)

E(t)

Page 22: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 22

School of Computer ScienceCarnegie Mellon

Densification Power Law

ArXiv: who wrote what

1.15

N(t)

E(t)

Page 23: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 23

School of Computer ScienceCarnegie Mellon

Summary of ‘laws’

Static graphs

• power law degrees; power law eigenvalues

• communities (within communities)

• small diameters

Dynamic graphs

• shrinking diameter

• densification power law

Page 24: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 24

School of Computer ScienceCarnegie Mellon

Overview

• Achievements– global patterns and ‘laws’ (static/dynamic)– generators– influence propagation– communities; graph partitioning– local patterns: frequent subgraphs

• Challenges

Page 25: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 25

School of Computer ScienceCarnegie Mellon

Problem#3: Generators

• Q: what local behavior can generate such graphs?

Page 26: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 26

School of Computer ScienceCarnegie Mellon

Problem#3: Generators

Q: what local behavior can generate such graphs?

• A1: Preferential attachment [Barabasi+]

• A2: ‘copying’ model [Kleinberg+]

• A3: ‘forest-fire’ model [Leskovec+]

• A4: Kronecker [Leskovec+]

• A5: Economic reasons [Papadimitriou+]

Page 27: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 27

School of Computer ScienceCarnegie Mellon

Q: what local behavior can generate such graphs?

• A1: Preferential attachment

• A2: ‘copying’ model

• A3: ‘forest-fire’ model

• A4: Kronecker

• A5: Economic reasons

Problem#3: Generators

power law degree

+ communities

+ DPL

+ easily parallilizable

Page 28: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 28

School of Computer ScienceCarnegie Mellon

Overview

• Achievements– global patterns and ‘laws’ (static/dynamic)– generators– influence propagation– communities; graph partitioning– local patterns: frequent subgraphs

• Challenges

Page 29: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 29

School of Computer ScienceCarnegie Mellon

Problem#4: influence propagation

• how do influence/rumors/viruses propagate?

• what is the best customer to market to?

Page 30: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 30

School of Computer ScienceCarnegie Mellon

Problem#4: influence propagation

• how do influence/rumors/viruses propagate?– tipping point [Kleinberg+]– first eigenvalue -> epidemic threshold

[Chakrabarti+]

• what is the best customer to market to?– network value of a customer [Domingos+]

Page 31: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 31

School of Computer ScienceCarnegie Mellon

Overview

• Achievements– global patterns and ‘laws’ (static/dynamic)– generators– influence propagation– communities; graph partitioning– local patterns: frequent subgraphs

• Challenges

Page 32: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 32

School of Computer ScienceCarnegie Mellon

Problem #5: communities

• how to find ‘natural’ communities, quickly?

• how to find ‘strange’/suspicious/valuable edges?

Page 33: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 33

School of Computer ScienceCarnegie Mellon

Problem #5: communities

• how to find ‘natural’ communities, quickly?– network flow [Flake+]– node/edge betweeness (~ ‘stress’)– cross-associations [Chakrabarti+]– 2nd eigenvalue; METIS [Karypis+]– random walks [Newman]– etc etc etc

Page 34: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 34

School of Computer ScienceCarnegie Mellon

Problem #5: communities

• connection sub-graphs [Faloutsos+]

Page 35: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 35

School of Computer ScienceCarnegie Mellon

Problem #5: communities

• connection sub-graphs [Faloutsos+]

Page 36: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 36

School of Computer ScienceCarnegie Mellon

Problem #5: communities

• connection sub-graphs [Faloutsos+]

• BANKS system [Chakrabarti+]

• ObjectRank [Papakonstantinou+]

Page 37: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 37

School of Computer ScienceCarnegie Mellon

Overview

• Achievements– global patterns and ‘laws’ (static/dynamic)– generators– influence propagation– communities; graph partitioning– local patterns: frequent subgraphs

• Challenges

Page 38: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 38

School of Computer ScienceCarnegie Mellon

Problem #6: local patterns

• Which sub-graphs are common/frequent?

molecule #1 molecule #M

C C

C H

HC C

CN

N

Page 39: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 39

School of Computer ScienceCarnegie Mellon

Problem #6: local patterns

• Which sub-graphs are common/frequent?

molecule #1 molecule #M

C C

C H

HC C

CN

N

Page 40: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 40

School of Computer ScienceCarnegie Mellon

Problem #6: local patterns

• Clever extensions of ‘Association Rules’ (= frequent itemsets / market basket analysis)– [Jiawei Han+]– [Jian Pei+]– [G. Karypis]– [M. Zaki+] – ...

Page 41: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 41

School of Computer ScienceCarnegie Mellon

Overview

• Achievements

• Challenges– time evolving graphs– multi-graphs

Page 42: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 42

School of Computer ScienceCarnegie Mellon

Time evolving graphs

• Q: what will happen next? (eg., on a traffic matrix?)

1/1/2005

Page 43: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 43

School of Computer ScienceCarnegie Mellon

Time evolving graphs

• Q: what will happen next? (eg., on a traffic matrix?)

1/2/2005

Page 44: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 44

School of Computer ScienceCarnegie Mellon

Time evolving graphs

• Q: what will happen next? (eg., on a traffic matrix?)

1/3/2005

Page 45: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 45

School of Computer ScienceCarnegie Mellon

Time evolving graphs

• Q: what will happen next? (eg., on a traffic matrix?)

1/4/2005

??

? ?

Page 46: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 46

School of Computer ScienceCarnegie Mellon

Overview

• Achievements

• Challenges– time evolving graphs– multi-graphs

Page 47: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 47

School of Computer ScienceCarnegie Mellon

Multi-graphs

• Patterns/outliers?

friends

co-authors

Page 48: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 48

School of Computer ScienceCarnegie Mellon

Multi-graphs

• Relational Learning

• link prediction, feature extraction [Jensen+] etc

• Dis-ambiguation, de-duplication [Getoor+] etc

friends

co-authors

Page 49: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 49

School of Computer ScienceCarnegie Mellon

Promising solution: tensors

I x J x K

destinationperson

sour

cepe

rson

type ofcontact

Page 50: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 50

School of Computer ScienceCarnegie Mellon

Tensors: ~SVD, for >=3 modes

¼

I x R

K x R

A

B

J x R

C

R x R x R

I x J x K

+…+=

Foil: from Tamara Kolda (Sandia)

Page 51: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 51

School of Computer ScienceCarnegie Mellon

=

U

I x R

V

J x R

WK x R

R x R x R

Specially Structured Tensors• Tucker Tensor • Kruskal Tensor

I x J x K

=

U

I x R

V

J x S

WK x T

R x S x T

I x J x K

Our Notation

Our Notation

+…+=

u1 uR

v1

w1

vR

wR

“core”

Foil: from Tamara Kolda (Sandia)

Page 52: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 52

School of Computer ScienceCarnegie Mellon

Conclusions - achievements

• Surprising patterns in graphs– power laws; communities– small/shrinking diameters

• simple, local behavior can lead to such patterns (eg., preferential attachment, etc)

• fast algorithms for communities/partitioning

• fast algorithms for ‘frequent subgraphs’

Page 53: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 53

School of Computer ScienceCarnegie Mellon

Conclusions - next steps

• multi-graphs

• time-evolving graphs

• scalability

• (graph sampling)

• (large graph visualization)

Page 54: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 54

School of Computer ScienceCarnegie Mellon

Conclusions - philosophically:

• deep connections with self-similarity, cellular automata (~agents), and ‘fractals’

Page 55: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 55

School of Computer ScienceCarnegie Mellon

Resources

• Manfred Schroeder “Chaos, Fractals and Power Laws”, 1991

• A-L. Barabasi, “Linked”, 2002

• D. Watts, “Six Degrees”, 2004

Page 56: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 56

School of Computer ScienceCarnegie Mellon

References

• R. Albert, H. Jeong, and A.-L. Barabási, Diameter of the World Wide Web. Nature 401, 130-131 (1999)

• A. Fabrikant, E. Koutsoupias, and C. Papadimitriou. Heuristically Optimized Trade-offs: A New Paradigm for Power Laws in the Internet. Int. Colloquium on Automata, Languages, and Programming (ICALP), Malaga, Spain, July 2002.

Page 57: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 57

School of Computer ScienceCarnegie Mellon

References

• [sigcomm99] Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, What does the Internet look like? Empirical Laws of the Internet Topology, SIGCOMM 1999

• [Leskovec 05] Jure Leskovec, Jon M. Kleinberg, Christos Faloutsos: Graphs over time: densification laws, shrinking diameters and possible explanations. KDD 2005: 177-187

Page 58: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 58

School of Computer ScienceCarnegie Mellon

References

• [brite] Alberto Medina, Anukool Lakhina, Ibrahim Matta, and John Byers. BRITE: An Approach to Universal Topology Generation. MASCOTS '01

• Xifeng Yan, Xianghong Jasmine Zhou, Jiawei Han: Mining closed relational graphs with connectivity constraints. KDD 2005: 324-333

Page 59: Graph Mining, self-similarity and  power  laws

DoE/DoD, 2007 C. Faloutsos 59

School of Computer ScienceCarnegie Mellon

Thank you!

Contact info:christos <at> cs.cmu.edu

www. cs.cmu.edu /~christos

(w/ papers, datasets, code, etc)