graph mining, self-similarity and power laws
DESCRIPTION
Graph Mining, self-similarity and power laws. Christos Faloutsos Carnegie Mellon University. Overview. Achievements global patterns and ‘laws’ (static/dynamic) generators influence propagation communities; graph partitioning local patterns: frequent subgraphs Challenges. - PowerPoint PPT PresentationTRANSCRIPT
School of Computer ScienceCarnegie Mellon
Graph Mining, self-similarity and power laws
Christos Faloutsos
Carnegie Mellon University
DoE/DoD, 2007 C. Faloutsos 2
School of Computer ScienceCarnegie Mellon
Overview
• Achievements– global patterns and ‘laws’ (static/dynamic)– generators– influence propagation– communities; graph partitioning– local patterns: frequent subgraphs
• Challenges
DoE/DoD, 2007 C. Faloutsos 3
School of Computer ScienceCarnegie Mellon
Motivating questions:
• How does the Internet look like?• What constitutes a ‘normal’ social
network?• ‘network value’ of a customer?
[Domingos+]• which gene/species affects the others
the most?
DoE/DoD, 2007 C. Faloutsos 4
School of Computer ScienceCarnegie Mellon
Problem #1 - topology
How does the Internet look like? Any rules?
DoE/DoD, 2007 C. Faloutsos 5
School of Computer ScienceCarnegie Mellon
Solution#1: Rank exponent R• A1: Power law in the degree distribution
[SIGCOMM99]
internet domains
log(rank)
log(degree)
-0.82
att.com
ibm.com
DoE/DoD, 2007 C. Faloutsos 6
School of Computer ScienceCarnegie Mellon
Power laws
• In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER]
log indegree
log(freq)
from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]
DoE/DoD, 2007 C. Faloutsos 7
School of Computer ScienceCarnegie Mellon
epinions.com
• who-trusts-whom [Richardson + Domingos, KDD 2001]
(out) degree
count
DoE/DoD, 2007 C. Faloutsos 8
School of Computer ScienceCarnegie Mellon
Even more power laws:
• web hit counts [w/ A. Montgomery]
Web Site Traffic
log(freq)
log(count)
Zipf“yahoo.com”
DoE/DoD, 2007 C. Faloutsos 9
School of Computer ScienceCarnegie Mellon
Overview
• Achievements– global patterns and ‘laws’ (static/dynamic)– generators– influence propagation– communities; graph partitioning– local patterns: frequent subgraphs
• Challenges
DoE/DoD, 2007 C. Faloutsos 10
School of Computer ScienceCarnegie Mellon
Problem#2: evolutionGiven a graph:
• how will it look like, next year?
[from Lumeta: ISPs 6/1999]
DoE/DoD, 2007 C. Faloutsos 11
School of Computer ScienceCarnegie Mellon
Evolution of diameter?
• Prior analysis, on power-law-like graphs, hints that
diameter ~ O(log(N)) or
diameter ~ O( log(log(N)))
• i.e.., slowly increasing with network size
• Q: What is happening, in reality?
DoE/DoD, 2007 C. Faloutsos 12
School of Computer ScienceCarnegie Mellon
Evolution of diameter?
• Prior analysis, on power-law-like graphs, hints that
diameter ~ O(log(N)) or
diameter ~ O( log(log(N)))
• i.e.., slowly increasing with network size
• Q: What is happening, in reality?
• A: It shrinks(!!), towards a constant value
DoE/DoD, 2007 C. Faloutsos 13
School of Computer ScienceCarnegie Mellon
Shrinking diameter
ArXiv physics papers and their citations
[Leskovec+05a]
DoE/DoD, 2007 C. Faloutsos 14
School of Computer ScienceCarnegie Mellon
Shrinking diameter
ArXiv: who wrote what
DoE/DoD, 2007 C. Faloutsos 15
School of Computer ScienceCarnegie Mellon
Shrinking diameter
U.S. patents citing each other
DoE/DoD, 2007 C. Faloutsos 16
School of Computer ScienceCarnegie Mellon
Shrinking diameter
Autonomous systems
DoE/DoD, 2007 C. Faloutsos 17
School of Computer ScienceCarnegie Mellon
Temporal evolution of graphs
• N(t) nodes; E(t) edges at time t
• suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
DoE/DoD, 2007 C. Faloutsos 18
School of Computer ScienceCarnegie Mellon
Temporal evolution of graphs
• N(t) nodes; E(t) edges at time t
• suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
• A: over-doubled!x
DoE/DoD, 2007 C. Faloutsos 19
School of Computer ScienceCarnegie Mellon
Densification Power Law
ArXiv: Physics papers
and their citations
1.69
N(t)
E(t)
DoE/DoD, 2007 C. Faloutsos 20
School of Computer ScienceCarnegie Mellon
Densification Power Law
U.S. Patents, citing each other
1.66
N(t)
E(t)
DoE/DoD, 2007 C. Faloutsos 21
School of Computer ScienceCarnegie Mellon
Densification Power Law
Autonomous Systems
1.18
N(t)
E(t)
DoE/DoD, 2007 C. Faloutsos 22
School of Computer ScienceCarnegie Mellon
Densification Power Law
ArXiv: who wrote what
1.15
N(t)
E(t)
DoE/DoD, 2007 C. Faloutsos 23
School of Computer ScienceCarnegie Mellon
Summary of ‘laws’
Static graphs
• power law degrees; power law eigenvalues
• communities (within communities)
• small diameters
Dynamic graphs
• shrinking diameter
• densification power law
DoE/DoD, 2007 C. Faloutsos 24
School of Computer ScienceCarnegie Mellon
Overview
• Achievements– global patterns and ‘laws’ (static/dynamic)– generators– influence propagation– communities; graph partitioning– local patterns: frequent subgraphs
• Challenges
DoE/DoD, 2007 C. Faloutsos 25
School of Computer ScienceCarnegie Mellon
Problem#3: Generators
• Q: what local behavior can generate such graphs?
DoE/DoD, 2007 C. Faloutsos 26
School of Computer ScienceCarnegie Mellon
Problem#3: Generators
Q: what local behavior can generate such graphs?
• A1: Preferential attachment [Barabasi+]
• A2: ‘copying’ model [Kleinberg+]
• A3: ‘forest-fire’ model [Leskovec+]
• A4: Kronecker [Leskovec+]
• A5: Economic reasons [Papadimitriou+]
DoE/DoD, 2007 C. Faloutsos 27
School of Computer ScienceCarnegie Mellon
Q: what local behavior can generate such graphs?
• A1: Preferential attachment
• A2: ‘copying’ model
• A3: ‘forest-fire’ model
• A4: Kronecker
• A5: Economic reasons
Problem#3: Generators
power law degree
+ communities
+ DPL
+ easily parallilizable
DoE/DoD, 2007 C. Faloutsos 28
School of Computer ScienceCarnegie Mellon
Overview
• Achievements– global patterns and ‘laws’ (static/dynamic)– generators– influence propagation– communities; graph partitioning– local patterns: frequent subgraphs
• Challenges
DoE/DoD, 2007 C. Faloutsos 29
School of Computer ScienceCarnegie Mellon
Problem#4: influence propagation
• how do influence/rumors/viruses propagate?
• what is the best customer to market to?
DoE/DoD, 2007 C. Faloutsos 30
School of Computer ScienceCarnegie Mellon
Problem#4: influence propagation
• how do influence/rumors/viruses propagate?– tipping point [Kleinberg+]– first eigenvalue -> epidemic threshold
[Chakrabarti+]
• what is the best customer to market to?– network value of a customer [Domingos+]
DoE/DoD, 2007 C. Faloutsos 31
School of Computer ScienceCarnegie Mellon
Overview
• Achievements– global patterns and ‘laws’ (static/dynamic)– generators– influence propagation– communities; graph partitioning– local patterns: frequent subgraphs
• Challenges
DoE/DoD, 2007 C. Faloutsos 32
School of Computer ScienceCarnegie Mellon
Problem #5: communities
• how to find ‘natural’ communities, quickly?
• how to find ‘strange’/suspicious/valuable edges?
DoE/DoD, 2007 C. Faloutsos 33
School of Computer ScienceCarnegie Mellon
Problem #5: communities
• how to find ‘natural’ communities, quickly?– network flow [Flake+]– node/edge betweeness (~ ‘stress’)– cross-associations [Chakrabarti+]– 2nd eigenvalue; METIS [Karypis+]– random walks [Newman]– etc etc etc
DoE/DoD, 2007 C. Faloutsos 34
School of Computer ScienceCarnegie Mellon
Problem #5: communities
• connection sub-graphs [Faloutsos+]
DoE/DoD, 2007 C. Faloutsos 35
School of Computer ScienceCarnegie Mellon
Problem #5: communities
• connection sub-graphs [Faloutsos+]
DoE/DoD, 2007 C. Faloutsos 36
School of Computer ScienceCarnegie Mellon
Problem #5: communities
• connection sub-graphs [Faloutsos+]
• BANKS system [Chakrabarti+]
• ObjectRank [Papakonstantinou+]
DoE/DoD, 2007 C. Faloutsos 37
School of Computer ScienceCarnegie Mellon
Overview
• Achievements– global patterns and ‘laws’ (static/dynamic)– generators– influence propagation– communities; graph partitioning– local patterns: frequent subgraphs
• Challenges
DoE/DoD, 2007 C. Faloutsos 38
School of Computer ScienceCarnegie Mellon
Problem #6: local patterns
• Which sub-graphs are common/frequent?
molecule #1 molecule #M
C C
C H
HC C
CN
N
DoE/DoD, 2007 C. Faloutsos 39
School of Computer ScienceCarnegie Mellon
Problem #6: local patterns
• Which sub-graphs are common/frequent?
molecule #1 molecule #M
C C
C H
HC C
CN
N
DoE/DoD, 2007 C. Faloutsos 40
School of Computer ScienceCarnegie Mellon
Problem #6: local patterns
• Clever extensions of ‘Association Rules’ (= frequent itemsets / market basket analysis)– [Jiawei Han+]– [Jian Pei+]– [G. Karypis]– [M. Zaki+] – ...
DoE/DoD, 2007 C. Faloutsos 41
School of Computer ScienceCarnegie Mellon
Overview
• Achievements
• Challenges– time evolving graphs– multi-graphs
DoE/DoD, 2007 C. Faloutsos 42
School of Computer ScienceCarnegie Mellon
Time evolving graphs
• Q: what will happen next? (eg., on a traffic matrix?)
1/1/2005
DoE/DoD, 2007 C. Faloutsos 43
School of Computer ScienceCarnegie Mellon
Time evolving graphs
• Q: what will happen next? (eg., on a traffic matrix?)
1/2/2005
DoE/DoD, 2007 C. Faloutsos 44
School of Computer ScienceCarnegie Mellon
Time evolving graphs
• Q: what will happen next? (eg., on a traffic matrix?)
1/3/2005
DoE/DoD, 2007 C. Faloutsos 45
School of Computer ScienceCarnegie Mellon
Time evolving graphs
• Q: what will happen next? (eg., on a traffic matrix?)
1/4/2005
??
? ?
DoE/DoD, 2007 C. Faloutsos 46
School of Computer ScienceCarnegie Mellon
Overview
• Achievements
• Challenges– time evolving graphs– multi-graphs
DoE/DoD, 2007 C. Faloutsos 47
School of Computer ScienceCarnegie Mellon
Multi-graphs
• Patterns/outliers?
friends
co-authors
DoE/DoD, 2007 C. Faloutsos 48
School of Computer ScienceCarnegie Mellon
Multi-graphs
• Relational Learning
• link prediction, feature extraction [Jensen+] etc
• Dis-ambiguation, de-duplication [Getoor+] etc
friends
co-authors
DoE/DoD, 2007 C. Faloutsos 49
School of Computer ScienceCarnegie Mellon
Promising solution: tensors
I x J x K
destinationperson
sour
cepe
rson
type ofcontact
DoE/DoD, 2007 C. Faloutsos 50
School of Computer ScienceCarnegie Mellon
Tensors: ~SVD, for >=3 modes
¼
I x R
K x R
A
B
J x R
C
R x R x R
I x J x K
+…+=
Foil: from Tamara Kolda (Sandia)
DoE/DoD, 2007 C. Faloutsos 51
School of Computer ScienceCarnegie Mellon
=
U
I x R
V
J x R
WK x R
R x R x R
Specially Structured Tensors• Tucker Tensor • Kruskal Tensor
I x J x K
=
U
I x R
V
J x S
WK x T
R x S x T
I x J x K
Our Notation
Our Notation
+…+=
u1 uR
v1
w1
vR
wR
“core”
Foil: from Tamara Kolda (Sandia)
DoE/DoD, 2007 C. Faloutsos 52
School of Computer ScienceCarnegie Mellon
Conclusions - achievements
• Surprising patterns in graphs– power laws; communities– small/shrinking diameters
• simple, local behavior can lead to such patterns (eg., preferential attachment, etc)
• fast algorithms for communities/partitioning
• fast algorithms for ‘frequent subgraphs’
DoE/DoD, 2007 C. Faloutsos 53
School of Computer ScienceCarnegie Mellon
Conclusions - next steps
• multi-graphs
• time-evolving graphs
• scalability
• (graph sampling)
• (large graph visualization)
DoE/DoD, 2007 C. Faloutsos 54
School of Computer ScienceCarnegie Mellon
Conclusions - philosophically:
• deep connections with self-similarity, cellular automata (~agents), and ‘fractals’
DoE/DoD, 2007 C. Faloutsos 55
School of Computer ScienceCarnegie Mellon
Resources
• Manfred Schroeder “Chaos, Fractals and Power Laws”, 1991
• A-L. Barabasi, “Linked”, 2002
• D. Watts, “Six Degrees”, 2004
DoE/DoD, 2007 C. Faloutsos 56
School of Computer ScienceCarnegie Mellon
References
• R. Albert, H. Jeong, and A.-L. Barabási, Diameter of the World Wide Web. Nature 401, 130-131 (1999)
• A. Fabrikant, E. Koutsoupias, and C. Papadimitriou. Heuristically Optimized Trade-offs: A New Paradigm for Power Laws in the Internet. Int. Colloquium on Automata, Languages, and Programming (ICALP), Malaga, Spain, July 2002.
DoE/DoD, 2007 C. Faloutsos 57
School of Computer ScienceCarnegie Mellon
References
• [sigcomm99] Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, What does the Internet look like? Empirical Laws of the Internet Topology, SIGCOMM 1999
• [Leskovec 05] Jure Leskovec, Jon M. Kleinberg, Christos Faloutsos: Graphs over time: densification laws, shrinking diameters and possible explanations. KDD 2005: 177-187
DoE/DoD, 2007 C. Faloutsos 58
School of Computer ScienceCarnegie Mellon
References
• [brite] Alberto Medina, Anukool Lakhina, Ibrahim Matta, and John Byers. BRITE: An Approach to Universal Topology Generation. MASCOTS '01
• Xifeng Yan, Xianghong Jasmine Zhou, Jiawei Han: Mining closed relational graphs with connectivity constraints. KDD 2005: 324-333
DoE/DoD, 2007 C. Faloutsos 59
School of Computer ScienceCarnegie Mellon
Thank you!
Contact info:christos <at> cs.cmu.edu
www. cs.cmu.edu /~christos
(w/ papers, datasets, code, etc)