large graph mining - department of computer science and ... › ~hillol › ngdm07 › abstracts ›...
TRANSCRIPT
![Page 1: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/1.jpg)
CMU SCS
Large Graph Mining
Christos Faloutsos
CMU
![Page 2: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/2.jpg)
NGDM 2007 C. Faloutsos 2
CMU SCS
Thank you!
• Hillol Kargupta
![Page 3: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/3.jpg)
NGDM 2007 C. Faloutsos 3
CMU SCS
Outline
• Problem definition / Motivation
• Static & dynamic laws; generators
• Tools: CenterPiece graphs; fraud detection
• Conclusions
![Page 4: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/4.jpg)
NGDM 2007 C. Faloutsos 4
CMU SCS
Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Fraud detection
![Page 5: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/5.jpg)
NGDM 2007 C. Faloutsos 5
CMU SCS
Problem#1: Joint work with
Dr. Deepayan Chakrabarti
(CMU/Yahoo R.L.)
![Page 6: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/6.jpg)
NGDM 2007 C. Faloutsos 6
CMU SCS
Graphs - why should we care?
Internet Map
[lumeta.com]
Food Web
[Martinez ’91]
Protein Interactions
[genomebiology.com]
Friendship Network
[Moody ’01]
![Page 7: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/7.jpg)
NGDM 2007 C. Faloutsos 7
CMU SCS
Graphs - why should we care?
• IR: bi-partite graphs (doc-terms)
• web: hyper-text graph
• ... and more:
D1
DN
T1
TM
... ...
![Page 8: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/8.jpg)
NGDM 2007 C. Faloutsos 8
CMU SCS
Graphs - why should we care?
• network of companies & board-of-directors
members
• ‘viral’ marketing
• web-log (‘blog’) news propagation
• computer network security: email/IP traffic
and anomaly detection
• ....
![Page 9: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/9.jpg)
NGDM 2007 C. Faloutsos 9
CMU SCS
Problem #1 - network and graph
mining
• How does the Internet look like?
• How does the web look like?
• What is ‘normal’/‘abnormal’?
• which patterns/laws hold?
![Page 10: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/10.jpg)
NGDM 2007 C. Faloutsos 10
CMU SCS
Graph mining
• Are real graphs random?
![Page 11: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/11.jpg)
NGDM 2007 C. Faloutsos 11
CMU SCS
Laws and patterns
• Are real graphs random?
• A: NO!!
– Diameter
– in- and out- degree distributions
– other (surprising) patterns
![Page 12: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/12.jpg)
NGDM 2007 C. Faloutsos 12
CMU SCS
Solution#1
• Power law in the degree distribution
[SIGCOMM99]
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
![Page 13: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/13.jpg)
NGDM 2007 C. Faloutsos 13
CMU SCS
Solution#1’: Eigen Exponent E
• A2: power law in the eigenvalues of the adjacency matrix
E = -0.48
Exponent = slope
Eigenvalue
Rank of decreasing eigenvalue
May 2001
![Page 14: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/14.jpg)
NGDM 2007 C. Faloutsos 14
CMU SCS
But:
How about graphs from other domains?
![Page 15: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/15.jpg)
NGDM 2007 C. Faloutsos 15
CMU SCS
More power laws:
• web hit counts [w/ A. Montgomery]
Web Site Traffic
log(in-degree)
log(count)
Zipf
userssites
``ebay’’
![Page 16: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/16.jpg)
NGDM 2007 C. Faloutsos 16
CMU SCS
epinions.com
• who-trusts-whom
[Richardson +
Domingos, KDD
2001]
(out) degree
count
trusts-2000-people user
![Page 17: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/17.jpg)
NGDM 2007 C. Faloutsos 17
CMU SCS
Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Fraud detection
![Page 18: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/18.jpg)
NGDM 2007 C. Faloutsos 18
CMU SCS
Problem#2: Time evolution
• with Jure Leskovec
(CMU/MLD)
• and Jon Kleinberg (Cornell –
sabb. @ CMU)
![Page 19: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/19.jpg)
NGDM 2007 C. Faloutsos 19
CMU SCS
Evolution of the Diameter
• Prior work on Power Law graphs hints
at slowly growing diameter:
– diameter ~ O(log N)
– diameter ~ O(log log N)
• What is happening in real data?
![Page 20: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/20.jpg)
NGDM 2007 C. Faloutsos 20
CMU SCS
Evolution of the Diameter
• Prior work on Power Law graphs hints
at slowly growing diameter:
– diameter ~ O(log N)
– diameter ~ O(log log N)
• What is happening in real data?
• Diameter shrinks over time
![Page 21: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/21.jpg)
NGDM 2007 C. Faloutsos 21
CMU SCS
Diameter – ArXiv citation graph
• Citations among
physics papers
• 1992 –2003
• One graph per
year
time [years]
diameter
![Page 22: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/22.jpg)
NGDM 2007 C. Faloutsos 22
CMU SCS
Diameter – “Autonomous
Systems”
• Graph of Internet
• One graph per
day
• 1997 – 2000
number of nodes
diameter
![Page 23: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/23.jpg)
NGDM 2007 C. Faloutsos 23
CMU SCS
Diameter – “Affiliation Network”
• Graph of
collaborations in
physics – authors
linked to papers
• 10 years of data
time [years]
diameter
![Page 24: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/24.jpg)
NGDM 2007 C. Faloutsos 24
CMU SCS
Diameter – “Patents”
• Patent citation
network
• 25 years of data
time [years]
diameter
![Page 25: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/25.jpg)
NGDM 2007 C. Faloutsos 25
CMU SCS
Temporal Evolution of the Graphs
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
![Page 26: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/26.jpg)
NGDM 2007 C. Faloutsos 26
CMU SCS
Temporal Evolution of the Graphs
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
• A: over-doubled!
– But obeying the ``Densification Power Law’’
![Page 27: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/27.jpg)
NGDM 2007 C. Faloutsos 27
CMU SCS
Densification – Physics Citations
• Citations among physics papers
• 2003:
– 29,555 papers, 352,807 citations
N(t)
E(t)
??
![Page 28: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/28.jpg)
NGDM 2007 C. Faloutsos 28
CMU SCS
Densification – Physics Citations
• Citations among physics papers
• 2003:
– 29,555 papers, 352,807 citations
N(t)
E(t)
1.69
![Page 29: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/29.jpg)
NGDM 2007 C. Faloutsos 29
CMU SCS
Densification – Physics Citations
• Citations among physics papers
• 2003:
– 29,555 papers, 352,807 citations
N(t)
E(t)
1.69
1: tree
![Page 30: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/30.jpg)
NGDM 2007 C. Faloutsos 30
CMU SCS
Densification – Physics Citations
• Citations among physics papers
• 2003:
– 29,555 papers, 352,807 citations
N(t)
E(t)
1.69clique: 2
![Page 31: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/31.jpg)
NGDM 2007 C. Faloutsos 31
CMU SCS
Densification – Patent Citations
• Citations among
patents granted
• 1999
– 2.9 million nodes
– 16.5 million
edges
• Each year is a
datapointN(t)
E(t)
1.66
![Page 32: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/32.jpg)
NGDM 2007 C. Faloutsos 32
CMU SCS
Densification – Autonomous Systems
• Graph of
Internet
• 2000
– 6,000 nodes
– 26,000 edges
• One graph per
day
N(t)
E(t)
1.18
![Page 33: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/33.jpg)
NGDM 2007 C. Faloutsos 33
CMU SCS
Densification – Affiliation
Network
• Authors linked
to their
publications
• 2002
– 60,000 nodes
• 20,000 authors
• 38,000 papers
– 133,000 edgesN(t)
E(t)
1.15
![Page 34: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/34.jpg)
NGDM 2007 C. Faloutsos 34
CMU SCS
Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Fraud detection
![Page 35: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/35.jpg)
NGDM 2007 C. Faloutsos 35
CMU SCS
Problem#3: Generation
• Given a growing graph with count of nodes N1,
N2, …
• Generate a realistic sequence of graphs that will
obey all the patterns
![Page 36: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/36.jpg)
NGDM 2007 C. Faloutsos 36
CMU SCS
Problem Definition
• Given a growing graph with count of nodes N1, N2, …
• Generate a realistic sequence of graphs that will obey all the patterns
– Static PatternsPower Law Degree Distribution
Power Law eigenvalue and eigenvector distribution
Small Diameter
– Dynamic PatternsGrowth Power Law
Shrinking/Stabilizing Diameters
![Page 37: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/37.jpg)
NGDM 2007 C. Faloutsos 37
CMU SCS
Problem Definition
• Given a growing graph with count of nodes
N1, N2, …
• Generate a realistic sequence of graphs that
will obey all the patterns
• Idea: Self-similarity
– Leads to power laws
– Communities within communities
– …
![Page 38: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/38.jpg)
NGDM 2007 C. Faloutsos 38
CMU SCS
Adjacency matrix
Kronecker Product – a Graph
Intermediate stage
Adjacency matrix
![Page 39: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/39.jpg)
NGDM 2007 C. Faloutsos 39
CMU SCS
Kronecker Product – a Graph
• Continuing multiplying with G1 we obtain G4 and
so on …
G4adjacency matrix
![Page 40: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/40.jpg)
NGDM 2007 C. Faloutsos 40
CMU SCS
Kronecker Product – a Graph
• Continuing multiplying with G1 we obtain G4 and
so on …
G4adjacency matrix
![Page 41: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/41.jpg)
NGDM 2007 C. Faloutsos 41
CMU SCS
Kronecker Product – a Graph
• Continuing multiplying with G1 we obtain G4 and
so on …
G4adjacency matrix
![Page 42: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/42.jpg)
NGDM 2007 C. Faloutsos 42
CMU SCS
Properties:
• We can PROVE that
– Degree distribution is multinomial ~ power law
– Diameter: constant
– Eigenvalue distribution: multinomial
– First eigenvector: multinomial
• See [Leskovec+, PKDD’05] for proofs
![Page 43: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/43.jpg)
NGDM 2007 C. Faloutsos 43
CMU SCS
Problem Definition
• Given a growing graph with nodes N1, N2, …
• Generate a realistic sequence of graphs that will obey all
the patterns
– Static Patterns
Power Law Degree Distribution
Power Law eigenvalue and eigenvector distribution
Small Diameter
– Dynamic Patterns
Growth Power Law
Shrinking/Stabilizing Diameters
• First and only generator for which we can prove
all these properties
������������
��������
![Page 44: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/44.jpg)
NGDM 2007 C. Faloutsos 44
CMU SCS
(Q: how to fit the parm’s?)
A:
• Stochastic version of Kronecker graphs +
• Max likelihood +
• Metropolis sampling
• [Leskovec+, ICML’07]
![Page 45: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/45.jpg)
NGDM 2007 C. Faloutsos 45
CMU SCS
Experiments on real AS graphDegree distribution Hop plot
Network valueAdjacency matrix eigen values
![Page 46: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/46.jpg)
NGDM 2007 C. Faloutsos 46
CMU SCS
Conclusions
• Kronecker graphs have:
– All the static properties
Heavy tailed degree distributions
Small diameter
Multinomial eigenvalues and eigenvectors
– All the temporal properties
Densification Power Law
Shrinking/Stabilizing Diameters
– We can formally prove these results
����
����
����
����
����
![Page 47: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/47.jpg)
NGDM 2007 C. Faloutsos 47
CMU SCS
Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Fraud detection
![Page 48: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/48.jpg)
NGDM 2007 C. Faloutsos 48
CMU SCS
Problem#4: MasterMind – ‘CePS’
• w/ Hanghang Tong,
KDD 2006
• htong <at> cs.cmu.edu
![Page 49: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/49.jpg)
NGDM 2007 C. Faloutsos 49
CMU SCS
Center-Piece Subgraph(Ceps)
• Given Q query nodes
• Find Center-piece ( )
• App.
– Social Networks
– Law Inforcement, …
• Idea:
– Proximity -> random walk with restarts
A C
B
b≤
![Page 50: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/50.jpg)
NGDM 2007 C. Faloutsos 50
CMU SCS
Case Study: AND query
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
![Page 51: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/51.jpg)
NGDM 2007 C. Faloutsos 51
CMU SCS
Case Study: AND query
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V.
Jagadish
Laks V.S.
Lakshmanan
Heikki
Mannila
Christos
Faloutsos
Padhraic
Smyth
Corinna
Cortes
15 1013
1 1
6
1 1
4 Daryl
Pregibon
10
2
1
13
1
6
![Page 52: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/52.jpg)
NGDM 2007 C. Faloutsos 52
CMU SCS
Case Study: AND query
R. Agrawal Jiawei Han
V. Vapnik M. Jordan
H.V.
Jagadish
Laks V.S.
Lakshmanan
Heikki
Mannila
Christos
Faloutsos
Padhraic
Smyth
Corinna
Cortes
15 1013
1 1
6
1 1
4 Daryl
Pregibon
10
2
1
13
1
6
![Page 53: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/53.jpg)
NGDM 2007 C. Faloutsos 53
CMU SCS
Conclusions
• Q1:How to measure the importance?
• A1: RWR+K_SoftAnd
• Q2:How to do it efficiently?
• A2:Graph Partition (Fast CePS)
– ~90% quality
– 150x speedup (ICDM’06)
A C
B
![Page 54: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/54.jpg)
NGDM 2007 C. Faloutsos 54
CMU SCS
Motivation
Data mining: ~ find patterns (rules, outliers)
• Problem#1: How do real graphs look like?
• Problem#2: How do they evolve?
• Problem#3: How to generate realistic graphs
TOOLS
• Problem#4: Who is the ‘master-mind’?
• Problem#5: Fraud detection
![Page 55: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/55.jpg)
NGDM 2007 C. Faloutsos 55
CMU SCS
E-bay Fraud detection
w/ Polo Chau &
Shashank Pandit, CMU
![Page 56: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/56.jpg)
NGDM 2007 C. Faloutsos 56
CMU SCS
E-bay Fraud detection
• lines: positive feedbacks
• would you buy from him/her?
![Page 57: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/57.jpg)
NGDM 2007 C. Faloutsos 57
CMU SCS
E-bay Fraud detection
• lines: positive feedbacks
• would you buy from him/her?
• or him/her?
![Page 58: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/58.jpg)
NGDM 2007 C. Faloutsos 58
CMU SCS
E-bay Fraud detection - NetProbe
![Page 59: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/59.jpg)
NGDM 2007 C. Faloutsos 59
CMU SCS
OVERALL CONCLUSIONS
• Graphs pose a wealth of fascinating
problems
• self-similarity and power laws work,
when textbook methods fail!
• New patterns (shrinking diameter!)
• New generator: Kronecker
![Page 60: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/60.jpg)
NGDM 2007 C. Faloutsos 60
CMU SCS
Promising directions
• Reaching out
– Sociology, epidemiology; physics, ++…
– Computer networks, security, intrusion det.
– Num. analysis (tensors)
time
IP-source
IP-destination
![Page 61: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/61.jpg)
NGDM 2007 C. Faloutsos 61
CMU SCS
Promising directions – cont’d
• Scaling up, to Gb/Tb/Pb
– Storage Systems
– Parallelism (hadoop/map-reduce)
![Page 62: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/62.jpg)
NGDM 2007 C. Faloutsos 62
CMU SCS
E.g.: self-* system @ CMU
• >200 nodes
• 40 racks of computing equipment
• 774kw of power.
• target: 1 PetaByte
• goal: self-correcting, self-securing, self-monitoring, self-...
![Page 63: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/63.jpg)
NGDM 2007 C. Faloutsos 63
CMU SCS
DM for Tera- and Peta-bytes
Two-way street:
<- DM can use such infrastructures to find
patterns
-> DM can help such infrastructures become
self-healing, self-adjusting, ‘self-*’
![Page 64: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/64.jpg)
NGDM 2007 C. Faloutsos 64
CMU SCS
References
• Hanghang Tong, Christos Faloutsos, and Jia-Yu
Pan Fast Random Walk with Restart and Its
Applications ICDM 2006, Hong Kong.
• Hanghang Tong, Christos Faloutsos Center-Piece
Subgraphs: Problem Definition and Fast
Solutions, KDD 2006, Philadelphia, PA
![Page 65: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/65.jpg)
NGDM 2007 C. Faloutsos 65
CMU SCS
References
• Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations KDD 2005, Chicago, IL. ("Best Research Paper" award).
• Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication(ECML/PKDD 2005), Porto, Portugal, 2005.
![Page 66: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/66.jpg)
NGDM 2007 C. Faloutsos 66
CMU SCS
References
• Jure Leskovec and Christos Faloutsos, Scalable Modeling of Real Graphs using Kronecker Multiplication, ICML 2007, Corvallis, OR, USA
• Shashank Pandit, Duen Horng (Polo) Chau, Samuel Wang and Christos Faloutsos NetProbe: A Fast and Scalable System for Fraud Detection in Online Auction Networks WWW 2007, Banff,
Alberta, Canada, May 8-12, 2007.
• Jimeng Sun, Dacheng Tao, Christos Faloutsos Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA
![Page 67: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/67.jpg)
NGDM 2007 C. Faloutsos 67
CMU SCS
References
• Jimeng Sun, Yinglian Xie, Hui Zhang, Christos
Faloutsos. Less is More: Compact Matrix
Decomposition for Large Sparse Graphs, SDM,
Minneapolis, Minnesota, Apr 2007. [pdf]
![Page 68: Large Graph Mining - Department of Computer Science and ... › ~hillol › NGDM07 › abstracts › ... · Large Graph Mining Christos Faloutsos CMU. NGDM 2007 C. Faloutsos 2 CMU](https://reader033.vdocuments.mx/reader033/viewer/2022060507/5f2247cffcb66e25f906dd4e/html5/thumbnails/68.jpg)
NGDM 2007 C. Faloutsos 68
CMU SCS
Contact info:
www. cs.cmu.edu /~christos
(w/ papers, datasets, code, etc)