large graph mining – patterns, tools and cascade...

126
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU

Upload: others

Post on 30-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Large Graph Mining – Patterns, Tools and Cascade

analysis

Christos Faloutsos CMU

Page 2: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Thank you! •  Xuewen Chen

•  Dennis Schwartz

Wayne State, Feb. 2013 C. Faloutsos (CMU) 2

Page 3: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 3

Roadmap

•  Introduction – Motivation – Why ‘big data’ – Why (big) graphs?

•  Problem#1: Patterns in graphs •  Problem#2: Tools •  Problem#3: Scalability •  Conclusions

Wayne State, Feb. 2013

Page 4: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Why ‘big data’ •  Why? •  What is the problem definition? •  What are the major research challenges?

Wayne State, Feb. 2013 C. Faloutsos (CMU) 4

Page 5: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS Main message:

Big data: often > experts •  ‘Super Crunchers’ Why Thinking-By-Numbers is the

New Way To Be Smart by Ian Ayres, 2008

•  Google won the machine translation competition 2005

•  http://www.itl.nist.gov/iad/mig//tests/mt/2005/doc/mt05eval_official_results_release_20050801_v3.html

Wayne State, Feb. 2013 C. Faloutsos (CMU) 5

Page 6: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Problem definition – big picture

Wayne State, Feb. 2013 C. Faloutsos (CMU) 6

Tera/Peta-byte data

Analytics Insights, outliers

Page 7: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Problem definition – big picture

Wayne State, Feb. 2013 C. Faloutsos (CMU) 7

Tera/Peta-byte data

Analytics Insights, outliers

Main emphasis in this talk

Page 8: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Problem definition – big picture

Wayne State, Feb. 2013 C. Faloutsos (CMU) 8

Tera/Peta-byte data

(my personal) rules of thumb: if data •  fits in memory -> R, matlab,

scipy •  single disk -> RDBMS (sqlite3,

mysql, postgres) •  multiple (<100-1000) disks:

parallel RDBMS (Vertica, TeraData)

•  multiple (>1000) disks: hadoop, pig

Page 9: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 9

(Free) Resource for graphs

Open source system for mining huge graphs: PEGASUS project (PEta GrAph mining

System) •  www.cs.cmu.edu/~pegasus •  Apache license for s/w •  code and papers Wayne State, Feb. 2013

Page 10: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Research challenges •  The usual ones from data mining

– Data cleansing – Feature engineering – …

PLUS – Scalability ( < O(N**2)) – Real data *disobey* textbook assumptions

(uniformity, independence, Gaussian, Poisson) with huge performance implications

Wayne State, Feb. 2013 C. Faloutsos (CMU) 10

Page 11: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 11

Roadmap

•  Introduction – Motivation – Why ‘big data’ – Why (big) graphs?

•  Problem#1: Patterns in graphs •  Problem#2: Tools •  Problem#3: Scalability •  Conclusions

Wayne State, Feb. 2013

Page 12: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 12

Graphs - why should we care?

Internet Map [lumeta.com]

Food Web [Martinez ’91]

>$10B revenue

>0.5B users

Wayne State, Feb. 2013

Page 13: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 13

Graphs - why should we care? •  IR: bi-partite graphs (doc-terms)

•  web: hyper-text graph

•  ... and more:

D1

DN

T1

TM

... ...

Wayne State, Feb. 2013

Page 14: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 14

Graphs - why should we care? •  ‘viral’ marketing •  web-log (‘blog’) news propagation •  computer network security: email/IP traffic

and anomaly detection •  .... •  Subject-verb-object -> graph •  Many-to-many db relationship -> graph

Wayne State, Feb. 2013

Page 15: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 15

Outline

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs – Weighted graphs – Time evolving graphs

•  Problem#2: Tools •  Problem#3: Scalability •  Conclusions

Wayne State, Feb. 2013

Page 16: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 16

Problem #1 - network and graph mining

•  What does the Internet look like? •  What does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

Wayne State, Feb. 2013

Page 17: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 17

Problem #1 - network and graph mining

•  What does the Internet look like? •  What does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

–  To spot anomalies (rarities), we have to discover patterns

Wayne State, Feb. 2013

Page 18: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 18

Problem #1 - network and graph mining

•  What does the Internet look like? •  What does FaceBook look like?

•  What is ‘normal’/‘abnormal’? •  which patterns/laws hold?

–  To spot anomalies (rarities), we have to discover patterns

–  Large datasets reveal patterns/anomalies that may be invisible otherwise…

Wayne State, Feb. 2013

Page 19: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 19

Graph mining •  Are real graphs random?

Wayne State, Feb. 2013

Page 20: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 20

Laws and patterns •  Are real graphs random? •  A: NO!!

– Diameter –  in- and out- degree distributions –  other (surprising) patterns

•  So, let’s look at the data

Wayne State, Feb. 2013

Page 21: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 21

Solution# S.1 •  Power law in the degree distribution

[SIGCOMM99]

log(rank)

log(degree)

internet domains

att.com

ibm.com

Wayne State, Feb. 2013

Page 22: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 22

Solution# S.1 •  Power law in the degree distribution

[SIGCOMM99]

log(rank)

log(degree)

-0.82

internet domains

att.com

ibm.com

Wayne State, Feb. 2013

Page 23: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 23

Solution# S.2: Eigen Exponent E

•  A2: power law in the eigenvalues of the adjacency matrix

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

Wayne State, Feb. 2013

Page 24: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 24

Solution# S.2: Eigen Exponent E

•  [Mihail, Papadimitriou ’02]: slope is ½ of rank exponent

E = -0.48

Exponent = slope

Eigenvalue

Rank of decreasing eigenvalue

May 2001

Wayne State, Feb. 2013

Page 25: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 25

But: How about graphs from other domains?

Wayne State, Feb. 2013

Page 26: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 26

More power laws: •  web hit counts [w/ A. Montgomery]

Web Site Traffic

in-degree (log scale)

Count (log scale)

Zipf

users sites

``ebay’’

Wayne State, Feb. 2013

Page 27: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 27

epinions.com •  who-trusts-whom

[Richardson + Domingos, KDD 2001]

(out) degree

count

trusts-2000-people user

Wayne State, Feb. 2013

Page 28: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

And numerous more •  # of sexual contacts •  Income [Pareto] –’80-20 distribution’ •  Duration of downloads [Bestavros+] •  Duration of UNIX jobs (‘mice and

elephants’) •  Size of files of a user •  … •  ‘Black swans’ Wayne State, Feb. 2013 C. Faloutsos (CMU) 28

Page 29: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 29

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs •  degree, diameter, eigen, •  triangles •  cliques

– Weighted graphs – Time evolving graphs

•  Problem#2: Tools Wayne State, Feb. 2013

Page 30: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 30

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles

Wayne State, Feb. 2013

Page 31: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 31

Solution# S.3: Triangle ‘Laws’

•  Real social networks have a lot of triangles –  Friends of friends are friends

•  Any patterns?

Wayne State, Feb. 2013

Page 32: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 32

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

ASN HEP-TH

Epinions X-axis: # of participating triangles Y: count (~ pdf)

Wayne State, Feb. 2013

Page 33: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 33

Triangle Law: #S.3 [Tsourakakis ICDM 2008]

ASN HEP-TH

Epinions

Wayne State, Feb. 2013

X-axis: # of participating triangles Y: count (~ pdf)

Page 34: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 34

Triangle Law: #S.4 [Tsourakakis ICDM 2008]

SN Reuters

Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles

Wayne State, Feb. 2013

Page 35: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges)

[U Kang, Brendan Meeder, +, PAKDD’11] 38 Wayne State, Feb. 2013 38 C. Faloutsos (CMU)

? ?

?

Page 36: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges)

[U Kang, Brendan Meeder, +, PAKDD’11] 39 Wayne State, Feb. 2013 39 C. Faloutsos (CMU)

Page 37: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges)

[U Kang, Brendan Meeder, +, PAKDD’11] 40 Wayne State, Feb. 2013 40 C. Faloutsos (CMU)

Page 38: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges)

[U Kang, Brendan Meeder, +, PAKDD’11] 41 Wayne State, Feb. 2013 41 C. Faloutsos (CMU)

Page 39: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 42

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs •  degree, diameter, eigen, •  triangles •  cliques

– Weighted graphs – Time evolving graphs

•  Problem#2: Tools Wayne State, Feb. 2013

Page 40: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 43

Observations on weighted graphs?

•  A: yes - even more ‘laws’!

M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008

Wayne State, Feb. 2013

Page 41: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 44

Observation W.1: Fortification Q: How do the weights of nodes relate to degree?

Wayne State, Feb. 2013

Page 42: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 45

Observation W.1: Fortification

More donors, more $ ?

$10

$5

Wayne State, Feb. 2013

‘Reagan’

‘Clinton’ $7

Page 43: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Edges (# donors)

In-weights ($)

C. Faloutsos (CMU) 46

Observation W.1: fortification: Snapshot Power Law

•  Weight: super-linear on in-degree •  exponent ‘iw’: 1.01 < iw < 1.26

Orgs-Candidates

e.g. John Kerry, $10M received, from 1K donors

More donors, even more $

$10

$5

Wayne State, Feb. 2013

Page 44: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 47

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs – Weighted graphs – Time evolving graphs

•  Problem#2: Tools •  …

Wayne State, Feb. 2013

Page 45: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 48

Problem: Time evolution •  with Jure Leskovec (CMU ->

Stanford)

•  and Jon Kleinberg (Cornell – sabb. @ CMU)

Wayne State, Feb. 2013

Page 46: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 49

T.1 Evolution of the Diameter •  Prior work on Power Law graphs hints

at slowly growing diameter: –  diameter ~ O(log N) –  diameter ~ O(log log N)

•  What is happening in real data?

Wayne State, Feb. 2013

Page 47: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 50

T.1 Evolution of the Diameter •  Prior work on Power Law graphs hints

at slowly growing diameter: –  diameter ~ O(log N) –  diameter ~ O(log log N)

•  What is happening in real data? •  Diameter shrinks over time

Wayne State, Feb. 2013

Page 48: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 51

T.1 Diameter – “Patents”

•  Patent citation network

•  25 years of data •  @1999

–  2.9 M nodes –  16.5 M edges

time [years]

diameter

Wayne State, Feb. 2013

Page 49: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 52

T.2 Temporal Evolution of the Graphs

•  N(t) … nodes at time t •  E(t) … edges at time t •  Suppose that

N(t+1) = 2 * N(t) •  Q: what is your guess for

E(t+1) =? 2 * E(t)

Wayne State, Feb. 2013

Page 50: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 53

T.2 Temporal Evolution of the Graphs

•  N(t) … nodes at time t •  E(t) … edges at time t •  Suppose that

N(t+1) = 2 * N(t) •  Q: what is your guess for

E(t+1) =? 2 * E(t)

•  A: over-doubled! – But obeying the ``Densification Power Law’’

Wayne State, Feb. 2013

Page 51: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 54

T.2 Densification – Patent Citations

•  Citations among patents granted

•  @1999 –  2.9 M nodes –  16.5 M edges

•  Each year is a datapoint

N(t)

E(t)

1.66

Wayne State, Feb. 2013

Page 52: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 55

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs

– Static graphs – Weighted graphs – Time evolving graphs

•  Problem#2: Tools •  …

Wayne State, Feb. 2013

Page 53: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 56

T.3 : popularity over time

Post popularity drops-off – exponentially?

lag: days after post

# in links

1 2 3

@t

@t + lag

Wayne State, Feb. 2013

Page 54: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 57

T.3 : popularity over time

Post popularity drops-off – exponentially? POWER LAW! Exponent?

# in links (log)

days after post (log)

Wayne State, Feb. 2013

Page 55: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 58

T.3 : popularity over time

Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 •  close to -1.5: Barabasi’s stack model •  and like the zero-crossings of a random walk

# in links (log) -1.6

days after post (log)

Wayne State, Feb. 2013

Page 56: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Wayne State, Feb. 2013 C. Faloutsos (CMU) 59

-1.5 slope J. G. Oliveira & A.-L. Barabási Human Dynamics: The

Correspondence Patterns of Darwin and Einstein. Nature 437, 1251 (2005) . [PDF]

Response time (log)

Prob(RT > x) (log)

Page 57: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 60

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

– Belief Propagation – Tensors – Spike analysis

•  Problem#3: Scalability •  Conclusions

Wayne State, Feb. 2013

Page 58: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Wayne State, Feb. 2013 C. Faloutsos (CMU) 61

E-bay Fraud detection

w/ Polo Chau & Shashank Pandit, CMU [www’07]

Page 59: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Wayne State, Feb. 2013 C. Faloutsos (CMU) 62

E-bay Fraud detection

Page 60: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Wayne State, Feb. 2013 C. Faloutsos (CMU) 63

E-bay Fraud detection

Page 61: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Wayne State, Feb. 2013 C. Faloutsos (CMU) 64

E-bay Fraud detection - NetProbe

Page 62: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Wayne State, Feb. 2013 C. Faloutsos (CMU) 65

E-bay Fraud detection - NetProbe

F A H F 99% A 99% H 49% 49%

Compatibility matrix

heterophily

Page 63: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 66

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

– Belief Propagation – Tensors – Spike analysis

•  Problem#3: Scalability •  Conclusions

Wayne State, Feb. 2013

Page 64: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

GigaTensor: Scaling Tensor Analysis Up By 100 Times –

Algorithms and Discoveries

U Kang

Christos Faloutsos

KDD’12

Evangelos Papalexakis

Abhay Harpale

Wayne State, Feb. 2013 67 C. Faloutsos (CMU)

Page 65: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Background: Tensor

•  Tensors (=multi-dimensional arrays) are everywhere – Hyperlinks &anchor text [Kolda+,05]

URL 1

URL 2

Anchor Text

Java

C++

C#

11

1

11

1 1

Wayne State, Feb. 2013 68 C. Faloutsos (CMU)

Page 66: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Background: Tensor

•  Tensors (=multi-dimensional arrays) are everywhere – Sensor stream (time, location, type) – Predicates (subject, verb, object) in knowledge base

“Barrack Obama is the president of

U.S.”

“Eric Clapton plays guitar”

(26M)

(26M)

(48M)

NELL (Never Ending Language Learner) data

Nonzeros =144M

Wayne State, Feb. 2013 69 C. Faloutsos (CMU)

Page 67: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Background: Tensor

•  Tensors (=multi-dimensional arrays) are everywhere – Sensor stream (time, location, type) – Predicates (subject, verb, object) in knowledge base

Wayne State, Feb. 2013 70 C. Faloutsos (CMU) IP-destination

IP-source

Time-stamp Anomaly Detection in Computer networks

Page 68: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Problem Definition

•  How to decompose a billion-scale tensor? – Corresponds to SVD in 2D case

Wayne State, Feb. 2013 72 C. Faloutsos (CMU)

Page 69: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS Problem Definition

  Q1: Dominant concepts/topics?   Q2: Find synonyms to a given noun phrase?   (and how to scale up: |data| > RAM)

(26M)

(26M)

(48M)

NELL (Never Ending Language Learner) data

Nonzeros =144M

Wayne State, Feb. 2013 73 C. Faloutsos (CMU)

Page 70: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Experiments

•  GigaTensor solves 100x larger problem

Number of nonzero = I / 50

(J)

(I)

(K)

GigaTensor

Out of Memory

100x

Wayne State, Feb. 2013 74 C. Faloutsos (CMU)

Page 71: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

A1: Concept Discovery

•  Concept Discovery in Knowledge Base

Wayne State, Feb. 2013 75 C. Faloutsos (CMU)

Page 72: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS A1: Concept Discovery

Wayne State, Feb. 2013 76 C. Faloutsos (CMU)

Page 73: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS A2: Synonym Discovery

Wayne State, Feb. 2013 77 C. Faloutsos (CMU)

Page 74: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 78

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools

– Belief propagation – Tensors – Spike analysis

•  Problem#3: Scalability -PEGASUS •  Conclusions

Wayne State, Feb. 2013

Page 75: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

•  Meme (# of mentions in blogs) –  short phrases Sourced from U.S. politics in 2008

79

20 40 60 80 100 120 140 1600

100

200

Time (hours)

# of

men

tions

20 40 60 80 100 120 140 1600

50

100

Time (hours)

# of

men

tions

“you can put lipstick on a pig”

“yes we can”

Rise and fall patterns in social media

C. Faloutsos (CMU) Wayne State, Feb. 2013

Page 76: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

0 50 1000

50

100

0 50 1000

50

100

0 50 1000

50

100

0 50 1000

50

100

0 50 1000

50

100

0 50 1000

50

100

Rise and fall patterns in social media

80

•  Can we find a unifying model, which includes these patterns?

•  four classes on YouTube [Crane et al. ’08] •  six classes on Meme [Yang et al. ’11]

C. Faloutsos (CMU) Wayne State, Feb. 2013

Page 77: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Rise and fall patterns in social media

81

•  Answer: YES!

•  We can represent all patterns by single model

20 40 60 80 100 1200

50

100

Time

Valu

e

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Valu

e

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Valu

e

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Valu

e

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Valu

e

OriginalSpikeM

20 40 60 80 100 1200

50

100

Time

Valu

e

OriginalSpikeM

C. Faloutsos (CMU) Wayne State, Feb. 2013 In Matsubara+ SIGKDD 2012

Page 78: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

82

Main idea - SpikeM -  1. Un-informed bloggers (uninformed about rumor)

-  2. External shock at time nb (e.g, breaking news)

-  3. Infection (word-of-mouth)

Time n=0 Time n=nb

β

C. Faloutsos (CMU) Wayne State, Feb. 2013

Infectiveness of a blog-post at age n:

- Strength of infection (quality of news) - Decay function

β

f (n)

Time n=nb+1

Page 79: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

83

-  1. Un-informed bloggers (uninformed about rumor)

-  2. External shock at time nb (e.g, breaking news)

-  3. Infection (word-of-mouth)

Time n=0 Time n=nb

β

C. Faloutsos (CMU) Wayne State, Feb. 2013

Infectiveness of a blog-post at age n:

- Strength of infection (quality of news) - Decay function

β

f (n)

Time n=nb+1

f (n) = β *n−1.5

Main idea - SpikeM

Page 80: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

SpikeM - with periodicity •  Full equation of SpikeM

84

ΔB(n+1) = p(n+1) ⋅ U(n) ⋅ (ΔB(t)+ S(t)) ⋅ f (n+1− t)+εt=nb

n

∑%

&''

(

)**

Periodicity

noon Peak 3am

Dip

Time n

Bloggers change their activity over time

(e.g., daily, weekly, yearly)

activity

p(n)

C. Faloutsos (CMU) Wayne State, Feb. 2013

Page 81: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Details •  Analysis – exponential rise and power-raw fall

85

Lin-log

Log-log

Rise-part

SI -> exponential SpikeM -> exponential

C. Faloutsos (CMU) Wayne State, Feb. 2013

Page 82: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Details •  Analysis – exponential rise and power-raw fall

86

Lin-log

Log-log

Fall-part

SI -> exponential SpikeM -> power law

C. Faloutsos (CMU) Wayne State, Feb. 2013

Page 83: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Tail-part forecasts

87

•  SpikeM can capture tail part

C. Faloutsos (CMU) Wayne State, Feb. 2013

Page 84: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

“What-if” forecasting

88

e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date

?

(1) First spike

(2) Release date

(3) Two weeks before release

C. Faloutsos (CMU) Wayne State, Feb. 2013

?

Page 85: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

“What-if” forecasting

89

SpikeM can forecast upcoming spikes

(1) First spike

(2) Release date

(3) Two weeks before release

C. Faloutsos (CMU) Wayne State, Feb. 2013

Page 86: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 90

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools •  Problem#3: Scalability –PEGASUS

– Diameter – Connected components

•  Conclusions

Wayne State, Feb. 2013

Page 87: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Wayne State, Feb. 2013 C. Faloutsos (CMU) 91

Scalability •  Google: > 450,000 processors in clusters of ~2000

processors each [Barroso, Dean, Hölzle, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003]

•  Yahoo: 5Pb of data [Fayyad, KDD’07] •  Problem: machine failures, on a daily basis •  How to parallelize data mining tasks, then? •  A: map/reduce – hadoop (open-source clone)

http://hadoop.apache.org/

Page 88: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 92

Centralized Hadoop/PEGASUS

Degree Distr. old old

Pagerank old old

Diameter/ANF old HERE

Conn. Comp old HERE

Triangles done HERE

Visualization started

Roadmap – Algorithms & results

Wayne State, Feb. 2013

Page 89: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

HADI for diameter estimation •  Radius Plots for Mining Tera-byte Scale

Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’10

•  Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1B)

•  Our HADI: linear on E (~10B) – Near-linear scalability wrt # machines – Several optimizations -> 5x faster

C. Faloutsos (CMU) 93 Wayne State, Feb. 2013

Page 90: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

????

19+ [Barabasi+]

94 C. Faloutsos (CMU)

Radius

Count

Wayne State, Feb. 2013

~1999, ~1M nodes

Page 91: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  Largest publicly available graph ever studied.

????

19+ [Barabasi+]

95 C. Faloutsos (CMU)

Radius

Count

Wayne State, Feb. 2013

??

~1999, ~1M nodes

Page 92: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  Largest publicly available graph ever studied.

????

19+? [Barabasi+]

96 C. Faloutsos (CMU)

Radius

Count

Wayne State, Feb. 2013

14 (dir.) ~7 (undir.)

Page 93: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) • 7 degrees of separation (!) • Diameter: shrunk

????

19+? [Barabasi+]

97 C. Faloutsos (CMU)

Radius

Count

Wayne State, Feb. 2013

14 (dir.) ~7 (undir.)

Page 94: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Q: Shape?

????

98 C. Faloutsos (CMU)

Radius

Count

Wayne State, Feb. 2013

~7 (undir.)

Page 95: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

99 C. Faloutsos (CMU)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality (?!)

Wayne State, Feb. 2013

Page 96: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Radius Plot of GCC of YahooWeb.

100 C. Faloutsos (CMU) Wayne State, Feb. 2013

Page 97: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

101 C. Faloutsos (CMU)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality: probably mixture of cores .

Wayne State, Feb. 2013

Page 98: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

102 C. Faloutsos (CMU)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality: probably mixture of cores .

Wayne State, Feb. 2013

EN

~7

Conjecture: DE

BR

Page 99: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

103 C. Faloutsos (CMU)

YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) •  effective diameter: surprisingly small. •  Multi-modality: probably mixture of cores .

Wayne State, Feb. 2013

~7

Conjecture:

Page 100: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 105

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools •  Problem#3: Scalability –PEGASUS

– Diameter – Connected components

•  Conclusions

Wayne State, Feb. 2013

Page 101: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS Generalized Iterated Matrix

Vector Multiplication (GIMV)

C. Faloutsos (CMU) 106

PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up).

Wayne State, Feb. 2013

Page 102: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS Generalized Iterated Matrix

Vector Multiplication (GIMV)

C. Faloutsos (CMU) 107

•  PageRank •  proximity (RWR) •  Diameter •  Connected components •  (eigenvectors, •  Belief Prop. •  … )

Matrix – vector Multiplication

(iterated)

Wayne State, Feb. 2013

details

Page 103: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

108

Example: GIM-V At Work •  Connected Components – 4 observations:

Size

Count

C. Faloutsos (CMU) Wayne State, Feb. 2013

Page 104: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

109

Example: GIM-V At Work •  Connected Components

Size

Count

C. Faloutsos (CMU) Wayne State, Feb. 2013

1) 10K x larger than next

Page 105: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

110

Example: GIM-V At Work •  Connected Components

Size

Count

C. Faloutsos (CMU) Wayne State, Feb. 2013

2) ~0.7B singleton nodes

Page 106: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

111

Example: GIM-V At Work •  Connected Components

Size

Count

C. Faloutsos (CMU) Wayne State, Feb. 2013

3) SLOPE!

Page 107: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

112

Example: GIM-V At Work •  Connected Components

Size

Count 300-size

cmpt X 500. Why? 1100-size cmpt

X 65. Why?

C. Faloutsos (CMU) Wayne State, Feb. 2013

4) Spikes!

Page 108: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

113

Example: GIM-V At Work •  Connected Components

Size

Count

suspicious financial-advice sites

(not existing now)

C. Faloutsos (CMU) Wayne State, Feb. 2013

Page 109: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

114

GIM-V At Work •  Connected Components over Time •  LinkedIn: 7.5M nodes and 58M edges

Stable tail slope after the gelling point

C. Faloutsos (CMU) Wayne State, Feb. 2013

Page 110: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 115

Roadmap

•  Introduction – Motivation •  Problem#1: Patterns in graphs •  Problem#2: Tools •  Problem#3: Scalability •  Conclusions

Wayne State, Feb. 2013

Page 111: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 116

OVERALL CONCLUSIONS – low level:

•  Several new patterns (fortification, triangle-laws, conn. components, etc)

•  New tools: –  belief propagation, gigaTensor, etc

•  Scalability: PEGASUS / hadoop

Wayne State, Feb. 2013

Page 112: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 117

OVERALL CONCLUSIONS – high level

•  BIG DATA: Large datasets reveal patterns/outliers that are invisible otherwise

Wayne State, Feb. 2013

Page 113: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

ML & Stats.

Comp. Systems

Theory & Algo.

Biology

Econ.

Social Science

Physics

118

Graph Analytics

Wayne State, Feb. 2013 C. Faloutsos (CMU)

Page 114: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 119

References •  Leman Akoglu, Christos Faloutsos: RTG: A Recursive

Realistic Graph Generator Using Random Typing. ECML/PKDD (1) 2009: 13-28

•  Deepayan Chakrabarti, Christos Faloutsos: Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38(1): (2006)

Wayne State, Feb. 2013

Page 115: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 120

References •  Deepayan Chakrabarti, Yang Wang, Chenxi Wang,

Jure Leskovec, Christos Faloutsos: Epidemic thresholds in real networks. ACM Trans. Inf. Syst. Secur. 10(4): (2008)

•  Deepayan Chakrabarti, Jure Leskovec, Christos Faloutsos, Samuel Madden, Carlos Guestrin, Michalis Faloutsos: Information Survival Threshold in Sensor and P2P Networks. INFOCOM 2007: 1316-1324

Wayne State, Feb. 2013

Page 116: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 121

References •  Christos Faloutsos, Tamara G. Kolda, Jimeng Sun:

Mining large graphs and streams using matrix and tensor tools. Tutorial, SIGMOD Conference 2007: 1174

Wayne State, Feb. 2013

Page 117: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 122

References •  T. G. Kolda and J. Sun. Scalable Tensor

Decompositions for Multi-aspect Data Mining. In: ICDM 2008, pp. 363-372, December 2008.

Wayne State, Feb. 2013

Page 118: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 123

References •  Jure Leskovec, Jon Kleinberg and Christos Faloutsos

Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award).

•  Jure Leskovec, Deepayan Chakrabarti, Jon M. Kleinberg, Christos Faloutsos: Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication. PKDD 2005: 133-145

Wayne State, Feb. 2013

Page 119: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

References •  Yasuko Matsubara, Yasushi Sakurai, B. Aditya

Prakash, Lei Li, Christos Faloutsos, "Rise and Fall Patterns of Information Diffusion: Model and Implications", KDD’12, pp. 6-14, Beijing, China, August 2012

Wayne State, Feb. 2013 C. Faloutsos (CMU) 124

Page 120: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 125

References •  Jimeng Sun, Yinglian Xie, Hui Zhang, Christos

Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr 2007.

•  Jimeng Sun, Spiros Papadimitriou, Philip S. Yu, and Christos Faloutsos, GraphScope: Parameter-free Mining of Large Time-evolving Graphs ACM SIGKDD Conference, San Jose, CA, August 2007

Wayne State, Feb. 2013

Page 121: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

References •  Jimeng Sun, Dacheng Tao, Christos

Faloutsos: Beyond streams and graphs: dynamic tensor analysis. KDD 2006: 374-383

Wayne State, Feb. 2013 C. Faloutsos (CMU) 126

Page 122: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 127

References •  Hanghang Tong, Christos Faloutsos, and

Jia-Yu Pan, Fast Random Walk with Restart and Its Applications, ICDM 2006, Hong Kong.

•  Hanghang Tong, Christos Faloutsos, Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PA

Wayne State, Feb. 2013

Page 123: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 128

References •  Hanghang Tong, Christos Faloutsos, Brian

Gallagher, Tina Eliassi-Rad: Fast best-effort pattern matching in large attributed graphs. KDD 2007: 737-746

Wayne State, Feb. 2013

Page 124: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 129

Project info & ‘thanks’

Wayne State, Feb. 2013

Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab

www.cs.cmu.edu/~pegasus

Page 125: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

C. Faloutsos (CMU) 130

Cast

Akoglu, Leman

Chau, Polo

Kang, U

McGlohon, Mary

Tong, Hanghang

Prakash, Aditya

Wayne State, Feb. 2013

Koutra, Danae

Beutel, Alex

Papalexakis, Vagelis

Page 126: Large Graph Mining – Patterns, Tools and Cascade analysischristos/TALKS/13-wayne/faloutsos_WayneState2013.pdfCMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos

CMU SCS

Take-home message

Wayne State, Feb. 2013 C. Faloutsos (CMU) 131

Tera/Peta-byte data

Analytics Insights, outliers

Big data reveal insights that would be invisible otherwise (even to experts)