cmu scs graph analytics wkshpc. faloutsos (cmu) 1 graph analytics workshop: tools christos faloutsos...

211
CMU SCS Graph Analytics wkshp C. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

Upload: moris-paul

Post on 29-Dec-2015

251 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 1

Graph Analytics Workshop:Tools

Christos FaloutsosCMU

Page 2: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 2

Welcome!

Tue We Thu

9:00-10:30 Tools Laplacians Parallelism

11:00-12:30 NELL Rich graphs Communities

1:30-3:00 Exercises Panel Scalability

3:30-5:00 Graph. models Posters Graph ‘Laws’

Reception

Page 3: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 3

Roadmap• Introduction – Motivation• Task 1: Node importance • Task 2: Community detection• Task 3: Mining graphs over time – Tensors• Task 4: Theory – intro to Laplacians• Conclusions

Page 4: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

C. Faloutsos (CMU) 4

Graphs - why should we care?

Internet Map [lumeta.com]

Food Web [Martinez ’91]

>$10B revenue

>0.5B users

Graph Analytics wkshp

Page 5: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 5

Graphs - why should we care?• IR: bi-partite graphs (doc-terms)

• ‘NELL’: ‘merkel’ ‘chancellor’ ‘germany’ - <S><V><O> facts -> tensors

• web: hyper-text graph• ... and more:

D1

DN

T1

TM

... ...

Page 6: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 6

Graphs - why should we care?• ‘viral’ marketing• web-log (‘blog’) news propagation• computer network security: email/IP traffic and anomaly

detection• ....• Any M:N relationship -> Graph• Any subject-verb-object construct: -> Graph/Tensor

Page 7: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 7

Graphs and matrices• Closely related• Powerful tools from matrix algebra, for

graph mining

Page 8: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 8

Examples of Matrices: Graph - social network

John Peter Mary Nick...

JohnPeterMaryNick

...

0 11 22 55 ...5 0 6 7 ...

... ... ... ... ...

... ... ... ... ...

... ... ... ... ...

Page 9: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 9

Examples of Matrices: Market basket

• market basket as in Association Rules

milk bread choc. wine ...JohnPeterMaryNick

...

Page 10: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 10

Examples of Matrices: Documents and terms

13 11 22 55 ...5 4 6 7 ...

... ... ... ... ...

... ... ... ... ...

... ... ... ... ...

Paper#1

Paper#2

Paper#3Paper#4

data mining classif. tree ...

...

Page 11: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 11

Examples of Matrices:Authors and terms

13 11 22 55 ...5 4 6 7 ...

... ... ... ... ...

... ... ... ... ...

... ... ... ... ...

data mining classif. tree ...JohnPeterMaryNick

...

Page 12: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 12

Roadmap• Introduction – Motivation• Task 1: Node importance • Task 2: Community detection• Task 3: Mining graphs over time – Tensors• Task 4: Theory – intro to Laplacians• Conclusions

Page 13: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 13

Node importance - Motivation:

• Given a graph (eg., web pages containing the desirable query word)

• Q: Which node is the most important?

Page 14: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 14

Node importance - Motivation:

• Given a graph (eg., web pages containing the desirable query word)

• Q: Which node is the most important?• A1: HITS (SVD = Singular Value

Decomposition)• A2: eigenvector (PageRank) ‘I am important,

if my friends are important’ ->Fixed point / eigenvector

Page 15: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 15

Node importance - motivation

• SVD and eigenvector analysis: very closely related

Page 16: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 16

Roadmap• Introduction – Motivation• Task 1: Node importance • Task 2: Community detection• Task 3: Mining graphs over time – Tensors• Task 4: Theory – intro to Laplacians• Conclusions

Page 17: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 17

Task 1 - SVD - Detailed outline

• Motivation• Definition - properties• Interpretation• Complexity• Case Studies

– HITS– PageRank

Page 18: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 18

SVD - Motivation

• problem #1: text - LSI: find ‘concepts’• problem #2: compression / dim. reduction

Page 19: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 19

SVD - Motivation

• problem #1: text - LSI: find ‘concepts’

Page 20: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 20

SVD - Motivation

• Customer-product, for recommendation system:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

bread

lettu

cebe

ef

vegetarians

meat eaters

tom

atos

chick

en

Page 21: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 21

SVD - Motivation

• problem #2: compress / reduce dimensionality

Page 22: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 22

Problem - specs

• Visualize customers

Page 23: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 23

SVD - Motivation

Page 24: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 24

SVD - Motivation

Page 25: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 25

Task 1 - SVD - Detailed outline

• Motivation• Definition - properties• Interpretation• Complexity• Case Studies

– HITS– PageRank

Page 26: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 26

SVD - Definition

• A = U L VT - example:

Page 27: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 27

SVD - Definition

A[n x m] = U[n x r] L [ r x r] (V[m x r])T

• A: n x m matrix (eg., n documents, m terms)

• U: n x r matrix (n documents, r concepts)• L: r x r diagonal matrix (strength of each

‘concept’) (r : rank of the matrix)• V: m x r matrix (m terms, r concepts)

Page 28: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 28

SVD - Properties

THEOREM [Press+92]: always possible to decompose matrix A into A = U L VT , where

• U, ,L V: unique (*)• U, V: column orthonormal (ie., columns are unit

vectors, orthogonal to each other)– UT U = I; VT V = I (I: identity matrix)

• L: singular are positive, and sorted in decreasing order

Page 29: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 29

SVD - Example

• A = U L VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 30: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 30

SVD - Example

• A = U L VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

CS-conceptMD-concept

Page 31: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 31

SVD - Example

• A = U L VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

CS-conceptMD-concept

doc-to-concept similarity matrix

Page 32: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 32

SVD - Example

• A = U L VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

‘strength’ of CS-concept

Page 33: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 33

SVD - Example

• A = U L VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

term-to-conceptsimilarity matrix

CS-concept

Page 34: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 34

SVD - Example

• A = U L VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

datainf.

retrieval

brain lung

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=CS

MD

9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

term-to-conceptsimilarity matrix

CS-concept

Page 35: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 35

Task 1 - SVD - Detailed outline

• Motivation• Definition - properties• Interpretation• Complexity• Case studies• Additional properties

Page 36: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 36

SVD - Interpretation #1

‘documents’, ‘terms’ and ‘concepts’:• U: document-to-concept similarity matrix• V: term-to-concept sim. matrix• L: its diagonal elements: ‘strength’ of each

concept

Page 37: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 37

SVD – Interpretation #1

‘documents’, ‘terms’ and ‘concepts’:Q: if A is the document-to-term matrix, what

is AT A?A:Q: A AT ?A:

Page 38: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 38Copyright: Faloutsos, Tong (2009) 2-38

SVD – Interpretation #1

‘documents’, ‘terms’ and ‘concepts’:Q: if A is the document-to-term matrix, what

is AT A?A: term-to-term ([m x m]) similarity matrixQ: A AT ?A: document-to-document ([n x n]) similarity

matrix

ICDE’09

Page 39: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 39Copyright: Faloutsos, Tong (2009) 2-39

SVD properties

• V are the eigenvectors of the covariance matrix ATA

• U are the eigenvectors of the Gram (inner-product) matrix AAT

Further reading:1. Ian T. Jolliffe, Principal Component Analysis (2nd ed), Springer, 2002.2. Gilbert Strang, Linear Algebra and Its Applications (4th ed), Brooks Cole, 2005.

Page 40: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 40

SVD - Interpretation #2

• best axis to project on: (‘best’ = min sum of squares of projection errors)

Page 41: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 41

SVD - Motivation

Page 42: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 42

SVD - interpretation #2

• minimum RMS error

SVD: givesbest axis to project

v1

first singular

vector

Page 43: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 43

SVD - Interpretation #2

Page 44: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 44

SVD - Interpretation #2

• A = U L VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

v1

Page 45: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 45

SVD - Interpretation #2

• A = U L VT - example:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

variance (‘spread’) on the v1 axis

Page 46: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 46

SVD - Interpretation #2

• A = U L VT - example:– U L gives the coordinates of the points in the

projection axis

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 47: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 47

SVD - Interpretation #2

• More details• Q: how exactly is dim. reduction done?

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 48: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 48

SVD - Interpretation #2

• More details• Q: how exactly is dim. reduction done?• A: set the smallest singular values to zero:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 49: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 49

SVD - Interpretation #2

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

~9.64 0

0 0x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 50: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 50

SVD - Interpretation #2

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

~9.64 0

0 0x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 51: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 51

SVD - Interpretation #2

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18

0.36

0.18

0.90

0

00

~9.64

x

0.58 0.58 0.58 0 0

x

Page 52: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 52

SVD - Interpretation #2

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

~

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 0 0

0 0 0 0 00 0 0 0 0

Page 53: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 53

SVD - Interpretation #2

Exactly equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 54: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 54

SVD - Interpretation #2

Exactly equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= x xu1 u2

l1l2

v1

v2

Page 55: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 55

SVD - Interpretation #2

Exactly equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u1l1 vT1 u2l2 vT

2+ +...n

m

Page 56: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 56

SVD - Interpretation #2

Exactly equivalent:‘spectral decomposition’ of the matrix:

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u1l1 vT1 u2l2 vT

2+ +...n

m

n x 1 1 x m

r terms

Page 57: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 57

SVD - Interpretation #2

approximation / dim. reduction:by keeping the first few terms (Q: how many?)

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u1l1 vT1 u2l2 vT

2+ +...n

m

assume: l1 >= l2 >= ...

Page 58: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 58

SVD - Interpretation #2

A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of li ’s)

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

= u1l1 vT1 u2l2 vT

2+ +...n

m

assume: l1 >= l2 >= ...

Page 59: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 59

Pictorially: matrix form of SVD

– Best rank-k approximation in L2

Am

n

m

n

U

VT

Page 60: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 60

Pictorially: Spectral form of SVD

– Best rank-k approximation in L2

Am

n

+

1u1v1 2u2v2

Page 61: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 61

Task 1 - SVD - Detailed outline

• Motivation• Definition - properties• Interpretation

– #1: documents/terms/concepts– #2: dim. reduction– #3: picking non-zero, rectangular ‘blobs’

• Complexity• Case studies• Additional properties

Page 62: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 62

SVD - Interpretation #3

• finds non-zero ‘blobs’ in a data matrix

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 63: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 63

SVD - Interpretation #3

• finds non-zero ‘blobs’ in a data matrix

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

0.18 0

0.36 0

0.18 0

0.90 0

0 0.53

0 0.800 0.27

=9.64 0

0 5.29x

0.58 0.58 0.58 0 0

0 0 0 0.71 0.71

x

Page 64: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 64

SVD - Interpretation #3

• finds non-zero ‘blobs’ in a data matrix =• ‘communities’ (bi-partite cores, here)

1 1 1 0 0

2 2 2 0 0

1 1 1 0 0

5 5 5 0 0

0 0 0 2 2

0 0 0 3 30 0 0 1 1

Row 1

Row 4

Col 1

Col 3

Col 4Row 5

Row 7

Page 65: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 65

Task 1 - SVD - Detailed outline

• Motivation• Definition - properties• Interpretation• Complexity• Case Studies

– HITS– PageRank

Page 66: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 66

SVD - Complexity

• O( n * m * m) or O( n * n * m) (whichever is less)

• less work, if we just want singular values• or if we want first k singular vectors• or if the matrix is sparse [Berry]• Implemented: in any linear algebra package

(LINPACK, matlab, Splus, mathematica ...)

Page 67: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 67

SVD - conclusions so far

• SVD: A= U L VT : unique (*)• U: document-to-concept similarities• V: term-to-concept similarities• L: strength of each concept• dim. reduction: keep the first few strongest

singular values (80-90% of ‘energy’)– SVD: picks up linear correlations

• SVD: picks up non-zero ‘blobs’

Page 68: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 68

Task 1 - SVD - Detailed outline

• Motivation• Definition - properties• Interpretation• Complexity• Case Studies

– HITS– PageRank

Page 69: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 69

Kleinberg’s algo (HITS)

Kleinberg, Jon (1998). Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms.

Page 70: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 70

Recall: problem dfn

• Given a graph (eg., web pages containing the desirable query word)

• Q: Which node is the most important?

Page 71: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 71

Kleinberg’s algorithm• Problem dfn: given the web and a query• find the most ‘authoritative’ web pages for

this query

Step 0: find all pages containing the query terms

Step 1: expand by one move forward and backward

Page 72: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 72

Kleinberg’s algorithm• Step 1: expand by one move forward and

backward

Page 73: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 73

Kleinberg’s algorithm• on the resulting graph, give high score (=

‘authorities’) to nodes that many important nodes point to

• give high importance score (‘hubs’) to nodes that point to good ‘authorities’)

hubs authorities

Page 74: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 74

Kleinberg’s algorithm

observations• recursive definition!• each node (say, ‘i’-th node) has both an

authoritativeness score ai and a hubness score hi

Page 75: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 75

Kleinberg’s algorithm

Let E be the set of edges and A be the adjacency matrix: the (i,j) is 1 if the edge from i to j exists

Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scores.

Then:

Page 76: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 76

Kleinberg’s algorithm

Then:

ai = hk + hl + hm

that is

ai = Sum (hj) over all j that (j,i) edge exists

or

a = AT h

k

l

m

i

Page 77: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 77

Kleinberg’s algorithm

symmetrically, for the ‘hubness’:

hi = an + ap + aq

that is

hi = Sum (qj) over all j that (i,j) edge exists

or

h = A a

p

n

q

i

Page 78: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 78

Kleinberg’s algorithm

In conclusion, we want vectors h and a such that:

h = A a

a = AT hSVD properties:

A [n x m] v1 [m x 1] = l1 u1 [n x 1]

u1T A = l1 v1

T

=

Page 79: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 79

Kleinberg’s algorithmIn short, the solutions to

h = A a

a = AT h

are the left- and right- singular-vectors of the adjacency matrix A.

Starting from random a’ and iterating, we’ll eventually converge

(Q: to which of all the singular-vectors? why?)

Page 80: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 80

Kleinberg’s algorithm

(Q: to which of all the singular-vectors? why?)

A: to the ones of the strongest singular-value:(AT

A ) k v’ ~ (constant) v1

Page 81: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 81

Kleinberg’s algorithm - results

Eg., for the query ‘java’:

0.328 www.gamelan.com

0.251 java.sun.com

0.190 www.digitalfocus.com (“the java developer”)

Page 82: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 82

Kleinberg’s algorithm - discussion• ‘authority’ score can be used to find ‘similar

pages’ (how?)

Page 83: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 83

Task 1 - SVD - Detailed outline

• Motivation• Definition - properties• Interpretation• Complexity• Case Studies

– HITS– PageRank

Page 84: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 84

PageRank (google)

•Brin, Sergey and Lawrence Page (1998). Anatomy of a Large-Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

LarryPage

SergeyBrin

Page 85: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 85

Problem: PageRank

Given a directed graph, find its most interesting/central node

A node is important,if it is connected with important nodes(recursive, but OK!)

Page 86: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 86

Problem: PageRank - solution

Given a directed graph, find its most interesting/central node

Proposed solution: Random walk; spot most ‘popular’ node (-> steady state prob. (ssp))

A node has high ssp,if it is connected with high ssp nodes(recursive, but OK!)

Page 87: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 87

(Simplified) PageRank algorithm

• Let A be the adjacency matrix;• let B be the transition matrix: transpose, column-normalized - then

1 2 3

45

p1

p2

p3

p4

p5

p1

p2

p3

p4

p5

=

To From

B1

1 1

1/2 1/2

1/2

1/2

Page 88: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 88

(Simplified) PageRank algorithm• B p = p

p1

p2

p3

p4

p5

p1

p2

p3

p4

p5

=

B p = p

1

1 1

1/2 1/2

1/2

1/2

1 2 3

45

Page 89: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 89

Definitions

A Adjacency matrix (from-to)

D Degree matrix = (diag ( d1, d2, …, dn) )

B Transition matrix: to-from, column normalized

B = AT D-1

Page 90: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 90

(Simplified) PageRank algorithm• B p = 1 * p• thus, p is the eigenvector that corresponds

to the highest eigenvalue (=1, since the matrix is

column-normalized)• Why does such a p exist?

– p exists if B is nxn, nonnegative, irreducible [Perron–Frobenius theorem]

Page 91: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 91

(Simplified) PageRank algorithm• In short: imagine a particle randomly

moving along the edges• compute its steady-state probabilities (ssp)

Full version of algo: with occasional random jumps

Why? To make the matrix irreducible

Page 92: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 92

Full Algorithm• With probability 1-c, fly-out to a random

node• Then, we have

p = c B p + (1-c)/n 1 =>

p = (1-c)/n [I - c B] -1 1

Page 93: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 93

Alternative notation

M Modified transition matrix

M = c B + (1-c)/n 1 1T

Then

p = M p

That is: the steady state probabilities =

PageRank scores form the first eigenvector of the ‘modified transition matrix’

Page 94: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 94

Parenthesis: intuition behind eigenvectors

Page 95: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 95

Formal definition

If A is a (n x n) square matrix(l , x) is an eigenvalue/eigenvector pair of A if A x = l x

CLOSELY related to singular values:

Page 96: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 96

Property #1: Eigen- vs singular-values

if

B[n x m] = U[n x r] L [ r x r] (V[m x r])T

then A = (BTB) is symmetric and

C(4): BT B vi = li2 vi

ie, v1 , v2 , ...: eigenvectors of A = (BTB)

Page 97: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 97

Property #2

• If A[nxn] is a real, symmetric matrix

• Then it has n real eigenvalues

(if A is not symmetric, some eigenvalues may be complex)

Page 98: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 98

Property #3

• If A[nxn] is a real, symmetric matrix

• Then it has n real eigenvalues• And they agree with its n singular values,

except possibly for the sign

Page 99: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 99

Intuition

• A as vector transformation

2 11 3

A

10

x

21

x’

= x

x’

2

1

1

3

Page 100: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 100

Intuition

• By defn., eigenvectors remain parallel to themselves (‘fixed points’)

2 11 3

A0.52

0.85

v1v1

=

0.52

0.853.62 *

l1

Page 101: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 101

Convergence

• Usually, fast:

Page 102: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 102

Convergence

• Usually, fast:

Page 103: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 103

Convergence

• Usually, fast:• depends on ratio

l1 : l2l1

l2

Page 104: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 104

Kleinberg/google - conclusions

SVD helps in graph analysis:

hub/authority scores: strongest left- and right- singular-vectors of the adjacency matrix

random walk on a graph: steady state probabilities are given by the strongest eigenvector of the (modified) transition matrix

Page 105: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 105

Conclusions• SVD: a valuable tool• given a document-term matrix, it finds

‘concepts’ (LSI)• ... and can find fixed-points or steady-state

probabilities (google/ Kleinberg/ Markov Chains)

Page 106: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 106

Conclusions cont’d

(We didn’t discuss/elaborate, but, SVD• ... can reduce dimensionality (KL)• ... and can find rules (PCA; RatioRules)• ... and can solve optimally over- and under-

constraint linear systems (least squares / query feedbacks)

Page 107: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 107

References

• Berry, Michael: http://www.cs.utk.edu/~lsi/• Brin, S. and L. Page (1998). Anatomy of a

Large-Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

Page 108: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 108

References

• Christos Faloutsos, Searching Multimedia Databases by Content, Springer, 1996. (App. D)

• Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, Academic Press.

• I.T. Jolliffe Principal Component Analysis Springer, 2002 (2nd ed.)

Page 109: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 109

References cont’d• Kleinberg, J. (1998). Authoritative sources

in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms.

• Press, W. H., S. A. Teukolsky, et al. (1992). Numerical Recipes in C, Cambridge University Press. www.nr.com

Page 110: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 110

PART 2: Communities

Page 111: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 111

Roadmap• Introduction – Motivation• Task 1: Node importance • Task 2: Community detection• Task 3: Mining graphs over time – Tensors• Task 4: Theory – intro to Laplacians• Conclusions

Page 112: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 112

Task 2 – Communities - Detailed outline

• Motivation• Hard clustering – k pieces• Hard clustering – optimal # pieces• Observations

Page 113: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 113

Problem

• Given a graph, and k• Break it into k (disjoint) communities

Page 114: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 114

Problem

• Given a graph, and k• Break it into k (disjoint) communities

k = 2

Page 115: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 115

Solution #1: METIS

• Arguably, the best algorithm• Open source, at

– http://www.cs.umn.edu/~metis

• and *many* related papers, at same url• Main idea:

– coarsen the graph; – partition; – un-coarsen

Page 116: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 116

Solution #1: METIS

• G. Karypis and V. Kumar. METIS 4.0: Unstructured graph partitioning and sparse matrix ordering system. TR, Dept. of CS, Univ. of Minnesota, 1998.

• <and many extensions>

Page 117: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 117

Solution #2

(problem: hard clustering, k pieces)

Spectral partitioning:• Consider the 2nd smallest eigenvector of the

(normalized) Laplacian

See details in ‘Task 7’, later

Page 118: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 118

Solutions #3, …

Many more ideas:• Clustering on the A2 (square of adjacency

matrix) [Zhou, Woodruff, PODS’04]• Minimum cut / maximum flow [Flake+,

KDD’00]• …

Page 119: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 119

Task 2 – Communities - Detailed outline

• Motivation• Hard clustering – k pieces• Hard clustering – optimal # pieces• Observations

Page 120: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 120

Cross-association

Desiderata:

Simultaneously discover row and column groups

Fully Automatic: No “magic numbers”

Scalable to large matrices

Reference:1. Chakrabarti et al. Fully Automatic Cross-Associations, KDD’04

Page 121: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 121

What makes a cross-association “good”?

versus

Column groups

Column groups

Row

gro

ups

Row

gro

ups

Why is this better?

Page 122: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 122

What makes a cross-association “good”?

versus

Column groups

Column groups

Row

gro

ups

Row

gro

ups

Why is this better?

simpler; easier to describeeasier to compress!

Page 123: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 123

What makes a cross-association “good”?

Problem definition: given an encoding scheme• decide on the # of col. and row groups k and l• and reorder rows and columns,• to achieve best compression

Page 124: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 124

Main Idea

sizei * H(xi) +Cost of describing cross-associations

Code Cost Description Cost

Σi Total Encoding Cost =

Good Compression

Better Clustering

Minimize the total cost (# bits)

for lossless compression

Page 125: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 125

Algorithmk =

5 row groups

k=1, l=2

k=2, l=2

k=2, l=3

k=3, l=3

k=3, l=4

k=4, l=4

k=4, l=5

l = 5 col groups

Page 126: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 126

Experiments

“CLASSIC”

• 3,893 documents

• 4,303 words

• 176,347 “dots”

Combination of 3 sources:

• MEDLINE (medical)

• CISI (info. retrieval)

• CRANFIELD (aerodynamics)

Doc

umen

ts

Words

Page 127: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 127

Experiments

“CLASSIC” graph of documents & words: k=15, l=19

Doc

umen

ts

Words

Page 128: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 128

Experiments

“CLASSIC” graph of documents & words: k=15, l=19

MEDLINE(medical)

insipidus, alveolar, aortic, death, prognosis, intravenous

blood, disease, clinical, cell, tissue, patient

Page 129: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 129

Experiments

“CLASSIC” graph of documents & words: k=15, l=19

CISI(Information Retrieval)

providing, studying, records, development, students, rules

abstract, notation, works, construct, bibliographies

MEDLINE(medical)

Page 130: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 130

Experiments

“CLASSIC” graph of documents & words: k=15, l=19

CRANFIELD (aerodynamics)

shape, nasa, leading, assumed, thin

CISI(Information Retrieval)

MEDLINE(medical)

Page 131: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 131

Experiments

“CLASSIC” graph of documents & words: k=15, l=19

paint, examination, fall, raise, leave, based

CRANFIELD (aerodynamics)

CISI(Information Retrieval)

MEDLINE(medical)

Page 132: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 132

AlgorithmCode for cross-associations (matlab):

www.cs.cmu.edu/~deepay/mywww/software/CrossAssociations-01-27-2005.tgz

Variations and extensions:• ‘Autopart’ [Chakrabarti, PKDD’04]• www.cs.cmu.edu/~deepay

Page 133: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 133

Algorithm• Hadoop implementation [ICDM’08]

Spiros Papadimitriou, Jimeng Sun: DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining. ICDM

2008: 512-521

Page 134: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 134

Task 2 – Communities - Detailed outline

• Motivation• Hard clustering – k pieces• Hard clustering – optimal # pieces• Observations

Page 135: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 135

Observation #1

• Skewed degree distributions – there are nodes with huge degree (>O(10^4), in facebook/linkedIn popularity contests!)

Page 136: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 136

Observation #2

• Maybe there are no good cuts: ``jellyfish’’ shape [Tauro+’01], [Siganos+,’06], strange behavior of cuts [Chakrabarti+’04], [Leskovec+,’08]

Page 137: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 137

Observation #2

• Maybe there are no good cuts: ``jellyfish’’ shape [Tauro+’01], [Siganos+,’06], strange behavior of cuts [Chakrabarti+,’04], [Leskovec+,’08]

? ?

Page 138: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 138

Jellyfish model [Tauro+]

A Simple Conceptual Model for the Internet Topology, L. Tauro, C. Palmer, G. Siganos, M. Faloutsos, Global Internet, November 25-29, 2001

Jellyfish: A Conceptual Model for the AS Internet Topology G. Siganos, Sudhir L Tauro, M. Faloutsos, J. of Communications and Networks, Vol. 8, No. 3, pp 339-350, Sept. 2006.

Page 139: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 139

Strange behavior of min cuts

• ‘negative dimensionality’ (!)

NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004 Workshop on Link Analysis, Counter-terrorism and Privacy

Statistical Properties of Community Structure in Large Social and Information Networks, J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. WWW 2008.

Page 140: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 140

“Min-cut” plot• Do min-cuts recursively.

log (# edges)

log (mincut-size / #edges)

N nodes

Mincut size = sqrt(N)

Page 141: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 141

“Min-cut” plot• Do min-cuts recursively.

log (# edges)

log (mincut-size / #edges)

N nodes

New min-cut

Page 142: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 142

“Min-cut” plot• Do min-cuts recursively.

log (# edges)

log (mincut-size / #edges)

N nodes

New min-cut

Slope = -0.5

For a d-dimensional grid, the slope is -1/d

Page 143: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 143

“Min-cut” plot

log (# edges)

log (mincut-size / #edges)

Slope = -1/d

For a d-dimensional grid, the slope is -1/d

log (# edges)

log (mincut-size / #edges)

For a random graph, the slope is 0

Page 144: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 144

“Min-cut” plot• What does it look like for a real-world

graph?

log (# edges)

log (mincut-size / #edges)

?

Page 145: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 145

Experiments• Datasets:– Google Web Graph: 916,428 nodes and

5,105,039 edges– Lucent Router Graph: Undirected graph of

network routers from www.isi.edu/scan/mercator/maps.html; 112,969 nodes and 181,639 edges

– User Website Clickstream Graph: 222,704 nodes and 952,580 edges

NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004 Workshop on Link Analysis, Counter-terrorism and Privacy

Page 146: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 146

Experiments• Used the METIS algorithm [Karypis, Kumar,

1995]

log (# edges)

log

(min

cut-

size

/ #

edge

s)

• Google Web graph

• Values along the y-axis are averaged

• “lip” for large edges

• Slope of -0.4, corresponds to a 2.5-dimensional grid!

Slope~ -0.4

Page 147: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 147

Experiments• Same results for other graphs too…

log (# edges) log (# edges)

log

(min

cut-

size

/ #

edge

s)

log

(min

cut-

size

/ #

edge

s)

Lucent Router graph Clickstream graph

Slope~ -0.57 Slope~ -0.45

Page 148: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 148

Task 2 – Communities Conclusions – Practitioner’s guide

• Hard clustering – k pieces• Hard clustering – optimal # pieces• Observations

METIS

Cross-associations

‘jellyfish’: no good cuts

Page 149: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 149

PART 3: Tensors

Page 150: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 150

Roadmap• Introduction – Motivation• Task 1: Node importance • Task 2: Community detection• Task 3: Mining graphs over time – Tensors• Task 4: Theory – intro to Laplacians• Conclusions

Page 151: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 151

Task 3 – Tensors - Detailed roadmap

• Motivation• Definitions: PARAFAC• Case study: web mining

Page 152: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 152

Examples of Matrices:Authors and terms

13 11 22 55 ...5 4 6 7 ...

... ... ... ... ...

... ... ... ... ...

... ... ... ... ...

data mining classif. tree ...JohnPeterMaryNick

...

Page 153: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 153

But: if it changes over time??

• A: treat it as ‘tensor’

13 11 22 55 ...5 4 6 7 ...

... ... ... ... ...

... ... ... ... ...

... ... ... ... ...

data mining classif. tree ...JohnPeterMaryNick

...

KDD’08

KDD’07

KDD’09

Page 154: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 154

Motivation: Why tensors?

• Q: what is a tensor?

Page 155: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 155

Motivation: Why tensors?

• A: N-D generalization of matrix:

13 11 22 55 ...5 4 6 7 ...

... ... ... ... ...

... ... ... ... ...

... ... ... ... ...

data mining classif. tree ...JohnPeterMaryNick

...

KDD’09

Page 156: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 156

Motivation: Why tensors?

• A: N-D generalization of matrix:

13 11 22 55 ...5 4 6 7 ...

... ... ... ... ...

... ... ... ... ...

... ... ... ... ...

data mining classif. tree ...JohnPeterMaryNick

...

KDD’08

KDD’07

KDD’09

Page 157: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 157

Tensors are useful for 3 or more modes

Terminology: ‘mode’ (or ‘aspect’):

13 11 22 55 ...5 4 6 7 ...

... ... ... ... ...

... ... ... ... ...

... ... ... ... ...

data mining classif. tree ...

Mode (== aspect) #1

Mode#2

Mode#3

Page 158: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 158

Notice

• 3rd mode does not need to be time• we can have more than 3 modes

13 11 22 55 ...5 4 6 7 ...

... ... ... ... ...

... ... ... ... ...

... ... ... ... ...

...

IP destination

Dest. port

IP source

80

125

Page 159: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Background: Tensors

• Tensors (=multi-dimensional arrays) are everywhere– Sensor stream (time, location, type)– Predicates (subject, verb, object) in knowledge base

“Barrack Obama is the president of U.S.”

“Eric Clapton playsguitar”

(26M)

(26M)

(48M)

NELL (Never Ending Language Learner) data

Nonzeros =144M

Graph Analytics wkshp 159C. Faloutsos (CMU)

Page 160: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 160

Task 3 – Tensors - Detailed roadmap

• Motivation• Definitions: PARAFAC• Case study: web mining

Page 161: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 161

Tensor basics

• Multi-mode extensions of SVD – recall that:

Page 162: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 162

Reminder: SVD

– Best rank-k approximation in L2

Am

n

m

n

U

VT

Page 163: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 163

Reminder: SVD

– Best rank-k approximation in L2

Am

n

+

1u1v1 2u2v2

Page 164: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Extension to (>=)3 modes

Graph Analytics wkshp 164C. Faloutsos (CMU)

Page 165: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 165

Main points:

• 2 major types of tensor decompositions: PARAFAC and Tucker (not examined here)

• both can be solved with ``alternating least squares’’ (ALS)

Page 166: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 166

Task 3 – Tensors - Detailed outline

• Motivation• Definitions: PARAFAC• Case study: web mining

Page 167: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Discoveries: Problem Definition

Most important concepts and synonyms?

(26M)

(26M)

(48M)

NELL (Never Ending Language Learner) data

Nonzeros =144M

Graph Analytics wkshp 167C. Faloutsos (CMU)

Page 168: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

A1: Concept Discovery

• Concept Discovery in Knowledge Base

Graph Analytics wkshp 168C. Faloutsos (CMU)

Page 169: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

A2.1: Concept Discovery

Graph Analytics wkshp 169C. Faloutsos (CMU)

Page 170: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

A2: Synonym Discovery

• Synonym Discovery in Knowledge Base

a1a2 aR…

(Given) noun phrase

(Discovered) synonym 1

(Discovered) synonym 2

Graph Analytics wkshp 170C. Faloutsos (CMU)

Page 171: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

171C. Faloutsos (CMU)

A2: Synonym Discovery

Graph Analytics wkshp

Page 172: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

GigaTensor: Scaling Tensor Analysis Up By 100 Times –

Algorithms and Discoveries

U Kang

ChristosFaloutsos

KDD 2012

EvangelosPapalexakis

AbhayHarpale

Graph Analytics wkshp 172C. Faloutsos (CMU)

Page 173: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Experiments

• GigaTensor solves 100x larger problem

Number of nonzero= I / 50

(J)

(I)

(K)

GigaTensor

Tensor

Toolbox Out ofMemory

100x

Graph Analytics wkshp 173C. Faloutsos (CMU)

Page 174: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 174

Conclusions

• Real data may have multiple aspects (modes)

• Tensors provide elegant theory and algorithms– PARAFAC (and Tucker): discover groups

• GigaTensor: scales up (hadoop/PEGASUS)– www.cs.cmu.edu/~pegasus

Page 175: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 175

References

• T. G. Kolda, B. W. Bader and J. P. Kenny. Higher-Order Web Link Analysis Using Multilinear Algebra. In: ICDM 2005, Pages 242-249, November 2005.

• Jimeng Sun, Spiros Papadimitriou, Philip Yu. Window-based Tensor Analysis on High-dimensional and Multi-aspect Streams, Proc. of the Int. Conf. on Data Mining (ICDM), Hong Kong, China, Dec 2006

Page 176: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 176

Resources

• See tutorial on tensors, KDD’07 (w/ Tamara Kolda and Jimeng Sun):

www.cs.cmu.edu/~christos/TALKS/KDD-07-tutorial

Page 177: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 177

Tensor tools - resources

• Toolbox: from Tamara Kolda:csmr.ca.sandia.gov/~tgkolda/TensorToolbox

2-177Copyright: Faloutsos, Tong (2009) 2-177ICDE’09

• T. G. Kolda and B. W. Bader. Tensor Decompositions and Applications. SIAM Review, Volume 51, Number 3, September 2009

csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/TensorReview-preprint.pdf

• T. Kolda and J. Sun: Scalable Tensor Decomposition for Multi-Aspect Data Mining (ICDM 2008)

Page 178: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 178

PART 4: Theory

Page 179: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 179

Roadmap• Introduction – Motivation• Task 1: Node importance • Task 2: Community detection• Task 3: Mining graphs over time – Tensors• Task 4: Theory – intro to Laplacians• Conclusions

Page 180: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Task 4 – Theory - Detailed roadmap• Adjacency matrix• Laplacian

– Connected Components– Intuition: 2nd smallest eigenvalue -> ‘good cut’

180Graph Analytics wkshp 180C. Faloutsos (CMU)

Page 181: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Adjacency matrix

Graph Analytics wkshp C. Faloutsos (CMU) 181

A=1

2 3

4

Page 182: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Adjacency matrix

Graph Analytics wkshp C. Faloutsos (CMU) 182

A=1

2 3

4

1-step-awaypaths

Page 183: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Adjacency matrix

Graph Analytics wkshp C. Faloutsos (CMU) 183

1

2 3

4 Obvious extensions,for directed and/or weighted cases

Page 184: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Task 4 – Theory - Detailed roadmap• Adjacency matrix• Laplacian

– Connected Components– Intuition: 2nd smallest eigenvalue -> ‘good cut’

184Graph Analytics wkshp 184C. Faloutsos (CMU)

Page 185: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Main upcoming result

the second smallest eigenvector of the Laplacian (u2)

gives a good cut:Nodes with positive scores should go to one

group

And the rest to the other

Graph Analytics wkshp 185C. Faloutsos (CMU)

Page 186: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Laplacian

Graph Analytics wkshp C. Faloutsos (CMU) 186

L= D-A=1

2 3

4

Diagonal matrix, dii=di

Page 187: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Task 4 – Theory - Detailed roadmap• Adjacency matrix• Laplacian

– Connected Components– Intuition: 2nd smallest eigenvalue -> ‘good cut’

187Graph Analytics wkshp 187C. Faloutsos (CMU)

Page 188: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Connected Components• Lemma: Let G be a graph with n vertices

and c connected components. If L is the Laplacian of G, then rank(L)= n-c.

• Proof: see p.279, Godsil-Royle

Graph Analytics wkshp C. Faloutsos (CMU) 188

Page 189: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Connected Components

Graph Analytics wkshp C. Faloutsos (CMU) 189

G(V,E)

L=

eig(L)=

1 2 3

6

7 5

4

Page 190: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Connected Components

Graph Analytics wkshp C. Faloutsos (CMU) 190

G(V,E)

L=

eig(L)=

#zeros = #components

1 2 3

6

7 5

4

Page 191: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Connected Components

Graph Analytics wkshp C. Faloutsos (CMU) 191

G(V,E)

L=

eig(L)=

1 2 3

6

7 5

4

0.01

Page 192: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Connected Components

Graph Analytics wkshp C. Faloutsos (CMU) 192

G(V,E)

L=

eig(L)=

#zeros = #components

1 2 3

6

7 5

4

0.01

Page 193: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Connected Components

Graph Analytics wkshp C. Faloutsos (CMU) 193

G(V,E)

L=

eig(L)=

1 2 3

6

7 5

4

0.01

Indicates a “good cut”

Page 194: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Task 4 – Theory - Detailed roadmap• Reminders

• Adjacency matrix• Laplacian

– Connected Components– Intuition: 2nd smallest eigenvalue -> ‘good cut’

194Graph Analytics wkshp 194C. Faloutsos (CMU)

Page 195: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Example: Spectral Partitioning

Graph Analytics wkshp C. Faloutsos (CMU) 195

• K500• K500

dumbbell graph

? Montagues

Capulets

Romeo

Juliet

http://en.wikipedia.org/wiki/File:Romeo_and_juliet_brown.jpg

Page 196: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Example: Spectral Partitioning• This is how adjacency matrix of B looks

Graph Analytics wkshp C. Faloutsos (CMU) 196

spy(B)

Page 197: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Example: Spectral Partitioning

• 2nd eigenvector u2 of B: B u2 = l u2

Graph Analytics wkshp C. Faloutsos (CMU) 197

L = diag(sum(B))-B;[u v] = eigs(L,2,'SM');

plot(u(:,1),’x’)

Not so much information yet…

Node-id ‘i’

u2,i score

Page 198: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Example: Spectral Partitioning

• 2nd eigenvector after sorting on x2,i score

Graph Analytics wkshp C. Faloutsos (CMU) 198

[ign ind] = sort(u(:,1));plot(u(ind),'x')

x2,i score

Node-id ‘i’

Page 199: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Example: Spectral Partitioning

• 2nd eigenvector after sorting on x2,i score

Graph Analytics wkshp C. Faloutsos (CMU) 199

[ign ind] = sort(u(:,1));plot(u(ind),'x')

But now we seethe two communities!

x2,i score

Node-id ‘i’

Page 200: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Example: Spectral Partitioning• This is how adjacency matrix of B looks

now

Graph Analytics wkshp C. Faloutsos (CMU) 200

spy(B(ind,ind))

Page 201: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Why λ2?

201

Each ball 1 unit of mass xLx x1 xnOSCILLATE

Dfn of eigenvector

Matrix viewpoint:

Graph Analytics wkshp 201C. Faloutsos (CMU)

Page 202: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Why λ2?

202

Each ball 1 unit of mass xLx x1 xnOSCILLATE

Force due to neighbors displacement

Hooke’s constantPhysics viewpoint:

Graph Analytics wkshp 202C. Faloutsos (CMU)

Page 203: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Why λ2?

Graph Analytics wkshp C. Faloutsos (CMU) 203

Each ball 1 unit of mass

Eigenvector value

Node idxLx x1 xnOSCILLATE

For the first eigenvector:All nodes: same displacement (= value)

Page 204: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Why λ2?

204

Each ball 1 unit of mass

Eigenvector value

Node idxLx x1 xnOSCILLATE

Graph Analytics wkshp 204C. Faloutsos (CMU)

Page 205: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Conclusions

Spectrum tells us a lot about the graph:• Adjacency: #Paths• Laplacian: Sparse Cut

Graph Analytics wkshp C. Faloutsos (CMU) 205

Page 206: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

References• Fan R. K. Chung: Spectral Graph Theory (AMS) • Chris Godsil and Gordon Royle: Algebraic Graph

Theory (Springer) • Bojan Mohar and Svatopluk Poljak: Eigenvalues

in Combinatorial Optimization, IMA Preprint Series #939

• Gilbert Strang: Introduction to Applied

Mathematics (Wellesley-Cambridge Press)

Graph Analytics wkshp C. Faloutsos (CMU) 206

Page 207: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) 207

PART 5: Conclusions

Page 208: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) P9-208

Summary• Task 1: Node importance• Task 2: Community detection• Task 3: Mining graphs over

time• Task 4: Spectral graph theory

Page 209: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) P9-209

Summary• Task 1: Node importance• Task 2: Community detection• Task 3: Mining graphs over

time• Task 4: Spectral graph theory

->SVD, PageRank, HITS -> METIS; ‘no good cuts’ -> Tensors

-> Laplacians

Page 210: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS

Graph Analytics wkshp C. Faloutsos (CMU) P9-210

AcknowledgementsFunding:

IIS-0705359, IIS-0534205, DBI-0640543, CNS-0721736

Page 211: CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

Graph Analytics wkshp C. Faloutsos (CMU) P9-211

THANK YOU!Christos Faloutsoswww.cs.cmu.edu/~christoswww.cs.cmu.edu/~pegasus

http://www.cs.cmu.edu/~epapalex/gmc/