scs cmu joint work by hanghang tong, spiros papadimitriou, jimeng sun, philip s. yu, christos...

29
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug. 24-27, 2008, Las Vegas KDD 2008 Colibri: Fast Mining of Large Static and Dynamic Graphs

Upload: cleopatra-summers

Post on 19-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

SCS CMU

Joint Work by

Hanghang Tong, Spiros Papadimitriou, Jimeng Sun,

Philip S. Yu, Christos Faloutsos

Speaker: Hanghang Tong

Aug. 24-27, 2008, Las Vegas KDD 2008

Colibri: Fast Mining of Large Static and Dynamic Graphs

SCS CMU

2

Graphs are everywhere!

Q: How to find patterns?e.g., community, anomaly, etc.

SCS CMU

Motivation• Q: How to find patterns?

– e.g., community, anomaly, etc.

• A: Low-Rank Approximation (LRA) for Adjacency Matrix of the Graph.

3

A L

M RX X

~~

SCS CMU

LRA for Graph Mining: Example

4

John

KDD

Tom

Bob

Carl

Van

RoyRECOMB

ISMB

ICDM

Author Conf.

L M R

~~X X

Adj. matrix: A

Au. clusters

Conf. Cluster

Interaction

Recon. error is high ‘Carl’ is abnormal

SCS CMU

Challenges

• How to get (L, M, R)+ Efficiently (both time and space);

+ Intuitively (easy for interpretation);

+ Dynamically (track patterns over time)?

5

SCS CMU

6

Roadmap

• Motivation

• Existing Methods– SVD– CUR/CX

• Proposed Methods: Colibri

• Experimental Results

• Conclusion

SCS CMU

Matrix & Column Space

• Matrix

• Column Space of a Matrix

B =

7

3 11 10 0b1 b2

b1 , b2 are vectors in 3-d space!

b2 b1

SCS CMU

Projection, Projection Matrix & Core Matrix

8

v

v~

v~ = B v

BTBTB+

X X X

Projection of v Projection matrix of B An arbitrary vector

Core Matrix

SCS CMU

Singular-Value-Decomposition (SVD)

9

….a1 a2 a3 am…

A: n x m

….u1 uk…

U: left singular vectors

….

….

v1

V: right singular vectors

vk

1

k

x x

……

… … … … …

~~

SCS CMU

SVD: How to

• #1: Find the left matrix U, where

• #2: Project A into the column space of U

10

( ) ...T TA U U U U A U V

1 ,1 2 ,2 ,...Ti i m i mi

ii i

a v a v a vA vu

Projection Matrix of Column Space of U

SCS CMU

SVD: drawbacks

• Efficiency– Time– Space (U, V) are dense

• Interpretation

• Dynamic: not easy11

2 2(min( , ))O n m nm

1st singular vector

2nd singular vector

=

A U V

SCS CMU

CUR (CX) decomposition

12

…. …

A: n x m

….

C

…. ….

R

x x…

U

( )TC C TC A

~~•Sample Columns from A to form C•Project A onto the col. Space of C

SCS CMU

CUR (CX): advantages

13

• Efficiency (better than SVD)– Time

• (c is # of sampled col.s)

– Space (C, R) are sparse

• Interpretation

2 3( ) or ( )O c n O c cm

SCS CMU

• Redundancy in C, wasting both time and space

• Dynamic: not easy

CUR (CX): drawbacks

14

• 3 copies of green, • 2 copies of red, • 2 copies of purple• purple=0.5*green + red…

SCS CMU

15

Roadmap

• Motivation

• Existing Methods

• Colibri– Colibri-S for static graphs– Colibri-D for dynamic graphs

• Experimental Results

• Conclusion

SCS CMU

16

• 3 copies of green, • 2 copies of red, • 2 copies of purple• purple=0.5*green + red…

Colibri-S: Basic Idea

L

….

….

….

RMx x

CUR (CX) Colibri-SOriginal Matrix

We want the Col.s in L are linearly independent with each other!

SCS CMU

M= =CoreMatrix

17

InitiallySampled matrix C

….

L = : Linearly Ind. Col.s

….

….

….

-1

R = LT x A = ….

Input Output

?

LT L

Q: How to find L & M from C efficiently?

SCS CMU

discard v

18

A: Find L & M iteratively!….

Current L & M

Redundant ?

For each col. v in CProject it on L

Initial Sampled Matrix c

Expand L & M

SCS CMU

19

Colibri-S vs. CUR(CX)• Quality:

• Colibri-S = CUR(CX)• Time:

• Colibri-S >= CUR(CX)• Space

• Colibri-S >= CUR(CX)• Illustrations

Colibri-S CUR (CX)

3 3( ) vs. ( ), where ,O c cm O c cm c c m m

SCS CMU

Colirbri-D for dynamic graphs

20

Initially sampled matrix

t+1

Lt

Mt Rt

Lt+1

Mt+1 Rt+1

?

Q: How to update L and M efficiently?

t

SCS CMU

Colibri-D: How-To

21

Initially sampled matrix

t+1

Lt

Mt Rt

Lt+1

Mt+1 Rt+1

t

Selected Redundant

Selected Redundant

?

Changed from t

SCS CMU

Colibri-D: How-To

22

Initially sampled matrix

t+1

Lt

Mt

Lt+1

Mt+1

t

Selected Redundant

Selected Redundant

L~ Subspace by

blue cols at t+1

Un

ch

ang

ed

C

ols!

SCS CMU

23

Roadmap

• Motivation

• Existing Methods

• Colibri

• Experimental Results

• Conclusion

SCS CMU

24

Experimental Setup

• Datasets• Network traffic• 21,837 sources/destinations• 1,222 consecutive hours• 22,800 edges per hour

• Accuracy:Accu =

• Space Cost:

SCS CMU

25

Performance of Colibri-S

Time Space

Ours

CUR CUR

CMD

OursCMD

• Accuracy• Same 91%+

• Time• 12x of CMD• 28x of CUR

• Space• ~1/3 of CMD• ~10% of CUR

SCS CMU

26Approximation Accuracy

CUR

CMD

Colibri-S

More Evaluation on Colibri-SLog Time (Sec)

SCS CMU

27

Performance of Colibri-D

Time

# of changed cols

CMD

Colibri-S

Colibri-D achieves up to 112x speedups

Colibri-D

SCS CMU

A Family of Low-Rank Approximationfor Fast Graph Mining

• Colibri-S– For static graphs– Remove redundancy– Significant saving in time & space by “free”

• Colibri-D– For dynamic graphs– Explores “smoothness”– Up to 112x than best known methods

28

SCS CMU

29

Poster tonight!

Thank you!

www.cs.cmu.edu/~htong