jure leskovecand anandrajaraman · 2/4/2010 jure leskovec & anand rajaraman, stanford cs345a:...

CS345a: Data MiningJure Leskovec and Anand RajaramanjStanford University

… …

Philip S. Yu

IJCAI Q: what is most relatedconference to ICDM

Ning Zhong

A: Random walks!

conference to ICDM

AAAI M. Jordan

R. RamakrishnanA: Random walks!

[Sun, ICDM2005]

NIPS…

Conference Author2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

0.0080.007

0.0050.011

0.0050 004

0.0040.004

32/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

[Pan, KDD ‘04]

Sea Sun Sky Wave{ } { }Cat Forest Grass Tiger

?A: Proximity on

grpahs!

{?, ?, ?,}grpahs!

[Pan KDD2004]2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

[Pan, KDD ‘04]

Region

Test Image

Sea Sun Sky Wave Cat Forest Tiger Grass

Keyword2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

[Pan, KDD ‘04]

Region

Test Image{Grass, Forest, Cat, Tiger}

Sea Sun Sky Wave Cat Forest Tiger Grass

Keyword2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

[Tong‐Faloutsos, ‘06]

Connection subgraphs: Connection subgraphs: What is the most likely connection between Andrew McCallum and Yiming Yang:Andrew McCallum and Yiming Yang:

Michael I.Jordan

RongJin

AndrewNg 1 16

Andrew Yiming

Jordan JinNg

ZoubinJohn D.

McCallum Yang

Xiaojin

GhahramaniLaffterty

Fernando42

2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 7

XiaojinZhu

Jian ZhangFernando

C.N. Pereira

Gi Q d B Given Q query nodes Find Center‐piece subgraph on b nodes

subgraph on b nodes

Input of CEPS: Q Query nodes Q Query nodes Budget b k‐softAND coefficient B

Individual Score Calculation: Individual Score Calculation: Measure proximity of each nodewith respect to individual query node

( , ) n Qr i j

with respect to individual query node

Combine Individual Scores:

Measure importance of a node tothe whole query set

1( , ) nr Q j

“Extract” the connection subgraph: arg max ( )H g H

Goal: Goal: Calculate importance score r(i,j) = ri,j f h d j d h d i for each node j and each query node i

How to do that? How to do that?

A BH1 1

a.k.a.: Relevance, Closeness, ‘Similarity’…

Shortest path is not good:Shortest path is not good:

No influence for degree‐1 nodes (E, F)! known as ‘pizza delivery guy’ problem in undirected graphgraph

Multi‐faceted relationships2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining 12

Max flow is not good: Max‐flow is not good:

Does not punish long paths

I J11 1

A BH1 1

1 11•Multiple Connections…

• Quality of connection

•Direct & In‐direct connections

Direct & In direct connections

•Length, Degree, Weight…2/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

Node 4

Node 1Node 2Node 3Node 4

0.130.100.130.222

120.13

0.130.050.050.08

0.040.030.040.02

Ranking vector More red more relevant

Nearby nodes, higher scores

More red, more relevant4r

( , ) i jQ i j rj

W : adjacency matrix. 1( )Q I cW

,( , ) i jQ j

c: damping factor

2c 3c ...W 2W 3WQ I c W WQ

all paths from i all paths from i all paths from i

to jwith length 1 to jwith length 2 to jwith length 3

(1 )r cWr c e

(1 )i i ir cWr c e R b

0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 00.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0 0

0.13 00.10 0

Ranking vector Starting vectorTransition matrix Restart prob.

0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0 0.130.220.130.05

01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4 0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

0.10 00.13 00.220.13 00.05 0

0.050.080.040.030.04

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 0 0 0 0 1/4 0 1/3 0 1/2

0.05 00.08 00.04 00.03 00.04 0

0.02 0 0 0 0 0 0 0 0 0 1/3 1/3 0

2 00.0

n x n n x 1n x 1182/4/2010 Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

0 1/3 1/3 1/3 0 0 0 0 0 0 0 0 0 0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4 0 0 0 0 0 0 0

0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0

0 0 0 0 1/4 1/2 0 0 0 0 0 0

?? 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/20 0 0 0 0

0 0 1/4 0 1/3 0 1/2 0

0 0 0 0 0 0 0 1/4 0 1/3 0 1/2 0

0 0 0 0 0 0 0 0 0 1/3 1/3 0 0

Adjacency matrix Starting vectorRanking vectorRanking vector Adjacency matrix Starting vectorRanking vectorRanking vector

2/4/2010Jure Leskovec & Anand Rajaraman, Stanford CS345a: Data Mining

0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 0

0.130.10

0.120.18

0.190.09

0.140.13

0.160.10

0.130.10 1/3 1/3 0 1/3 0 0 0 0 0 0 0 0

1/3 0 1/3 0 1/4

0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 00 0 0 0 1/4 1/2 0 0 0 0 0 0

0.130.220.130.050 05

0.30.10.300

0.120.350.030.070 07

0.190.180.180.040 04

0.140.260.100.060 06

0.160.210.150.050 05

0.130.220.130.050 05

120.130.10

0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2

0 0 0 0 0

0 0 1/4 0 1/3 0 1/2 0

0.050.080.040.030.04

1100000

0.070.07000

0.040.060.0200.02

0.060.080.010.010.01

0.050.070.020.010.02

0.050.080.040.030.04

110.13

0 0 0 0 0 0 0 0 0 1/3 1/3 0 0 0

70 0 0 0 0.01 0.02

No pre computation / light storage

No pre‐computation / light storageSlow on‐line response O(mE)

Q1 Q2 Q3

Node 1Node 2

0.5767 0.0088 0.00880.1235 0.0076 0.0076

0 00240 0333

0.0088

0.0283 0.0283 0.02830.0076 0.1235 0.00760.0088 0.5767 0.00880.0076 0.0076 0.123510

0.1260

0 02830.0024

0.00760.00240.0333

0.0088 0.0088 0.57670.0333 0.0024 0.12600.1260 0.0024 0.03330.1260 0.0333 0.0024

10 133

0.1235

0.0283

0.0076

Node 11Node 12Node 13

0.0333 0.1260 0.00240.0024 0.1260 0.03330.0024 0.0333 0.1260

0.5767

0.1260 0.0333

0.0088

The RWR score r(i j) The RWR score r(i,j) i … query node, iQ j d i th t k j … node in the network

Combine r(i,j) to get the score r(Q,j)( ,j) g (Q,j) 1) AND query: Meeting probability prob. that |Q| random particles coincide on node jp |Q| p j

OR query: OR query: At least one particle arrives to node j:(i d j i i t t t t l t 1 d )(i.e., node j is important to at least 1 query node)

k‐SoftAND query:d l k d Node j is important to at least k query nodes

Can be computed recursively:

Goal:2

910Goal:

Extract a sub‐graph S on b nodes that maximizes

the uS r(Q, u) Idea: Iterate until budget is

14 15 16 Iterate until budget is reached Pick not‐yet‐selected node j (Q j)

j=arg maxj r(Q,j) Find good paths to all query nodes 4

Add the paths to S

1 35 712

0 4505 0.0103

11 0 07100.10100.1010

0.45055

11 120.00190.00460.0046

0.0103

0.1010

0.22670.1010

0.0710

0.0046

0.00240.0046

0.0710

0 1010 0 1010

0.0710

0.0019

0 0046 0 0046

0.0019

9 80.4505

0.1010 0.1010

0.4505

9 80.0103

0.0046 0.0046

0.0103

S f AND

AND 2‐SoftAND

0 0103

0 16170.1617

0.0103

11 124

0 1617 0 1617

0.13870.16170.1617

10 133

0.1617

0.1387

0.08490.1617

0.1387

0.1617 0.1617

9 80.0103 0.0103

Solving the random walk for each query Solving the random walk for each query separately is time consuming Not appropriate for real time Not appropriate for real‐time

Can we do better? Can we do better?

Idea 1) Pre compute scores for each possible Idea 1) Pre‐compute scores for each possible query node i

1 2 3 4 5 6 7 8 9 10 11 12r r r r r r r r r r r r0.20 0.13 0.14 0.13 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.340.28 0.20 0.13 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.450.14 0.13 0.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.330.13 0.10 0.13 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.32

8120.13

0.100.08

0.09 0.09 0.09 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.560.03 0.04 0.04 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.220.03 0.04 0.04 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.220.08 0.11 0.04 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.130 03 0 04 0 03 0 28 0 36 0 30 0 30 0 74 1 78 1 00 0 76 0 79

8 110.13

0.130.05

0.04Q‐1:

0.03 0.04 0.03 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.790.04 0.04 0.04 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.800.04 0.05 0.04 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.720.02 0.03 0.02 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05

Fast on‐line response

Heavy pre‐computation/storage costO(n3 ) O(n2)

[Tong‐Faloutsos‐Pan, ‘06]

Find Communities 12

100.04 0.039 10

9 1012

120.130.10

0.080.02

Combine1 3

Fix the remaining4

Within partition links cross partition links

Within‐partition links cross‐partition links

9 109 10

9 1012

|S| << |W2|~| | | |

Offline stage: Offline stage:

C i i BridgesCommunities Bridges

8/22/2006 34

Sherman Morrison Woodbury matrix identity: Sherman‐Morrison‐Woodbury matrix identity: Inverse of a rank‐k correction of matrix X can be computed by doing a rank‐k correction to X‐1:computed by doing a rank‐k correction to X :

In our case: In our case:Q‐1=c(I‐W)‐1=c(I‐W1‐W2)‐1=

1 1 11 U VQQ Q Q 8/22/2006 35

1 1 11 1

11U VcQQ Q Q

Online stage: Online stage: Compute column i of:

1 1 11 QQ Q Q

1 1 11 1

11U VcQQ Q Q

Using:

8/22/2006 36

Most slides are borrowed from the tutorial by Most slides are borrowed from the tutorial by Hanghang Tong and Christos Faloutsos from CMUCMU

http://www.cs.cmu.edu/~htong/tut/cikm2008/cikm_tutorial.html

8/22/2006 37

jure leskovecand anandrajaraman · 2/4/2010 jure leskovec & anand rajaraman, stanford cs345a:...

Documents

o 0.025 0.02 0.018 0.016 0.012 0.01 0.008 0.007 0.006 0.005...

jure leskovec (@jure) - stanford university...jure leskovec,...

Ø { b r è 8 Ó ± å !c ì È (ò€¦ · 3 0.008 0.007...

volvo service manual - conversion...

discussion of silvia miranda-agrippino and giovanni ricco...

home | pds group - company name...ang guan piao 212,0000...

1 dynamics of real-world networks jure leskovec machine...

yamagami-suidou.or.jpyamagami-suidou.or.jp/datafile/pdf/suisitu/before_h30/...9...

utm.md · m.05 0.001 m.07 0.002 m.04 0.003 m.05 0.004 m.05...

6fdoh 5dglr &rqwuroohg 5dflqj 7uxfn (3 …35a 35a 35a 35a...

jure kastelan

y9 booster lesson1. place value chartm1.1 0.001 0.002 0.003...

sina-pub.ir · 2019-03-02 · sina-pub.ir 0.054 0.083 0.025...

0.01 0.005

jure kaštelan.lea.rugole

jure leskovec (@jure)

4. - · pdf file›™ 0.000 0.005 0.027 0.027 0.027 0.027...

constraints on quantifiers · hearers perspective 0 0.001...

フォトダイオードモジュール - home | hamamatsu...

topical glaucoma medication with a teal cap? prostaglandin...