8/13/2007kdd 2007, san jose graph x-ray: fast best-effort pattern matching in large attributed...

31
8/13/2007 KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad L L N L

Upload: charlotte-hicks

Post on 13-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

8/13/2007 KDD 2007, San Jose

Graph X-Ray: Fast Best-Effort Pattern Matching

in Large Attributed Graphs

Hanghang Tong, Brian Gallagher,

Christos Faloutsos, Tina Eliassi-Rad

L L N L

Page 2: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

2

OutputInput

Attributed Data Graph

Query Graph

Matching SubgraphAccountant

CEO

Manager

SEC

Page 3: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

3

Terminology: ``Conform’’

Query GraphMatching Subgraph conforms

Page 4: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

4

Terminology: ``Interception’’

Query GraphMatching Subgraph

Path 12-13-4 is an Interception

Intermediate node

matching node

matching node

matching node

matching node

Page 5: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

5

Terminology: ``Instantiate’’

Query Graph HqMatching Subgraph Ht

Node 11 instantiates SEC nodeHt instantiates Hq

Page 6: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

6

Roadmap

• Introduction– Problem Definition

– Motivations

• How to: Graph X-Ray

• Experimental Results

• Conclusion

Page 7: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

7

Motivation: Why Not SQL?

• Case 1: Exact match does not exist– Q: How to find approximate answer?

• Case 2: Too many exact matches– Q: How to rank them?

Page 8: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

8

Motivation: Why Not SQL? (Cont.)

• Case 3: Exact match might be not the best answer– ``Find CEO who has heavy contact with Accountant’’

• Q: how to find right?

12 1

99

3

2

4

11 4

...

Exact match1 direct connection Inexact match

Many indirect connections

Page 9: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

9

Motivation: Efficiency

• Why Not Subgraph Isomorphism?– Polynomial for fixed # of pattern query

• Q1: How to scale up linearly?

• Q2: … and with a small slope?

Page 10: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

10

Wish List

• Effectiveness– Both exact match & inexact Match– Ranking among multiple results– ``Best’’ answer (proximity-based)

• Efficiency– Scale linearly– Scale with small scope

G-Ray meets all!

Page 11: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

11

Roadmap

• Introduction– Problem Definition

– Motivations

• How to: Graph X-Ray

• Experimental Results

• Conclusion

Page 12: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

12

Preliminary: Center-Piece Subgraph [Tong+]

A C

B

A C

B

Original GraphBlack: query nodes

Q

=CePS( , , )A B CCePS is meta opt. in G-Ray!

Page 13: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

13

Preliminary: Augmented Graph

• Data nodes– 1,…13

• Attribute nodes– a

Footnote

12

1311

9

10

5

6

7

8

1

2

3

4

Aug. Graph is crucial for computation!

Page 14: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

14

G-Ray: quick overview (for loop )

Step 1: SF

Step 2: NE Step 3: BR

Step 4: NE Step 5: BR

Step 6: NE

Step 7: BR Step 8: BR

SF: Seed-FinderNE: Neighborhood -ExpanderBR: Bridge

Page 15: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

15

• Q: How to instantiate SEC node?

• A:

Footnote

11 =CePS( )

12

1311

9

10

5

6

7

8

1

2

3

4

Seed-Finder ( )

`11’ is close to some un-known data nodes for `CEO’ `Account.’and `Manager’

Page 16: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

16

12

1311

9

10

5

6

7

8

1

2

3

4

Neighborhood-Expander ( )

• Q: How to instantiate CEO node?– Step 1 Step 2?

• A:

• Footnote:– Step 3 Step 4?

– Step 5 Step 6?

11=CePS( )12

11=CePS( )7

=CePS( )4 7 12

Page 17: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

17

Step 6: NE

Bridge ( )

• Q:

• A: Prim-like Alg.– To maximize

– Should block node 11 and 7

• Footnote– Connection subgraph, or one single path?

Step 7: BR

?

Page 18: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

18

Roadmap

• Introduction– Problem Definition

– Motivation

• How to: Graph X-Ray

• Experimental Results

• Conclusion

Page 19: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

19

Experimental Results

• Datasets– DBLP– Node: author (315k)– Edge: co-authorship (1,800k)– Attribute: conference & year (13k)

• KDD-2001, SIGMOD…

Page 20: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

20

Effectiveness: star-query

Query Result

Page 21: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

21

Effectiveness: line-query

Query

Result

Page 22: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

22

Query

Result

Effectiveness: loop-query

Page 23: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

23

Efficiency

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 106

0

10

20

30

40

50

60

70

80

# of Edges

Ave

rage

Res

pons

e T

ime

(Sec

onds

)

Fast FSGM

Iterative method

# of Edges

Response Time

•Scale linearly•Small slope•3-5 Seconds

~2 M edges

Page 24: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

24

Roadmap

• Introduction– Problem Definition

– Motivation

• How to: Graph X-Ray

• Experimental Results

• Conclusion

Page 25: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

25

Conclusion

• Graph X-Ray (G-Ray)– Best effort pattern match

• in large attributed graphs

– Scale linearly • with small slope

• More details in Poster Session – Monday (tonight)– board number 8

Page 26: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

26

www.cs.cmu.edu/~htong

12

1311

9

10

5

6

7

8

1

2

3

4

12

11 4

7

13

X-Ray G-Ray

Thank you!

Page 27: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

27

Backup-slides

Page 28: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

28

1

4

3

2

56

7

910

8

11

12

Proximity on Graph

• Multi-faceted• Punish long path• Edge weight

a.k.a relevance, closeness

How to: ---- random walk with restart

Page 29: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

29

Random walk with restart

Node 4

Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12

0.130.100.130.220.130.050.050.080.040.030.040.02

1

4

3

2

56

7

910

811

120.13

0.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

Ranking vector More red, more relevant

Nearby nodes, higher scores

4r

Page 30: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

30

How to rank the results

• Our goodness function– Measure the proximity between any two matching

nodes if they are required to be connected. (two-way)– Multiply them together

• In G-Ray, we approximately optimize this goodness functions

• If we have multiple matching subgraphs, we can rank them according to this goodness functions

Page 31: 8/13/2007KDD 2007, San Jose Graph X-Ray: Fast Best-Effort Pattern Matching in Large Attributed Graphs Hanghang Tong, Brian Gallagher, Christos Faloutsos,

31

How to rank the results

matching node

matching node

matching node

matching node

Goodness = Prox (12, 4) x Prox (4, 12) x Prox (7, 4) x Prox (4, 7) x Prox (11, 7) x Prox (7, 11) x Prox (12, 11) x Prox (11, 12)