star: steiner-tree approximation in relationship graphs

127
STAR: Steiner-Tree Approximation in Relationship Graphs Max-Planck Institute for Informatics, Database and Information Systems, Gjergji Kasneci , Maya Ramanath , Mauro Sozio, Fabian M. Suchanek , Gerhard Weikum

Upload: buck

Post on 23-Feb-2016

61 views

Category:

Documents


0 download

DESCRIPTION

STAR: Steiner-Tree Approximation in Relationship Graphs. Max-Planck Institute for Informatics, Database and Information Systems, Gjergji Kasneci , Maya Ramanath , Mauro Sozio , Fabian M. Suchanek , Gerhard Weikum. Introduction. Entity-Relationship Graphs - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: STAR: Steiner-Tree Approximation in Relationship Graphs

STAR: Steiner-Tree Approximation in RelationshipGraphs

Max-Planck Institute for Informatics,Database and Information Systems,

Gjergji Kasneci , Maya Ramanath , Mauro Sozio, Fabian M. Suchanek , Gerhard Weikum

Page 2: STAR: Steiner-Tree Approximation in Relationship Graphs

Introduction

• Entity-Relationship Graphs – An other way of representing relational Data– Consist of labeled Nodes and Edges, – Node Labels correspond to Entities– Edge Labels represent relations between Entities – Edge Weights and Entity relation strength. – Taxonomic Relations (subClassOf, type )

Page 3: STAR: Steiner-Tree Approximation in Relationship Graphs

Introduction

• Example of an Entity Relationship Graph

Specialization

Generalization

Page 4: STAR: Steiner-Tree Approximation in Relationship Graphs

Introduction• Quering E-R Graphs

– The Relationship Search Query Class: • Given a set of two, three, or more entities (nodes), find their closest

relationships (edges or paths) that connect the entities in the strongest possible way.

• Strongest Related to Informativenes– A Relationship Search Query Example

• Query: “How are Germany’s chancellor Angela Merkel, the mathematician Richard Courant, Turing-Award winner Jim Gray, and the Dalai Lama related?”

• Informative answer: All have a doctoral degree from a German university

– How are Angela Merkel, Arnold Swarzenegger, Max Plank and Germany are Related ?

Page 5: STAR: Steiner-Tree Approximation in Relationship Graphs

Motivation and Problem • Information Discovery as opposed to Lookup• The Nature of the Answer

– Can be a Tree embeded In Original Graph – Input Nodes (Query) must be connected by the Tree– How Good is the answer?

• A scoring model can exploit node and edge weights

• The formal Definition of the Problem: – Compute the k lowest-cost Steiner trees:

Page 6: STAR: Steiner-Tree Approximation in Relationship Graphs

Motivation and Problem • What is a Steiner Tree Problem?

• Steiner Tree Examples: Steiner tree for three terminals V’ = {A, B, C} Note the Steiner Point S.

Steiner tree for four terminals V’ = {A, B, C, D} Note the Steiner Points S1, S2.

Page 7: STAR: Steiner-Tree Approximation in Relationship Graphs

Motivation and Problem • Steiner Tree Problem Complexity

– NP-Hard Complete (Optimal)– Approximate Solution algorithms– Approximation Ratio:

• Measures the Quality of approximation algorithm • Weight of Aproximate Graph out / weight of Optimal Graph Output

• Benefits by Reducing Approximation Ratio– Viable Runtimes (efficiency)– Better Graph quality (Informativenes) near-optimal

Page 8: STAR: Steiner-Tree Approximation in Relationship Graphs

Paper Contributions• Presents STAR a new Efficient algorithm

– Computes near-optimal Steiner Trees– Exploits Taxonomic Schema (when available)– Viable Runtimes over large graphs

• STAR Approximation Ratio Proofs:– O(logn), for n given query entities (Worst Case)

• Improvement over other approximation ratios– , or – STAR practically is better than a - approximation algorithm

• STAR top-k tree capability• STAR Outperforms State of the art algorithms by an order of magnitude• Can be applied either on main memory datasets or on-disc resident

Large Graphs. • Evaluation via Comparison with other cutting edge algorithms

Page 9: STAR: Steiner-Tree Approximation in Relationship Graphs

The Star Algorithm

• Introduction • First Phase• Second Phase• Examples

Page 10: STAR: Steiner-Tree Approximation in Relationship Graphs

The Star Algorithm

• Introduction • First Phase• Second Phase• Examples

Page 11: STAR: Steiner-Tree Approximation in Relationship Graphs

The Star Algorithm – Introduction • Problem Definition

– As Stated in introduction – Further we are interested in finding Top-k result trees by increasing order

• Exploitation of Taxonomic Backbones– Node Labels as Entities – Edge Labels as weights or relations– Taxonomic Availability is not compulsory

• Runs in 2 Phases• Phase 1: Uses Taxonomic Information (when available)

– Builds a quick Tree by pruning the Original Graph– Interconnects all given nodes

• Phase 2: Iteratively improves the Tree from Phase 1

Page 12: STAR: Steiner-Tree Approximation in Relationship Graphs

The Star Algorithm

• Introduction • First Phase• Second Phase• Examples

Page 13: STAR: Steiner-Tree Approximation in Relationship Graphs

The Star Algorithm - First Phase

• Prunes Original Graph • Runs Iterators in each Terminal• Iterators Run in a Round Robin Manner• Iterators Follow only Taxonomic Edges:

– subClassOf, type

Page 14: STAR: Steiner-Tree Approximation in Relationship Graphs

Single Breadth – First - Search Iterator Pruning Example

Page 15: STAR: Steiner-Tree Approximation in Relationship Graphs

15

Breadth First Search

s

2

5

4

7

8

3 6 9

Observe Taxonomic Structure

Page 16: STAR: Steiner-Tree Approximation in Relationship Graphs

16

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: s

Top of queue

2

1Shortest pathfrom s

Page 17: STAR: Steiner-Tree Approximation in Relationship Graphs

17

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: s 2

Top of queue

3

1

1

Page 18: STAR: Steiner-Tree Approximation in Relationship Graphs

18

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: s 2 3

Top of queue

5

1

1

1

Page 19: STAR: Steiner-Tree Approximation in Relationship Graphs

19

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 2 3 5

Top of queue

1

1

1

Page 20: STAR: Steiner-Tree Approximation in Relationship Graphs

20

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 2 3 5

Top of queue

4

1

1

1

2

Page 21: STAR: Steiner-Tree Approximation in Relationship Graphs

21

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 2 3 5 4

Top of queue

1

1

1

2

5 already discovered:don't enqueue

Page 22: STAR: Steiner-Tree Approximation in Relationship Graphs

22

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 2 3 5 4

Top of queue

1

1

1

2

Page 23: STAR: Steiner-Tree Approximation in Relationship Graphs

23

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 3 5 4

Top of queue

1

1

1

2

Page 24: STAR: Steiner-Tree Approximation in Relationship Graphs

24

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 3 5 4

Top of queue

1

1

1

2

6

2

Page 25: STAR: Steiner-Tree Approximation in Relationship Graphs

25

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 3 5 4 6

Top of queue

1

1

1

2

2

Page 26: STAR: Steiner-Tree Approximation in Relationship Graphs

26

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 5 4 6

Top of queue

1

1

1

2

2

Page 27: STAR: Steiner-Tree Approximation in Relationship Graphs

27

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 5 4 6

Top of queue

1

1

1

2

2

Page 28: STAR: Steiner-Tree Approximation in Relationship Graphs

28

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 4 6

Top of queue

1

1

1

2

2

Page 29: STAR: Steiner-Tree Approximation in Relationship Graphs

29

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 4 6

Top of queue

1

1

1

2

2

8

3

Page 30: STAR: Steiner-Tree Approximation in Relationship Graphs

30

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 4 6 8

Top of queue

1

1

1

2

2

3

Page 31: STAR: Steiner-Tree Approximation in Relationship Graphs

31

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 6 8

Top of queue

1

1

1

2

2

3

7

3

Page 32: STAR: Steiner-Tree Approximation in Relationship Graphs

32

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 6 8 7

Top of queue

1

1

1

2

2

3

9

3

3

Page 33: STAR: Steiner-Tree Approximation in Relationship Graphs

33

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 6 8 7 9

Top of queue

1

1

1

2

2

3

3

3

Page 34: STAR: Steiner-Tree Approximation in Relationship Graphs

34

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 8 7 9

Top of queue

1

1

1

2

2

3

3

3

Page 35: STAR: Steiner-Tree Approximation in Relationship Graphs

35

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 7 9

Top of queue

1

1

1

2

2

3

3

3

Page 36: STAR: Steiner-Tree Approximation in Relationship Graphs

36

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 7 9

Top of queue

1

1

1

2

2

3

3

3

Page 37: STAR: Steiner-Tree Approximation in Relationship Graphs

37

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 7 9

Top of queue

1

1

1

2

2

3

3

3

Page 38: STAR: Steiner-Tree Approximation in Relationship Graphs

38

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 7 9

Top of queue

1

1

1

2

2

3

3

3

Page 39: STAR: Steiner-Tree Approximation in Relationship Graphs

39

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 9

Top of queue

1

1

1

2

2

3

3

3

Page 40: STAR: Steiner-Tree Approximation in Relationship Graphs

40

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 9

Top of queue

1

1

1

2

2

3

3

3

Page 41: STAR: Steiner-Tree Approximation in Relationship Graphs

41

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue: 9

Top of queue

1

1

1

2

2

3

3

3

Page 42: STAR: Steiner-Tree Approximation in Relationship Graphs

42

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Undiscovered

Discovered

Finished

Queue:

Top of queue

1

1

1

2

2

3

3

3

Page 43: STAR: Steiner-Tree Approximation in Relationship Graphs

43

Breadth First Search

s

2

5

4

7

8

3 6 9

0

Level Graph

1

1

1

2

2

3

3

3

Page 44: STAR: Steiner-Tree Approximation in Relationship Graphs

First – Phase Example

(Simple Breadth – First – Search Iterator from each Terminal)

V’ = {Max Planck, Arnold Schwarzenegger, Germany}

Page 45: STAR: Steiner-Tree Approximation in Relationship Graphs

45

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Breadth First Search Iterators from Each Terminal

As soon as iterators meet a result is constructed

Page 46: STAR: Steiner-Tree Approximation in Relationship Graphs

46

Queue T1: Max Planck

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:

Queue: T3:

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 47: STAR: Steiner-Tree Approximation in Relationship Graphs

47

Queue T1: Max Planck

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:Arnold Schwarzenegger

Queue: T3:

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 48: STAR: Steiner-Tree Approximation in Relationship Graphs

48

Queue T1: Max Planck

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:Arnold Schwarzenegger

Queue: T3: Germany

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 49: STAR: Steiner-Tree Approximation in Relationship Graphs

49

Queue T1: Max Planck, Physicist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:Arnold Schwarzenegger

Queue: T3: Germany

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 50: STAR: Steiner-Tree Approximation in Relationship Graphs

50

Queue T1: Max Planck, Physicist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:Arnold Schwarzenegger, Politician

Queue: T3: Germany

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 51: STAR: Steiner-Tree Approximation in Relationship Graphs

51

Queue T1: Max Planck, Physicist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:Arnold Schwarzenegger, Politician

Queue: T3: Germany, State

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 52: STAR: Steiner-Tree Approximation in Relationship Graphs

52

Queue T1: Physicist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:Arnold Schwarzenegger, Politician

Queue: T3: Germany, State

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 53: STAR: Steiner-Tree Approximation in Relationship Graphs

53

Queue T1: Physicist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Arnold Schwarzenegger, Politician, Actor

Queue: T3: Germany, State

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 54: STAR: Steiner-Tree Approximation in Relationship Graphs

54

Queue T1: Physicist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Arnold Schwarzenegger, Politician

Queue: T3: State

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 55: STAR: Steiner-Tree Approximation in Relationship Graphs

55

Queue T1: Physicist, Scientist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Arnold Schwarzenegger, Politician

Queue: T3: State

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 56: STAR: Steiner-Tree Approximation in Relationship Graphs

56

Queue T1: Physicist, Scientist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Politician, Actor

Queue: T3: State

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 57: STAR: Steiner-Tree Approximation in Relationship Graphs

57

Queue T1: Physicist, Scientist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Politician, Actor

Queue: T3: State, Organization Unit

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 58: STAR: Steiner-Tree Approximation in Relationship Graphs

58

Queue T1: Scientist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Politician, Actor

Queue: T3: State, Organization Unit

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 59: STAR: Steiner-Tree Approximation in Relationship Graphs

59

Queue T1: Scientist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Politician, Actor, Entity

Queue: T3: State, Organization Unit

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 60: STAR: Steiner-Tree Approximation in Relationship Graphs

60

Queue T1: Scientist

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Politician, Actor, Entity

Queue: T3: Organization Unit

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 61: STAR: Steiner-Tree Approximation in Relationship Graphs

61

Queue T1: Scientist, Person

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Politician, Actor, Entity

Queue: T3: Organization Unit

T1 T2 T3

Breadth First Search Iterators from Each Terminal

Page 62: STAR: Steiner-Tree Approximation in Relationship Graphs

62

Queue T1: Scientist, Person

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Actor, Entity

Queue: T3: Organization Unit

T1 T2 T3

Breadth First Search Iterators from Each TerminalEntity already discovered in T2 iterator: don't enqueue

Page 63: STAR: Steiner-Tree Approximation in Relationship Graphs

63

Queue T1: Scientist, Person

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Entity

Queue: T3: Organization Unit, Entity

T1 T2 T3

Breadth First Search Iterators from Each TerminalEntity already discovered in T2 iterator: T2 & T3 Iterators Met Stop T3 Iterator

Page 64: STAR: Steiner-Tree Approximation in Relationship Graphs

64

Queue T1: Person

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Entity

Queue: T3:

T1 T2

Breadth First Search Iterators from Each Terminal

Page 65: STAR: Steiner-Tree Approximation in Relationship Graphs

65

Queue T1: Person, Entity

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Entity

Queue: T3:

T1 T2

Breadth First Search Iterators from Each TerminalEntity already discovered in T2 iterator: T1 & T2 Iterators Met Stop T1 Iterator

Page 66: STAR: Steiner-Tree Approximation in Relationship Graphs

66

Queue T1:

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Entity

Queue: T3:

T2

Breadth First Search Iterators from Each Terminal

Page 67: STAR: Steiner-Tree Approximation in Relationship Graphs

67

Queue T1:

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2: Entity

Queue: T3:

T2

Breadth First Search Iterators from Each Terminal

Page 68: STAR: Steiner-Tree Approximation in Relationship Graphs

68

Queue T1:

Undiscovered

Discovered

Finished

Top of queue

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Queue T2:

Queue: T3:

Breadth First Search Iterators from Each Terminal

Page 69: STAR: Steiner-Tree Approximation in Relationship Graphs

The Star Algorithm – Second Phase • Aims to Improve the Tree from Phase 1• Follows an iterative improvement procedure

– Certain paths are replaced on each Iteration– New path weights are lower

• Some Definitions : • Terminal Node:

– Any node v є V’• Degree of a node v, deg(v):

– Is the number of edges connected to the node• Fixed Node:

– Any node v, of deg(v) ≥ 3 – Any Terminal Node

Page 70: STAR: Steiner-Tree Approximation in Relationship Graphs

The Star Algorithm – Second Phase • Loose Path :

– A path p in T is a loose path if it has minimal length and its end nodes are fixed nodes.

• Fixed nodes should not be removed during Improvement• Follows that Every intermediate node v in a loose path must

be a Steiner node of deg(v) = 2

• A loose Path is a path that can be replaced during improvement process

• A minimal Steiner Tree with respect to V’ is a tree in which all loose paths represent shortest paths between fixed nodes.

Page 71: STAR: Steiner-Tree Approximation in Relationship Graphs

The Star Algorithm – Second PhaseObservations

• Removing a LP T1, T1 subtrees

• Replacing any LP by a shorter– Compute shortest path between

any node of T1 to any node of T2• Removing and Inserting LPs

Fixed nodes and Unfixed nodes

Page 72: STAR: Steiner-Tree Approximation in Relationship Graphs

The Star Algorithm – Second PhaseFinding an approximate Steiner Tree

1. Remove a LP2. Decomposition of

T into T1 and T23. Connect T1 and T2

by a shorter than LP path

Page 73: STAR: Steiner-Tree Approximation in Relationship Graphs

The Star Algorithm – Second PhaseFinding an approximate Steiner Tree

Page 74: STAR: Steiner-Tree Approximation in Relationship Graphs

The Star Algorithm – Second PhaseThe Tree improving algorithm

• The Difficult Steiner Tree Problem is Reduced– Find shortes paths between node subsets

• In each iteration lp with max weight is removed (Heuristic)

Page 75: STAR: Steiner-Tree Approximation in Relationship Graphs

The Star Algorithm – Second PhaseThe method: replace(lp, T)

• Removes the loose path form T• T is split into subgraphs T1 and T2 • The shortest path connecting any node of T1

to any node of T2 is determined– replace (lp, T) calls findShortestPath(VT1, VT2, lp)– findShortestPath(VT1, VT2, lp), returns the

shortest path

Page 76: STAR: Steiner-Tree Approximation in Relationship Graphs

Steiner Tree Approximation - Phase 2

Physicist

Max Planck

Scientist

Person

Entity

PoliticianActor

Arnold Schwarzenegger

Organization Unit

State

Germany

Angela Merkel

• The overall Graph G

Page 77: STAR: Steiner-Tree Approximation in Relationship Graphs

Steiner Tree Approximation - Phase 2 • Output of Phase 1

• deg(Person) = 3, therefore it is a Fixed point• Largest LP occurs between Person & Germany

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Page 78: STAR: Steiner-Tree Approximation in Relationship Graphs

Steiner Tree Approximation - Phase 2

• Remove Largest LP• Fixed Nodes are not removed

Physicist

Max Planck

Scientist

Person

Entity

Politician

Arnold Schwarzenegger

Organization Unit

State

Germany

Page 79: STAR: Steiner-Tree Approximation in Relationship Graphs

Steiner Tree Approximation - Phase 2

• G splitted into sub graphs T1 & T2• V(T1) = { Person, Politician, Scientist, Physicist, Max Plank} • V(T2) = {Germany}• Algorithm for finding shortest path between T1 & T2 • Method call: shortestPath(V(T1), V(T2), lp)

Physicist

Max Planck

Scientist

Person

Politician

Arnold Schwarzenegger

Germany

T1 T2

Page 80: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

• All pruned vertices are needed• Runs “One single source shortest

path iterator from V(T1) and V(T2)”

• i.e. Find the shortest path from a source Vertex V to all other vertices in graphs.

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

GermanyAngela Merkel

Page 81: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

• Vertex distance d(v) initialization• Assign TWO distances (d1, d2) to each vertex• Assign d1 = 0 to all vertices of V(T1)• Assign d2 = 0 to all vertices of V(T2)• Assign d1= ∞ to all vertices of V(T2)• Assign d2= ∞ to all vertices of V(T1)• Assign d1= d2 = ∞ to all pruned or not

queried vertices

Physicist

Max Planck

Scientist

Person

Entity

Politician Actor

Arnold Schwarzenegger

Organization Unit

State

GermanyAngela Merkel

Page 82: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

Physicist (0. ∞)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. ∞)

Germany(∞, 0)

Angela Merkel(∞. ∞)

• T1 is considered a single node of distance 0 from itself and distance ∞ from T2

• T2 accordingly • Other nodes not members of T1 or T2 have infinite distances from

both T1 or T2

Page 83: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

Physicist (0. ∞)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. ∞)

Germany(∞, 0)

Angela Merkel(∞. ∞)

Itr Cur Oth V V’

1 2Q1 (d1) Q2 (d2)

Arn(0) Ger(0)

Pol(0)

Max(0)

Phy(0)

Sci(0)

Per(0)

• Current: points to iterator of minimal fringe nodes And that is currently expanded

Page 84: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

Physicist (0. ∞)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. ∞)

Germany(∞, 0)

Angela Merkel(∞. ∞)

Itr Cur Oth V V’

1 2

1 2 1 Ger

Q1 (d1) Q2 (d2)

Arn(0)

Pol(0)

Max(0)

Phy(0)

Sci(0)

Per(0)

• Fringe(Q2) < Fringe (Q1)• Swap (current, Other)• Dequeue Germany form Q2

Page 85: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

Physicist (0. ∞)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. ∞)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

Q1 (d1) Q2 (d2)

Arn(0) Sta(0,95)

Pol(0)

Max(0)

Phy(0)

Sci(0)

Per(0)

• d2(State) = 0 + 0,95• Enqueue(State) in Q2

0,95

Page 86: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

Physicist (0. ∞)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

2 2 1 Ger Ang

Q1 (d1) Q2 (d2)

Arn(0) Ang(0,96)

Pol(0) Sta(0,95)

Max(0)

Phy(0)

Sci(0)

Per(0)

• d2(Angela Merkel) = 0 + 0,96• Enqueue Angela Merkel in Q2

0,95

0,96

Page 87: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

Physicist (0. ∞)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang

Q1 (d1) Q2 (d2)

Arn(0) Sta(0,95)

Pol(0)

Max(0)

Phy(0)

Sci(0)

Per(0)

• Dequeue Angela Merkel from Q2

0,95

0,96

Page 88: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

Physicist (0. 1,91)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. ∞)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang Phy

Q1 (d1) Q2 (d2)

Arn(0) Phy(1,91)

Pol(0) Sta(0,95)

Max(0)

Phy(0)

Sci(0)

Per(0)

• d2(Physicist) = 0,96 + 0,95• Enqueue Physicist in Q2

0,95

0,96

0,95

Page 89: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

Physicist (0. 1,91)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. 1,91)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang Phy

2 2 1 Ang Pol

Q1 (d1) Q2 (d2)

Arn(0) Phy(1,91)

Pol(0) Pol(1,91)

Max(0) Sta(0,95)

Phy(0)

Sci(0)

Per(0)

• d2(Politician) = 0,96 + 0,95• Enqueue Politician in Q2

0,95

0,96

0,95

0,95

Page 90: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

Physicist (0. 1,91)

Max Planck(0. ∞)

Scientist (0. ∞)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. 1,91)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang Phy

2 2 1 Ang Pol

3 2 1 Phy

Q1 (d1) Q2 (d2)

Arn(0) Pol(1,91)

Pol(0) Sta(0,95)

Max(0)

Phy(0)

Sci(0)

Per(0)

• Dequeue Physicist from Q2

0,95

0,96

0,95

0,95

Page 91: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

Physicist (0. 1,91)

Max Planck(0. ∞)

Scientist (0.2,9)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. 1,91)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang Phy

2 2 1 Ang Pol

3 2 1 Phy Sci

Q1 (d1) Q2 (d2)

Arn(0) Sci (2,9)

Pol(0) Pol(1,91)

Max(0) Sta(0,95)

Phy(0)

Sci(0)

Per(0)

• d2(Scientist) = 1,91 + 0,99=2,9

• Enqueue Scintist in Q2

0,95

0,96

0,95

0,950,99

Page 92: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

Physicist (0. 1,91)

Max Planck(0. ∞)

Scientist (0.2,9)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. 1,91)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang Phy

2 2 1 Ang Pol

3 2 1 Phy Sci

Q1 (d1) Q2 (d2)

Arn(0) Sci (2,9)

Pol(0) Pol(1,91)

Max(0) Sta(0,95)

Phy(0)

Sci(0)

Per(0)

0,95

0,96

0,95

0,950,99

Stop since Physicist ϵ V(T1)

Page 93: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm

Physicist (0. 1,91)

Max Planck(0. ∞)

Scientist (0.2,9)

Person (0. ∞)

Entity (∞. ∞)

Politician(0. 1,91)

Actor(∞. ∞)

Arnold Schwarzenegger(0. ∞)

Organization Unit(∞. ∞)

State(∞. 0,95)

Germany(∞, 0)

Angela Merkel(∞. 0,96)

Itr Cur Oth V V’

1 2

2 1 Ger

1 2 1 Ger Sta

1 2 1 Ger Ang

2 2 1 Ang Phy

2 2 1 Ang Pol

3 2 1 Phy Sci

Q1 (d1) Q2 (d2)

Arn(0) Per(3,8)

Pol(0) Pol(1,91)

Max(0) Sta(0,95)

Phy(0)

Sci(0)

Per(0)

Return vertices in vector V : V = {Germany, Angela Merkel,

Physicist }

0,95

0,96

0,950,950,99

0,99

Page 94: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path AlgorithmFirst Iteration Result:

Page 95: STAR: Steiner-Tree Approximation in Relationship Graphs

Phase 2– shortest Path Algorithm Second Iteration:

Remove LP

Apply Again the algorithm: To find Shortest Path between T1 and T2

Stop here Since no Loose Paths can be improved

Page 96: STAR: Steiner-Tree Approximation in Relationship Graphs

Aproximation GuaranteeLemmas and Theorems

• Lemma 1– A Tree T with terminal set V’, |V’| ≥ 2 has at least

|V’| - 1 and at most 2|V’| - 3 loose paths.

• The approximation ratio for the cost of the tree returned by star is independent of the 1st Phase result.

Page 97: STAR: Steiner-Tree Approximation in Relationship Graphs

Aproximation GuaranteeLemmas and Theorems

• Lemma 2– Let TA be the Steiner tree yielded by the STAR algorithm. Let L(TA) be

the set of loose paths in TA . For any circular ordering u1, …, uN of the terminals in TA there is a mapping μ: L(TA) V’ X V’, such that:

1. μ is defined for all loose paths in TA

2. For each loose path P with end points u and v, let T1 and T2 the two trees obtained by removing from TA all nodes in P (and their edges), except u and v; then μ(P) = {ui , ui+1 } for some i=1, …, N and one of the nodes ui , ui+1 belongs to T1 , while the other one belongs to T2 ;

3. For each pair of terminals {ui , ui+1 } there are at most 2┌ logN┐+2 loose paths mapped to {ui , ui+1 } .

Page 98: STAR: Steiner-Tree Approximation in Relationship Graphs

Aproximation GuaranteeLemmas and Theorems

• Theorem 1 (approximation order)– The STAR algorithm is a

(4┌ logN┐+4 )-approximation algorithm for the Steiner Tree Problem.

– Therefore:

Page 99: STAR: Steiner-Tree Approximation in Relationship Graphs

Aproximation GuaranteeLemmas and Theorems

• Improvement Guarantee Rule – STAR might have exponential running time. – Infinitesimally small amount cost reduction at

each iteration. – An Improvement Guarantee Rule solves this: – Replace loose path P if and only if:

– Where P’ is the path to be replaced by STAR, given that є > 0

Page 100: STAR: Steiner-Tree Approximation in Relationship Graphs

Aproximation GuaranteeLemmas and Theorems

• Lemma 3 (Time complexity )– Given є > 0, the STAR algorithm with the

improvement-guarantee rule is guaranteed to terminate in

– steps – Where m is the number of edges– is the ratio of the maximum and minimum

cost of the edges in the input graph.

Page 101: STAR: Steiner-Tree Approximation in Relationship Graphs

Aproximation GuaranteeLemmas and Theorems

• Theorem 2– Given є > 0, the STAR algorithm with the

improvement-guarantee rule is a - approximation algorithm for the steiner tree problem. Its Running time is

Where n, m, N denote the number of Vertices, edges and terminals of the input graph.

Page 102: STAR: Steiner-Tree Approximation in Relationship Graphs

Approximate Top-K Interconnections

• Observing loose path weight is an upper bound for new interconnecting path weights

• No loose paths in the final tree T after improvements

• Top-K interconections are computed starting from the final tree T returned by original STAR

Page 103: STAR: Steiner-Tree Approximation in Relationship Graphs

Approximate Top-K Interconnections

• Lines 1-3 compute the original tree T

• T is enqueued in priority queue Q

• New trees generated by artificially relaxing current tree lps (Lines 4-9)

Page 104: STAR: Steiner-Tree Approximation in Relationship Graphs

Approximate Top-K Interconnections

• Relax(T, є )– Tunable value є >0 used to

artificially create loose path weights

– New weights used as upper bounds.

– Artificial Upper Bounds for New interconnecting paths between sub trees

Page 105: STAR: Steiner-Tree Approximation in Relationship Graphs

Approximate Top-K Interconnections

• improveTree’(T’, V’)– Replace(lp, T) calls

findShortestPath(V(T1), V (T2), lp)– findShortestPath(V(T1), V (T2), lp)

uses higher artificiall weights – New interconnecting paths are

not the same but still the shortest between T1, T2.

– Node disjoint to loose path new interconnecting paths considered.

– This gives us result diversity required for top-k

Original algorithm

Page 106: STAR: Steiner-Tree Approximation in Relationship Graphs

Approximate Top-K Interconnections

• reweight(T’)– Re-weights the result of

improveTree(T, V’) by: – Acting on loose paths of

T’ (also loose paths of T)

– Setting back W(T’) to its initial value before relaxation.

Page 107: STAR: Steiner-Tree Approximation in Relationship Graphs

Evaluation

• STAR Compared to most known Steiner Tree Approximation Algorithms: – DNH, DPBF, BLINKS, BANKS (both versions)

• Compared in terms of quality (avg. weight) and performance (avg. runtime)

• Semantic Quality or User perceived Relevance is not Considered

• An earlier work of them showed that: – A steiner tree based scoring function contribute to high

relevance from a users view point

Page 108: STAR: Steiner-Tree Approximation in Relationship Graphs

Evaluation - Algorithms in Comparison

• DNH (Distance Network Heuristic)– 2-approximation algorithm

• DPBF– Dynamic programming approach– Optimal Tree Can be computed (not an approximation)– Best on small number of terminals (Queries)

• BLINKS– Newest – Experimentally BEST in the field

• Banks I & II– Keyword proximity search on relational data

Page 109: STAR: Steiner-Tree Approximation in Relationship Graphs

Evaluation Types of Comparisons Performed

• Top-1 comparison of STAR, DNH DPBF, BANKS I & II

• Top-k comparison of STAR, BANKS I & II, BLINKS

• External Storage Comparison of STAR and BANKS

Page 110: STAR: Steiner-Tree Approximation in Relationship Graphs

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)

• Worst Case Theoretical properties of algorithms: – DNH, approximation ratio: 2(1- 1/n), n =|V’|

• Goal a good approximation ratio on given G, V’ – STAR, approximation ratio: 4logn + 4– BANKS I & II approximation ratio: O(n)– DPBF, approximation ratio: Does n’t nave (Optimal

Steiner Tree)• Used for comparison of all others to optimal tree weights

Page 111: STAR: Steiner-Tree Approximation in Relationship Graphs

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)

• Datasets – View DBLP and IMDB as Graphs

• Nodes entities: (author, publication, conference, actor, movie, year, etc.)

• Edges Relations: (cited by, author of, acted in, etc.).– Dataset DBLB: Sub graph of 15,000 nodes & 150,000 Edges.

• Due to DNH & DPBF constraints (perform on main memory only)– Dataset IMDB : Sub graph of 30,000 nodes & 80,000 Edges. – Two Different Datasets needed to tackle different Topologies – No edge weights present in both datasets -> randomly assigned– No taxonomic information present in both datasets

(Not a problem for STAR tackled in 1st Phase)

Page 112: STAR: Steiner-Tree Approximation in Relationship Graphs

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)

• Queries– Query sets of 3, 5 and 7 – Each set of 60 queries – Same number of terminals only– Randomly acquired terminals

Page 113: STAR: Steiner-Tree Approximation in Relationship Graphs

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)

• Metrics– Reference: Optimal Scores Returned by DPBF – Compare weight by STAR to weights by all others – Running times of all Algorithms comparison

Page 114: STAR: Steiner-Tree Approximation in Relationship Graphs

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)• Results

– Observe DPBF performance for all #terminals • Weight • Runtime

– Observe DPBF performance for 7 #terminals ????

Page 115: STAR: Steiner-Tree Approximation in Relationship Graphs

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)• DBLP Results

– For all –Terminals• STAR weight is better than all the others • STAR runtime outperforms all others • Even though DNH has a better Approximation Ratio

Page 116: STAR: Steiner-Tree Approximation in Relationship Graphs

Top-1 Comparisons (STAR, DNH, DPBF, BANKS I & II)• IMDB Results

– STAR weight is slightly not better than DNH– A hypothesis is; DBLP Higher Edge-To-Node Ratio– Banks II performance improved relative to

competitors ? – Still Outperformed by STAR

Page 117: STAR: Steiner-Tree Approximation in Relationship Graphs

Top-k Comparisons (STAR, BANKS I & II, BLINKS)• DNH can not compute Top-k results• BLINKS

– Uses indexing for Query time Speedup– Requires Entire Graph in Main Memory– Datasets are again used– Uses a partitioning strategy (Block Sizes of

Nodes)– Initially Tuned for better results

• DBLB: 100 node Block Size• IMDB: 5 node Block Size

Page 118: STAR: Steiner-Tree Approximation in Relationship Graphs

Top-k Comparisons (STAR, BANKS I & II, BLINKS)• Metrics

– BLINKS avg. weight is not applicable• Returns only Root nodes of result trees at output

• Queries– Comparison for k=10, k=50, k=100– DBLP & IMDB:

• 5 terminals• Random queries• 60 queries

Page 119: STAR: Steiner-Tree Approximation in Relationship Graphs

Top-k Comparisons (STAR, BANKS I & II, BLINKS)• Results

– Index construction Time by BLINKS excluded– BLINKS has the worst runtime though– BANKS II & BLINKS runtimes is worse on denser DBLP

Graph

Page 120: STAR: Steiner-Tree Approximation in Relationship Graphs

Top-k Comparisons (STAR, BANKS I & II, BLINKS)

• STAR performance explanation: – Uses only 2 iterators per

improvement step– Does not visit nodes of:

d> W(lp) – Tighter upper bounds for

pruning

Page 121: STAR: Steiner-Tree Approximation in Relationship Graphs

External Storage Comparison of STAR and BANKS

• STAR & BANKS direct applicability to Graphs NOT FITED to main memory

• Simulation of such a scenario – Disk Resident Datasets

• Dataset:– YAGO Knowledge Base

• ( Nodes: 1.7 Milion, Edges: 14 Milion)• Edge Weights supoted• Graph Stored in a Relational Database of Schema:

EDGE(source , target, weight)• Type and Subclass taxonomy (STAR 1st Phase) supported

– Database Call overhead uniformly treated on STAR & BANKS

Page 122: STAR: Steiner-Tree Approximation in Relationship Graphs

External Storage Comparison of STAR and BANKS

• STAR & BANKS direct applicability to Graphs NOT FITED to main memory

• Simulation of such a scenario:– Disk Resident Datasets

• Dataset:– YAGO Knowledge Base

• ( Nodes: 1.7 Milion, Edges: 14 Milion)• Edge Weights supoted• Type and Subclass taxonomy (STAR 1st Phase) supported

– Graph Stored in a Relational Database of Schema: EDGE(source , target, weight)

– Edge Exploration: Database access for each edge– overhead uniformly treated on both STAR & BANKS by edge loading.

Page 123: STAR: Steiner-Tree Approximation in Relationship Graphs

External Storage Comparison of STAR and BANKS

• Queries: – 2 sets, 3 and 6 Terminals– Top-1, Top-3, Top-6 results– Terminal nodes randomly chosen – 30 queries made

• Metrics: – Average Weight (quality of output Trees)– Efficiency (running times)– Number of edges accessed

Page 124: STAR: Steiner-Tree Approximation in Relationship Graphs

External Storage Comparison of STAR and BANKS

• Results: – BANKS I & II, some times 30 min to return results

• Excluded from Evaluation – fair enough– STAR Outperforms:

• an order of magnitude faster – STAR accesses an order of magnitude fewer edges

• Gain from taxonomic structure (1st Phase)

Page 125: STAR: Steiner-Tree Approximation in Relationship Graphs

Results Summary

• Fairness by Giving all algorithms the same inputs• Diversity of algorithms

– DNH only handles graphs in main memory– BLINKS: Indexing, different metric, luck of approximation

guarantee– Not Steiner-Tree-Like query methods

• STAR outstanding performance: – 1) Graph Taxonomic Structure when Possible – 2) Iterators needed per improvement step, Number of

Terminal Independence– 3) Tight upper bounds and path pruning

Page 126: STAR: Steiner-Tree Approximation in Relationship Graphs

Conclusion

• E-R Style data Graph Query Problem addressed• Inherent Taxonomic Structure Exploited• STAR Does not depend ONLY on Taxonomic

Information – 2nd Phase fast “findShortestPath” algorithm

• DNH Contradiction: – Better approximation rate while similar results as STAR

• STAR achieves a good approximation O(logn) , to Optimal Steiner Tree

Page 127: STAR: Steiner-Tree Approximation in Relationship Graphs

Thank YouFor

Your Attention