university of jyvÄskylÄ optimal resource discovery paths of gnutella2 the ieee 22nd international...

13
UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications (AINA 2008) 27.3.2008 Mikko Vapa, research student P2P Computing Group Department of Mathematical Information Technology www.mit.jyu.fi/ cheesefactory

Upload: julian-rankin

Post on 26-Mar-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications

UNIVERSITY OF JYVÄSKYLÄ

Optimal Resource Discovery Paths of Gnutella2The IEEE 22nd International Conference on Advanced Information Networking and Applications (AINA 2008)27.3.2008

Mikko Vapa, research studentP2P Computing Group

Department of Mathematical Information Technologywww.mit.jyu.fi/cheesefactory

Page 2: UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications

UNIVERSITY OF JYVÄSKYLÄ

Resource Discovery Problem

• In peer-to-peer (P2P) resource discovery problem any node in the network can possess resources and also query these resources from other nodes

Node1: Where is ?

Node 1

Node 2

Node 3

Node 4

Page 3: UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications

UNIVERSITY OF JYVÄSKYLÄ

A Simple Solution for the Problem

• The most studied P2P network, Gnutella, for example used Breadth-First Search (BFS) flooding algorithm which sends query to all neighbors

• Problems: all resources in the network can be found, but network gets congested and there are lots of useless packets

Node 1: Where is ?

Node 1

Node 2

Node 3

Node 4

Query

QueryQuery

Query

Query

Query

Node 4: I have it!

Node 2: I have it!Node 4: Node 4 has it too!Reply

Reply

Page 4: UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications

UNIVERSITY OF JYVÄSKYLÄ

Steiner Minimum Tree Problem

• Optimal paths for resource discovery can be found by using non-distributed algorithm which requires global knowledge of topology and resources

• Precisely, this problem can be formulated as a task of finding a Steiner Minimum Tree (SMT) from a graph:

Page 5: UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications

UNIVERSITY OF JYVÄSKYLÄ

Steiner Minimum Tree Problem

• V = {Node 1, Node 2, Node 3, Node 4}• R = {Node 1, Node 2, Node 4}• min T = ({Node 1, Node 2, Node 4}, {1-2, 2-4})• min w(T) = 2

Node 1: Where is ?

Node 1

Node 2

Node 3

Node 4

Query

Query

Node 4: I have it!

Node 2: I have it!Node 4: Node 4 has it too!Reply

Reply

Page 6: UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications

UNIVERSITY OF JYVÄSKYLÄ

Rooted k-Steiner Minimum Tree Problem

• SMT locates all resources in the network, but if only k instances of the matching resources need to be found the problem becomes k-Steiner Minimum Tree problem

• Also the problem is rooted to define which node starts the query

Page 7: UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications

UNIVERSITY OF JYVÄSKYLÄ

MST k-Steiner Minimum Tree Algorithm

• MST k-Steiner Minimum Tree Algorithm was developed to find an approximation solution: Algorithm: MST k-Steiner Minimum Tree

Input: A connected graph G = (V,E), a terminal set VR , a root vertex Rr and

||2 Rk

Output: A Steiner tree T for R in G rooted to the vertex r containing k terminal vertices.

(1) Add one node to the graph G and connect it to all terminal nodes contained in R with an edge having cost 0. The result is denoted as graph GV.

(2) Replace GV with the minimum spanning tree of GV.

(3) Compute the shortest path between two terminal nodes by iterating all edges of E in G and constructing the corresponding triplets. Transform the resulting triplets to graph GR.

(4) Compute a k-minimum spanning tree approximation TR from GR rooted to the vertex r and containing k vertices of R.

(5) Transform TR into subtree T of G by replacing each edge of TR by the corresponding shortest path.

Page 8: UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications

UNIVERSITY OF JYVÄSKYLÄ

MST k-Steiner Minimum Tree Algorithm

EEO log

Time Complexity:

whereE = number ofedges in a graph G

Worst-CaseApproximation Ratio:

2

R

whereR = availableresources

1

m2

r1 r2

r5

r4r3

m1

7

13

1

6

1

31

1m3

r4

r3

r1 r2

r5

1

m2

m1

7

13

1

6

1

31

1m3

0

0

0

00

r4r3

r1 r2

r5m2

m1

11

1m3

0

0

0

00

Graph G Graph GV after step (1) Graph GV after step (2)

r5

r1 r2

r4r3

7

15

1

6

5

r5

r1

r4r3

5

1

5

r1

r5

r4r3

m13

1

31

1m3

Graph GR after step (3) Tree TR after step (4) Tree T after step (5)

Page 9: UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications

UNIVERSITY OF JYVÄSKYLÄ

Simulation Scenarios

Scenario PL10000 N10000 Gnutella2

Distribution Power-Law Normal -

Nodes 10000 10000 74297

Edges 19997 19997 609036

Largest hub 161 11 360

Resources 1000 1000 10

Res. instances 39994 39994 43216

Queries 100 100 100

Diameter 8 10 12

Page 10: UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications

UNIVERSITY OF JYVÄSKYLÄ

Query Packets for Gnutella2with ~75000 nodes• MST k-Steiner Minimum Tree algorithm shows that current local search

algorithms for peer-to-peer networks are far from optimal paths

1

10

100

1000

10000

100000

1000000

0,0 20,0 40,0 60,0 80,0 100,0

% of Resources

Pac

kets

/ q

uer

y

DQP BFS HDSRWSA k-Steiner k

Page 11: UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications

UNIVERSITY OF JYVÄSKYLÄ

Hops for Gnutella2 with ~75000 nodes

• MST k-Steiner does not use the shortest paths to locate resources

0

10

20

30

40

50

60

70

0,0 20,0 40,0 60,0 80,0 100,0

% of Resources

Ho

ps

k-Steiner DQP BFS

Page 12: UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications

UNIVERSITY OF JYVÄSKYLÄHighest Degree Search

K-Steiner Minimum Tree

K-Steiner Tree Algorithm locates9 resource instances with 11 query packets. For this querythe approximated solutionis also the optimal solution.HDS uses almost twice as muchquery packets for this query.

Page 13: UNIVERSITY OF JYVÄSKYLÄ Optimal Resource Discovery Paths of Gnutella2 The IEEE 22nd International Conference on Advanced Information Networking and Applications

UNIVERSITY OF JYVÄSKYLÄ

Future Work

• Conducting an extensive survey of related work in graph theory for k-Steiner Minimum Trees and modifying the problem to support multiple resource instances on a same node (Prize Collecting Steiner Tree problem with Quota)

• What makes the resource discovery problem hard in P2P networks is that only local information is available– It would be interesting to know how close to the optimum can

algorithms get using local knowledge• A record of the global network topology is used in Open Shortest Path

First IP routing protocol and Dijkstra’s algorithm for computing the shortest paths– It might be possible that MST k-Steiner tree algorithm can be

adapted to P2P networks– In this case, information about the resources needs to be at least

partially cached in the nodes