just a test document

8/7/2019 Just a test document

1/32


2/32

Disclaimer

Results, statements, opinions in this talk do

not represent Cisco in anyway

This talk is about technical problems in

networking, and does not discuss moral,

legal and other issues related to P2P

networks and their applications


3/32

Outline

Brief survey of P2P architectures

Evaluation methodologies Search methods

Replication strategies and analysis

Simulation results


4/32

Characteristics of Peer-to-Peer

Networks Unregulated overlay network

Current application: file swapping

Dynamic: nodes join or leave frequently

Example systems:

Napster, Gnutella; Freenet, FreeHaven, MajoNation, Alpine, ...

JXTA, Ohaha,

Chord, CAN, Past, Tapestry,Oceanstore


5/32

Architecture Comparisons Napster: centralized

A central website to hold file directory of all

participants; Very efficient

Scales

Problem: Single point of failure

Gnutella: decentralized No central directory; use flooding w/ TTL

Very resilient against failure

Problem: Doesnt scale


6/32

Architecture Comparisons Various research projects such as CAN:

decentralized, but structured

CAN: distributed hash table

Structure: all nodes participate in a precise

scheme to maintain certain invariants

Extra work when nodes join and leave Scales very well, but can be fragile


7/32

Architecture Comparisons FreeNet: decentralized, but semi-structured

Intended for file storage

Files are stored along a route biased by hints

Queries for files follow a route biased by thesame hints

Scales very well

Problem: would it really work?

Simulation says yes in most cases, but no proof sofar


8/32

Our Focus: Gnutella-Style

Systems Advantages of Gnutella:

Support more flexible queries

Typically, precise name search is a small portionof all queries

Simplicity, high resilience against node failures

Problems of Gnutella: Scalability

Bottleneck: interrupt rates on individual nodes

Self-limiting network: nodes have to exit to getreal work done!


9/32

Evaluation MethodologiesSimulation based:

Network topology

Distribution of object popularity

Distribution of replication density of objects


10/32

Evaluation Methods Network topologies:

Uniform Random Graph (Random)

Average and median node degree is 4

Power-Law Random Graph (PLRG)

max node degree: 1746, median: 1, average: 4.46

Gnutella network snapshot (Gnutella) Oct 2000 snapshot

max degree: 136, median: 2, average: 5.5

Two-dimensional grid (Grid)


11/32

Modeling Methods Object popularity distribution pi

Uniform

Zipf-like

Object replication density distribution ri Uniform

Proportional: ri w pi Square-Root: ri w pi


12/32

Evaluation Metrics Overhead: average # of messages per node

per query

Probability of search success: Pr(success)

Delay: # of hops till success


13/32

Load on Individual Nodes Why is a node interrupted:

To process a query

To route the query to other nodes

To process duplicated queries sent to it


14/32

Duplication in Flooding-Based

Searches

. . . . . . . . . .

Duplication increases as TTL increases in

flooding

Worst case: a node A is interrrupted by N *

q * degree(A) messages

1

2 3 4

5 6 7 8


15/32

Duplications in Various Network

Topologies

Flooding: % duplicate msgs vs TTL

0

20

40

60

80

100

2 3 4 5 6 7 8 9

TTL

duplicatemsgs(%)

Random

PLRG

Gnutella

Grid


16/32

Relationship between TTL and

Search Successes

Flooding: Pr(success) vs TTL

0

20

40

60

80

100

120

2 3 4 5 6 7 8 9

TTL

Pr(success)%

Random

PLRG

Gnutella

Grid


17/32

Problems with Simple TTL-

Based Flooding Hard to choose TTL:

For objects that are widely present in the

network, small TTLs suffice

For objects that are rare in the network, large

TTLs are necessary

Number of query messages growexponentially as TTL grows


18/32

Idea #1: Adaptively Adjust TTL Expanding Ring

Multiple floods: start with TTL=1; increment

TTL by 2 each time until search succeeds

Success varies by network topology

For Random, 30- to 70- fold reduction in

message traffic For Power-law and Gnutella graphs, only

3- to 9- fold reduction


19/32

Limitations of Expanding RingF ood ng #nod t d L

0

2000

4000

6000

8000

10000

12000

2 3 4 5 6 7 8 9

TTL

#nod

td

Random

PLRG

Gnutella

Grid


20/32

Idea #2: Random Walk Simple random walk

takes too long to find anything!

Multiple-walker random walk

N agents after each walking T steps visits as

many nodes as 1 agent walking N*T steps

When to terminate the search: check back withthe query originator once every C steps


21/32

Search Traffic Comparisonavg. # msgs per node per query

1.863

2.85

0.053

0.961

0.027 0.0310

0.5

1

1.5

2

2.5

3

Random Gnutella

Flood Ring Walk


22/32

SearchD

elay Comparison# hops till success

2.51 2.39

4.033.4

9.12

7.3

0

2

4

6

8

10

Random Gnutella

Flood Ring Walk


23/32

Lessons Learnt about Search

Methods Adaptive termination

Minimize message duplication

Small expansion in each step


24/32

Flexible Replication In unstructured systems, search success is

essentially about coverage: visiting enough nodes

to probabilistically find the object => replicationdensity matters

Limited node storage => whats the optimal

replication density distribution?

In Gnutella, only nodes who query an object store it =>

ri w pi What if we have different replication strategies?


25/32

Optimal ri

Distribution

Goal: minimize 7( pi/ ri ), where 7 ri =R

Calculation:

introduce Lagrange multiplierP, find ri and P

that minimize:

7( pi/ ri ) + P * (7 ri - R)

=> P - pi/ ri2 = 0 for all i

=> ri w pi


26/32

Square-RootD

istribution General principle: to minimize 7( pi/ ri )

under constraint 7 ri =R, make ripropotional to square root ofpi

Other application examples:

Bandwidth allocation to minimize expected

download times Server load balancing to minimize expected

request latency


27/32

Achieving Square-Root

Distribution Suggestions from some heuristics

Store an object at a number of nodes that isproportional to the number of node visited in order tofind the object

Each node uses random replacement

Two implementations:

Path replication: store the object along the path of a

successful walk

Random replication: store the object randomly amongnodes visited by the agents


28/32

Evaluation of Replication

Methods Metrics

Overall message traffic

Search delay

Dynamic simulation

Assume Zipf-like object query probability

5 query/sec Poisson arrival

Results are during 5000sec-9000sec


29/32

Distribution of ri

Replication Dist ibution: Path Replication

0.001

0.01

0.1

1

1 10 100

object rank

re

plicationratio

(normalized)

r l r s l

sq r r


30/32

Total Search Message

Comparison

Observation: path replication is slightly

inferior to random replication

Avg. # msgs per node (5000-9000sec)

0

10000

20000

30000

40000

50000

60000

Owner Rep

Pat Rep

Random Rep


31/32

SearchD

elay ComparisonDynamic simulation: Hop Distribution

(5000~9000s)

0

20

40

60

80

100

120

1 2 4 8 16 32 64 128 256

#hops

queriesfinished(%)

Owner Replication

Path Replication

Random Replication


32/32

Summary Multi-walker random walk scales much

better than flooding

It wont scale as perfectly as structured

network, but current unstructured network can

be improved significantly

Square-root replication distribution isdesirable and can be achieved via path

replication

just a test document

Documents