just a test document
TRANSCRIPT
-
8/7/2019 Just a test document
1/32
-
8/7/2019 Just a test document
2/32
Disclaimer
Results, statements, opinions in this talk do
not represent Cisco in anyway
This talk is about technical problems in
networking, and does not discuss moral,
legal and other issues related to P2P
networks and their applications
-
8/7/2019 Just a test document
3/32
Outline
Brief survey of P2P architectures
Evaluation methodologies Search methods
Replication strategies and analysis
Simulation results
-
8/7/2019 Just a test document
4/32
Characteristics of Peer-to-Peer
Networks Unregulated overlay network
Current application: file swapping
Dynamic: nodes join or leave frequently
Example systems:
Napster, Gnutella; Freenet, FreeHaven, MajoNation, Alpine, ...
JXTA, Ohaha,
Chord, CAN, Past, Tapestry,Oceanstore
-
8/7/2019 Just a test document
5/32
Architecture Comparisons Napster: centralized
A central website to hold file directory of all
participants; Very efficient
Scales
Problem: Single point of failure
Gnutella: decentralized No central directory; use flooding w/ TTL
Very resilient against failure
Problem: Doesnt scale
-
8/7/2019 Just a test document
6/32
Architecture Comparisons Various research projects such as CAN:
decentralized, but structured
CAN: distributed hash table
Structure: all nodes participate in a precise
scheme to maintain certain invariants
Extra work when nodes join and leave Scales very well, but can be fragile
-
8/7/2019 Just a test document
7/32
Architecture Comparisons FreeNet: decentralized, but semi-structured
Intended for file storage
Files are stored along a route biased by hints
Queries for files follow a route biased by thesame hints
Scales very well
Problem: would it really work?
Simulation says yes in most cases, but no proof sofar
-
8/7/2019 Just a test document
8/32
Our Focus: Gnutella-Style
Systems Advantages of Gnutella:
Support more flexible queries
Typically, precise name search is a small portionof all queries
Simplicity, high resilience against node failures
Problems of Gnutella: Scalability
Bottleneck: interrupt rates on individual nodes
Self-limiting network: nodes have to exit to getreal work done!
-
8/7/2019 Just a test document
9/32
Evaluation MethodologiesSimulation based:
Network topology
Distribution of object popularity
Distribution of replication density of objects
-
8/7/2019 Just a test document
10/32
Evaluation Methods Network topologies:
Uniform Random Graph (Random)
Average and median node degree is 4
Power-Law Random Graph (PLRG)
max node degree: 1746, median: 1, average: 4.46
Gnutella network snapshot (Gnutella) Oct 2000 snapshot
max degree: 136, median: 2, average: 5.5
Two-dimensional grid (Grid)
-
8/7/2019 Just a test document
11/32
Modeling Methods Object popularity distribution pi
Uniform
Zipf-like
Object replication density distribution ri Uniform
Proportional: ri w pi Square-Root: ri w pi
-
8/7/2019 Just a test document
12/32
Evaluation Metrics Overhead: average # of messages per node
per query
Probability of search success: Pr(success)
Delay: # of hops till success
-
8/7/2019 Just a test document
13/32
Load on Individual Nodes Why is a node interrupted:
To process a query
To route the query to other nodes
To process duplicated queries sent to it
-
8/7/2019 Just a test document
14/32
Duplication in Flooding-Based
Searches
. . . . . . . . . .
Duplication increases as TTL increases in
flooding
Worst case: a node A is interrrupted by N *
q * degree(A) messages
1
2 3 4
5 6 7 8
-
8/7/2019 Just a test document
15/32
Duplications in Various Network
Topologies
Flooding: % duplicate msgs vs TTL
0
20
40
60
80
100
2 3 4 5 6 7 8 9
TTL
duplicatemsgs(%)
Random
PLRG
Gnutella
Grid
-
8/7/2019 Just a test document
16/32
Relationship between TTL and
Search Successes
Flooding: Pr(success) vs TTL
0
20
40
60
80
100
120
2 3 4 5 6 7 8 9
TTL
Pr(success)%
Random
PLRG
Gnutella
Grid
-
8/7/2019 Just a test document
17/32
Problems with Simple TTL-
Based Flooding Hard to choose TTL:
For objects that are widely present in the
network, small TTLs suffice
For objects that are rare in the network, large
TTLs are necessary
Number of query messages growexponentially as TTL grows
-
8/7/2019 Just a test document
18/32
Idea #1: Adaptively Adjust TTL Expanding Ring
Multiple floods: start with TTL=1; increment
TTL by 2 each time until search succeeds
Success varies by network topology
For Random, 30- to 70- fold reduction in
message traffic For Power-law and Gnutella graphs, only
3- to 9- fold reduction
-
8/7/2019 Just a test document
19/32
Limitations of Expanding RingF ood ng #nod t d L
0
2000
4000
6000
8000
10000
12000
2 3 4 5 6 7 8 9
TTL
#nod
td
Random
PLRG
Gnutella
Grid
-
8/7/2019 Just a test document
20/32
Idea #2: Random Walk Simple random walk
takes too long to find anything!
Multiple-walker random walk
N agents after each walking T steps visits as
many nodes as 1 agent walking N*T steps
When to terminate the search: check back withthe query originator once every C steps
-
8/7/2019 Just a test document
21/32
Search Traffic Comparisonavg. # msgs per node per query
1.863
2.85
0.053
0.961
0.027 0.0310
0.5
1
1.5
2
2.5
3
Random Gnutella
Flood Ring Walk
-
8/7/2019 Just a test document
22/32
SearchD
elay Comparison# hops till success
2.51 2.39
4.033.4
9.12
7.3
0
2
4
6
8
10
Random Gnutella
Flood Ring Walk
-
8/7/2019 Just a test document
23/32
Lessons Learnt about Search
Methods Adaptive termination
Minimize message duplication
Small expansion in each step
-
8/7/2019 Just a test document
24/32
Flexible Replication In unstructured systems, search success is
essentially about coverage: visiting enough nodes
to probabilistically find the object => replicationdensity matters
Limited node storage => whats the optimal
replication density distribution?
In Gnutella, only nodes who query an object store it =>
ri w pi What if we have different replication strategies?
-
8/7/2019 Just a test document
25/32
Optimal ri
Distribution
Goal: minimize 7( pi/ ri ), where 7 ri =R
Calculation:
introduce Lagrange multiplierP, find ri and P
that minimize:
7( pi/ ri ) + P * (7 ri - R)
=> P - pi/ ri2 = 0 for all i
=> ri w pi
-
8/7/2019 Just a test document
26/32
Square-RootD
istribution General principle: to minimize 7( pi/ ri )
under constraint 7 ri =R, make ripropotional to square root ofpi
Other application examples:
Bandwidth allocation to minimize expected
download times Server load balancing to minimize expected
request latency
-
8/7/2019 Just a test document
27/32
Achieving Square-Root
Distribution Suggestions from some heuristics
Store an object at a number of nodes that isproportional to the number of node visited in order tofind the object
Each node uses random replacement
Two implementations:
Path replication: store the object along the path of a
successful walk
Random replication: store the object randomly amongnodes visited by the agents
-
8/7/2019 Just a test document
28/32
Evaluation of Replication
Methods Metrics
Overall message traffic
Search delay
Dynamic simulation
Assume Zipf-like object query probability
5 query/sec Poisson arrival
Results are during 5000sec-9000sec
-
8/7/2019 Just a test document
29/32
Distribution of ri
Replication Dist ibution: Path Replication
0.001
0.01
0.1
1
1 10 100
object rank
re
plicationratio
(normalized)
r l r s l
sq r r
-
8/7/2019 Just a test document
30/32
Total Search Message
Comparison
Observation: path replication is slightly
inferior to random replication
Avg. # msgs per node (5000-9000sec)
0
10000
20000
30000
40000
50000
60000
Owner Rep
Pat Rep
Random Rep
-
8/7/2019 Just a test document
31/32
SearchD
elay ComparisonDynamic simulation: Hop Distribution
(5000~9000s)
0
20
40
60
80
100
120
1 2 4 8 16 32 64 128 256
#hops
queriesfinished(%)
Owner Replication
Path Replication
Random Replication
-
8/7/2019 Just a test document
32/32
Summary Multi-walker random walk scales much
better than flooding
It wont scale as perfectly as structured
network, but current unstructured network can
be improved significantly
Square-root replication distribution isdesirable and can be achieved via path
replication