just a test document

Upload: hasrulmaruf

Post on 08-Apr-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/7/2019 Just a test document

    1/32

  • 8/7/2019 Just a test document

    2/32

    Disclaimer

    Results, statements, opinions in this talk do

    not represent Cisco in anyway

    This talk is about technical problems in

    networking, and does not discuss moral,

    legal and other issues related to P2P

    networks and their applications

  • 8/7/2019 Just a test document

    3/32

    Outline

    Brief survey of P2P architectures

    Evaluation methodologies Search methods

    Replication strategies and analysis

    Simulation results

  • 8/7/2019 Just a test document

    4/32

    Characteristics of Peer-to-Peer

    Networks Unregulated overlay network

    Current application: file swapping

    Dynamic: nodes join or leave frequently

    Example systems:

    Napster, Gnutella; Freenet, FreeHaven, MajoNation, Alpine, ...

    JXTA, Ohaha,

    Chord, CAN, Past, Tapestry,Oceanstore

  • 8/7/2019 Just a test document

    5/32

    Architecture Comparisons Napster: centralized

    A central website to hold file directory of all

    participants; Very efficient

    Scales

    Problem: Single point of failure

    Gnutella: decentralized No central directory; use flooding w/ TTL

    Very resilient against failure

    Problem: Doesnt scale

  • 8/7/2019 Just a test document

    6/32

    Architecture Comparisons Various research projects such as CAN:

    decentralized, but structured

    CAN: distributed hash table

    Structure: all nodes participate in a precise

    scheme to maintain certain invariants

    Extra work when nodes join and leave Scales very well, but can be fragile

  • 8/7/2019 Just a test document

    7/32

    Architecture Comparisons FreeNet: decentralized, but semi-structured

    Intended for file storage

    Files are stored along a route biased by hints

    Queries for files follow a route biased by thesame hints

    Scales very well

    Problem: would it really work?

    Simulation says yes in most cases, but no proof sofar

  • 8/7/2019 Just a test document

    8/32

    Our Focus: Gnutella-Style

    Systems Advantages of Gnutella:

    Support more flexible queries

    Typically, precise name search is a small portionof all queries

    Simplicity, high resilience against node failures

    Problems of Gnutella: Scalability

    Bottleneck: interrupt rates on individual nodes

    Self-limiting network: nodes have to exit to getreal work done!

  • 8/7/2019 Just a test document

    9/32

    Evaluation MethodologiesSimulation based:

    Network topology

    Distribution of object popularity

    Distribution of replication density of objects

  • 8/7/2019 Just a test document

    10/32

    Evaluation Methods Network topologies:

    Uniform Random Graph (Random)

    Average and median node degree is 4

    Power-Law Random Graph (PLRG)

    max node degree: 1746, median: 1, average: 4.46

    Gnutella network snapshot (Gnutella) Oct 2000 snapshot

    max degree: 136, median: 2, average: 5.5

    Two-dimensional grid (Grid)

  • 8/7/2019 Just a test document

    11/32

    Modeling Methods Object popularity distribution pi

    Uniform

    Zipf-like

    Object replication density distribution ri Uniform

    Proportional: ri w pi Square-Root: ri w pi

  • 8/7/2019 Just a test document

    12/32

    Evaluation Metrics Overhead: average # of messages per node

    per query

    Probability of search success: Pr(success)

    Delay: # of hops till success

  • 8/7/2019 Just a test document

    13/32

    Load on Individual Nodes Why is a node interrupted:

    To process a query

    To route the query to other nodes

    To process duplicated queries sent to it

  • 8/7/2019 Just a test document

    14/32

    Duplication in Flooding-Based

    Searches

    . . . . . . . . . .

    Duplication increases as TTL increases in

    flooding

    Worst case: a node A is interrrupted by N *

    q * degree(A) messages

    1

    2 3 4

    5 6 7 8

  • 8/7/2019 Just a test document

    15/32

    Duplications in Various Network

    Topologies

    Flooding: % duplicate msgs vs TTL

    0

    20

    40

    60

    80

    100

    2 3 4 5 6 7 8 9

    TTL

    duplicatemsgs(%)

    Random

    PLRG

    Gnutella

    Grid

  • 8/7/2019 Just a test document

    16/32

    Relationship between TTL and

    Search Successes

    Flooding: Pr(success) vs TTL

    0

    20

    40

    60

    80

    100

    120

    2 3 4 5 6 7 8 9

    TTL

    Pr(success)%

    Random

    PLRG

    Gnutella

    Grid

  • 8/7/2019 Just a test document

    17/32

    Problems with Simple TTL-

    Based Flooding Hard to choose TTL:

    For objects that are widely present in the

    network, small TTLs suffice

    For objects that are rare in the network, large

    TTLs are necessary

    Number of query messages growexponentially as TTL grows

  • 8/7/2019 Just a test document

    18/32

    Idea #1: Adaptively Adjust TTL Expanding Ring

    Multiple floods: start with TTL=1; increment

    TTL by 2 each time until search succeeds

    Success varies by network topology

    For Random, 30- to 70- fold reduction in

    message traffic For Power-law and Gnutella graphs, only

    3- to 9- fold reduction

  • 8/7/2019 Just a test document

    19/32

    Limitations of Expanding RingF ood ng #nod t d L

    0

    2000

    4000

    6000

    8000

    10000

    12000

    2 3 4 5 6 7 8 9

    TTL

    #nod

    td

    Random

    PLRG

    Gnutella

    Grid

  • 8/7/2019 Just a test document

    20/32

    Idea #2: Random Walk Simple random walk

    takes too long to find anything!

    Multiple-walker random walk

    N agents after each walking T steps visits as

    many nodes as 1 agent walking N*T steps

    When to terminate the search: check back withthe query originator once every C steps

  • 8/7/2019 Just a test document

    21/32

    Search Traffic Comparisonavg. # msgs per node per query

    1.863

    2.85

    0.053

    0.961

    0.027 0.0310

    0.5

    1

    1.5

    2

    2.5

    3

    Random Gnutella

    Flood Ring Walk

  • 8/7/2019 Just a test document

    22/32

    SearchD

    elay Comparison# hops till success

    2.51 2.39

    4.033.4

    9.12

    7.3

    0

    2

    4

    6

    8

    10

    Random Gnutella

    Flood Ring Walk

  • 8/7/2019 Just a test document

    23/32

    Lessons Learnt about Search

    Methods Adaptive termination

    Minimize message duplication

    Small expansion in each step

  • 8/7/2019 Just a test document

    24/32

    Flexible Replication In unstructured systems, search success is

    essentially about coverage: visiting enough nodes

    to probabilistically find the object => replicationdensity matters

    Limited node storage => whats the optimal

    replication density distribution?

    In Gnutella, only nodes who query an object store it =>

    ri w pi What if we have different replication strategies?

  • 8/7/2019 Just a test document

    25/32

    Optimal ri

    Distribution

    Goal: minimize 7( pi/ ri ), where 7 ri =R

    Calculation:

    introduce Lagrange multiplierP, find ri and P

    that minimize:

    7( pi/ ri ) + P * (7 ri - R)

    => P - pi/ ri2 = 0 for all i

    => ri w pi

  • 8/7/2019 Just a test document

    26/32

    Square-RootD

    istribution General principle: to minimize 7( pi/ ri )

    under constraint 7 ri =R, make ripropotional to square root ofpi

    Other application examples:

    Bandwidth allocation to minimize expected

    download times Server load balancing to minimize expected

    request latency

  • 8/7/2019 Just a test document

    27/32

    Achieving Square-Root

    Distribution Suggestions from some heuristics

    Store an object at a number of nodes that isproportional to the number of node visited in order tofind the object

    Each node uses random replacement

    Two implementations:

    Path replication: store the object along the path of a

    successful walk

    Random replication: store the object randomly amongnodes visited by the agents

  • 8/7/2019 Just a test document

    28/32

    Evaluation of Replication

    Methods Metrics

    Overall message traffic

    Search delay

    Dynamic simulation

    Assume Zipf-like object query probability

    5 query/sec Poisson arrival

    Results are during 5000sec-9000sec

  • 8/7/2019 Just a test document

    29/32

    Distribution of ri

    Replication Dist ibution: Path Replication

    0.001

    0.01

    0.1

    1

    1 10 100

    object rank

    re

    plicationratio

    (normalized)

    r l r s l

    sq r r

  • 8/7/2019 Just a test document

    30/32

    Total Search Message

    Comparison

    Observation: path replication is slightly

    inferior to random replication

    Avg. # msgs per node (5000-9000sec)

    0

    10000

    20000

    30000

    40000

    50000

    60000

    Owner Rep

    Pat Rep

    Random Rep

  • 8/7/2019 Just a test document

    31/32

    SearchD

    elay ComparisonDynamic simulation: Hop Distribution

    (5000~9000s)

    0

    20

    40

    60

    80

    100

    120

    1 2 4 8 16 32 64 128 256

    #hops

    queriesfinished(%)

    Owner Replication

    Path Replication

    Random Replication

  • 8/7/2019 Just a test document

    32/32

    Summary Multi-walker random walk scales much

    better than flooding

    It wont scale as perfectly as structured

    network, but current unstructured network can

    be improved significantly

    Square-root replication distribution isdesirable and can be achieved via path

    replication