christos gkantsidis, milena mihail, amin saberi presented by paul bogdan february 28 th , 2007

43
1 “Hybrid Search Schemes for Unstructured Peer-to-Peer Networks” “Random Walks in Peer-to-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan

Upload: yin

Post on 20-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

“Hybrid Search Schemes for Unstructured Peer-to-Peer Networks” “Random Walks in Peer-to-Peer Networks”. Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007. “Hybrid Search Schemes for Unstructured Peer-to-Peer Networks”. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

1

“Hybrid Search Schemes for Unstructured Peer-to-Peer Networks”

“Random Walks in Peer-to-Peer Networks”

Christos Gkantsidis, Milena Mihail, Amin Saberi

Presented by Paul Bogdan

February 28th, 2007

Page 2: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

2

“Hybrid Search Schemes for Unstructured Peer-to-Peer Networks”

Christos Gkantsidis, Milena Mihail, Amin Saberi

Page 3: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

3

Outline

• Random Graph Models

• Flooding and Normalization

• Random Walks and Replication

• Generalized Search Schemes

• Experimental evaluation

Page 4: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

4

Motivation• Flooding + small time-to-live (TTL) performs well in regular graphs

• Performance metric: number of exchanged messages/distinct response• Its performance decreases: when TTL increases or for irregular networks

• Random Walk performs better than flooding• scalability, granularity

• Hybrid + Generalized search schemes: • Random Walks with lookahead, Random Walks with 1-step replication

Page 5: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

5

Contribution• Random walks (RW) with shallow flooding offer

good performance (analytic justification)R1: In a random graph model with O(n) nodes of constant degree and O(n1/2) nodes of degree O(n1/2) the expected time to discover Ω(n) is O(n1/2).R2: Random Walks with look-ahead 1 or 1-step replication perform better when there is discrepancy on the degrees of the underlying topology.

• Normalized Flooding (NF) solutionR3: NF achieves comparable performance to flooding in regular graphs. R4: NF with 1-step replication achieves performance comparable to RW with 1-step replication. R5: Local information of the network (nodes degree) offers global benefit.

• Generalized Search Schemes

Page 6: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

6

Random Graph Models

• Random Regular Graphs – Gn,d

Gn,d represents a graph with n nodes and each node is of degree d.

Gn,d has a sum of degree D = nd .

• Random Graphs with super-nodes - Gn,d,α,β

Given α and β constants, Gn,d,α,β denotes a graphs with αn1/2 of degree βn1/2 (i.e. large vertices) and the remaining nodes of degree d (i.e. small vertices).

Gn,d,α,β has a sum of degree D = (αβ+d)n.

Page 7: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

7

Flooding and Normalization• Theorem 3.1.: Let us consider Gn,d random regular graph, flooding scenario

from node v with time-to-live τ, S – the number of distinct nodes queried by flooding with |S| ≤ |V| / 2

Claims:

(1)

(2)

(3)

d

-Od-

τ-d-S)(d

121

1

111log2

log

least at is message /responsesdistinct of number the

and is responsesdistinct of number the For

11

1

121

1

122

1

d

S

d-O

d-

τε

d-OSεVεSS,

least at is message / responsesdistinct of number the

and 411 is responsesdistinct of number theany For

2V

S , S

s. a. least at is message /responsesdistinct of number the

is responses of number the , ,any For

Page 8: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

8

(1)• Proof:

d

-Od-

τ-d-S)(d

121

1

111log2

log

least at is message per responsesdistinct of number the

and is responsesdistinct of number the For

dO

d

vS

dd

vS

d-OSG

dd

dd

dvSndiv

nOdvSnndiv

vS

n,d

i

i

i

i

i

i

i

12

1

1

vS1

1

1

vS1

vS is messageper

responsesdistinct ofnumber theand S1

1 have wegraph random aFor

12

111vS is TTL with received responsesdistinct ofnumber The

1 have we1 with allfor and verticesallfor Similarly,

1y probabilit with 1 ,log1 with allfor and verticesallFor

vSvS is TTL with received responsesdistinct ofnumber The

1

1

1

11

1-

0i

11-

0ii

2

1

22

1

11

Page 9: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

9

(2)• Proof:

dO

d

vS

dd

vS

d-OSG

dd

dd

dvSndiv

nOdvSnndiv

vS

n,d

i

i

i

i

i

i

i

12

1

1

vS1

1

1

vS1

vS is messageper

responsesdistinct ofnumber theand S1

1 have wegraph random aFor

12

111vS is TTL with received responsesdistinct ofnumber The

1 have we1 with allfor and verticesallfor Similarly,

1y probabilit with 1 ,log1 with allfor and verticesallFor

vSvS is TTL with received responsesdistinct ofnumber The

1

1

1

11

1-

0i

11-

0ii

2

1

22

1

11

surely almost least at is message per responsesdistinct of number the

is responsesdistinct of number the , ,any For

d-O

d-

τε

d-OSεVεSS,

121

1

122

1

Page 10: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

10

2/ ,4/

,1

1)(

VSVSd

VSSdd

OS

2/ ,4/

,1

1)(

VSVS

VSSd

OS

Page 11: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

11

(3)

• Proof:

dO

d

vS

dd

vS

d-OSG

dd

dd

dvSndiv

nOdvSnndiv

vS

n,d

i

i

i

i

i

i

i

12

1

1

vS1

1

1

vS1

vS is messageper

responsesdistinct ofnumber theand S1

1 have wegraph random aFor

12

111vS is TTL with received responsesdistinct ofnumber The

1 have we1 with allfor and verticesallfor Similarly,

1y probabilit with 1 ,log1 with allfor and verticesallFor

vSvS is TTL with received responsesdistinct ofnumber The

1

1

1

11

1-

0i

11-

0ii

2

1

22

1

11

11

1

d

S

least at is message per responsesdistinct of number the

and 411 is responsesdistinct of number theany For

2V

S , S

Page 12: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

12

Flooding and Normalization• Theorem 3.2.: Let Gn,d,α,β be a random graph with supernodes and a flooding

scenario from node v of degree d with time-to-live τ.Claim: For some τ = O(log log n), the number of distinct responses is Ω(n).Proof: Consider flooding with τ = c logd-1(log n)+1 and vertices visited with TTL τ-1.

Assumption: this set (of visited nodes) doesn’t contain a large degree vertex.

From d-regular graphs we know that this set contains at least (d - 1)τ-1 edges.

The probability that no vertex in Γ(Sτ-1(v)) is bounded by (d/(d+αβ))(d - 1)^(τ-1) = (d/(d+αβ))clog n so within the first O(loglog n) steps we see a large vertex.

Page 13: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

13

Flooding and Normalization• Theorem 3.3. : Let Gn,d,α,β be a random graph with supernodes, a normalized

flooding scenario from node v with TTL . Then the number of distinct responses is Ω((d - 1)τ-1) and the number of messages per response is O(1).

Proof:

From Theorem 3.1. the number of minigroups seen is (d - 1)τ-1 The expected number of small vertices is Q = (d *(d - 1)τ-1)/(d+αβ)

Let Xi, i = 1,…,N be random variables with P[ Xi=1]=pi and P[Xi=0]=1-pi

Using the above Chernoff bound the probability that less than Q/2 are seen is vanishingly small.

1log2

log

d

n

3

32

1 1

22

1 1

2

22expPr

2expPr

pNpNn

pNX

pNn

pNX

N

i i

ii

N

i i

ii and

Page 14: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

14

Random Walks and Replication

• Random Walk with Look-Ahead: • a random walk with shallow flooding on each step of the walk• RW with lookahead 1 visits Ω(n) nodes with response O(n^(1/2))

• Theorem 4.2.: Let Gn,d,α,β be a random graph with supernodes and consider a

random walk from a node v. Then, in 1-step replication scenario, the expected number of messages and response time to obtain distinct responses is

11

4n

d

n

nnOn

nOd

log2

log 2

12

1

Page 15: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

15

• Theorem 4.3.: Let Gn,d,α,β be a random graph with supernodes and consider

Normalized flooding from v with TTL τ ≈ (log n)/(2*log(d-1)). Then, in 1-step replication scenario, the number of distinct responses is at least

and the number of messages is at most

Proof:

The number of minigroups seen is (d - 1)τ – 1 and using the Chernoff bounds

there will be minigroups corresponding to large vertices.

ndd

nbd

8

1 2

121

2/111

2 nOdd

O

d

d

2

1 1

Page 16: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

16

Generalized Search Schemes• Searching procedure:

• A node of degree d initiates a search based on a budget kbudget = number of messages that are propageted in the network• Among its d neighbors the node picks certain quantities k1,k2,…,kd

such that k1 + k2 + … + kd = k

• For every neighbor i the master node forwards the message with budget ki ( for ki = 0 the message is not transmitted)

• Each neighbor i reduces the budget by 1 unit and repeat the process until the budget is greater than 0

• Every node that receives the message for the second yime from another neighbor forwards the message with the corresponding budget

• Random Walks + Flooding

Page 17: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

17

Experimental Evaluation• Methodology

– Performance Metrics• Median and Mean number of distinct peers discovered (hits)• Minimum, Maximum, Standard Deviation of the number of hits• Number of messages• Granularity of number of messages• Response time

– Topologies• Random d-Regular Graphs• Power Law Graphs• Bimodal topologies• Clustered topologies

Page 18: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

18

Normalized Flooding (NF)• Mean number of unique peers discovered as a function of the initial TTL • NF and Standard Flooding behave similarly in Regular Graphs• NF controls the number of messages and provides higher efficiency

Page 19: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

19

Normalized Flooding (NF)

• The number of unique peers increases exponentially with TTL in NF case• The number of peers increases faster than exponentially with TTL in

topologies with high degrees

Page 20: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

20

Random Walk with 1-step replication

Page 21: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

21

Random Walk with LookAhead (RWLA)

• RWLA performance is similar to long RW without lookahead (in terms of unique peers discovered)

• RWLA response time is much smaller compared to standard RW

Page 22: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

22

Edge Criticality & Searching with weights

• Generalized Searching performs similarly to Standard Flooding in regular graphs

• Generalized Searching behaves similarly to Standard Flooding in other topologies if normalized edge criticality is used.

Page 23: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

23

Conclusions

• Normalized Flooding (NF) could substitute the Standard Flooding in irregular graphs

• RW with 1-step replication performs better than RW and NF in irregular graphs

• Open for improvements:• Generalized schemes (analytic investigation)• Quantifying Directional flooding

Page 24: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

24

“Random Walks in Peer-to-Peer (P2P) Networks”

Christos Gkantsidis, Milena Mihail, Amin Saberi

Page 25: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

25

Outline

• Motivation

• Statistical Estimation and Random Walks (RW)

• Searching• Methodology and Topologies importance

• Construction and Summary

Page 26: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

26

Motivation• Random Walks (RW) were proposed for constructing searching

and topology maintenance protocols in P2P networks• RW improve searching performance as compared to flooding (Cao et al., 2002)• A RW approach to constructing and maintaining unstructured topologies

provides good connectivity properties (i.e. constant degree, constant expansion)

• Claim: RW approach is a good candidate • to simulate uniform sampling• the number of simulation steps required can be as low as the number of

samples in independent uniform sampling

• Searching and Overlay Topology Construction • RW searching performs better than flooding for the same number of messages

and for cluster and slow dynamic topologies• Construction of P2P networks by random walks

Page 27: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

27

Statistical Estimation & Random Walks• Coupon collection and Chernoff bounds

• n - type of coupons & each time one is drawn (uniformly distributed)• Tn - time by which we extracted coupons belonging to all n types

• Tαn - time by which we encountered αn distinct types, 0 < α < 1

• X1,…,Xk independent Bernoulli trials, P[Xi=1]=pi and P[Xi=0]=1-pi

• p - probability that a random drawn object has a particular property• the probability that the property is found in substantially fewer draws

than its frequency in the search space and the quality of the estimator X/k are bounded by

)log(21

1 nnOnn

n

n

nTE n

)(1

1

1211 nO

nn

n

n

n

n

nTE n

20 /

1

2

1

2

21 kpεk

i

i / εkpk

ii eεpp

k

XPr ekpεXPr

and

Page 28: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

28

Statistical Estimation & Random Walks

• Random Walks (RW), Convergence and Cover Time• G = (V,E) undirected graph, |V| = n, and di- degree of vertex I

• Aij - adjacency matrix, P - transition matrix which satisfies

• f: V→{0,1} which satisfies• Convergence rate metric - the rate at which the RW approaches the

stationary distribution• Cover time metric - the time by which all nodes were visited• Trajectory sample average - the rate at which the value of f averaged

over successive vertices of the RW trajectory approaches p

E

dP i

i 2 , with

Vv

vvfVv

v vfp )(1)(:

Page 29: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

29

Statistical Estimation & Random Walks

• Convergence rate is related to the second eigenvalue of P

(1)

• yt – the vertex that the RW visited at time t

• Cover time (2)

• Trajectory sample average (3)

SπSyPrmaxtΔπ

λtΔ tVS

min

t

2 , where

nΩπ ,

λ

nO

αλπ

lognO

αCE

nΩπ ,

λ

nlognO

λπ

lognOCE

min

22min

αn

min

22min

n

1

11

1

11

1

1

11

1

2

1

20

1- 2

8e

λlog

πlogτYYεpp

k

YPr min

1τtt

λkpε 22

and ,

(1) :[ 11], (2) :[ 12, 13] , (3) :[ 3, 4, 5, 6]

Page 30: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

30

Statistical Estimation & Random Walks

• Second Eigenvalue, Expansion and Conductance• S subset of V, C(S) cutset of V (i.e. edges with one point in S and

the other one in V\S), vol(S) (i.e. the sum of degrees of vertices in S)• Expansion

• Conductance

• Known bound

/2VSVS

S

SCminφ

/2VvolSvolVS

Svol

SCminΦ

2-12-1

2

2

ΦλΦ

[ 11, 14, 15, 16, 17, 18, 19]

Page 31: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

31

Searching• Performance metrics for Flooding and RW

• average number of distinct copies of an item located in the search• number of messages used by the searching algorithm

• RW performs better than flooding if• multiple search requests for the same item with slow-changing

topology• peer clustering ( see [20, 21, 22, 23, 24, 25] for details)

• Searching analysis• Methodology• Flat topologies with Uniformly Distributed Content• Topologies with Peer Clustering• Re-issuing the Same Query• Real topologies

Page 32: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

32

Searching - Methodology• Performance Metrics

• mean of the number of distinct copies (i.e. Mean)• discrepancy around the mean (i.e. Std) and the failure probability

• Cost• number of messages or queries performed during search

• Peer-to-peer topologies ( ≈ 1 million nodes)• Flat regular expanders, Two tier topologies with clustering, Power law

graphs, Samples from real topologies

• Dynamic topologies• rewiring

• Content placement• Content clustering affects the performance of searching

Page 33: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

33

Searching – Flat Topologies• Experiment:

• one request in a network of 500K peers• Mean hits, Minimum # of hits and Std are similar for Flooding

and RW• the entire distribution of hits is similar for Flooding and RW

Page 34: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

34

Searching -Topologies with Peer Clustering• Cluster topology consists of

• 5 flat regular graphs of size 40K; from each one pick randomly 1000 nodes to construct another flat regular graph

• Number of hits for RW is more concentrated around the mean compared to Flooding

Page 35: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

35

Searching - Reissuing the Same Query• Experiment setup – repeat 4 times the below procedure

• each peer sends a request and waits for response• between requests 2% of the links are rewired• each peer initiates a new searching

• RW have better performance than Flooding• Mean Hits and Failure Probability

Page 36: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

36

Searching - Reissuing the Same Query

• Performance of successive searches depends • on the number of topology changes considered between consecutive

searches

• Performance of Flooding increases as the rate of topological changes increases

• RW Performance remains the same for small variations

Page 37: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

37

Searching – Real Topologies

• The number of hits for RW is more concentrated around the mean than in Flooding

• P2P have good expansion properties

Page 38: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

38

Construction• P2P network construction concerns with:

• peers arrive and leave the network dynamically• strong and weak decentralization• low network overhead per addition or deletion

Page 39: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

39

Baseline Construction of Expander Graphs

• ABASE (undirected graph) consists of: • n vertices where each one chooses randomly d vertices• total number of edges = nd and expected vertex degree = 2d

• Theorem 4.1. Let G(V,E) a graph constructed by ABASE.

Then, G is an expander with high probability and for positive

constant α < 1 )1(1minPr

2,

OS

SCV

SVS

Page 40: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

40

Baseline Construction of Expander Graphs with Constant Overhead in Random Bits

• A’BASE construction algorithm: • start a RW at a random vertex on H (constant degree expander graph)• when ABASE needs a random number this is taken from the RW on H

• Theorem 4.2. Let G(V,E) a graph constructed by A’BASE.

There are positive constants α, 0 < β < 0.5 such that any subset S of at least β|V| and at most 0.5|V| has cutset expansion α almost surely.

)1(1minPr

2,

OS

SCV

SVVS

Page 41: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

41

Distributed Construction of Expanders with Constant Overhead on Network Resources

• A’H – construction• d daemons , one for each Hamilton cycle• a new arriving node, it contacts the daemon associated with the i-th

Hamilton cycle• it attaches after c number of steps between the peer that currently

hosts daemon i and one of its neighbors in the cycle i

Page 42: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

42

Distributed Construction of Expanders with Constant Overhead on Network Resources

• A’M – construction• d daemons , one for each Hamilton cycle• the arrival of a new arriving node consists of two X and Y nodes; X and

Y contact the central server to discover the location of the d daemons• X becomes the neighbor of daemon i and Y the neighbor of the initial

daemon’s neighbor

Page 43: Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

43

Summary

• For Searching • Random Walks (RW) are superior to Flooding

• For Construction• RW add new peers with constant overhead

• Open Problems• Strong Decentralized Construction algorithm• Can we handle better deletions and expansions of

small sets?• How the P2P network parameters (e.g. capacities)

affect the performance of RW?