pastry peter druschel, rice university antony rowstron, microsoft research uk some slides are...

32
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Upload: kaleigh-forton

Post on 14-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry

Peter Druschel, Rice University

Antony Rowstron, Microsoft Research UK

Some slides are borrowed from the original presentation by the authors

Page 2: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Outline

• Background

• Pastry

• Pastry proximity routing

• PAST

Page 3: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Background

Peer-to-peer systems

• distribution

• decentralized control

• self-organization

• symmetry (communication, node roles)

Page 4: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Common issues

• Organize, maintain overlay network

• Resource allocation/load balancing

• Resource location

• Network proximity routing

Pastry provides a generic p2p substrate

Page 5: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Architecture

TCP/IP

Pastry

Network storage

Event notification

Internet

P2p substrate (self-organizingoverlay network)

P2p application layer?

Page 6: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Structured p2p overlays

The primitive route(M, X) routes message M to the live node with node Id closest to key X

Node ids and keys are from a large, sparse id space

Page 7: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Distributed Hash Tables (DHT)

k6,v6

k1,v1

k5,v5

k2,v2

k4,v4

k3,v3

nodes

Operations:insert(k,v)lookup(k)

P2P overlay networ

k

P2P overlay networ

k

• p2p overlay maps keys to nodes• completely decentralized and self-organizing• robust, scalable

Page 8: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Outline

• Background

• Pastry

• Pastry proximity routing

• PAST

• SCRIBE

• Conclusions

Page 9: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Object distribution

objId

Consistent hashing [Karger et al. ‘97]

128 bit circular id space

nodeIds (uniform random)

objIds (uniform random)

Invariant: node with numerically closest nodeId maintains object

nodeIds

O2128-1

Page 10: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Object insertion/lookup

X

Route(X)

Msg with key X is routed to live node with nodeId closest to X

Problem: complete

routing table not feasible

O2128-1

Page 11: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Routing

Tradeoff

• O(log N) routing table size

• O(log N) message forwarding steps

Page 12: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Routing table of # 65a1fcx0x

1x

2x

3x

4x

5x

7x

8x

9x

ax

bx

cx

dx

ex

fx

60x

61x

62x

63x

64x

66x

67x

68x

69x

6ax

6bx

6cx

6dx

6ex

6fx

650x

651x

652x

653x

654x

655x

656x

657x

658x

659x

65bx

65cx

65dx

65ex

65fx

65a0x

65a2x

65a3x

65a4x

65a5x

65a6x

65a7x

65a8x

65a9x

65aax

65abx

65acx

65adx

65aex

65afx

log16 N rows

Row 0

Row 1

Row 2

Row 3

Page 13: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Routing

Propertieslog16 N steps O(log N) state

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

Page 14: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Leaf sets

Each node maintains IP addresses of the nodes with the L/2 numerically closest larger and smaller nodeIds, respectively.

• routing efficiency/robustness

• fault detection (keep-alive)

• application-specific local coordination

Page 15: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Routing procedureif (destination is within range of our leaf set)

forward to numerically closest memberelse

let l = length of shared prefix let d = value of l-th digit in D’s addressif (Rl

d exists) (Rld = entry at column d row l)

forward to Rld

else forward to a known node that (a) shares at least as long a prefix(b) is numerically closer than this node

Page 16: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Performance

Integrity of overlay/ message delivery:• guaranteed unless L/2 simultaneous failures

of nodes with adjacent nodeIds

Number of routing hops:• No failures: < log16 N expected, 128/b + 1 max

• During failure recovery:– O(N) worst case, average case much better

Page 17: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Self-organization

How are the routing tables and leaf sets

initialized and maintained?

• Node addition

• Node departure (failure)

Page 18: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Node addition

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

New node: d46a1c

Page 19: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Node departure (failure)

Leaf set members exchange heartbeat

• Leaf set repair (eager): request set from farthest live node in set

• Routing table repair (lazy): get table from peers in the same row, then higher rows

Page 20: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Experimental results

Prototype

• implemented in Java

• emulated network

• deployed currently at ~25 sites worldwide

Page 21: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Average # of hops

L=16, 100k random queries

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1000 10000 100000

Number of nodes

Average number of hops

Pastry

Log(N)

Page 22: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Outline

• Background

• Pastry

• Pastry proximity routing

Page 23: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Proximity routing

Proximity metric = time delay estimated by ping

A node can probe distance to any other node

Each routing table entry uses a node close to the local node (in the proximity space), among all nodes with the appropriate node Id prefix.

Page 24: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Routes in proximity space

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

NodeId space

d467c4

65a1fcd13da3

d4213f

d462ba

Proximity space

Page 25: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Distance traveled

L=16, 100k random queries, Euclidean proximity space

0.8

0.9

1

1.1

1.2

1.3

1.4

1000 10000 100000Number of nodes

Relative DistancePastry

Complete routing table

Page 26: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Locality properties

Expected distance traveled by a message in the proximity space is within a small constant of the minimum

Among k nodes with node Ids closest to the key, message likely to reach the node closest to the source node first

Page 27: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

d467c4

65a1fcd13da3

d4213f

d462ba

Proximity space

Pastry: Node addition

New node: d46a1c

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

NodeId space

Page 28: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry delay vs IP delay

0

500

1000

1500

2000

2500

0 200 400 600 800 1000 1200 1400

Distance between source and destination

Distance traveled by Pastry message

Mean = 1.59

GATech top., .5M hosts, 60K nodes, 20K random messages

Page 29: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: API

• route(M, X): route message M to node with nodeId numerically closest to X

• deliver(M): deliver message M to application• forwarding(M, X): message M is being

forwarded towards key X• newLeaf(L): report change in leaf set L to

application

Page 30: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Security

• Secure nodeId assignment

• Secure node join protocols

• Randomized routing

• Byzantine fault-tolerant leaf set membership protocol

Page 31: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Pastry: Summary

• Generic p2p overlay network

• Scalable, fault resilient, self-organizing, secure

• O(log N) routing steps (expected)

• O(log N) routing table size

• Network proximity routing

Page 32: Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

PAST: File Retrieval

fileId file located in log16 N steps (expected)

usually locates replica nearest client C

Lookup

k replicasC