distributed hash tables: an overview ashwin bharambe carnegie mellon university
TRANSCRIPT
![Page 1: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/1.jpg)
Distributed Hash Tables: An Overview
Ashwin BharambeCarnegie Mellon University
![Page 2: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/2.jpg)
Definition of a DHT
Hash table supports two operations insert(key, value) value = lookup(key)
Distributed Map hash-buckets to nodes
Requirements Uniform distribution of buckets Cost of insert and lookup should scale well Amount of local state (routing table size) should
scale well
![Page 3: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/3.jpg)
Fundamental Design Idea - I
Consistent Hashing Map keys and nodes to an identifier space; implicit
assignment of responsibility
IdentifiersA C DB
Key
Mapping performed using hash functions (e.g., SHA-1) Spread nodes and keys uniformly throughout
11111111110000000000
![Page 4: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/4.jpg)
Fundamental Design Idea - II
Prefix / Hypercube routing
Source
Destination
Zoom In
![Page 5: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/5.jpg)
But, there are so many of them! DHTs are hot! Scalability trade-offs
Routing table size at each node vs. Cost of lookup and insert operations
Simplicity Routing operations Join-leave mechanisms
Robustness
![Page 6: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/6.jpg)
Talk Outline
DHT Designs Plaxton Trees, Pastry/Tapestry Chord Overview: CAN, Symphony, Koorde, Viceroy, etc. SkipNet
DHT Applications File systems, Multicast, Databases, etc.
Conclusions / New Directions
![Page 7: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/7.jpg)
Plaxton Trees [Plaxton, Rajaraman, Richa]
Motivation Access nearby copies of replicated objects Time-space trade-off
Space = Routing table size Time = Access hops
![Page 8: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/8.jpg)
Plaxton Trees Algorithm
9 A E 4 2 4 7 B
1. Assign labels to objects and nodes
Each label is of log2b n digits
Object Node
- using randomizing hash functions
![Page 9: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/9.jpg)
Plaxton Trees Algorithm
2 4 7 B
2. Each node knows about other nodes with varying prefix matches
Node
2 4 7 B
2 4 7 B
2 4 7 B2 4 7 B
3
1
5
3
6
8
A
C
2
2
2 4
2 42 4 7
2 4 7
Prefix match of length 0
Prefix match of length 1
Prefix match of length 2
Prefix match of length 3
![Page 10: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/10.jpg)
Plaxton Trees Object Insertion and Lookup
Given an object, route successively towards nodes with greater prefix matches
2 4 7 B
Node
9 A E 2
9 A 7 6
9 F 1 0
9 A E 4
Object
Store the object at each of these locations
![Page 11: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/11.jpg)
Plaxton Trees Object Insertion and Lookup
Given an object, route successively towards nodes with greater prefix matches
2 4 7 B
Node
9 A E 2
9 A 7 6
9 F 1 0
9 A E 4
Object
Store the object at each of these locations
log(n) steps to insert or locate object
![Page 12: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/12.jpg)
Plaxton Trees Why is it a tree?
2 4 7 B
9 F 1 0
9 A 7 6
9 A E 2
Object
Object
Object
Object
![Page 13: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/13.jpg)
Plaxton Trees Network Proximity Overlay tree hops could be totally unrelated
to the underlying network hops
USA
Europe
East Asia
Plaxton trees guarantee constant factor approximation! Only when the topology is uniform in some sense
![Page 14: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/14.jpg)
Pastry
Based directly upon Plaxton Trees Exports a DHT interface Stores an object only at a node whose ID is
closest to the object ID In addition to main routing table
Maintains leaf set of nodes Closest L nodes (in ID space)
L = 2(b + 1) ,typically -- one digit to left and right
![Page 15: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/15.jpg)
Pastry
2 4 7 B
9 F 1 0
9 A 7 6
9 A E 2Object
Only at the root!
Key Insertion and Lookup = Routing to Root Takes O(log n) steps
![Page 16: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/16.jpg)
Pastry Self Organization Node join
Start with a node “close” to the joining node Route a message to nodeID of new node Take union of routing tables of the nodes on the
path Joining cost: O(log n) Node leave
Update routing table Query nearby members in the routing table
Update leaf set
![Page 17: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/17.jpg)
Chord [Karger, et al]
Map nodes and keys to identifiers Using randomizing hash functions
Arrange them on a circle
IdentifierCircle
x
succ(x)
010110110
010111110
pred(x)
010110000
![Page 18: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/18.jpg)
Chord Efficient routing Routing table
ith entry = succ(n + 2i) log(n) finger pointers
IdentifierCircle
Exponentially spacedpointers!
![Page 19: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/19.jpg)
Chord Key Insertion and Lookup
To insert or lookup a key ‘x’, route to succ(x)
x
succ(x)
source
O(log n) hops for routing
![Page 20: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/20.jpg)
Chord Self-organization Node join
Set up finger i: route to succ(n + 2i) log(n) fingers ) O(log2 n) cost
Node leave Maintain successor list for ring connectivity Update successor list and finger pointers
![Page 21: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/21.jpg)
CAN [Ratnasamy, et al]
Map nodes and keys to coordinates in a multi-dimensional cartesian space
source
key
Routing through shortest Euclidean path
For d dimensions, routing takes O(dn1/d) hops
Zone
![Page 22: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/22.jpg)
Symphony [Manku, et al]
Similar to Chord – mapping of nodes, keys ‘k’ links are constructed probabilistically!
x
This link chosen with probability P(x) = 1/(x ln n)
Expected routing guarantee: O(1/k (log2 n)) hops
![Page 23: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/23.jpg)
SkipNet [Harvey, et al]
Previous designs distribute data uniformly throughout the system Good for load balancing But, my data can be stored in Timbuktu! Many organizations want stricter control over data
placement What about the routing path?
Should a Microsoft Microsoft end-to-end path pass through Sun?
![Page 24: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/24.jpg)
SkipNet Content and Path Locality
Basic Idea: Probabilistic skip lists
Hei
ght
Nodes
Each node choose a height at random Choose height ‘h’ with probability 1/2h
![Page 25: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/25.jpg)
SkipNet Content and Path Locality
Hei
ght
Nodes
machine1.cm
u.edu
machine2.cm
u.edu
machine1.berke
ley.edu
Nodes are lexicographically sortedStill O(log n) routing guarantee!
![Page 26: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/26.jpg)
Summary (Ah, at last!)
# Links per node Routing hops
Pastry/Tapestry O(2b log2b n) O(log2
b n)
Chord log n O(log n)
CAN d dn1/d
SkipNet O(log n) O(log n)
Symphony k O((1/k) log2 n)
Koorde d logd n
Viceroy 7 O(log n)Optimal (= lower bound)
![Page 27: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/27.jpg)
What can DHTs do for us?
Distributed object lookup Based on object ID
De-centralized file systems CFS, PAST, Ivy
Application Layer Multicast Scribe, Bayeux, Splitstream
Databases PIER
![Page 28: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/28.jpg)
De-centralized file systems
CFS [Chord] Block based read-only storage
PAST [Pastry] File based read-only storage
Ivy [Chord] Block based read-write storage
![Page 29: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/29.jpg)
PAST
Store file Insert (filename, file) into Pastry Replicate file at the leaf-set nodes
Cache if there is empty space at a node
![Page 30: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/30.jpg)
CFS
Blocks are inserted into Chord DHT insert(blockID, block) Replicated at successor list nodes
Read root block through public key of file system
Lookup other blocks from the DHT Interpret them to be the file system
Cache on lookup path
![Page 31: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/31.jpg)
CFS
signature
public key
Root Block
D
DirectoryBlock
H(D)
F
H(F)
File Block
B1 B2Data Block Data Block
H(B1)H(B2)
![Page 32: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/32.jpg)
CFS vs. PAST
Block-based vs. File-based Insertion, lookup and replication
CFS has better performance for small popular files Performance comparable to FTP for larger files
PAST is susceptible to storage imbalances Plaxton trees can provide it network locality
![Page 33: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/33.jpg)
Ivy
Each user maintains a log of updates To construct file system, scan logs of all users
Log head
Log head
Alice
Bob
create
write
linkex-create
deletewrite
delete
![Page 34: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/34.jpg)
Ivy
Starting from log head – stupid Make periodic snapshots
Conflicts will arise For resolution, use any tactics (e.g., Coda’s)
![Page 35: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/35.jpg)
Application Layer Multicast
Embed multicast tree(s) over the DHT graph Multiple source; multiple groups
Scribe CAN-based multicast Bayeux
Single source; multiple trees Splitstream
![Page 36: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/36.jpg)
Scribe
New member
Underlying Pastry DHT
![Page 37: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/37.jpg)
Scribe Tree construction
New member
Underlying Pastry DHT
Rendezvous point
Route towardsmulticast groupID
groupID
![Page 38: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/38.jpg)
Scribe Tree construction
New member
Underlying Pastry DHT
Route towardsmulticast groupID
groupID
![Page 39: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/39.jpg)
Scribe Discussion Very scalable
Inherits scalability from the DHT Anycast is a simple extension How good is the multicast tree?
As compared to native IP multicast Comparison to Narada
Node heterogeneity not considered
![Page 40: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/40.jpg)
SplitStream
Single source, high bandwidth multicast Idea
Use multiple trees instead of one Make them internal-node-disjoint
Every node is an internal node in only one tree Satisfies bandwidth constraints Robust
Use cute Pastry prefix-routing properties to construct node-disjoint trees
![Page 41: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/41.jpg)
Databases, Service Discovery
SOME OTHER TIME!
![Page 42: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/42.jpg)
Where are we now?
Many DHTs offering efficient and relatively robust routing
Unanswered questions Node heterogeneity Network-efficient overlays vs. Structured
overlays Conflict of interest!
What happens with high user churn rate? Security
![Page 43: Distributed Hash Tables: An Overview Ashwin Bharambe Carnegie Mellon University](https://reader036.vdocuments.mx/reader036/viewer/2022081516/5519b15a55034660578b45db/html5/thumbnails/43.jpg)
Are DHTs a panacea?
Useful primitive Tension between network efficient
construction and uniform key-value distribution
Does every non-distributed application use only hash tables? Many rich data structures which cannot be built on
top of hash tables alone Exact match lookups are not enough Does any P2P file-sharing system use a DHT?