a scalable content-addressable network authors: s. ratnasamy, p. francis, m. handley, r. karp, s....
Post on 20-Dec-2015
221 views
TRANSCRIPT
![Page 1: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/1.jpg)
A Scalable Content-Addressable Network
Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker
University of California, Berkeley
Presenter: Andrzej Kochut
![Page 2: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/2.jpg)
Main features of CAN
Distributed, Internet-scale hash tableNo form of centralized controlScalable – amount of information stored at
each node independent of the total number of nodes in the system
Does not impose any form of hierarchical name structure
Can be implemented at the application level
![Page 3: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/3.jpg)
Basic operations of CAN
• Inserting, updating, deleting of (key,value) pairs
• Retrieving value associated with a given key
• Adding new nodes to CAN
• Handling departing nodes
• Dealing with node failures
![Page 4: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/4.jpg)
Basic CAN design
![Page 5: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/5.jpg)
Design of CAN• Virtual d-dimensional coordinate space on a
d-torus• At any instant the entire coordinate space is
partitioned among all the nodes in the system
• Each node contains chunk of the hash table and information about its neighbors in the d-space
• Uniform hash function is used to map key values to points in the d-dimensional space
![Page 6: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/6.jpg)
Example 2-space partition
![Page 7: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/7.jpg)
Design of CAN
• To store (K,V) pair the hash value P of K is computed and the pair is stored in the node that owns point P
• To retrieve the value given K, the hash value P of K is computed. If the requesting node does not own point P, the request is routed towards the node that owns P
• The set of immediate neighbors serves as a coordinate routing table that enables routing between arbitrary points in the d-space
![Page 8: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/8.jpg)
Routing in CAN• Intuitively – following the straight path through
the Cartesian space from source to destination• Node maintains coordinate routing table that holds
IP addresses and zones’ coordinate of its neighbors in the space
• Two nodes are neighbors if their coordinate spans overlap along d-1 dimensions and abut along one dimension
• CAN message contains destination coordinate. Node greedy forwards it to the neighbor with coordinates closest to the destination coordinate
![Page 9: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/9.jpg)
Routing in CAN
• For the d-dimensional space equally partitioned into n nodes the average routing path is (d/4)*n(1/d)
• Individual nodes maintain 2d neighbors
• The path length growth proportionally to the O(n(1/d))
• Many different routes between two points
![Page 10: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/10.jpg)
CAN construction
• New node is allocated its own portion of the coordinate space in three steps:– Find a node already in the CAN – look up the
CAN domain name in DNS– Pick zone to join to and route request to its
owner using CAN mechanisms – Split the zone between old and new node– The neighbors of the split zone must be notified
so the routing can include the new node
![Page 11: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/11.jpg)
Finding and splitting a zone
• Randomly choose a point P in the space• Send join request destined for P• Route the request using CAN mechanisms• Split the zone occupied by the owner of P
assuming certain ordering of the dimensions, i.e. first X then Y
• Transfer (key, value) pairs from the half of the zone to the new node
![Page 12: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/12.jpg)
Joining the routing
• Update neighbor information in the old and new node
• Inform all neighbors about changes in the zone – every node in the system sends immediate update message followed by periodic refreshes
![Page 13: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/13.jpg)
Node departure and recovery• Normal procedure – explicit hand over of
(key,value) database to one of the neighbors• Node failure – immediate takeover procedure:
– Failure detected as a lack of update messages– Each neighbor starts timer with proportion to the
node’s zone size– After timer expires the node extends its own zone to
contain the failed neighbor’s zone and sends TAKEOVER message to all failed node’s neighbors
– On receive of the TAKEOVER node cancels its timer if the sender’s zone size is smaller than his own. Otherwise it sends it’s own TAKEOVER message.
![Page 14: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/14.jpg)
Recovery cont.
• CAN state may become inconsistent if multiple adjacent nodes fail simultaneously
• In such cases perform an expanding ring search for any nodes residing beyond the failure region (?)
• If it fails initiate repair mechanism (?)• If a node holds more than one zone initiate the
background zone-reassignment algorithm
![Page 15: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/15.jpg)
Drawbacks of the basic CAN design
• Not fault tolerant
• No recovery mechanism
• Lack of load-balancing mechanisms (no caching and replication)
• Does not consider the underlying IP topology while building overly network
![Page 16: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/16.jpg)
Design improvements
![Page 17: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/17.jpg)
Design improvements
• Realities: multiple coordinate spaces
• Topologically-sensitive construction of the CAN
• Multiple hash functions
• Better routing metrics
• Overloading coordinate zones
• More uniform partitioning
• Caching and replication
![Page 18: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/18.jpg)
Multiple coordinate spaces
• Each node is assigned a different zone in each reality (coordinate space)
Improved robustness - point P is unreachable only if P in all realities is unreachable
Improved routing – to forward a message, a node checks all its neighbors on each reality and forwards the message to the neighbor with coordinates closest to the destination
Increased per-node state
![Page 19: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/19.jpg)
Topologically-sensitive construction of the CAN
• Assumes existence of a well known set of machines (i.e. DNS root name servers) called landmarks
• Each node that joins the CAN adds itself to the region associated with its perceived ordering of landmarks
Reduces latency stretch (ratio of the CAN latency to the IP latency)
Leads to unbalanced partition of space between nodes
![Page 20: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/20.jpg)
Multiple hash functions
• Many different hash functions to map (key, value) pairs to the points in space
• Data replicated accordinglyImproved fault toleranceIncrease in the size of (key,value) database
and in the size of the query trafficData consistency issues
![Page 21: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/21.jpg)
Effects of the improvements
![Page 22: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/22.jpg)
Effects of the improvements (cont.)
![Page 23: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/23.jpg)
Better routing metrics
• Each node measures the network-level RTT to each of its neighbors
• Message forwarded to the neighbor with the highest progress to RTT ratio
![Page 24: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/24.jpg)
Overloading coordinate zones
• More than one node associated with each zone• Each node maintains list of its peers in the zone
and neighboring information about selected node in each of its neighboring zones
• Adding new nodes – if the zone to which a new node joins contains less than MAXPEERS nodes, the new node joins the zone without any space-splitting
• Neighboring nodes in the adjacent zones are chosen based on the RTT
![Page 25: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/25.jpg)
Advantages of the zone overloading
• Reduced path length
• Reduced per-hop latency – node has multiple choices in neighbors selection
• Improved fault-tolerance
![Page 26: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/26.jpg)
More uniform partitioning• Attempt to have equal
partitioning of the space between nodes
• Existing occupant of the zone that a new node joins redirects the request to the zone with the highest volume among his and all of his neighbors
Not true load balancing – does not reflect popularity of the data
![Page 27: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/27.jpg)
Caching and replication
• Each node maintains cache of the most popular requests
• A node that is being overloaded by requests to the particular data key replicates this key to all of its neighbors
No description of the way to check validity of the cached and replicated data
![Page 28: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/28.jpg)
Performance tests
• System size – 218 nodes
• Topology generated by GT modeler
Results only from simulations
![Page 29: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/29.jpg)
Performance test results
![Page 30: A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:](https://reader031.vdocuments.mx/reader031/viewer/2022032704/56649d4d5503460f94a2cceb/html5/thumbnails/30.jpg)
Conclusion and presenter’s opinion
• Addresses the problems of scalable routing and indexing
• Test results only from simulationUnresolved security issues (i.e. denial of service
attacks)Lack of search techniques (i.e. keyword searching)Not clearly specified recovery techniquesNot clearly specified data consistency issues
(replication, caching)Does not address the problem of node’s connection
quality