scalable content- addressable networks prepared by kuhan paramsothy march 5, 2007
TRANSCRIPT
Scalable Content-Addressable Networks
Prepared by
Kuhan Paramsothy
March 5, 2007
ECE 1770 – Content-Addressable Networks
High-Level Overview
Hash tables (map keys to values) are heavily used in building software applications
The concept of a Content-Addressable Network (CAN) provides hash table-like functionality on Internet-like scales.
CAN is: Scalable Robust/Fault-tolerant Self-organizing Low-latency
ECE 1770 – Content-Addressable Networks
Hash Tables and CAN
A data structure that efficiently maps keys onto values
CANs are a form of distributed, Internet-scale hash tables.
ECE 1770 – Content-Addressable Networks
What CAN would do for us CAN would improve peer-to-peer systems
Napster: the process of locating a file is centralized Expensive to scale the central repository, single point of failure
Gnutella: decentralized the file location process (network self-organizes into an application layer mesh) Requests for files are done through flooding, not scalable, may not find
content Conclusion: P2P systems need a scalable indexing mechanism
CAN would improve large data repositories These systems need efficient insertion and retrieval
CAN would create large-scale name resolution services that don’t use a naming scheme (ie. Not DNS) No more location-dependent naming schemes
ECE 1770 – Content-Addressable Networks
Basic Operations Performed On CANs Basic Operations
Insertion (of key,value pairs) Lookup (of key,value pairs) Deletion (of key,value pairs)
Each CAN stores1. A piece (called a zone) of the entire hash table2. Holds information about a small number of adjacent zones in the table
Routing in a CAN Done by intermediate CAN nodes towards the CAN node whose zone contains that
key CAN Design is
Distributed (requires no centralized control or coordination) Scalable (nodes hold only a small about of information that doesn’t grow with the
network) Fault-tolerant (nodes can route around failures) Doesn’t require a naming hierarchy Is entirely Application Layer
ECE 1770 – Content-Addressable Networks
CAN Design
Centers around a virtual d-dimensional Cartesian coordinate space on a d-torus
At any time, the entire coordinate space is dynamically partitioned among all the nodes in the system Each node owns a
distinct zone
ECE 1770 – Content-Addressable Networks
CAN Design (2)1. To store a pair, key K1 is mapped to
P via a uniform hash function2. The pair is then stored at the node
that owns the zone where P lies3. To retrieve an entry corresponding
to K1, any node can apply the same hash function to map K1 to P and get the corresponding value
A node learns and maintains the IP addresses of those nodes that hold adjoining coordinate zones
Efficient routing is critical to a useful CAN
ECE 1770 – Content-Addressable Networks
Routing in a CAN Routing in a Content Addressable Networks works by following the straight line
path through the Cartesian space from source to destination coordinates. A CAN node maintains a coordinate routing table that holds the IP address and
virtual coordinate zone of each of its immediate neighbors in the coordinate space.
Average Path Length = (d/4)(n1/d)Individual Nodes Have 2d Neighbors
Average Path Length Grows As O(n1/d)
ECE 1770 – Content-Addressable Networks
Construction of a CAN Overlay The entire CAN space is divided amongst the nodes currently in the system Incremental construction process takes three steps
The new node finds a node already in the CAN Using the CAN routing mechanisms, finds a node whose zone will be split The neighbors of the split zone must be notified so that routing can include the new
node Bootstrapping: There are CAN bootstrap nodes associated to a DNS domain
nameNode Insertion Affects Only O(number of dimensions) existing nodes
ECE 1770 – Content-Addressable Networks
Maintenance of a CAN Overlay Node Graceful Departure: node explicitly hands over its zone and the
associated (key,value) database to one of its neighbors Node Abrupt Disappearance: An immediate takeover algorithm ensures
one of the “failed” node’s neighbors takes over the zone Under normal conditions, a node sends periodic update messages to
each of its neighbors and a list of neighbors and their zone coordinates. Prolonged absence of an update message from a neighbor signals it’s
failure
ECE 1770 – Content-Addressable Networks
Design Improvements
Basic CAN algorithm provides Low per-node state (O(d) for a d-dimensional
space) Short path lengths (O(dn1/d) hops for d dimensions
and n nodes) The problem is that there are application-
layer hops, not IP-layer hops Latency of each hop might be substantial
ECE 1770 – Content-Addressable Networks
Design Improvements (2) Improvement: Multi-dimensioned Coordinate Spaces
Increasing the dimensions of the CAN coordinate space reduces the routing path length and path latency for a small increase in the size of the coordinate routing table
Path Length scales as O(d(n1/d)) Fault-tolerance improves
Improvement: Multiple Coordinate Spaces (a.k.a. Multiple Realities) Maintain multiple independent coordinate spaces with each node in the system being assigned a different zone in the coordinate
space (each coordinate space is a reality) Fault-tolerance improves Low per-node state (O(d) for a d-dimensional space) Short path lengths (O(dn1/d) hops for d dimensions and n nodes)
Which is better? Increasing the dimensions
ECE 1770 – Content-Addressable Networks
Design Improvements (3) Improvement: Better CAN Routing Metrics
Have each node measure the network-level round-trip-time RTT to each of its neighbors. Then route messages accordingly.
Favors lower latency paths and avoids unnecessarily long hops Improvement: Caching and Replication
A CAN node can maintain a cache of the data keys it recently accessed A CAN node can replicate the data key at each of its neighboring nodes Both schemes need an associated time-to-live field, to eventually expire from
the cache
ECE 1770 – Content-Addressable Networks
Related Systems Domain Name System
CANs are more general than the DNS because DNS closely ties the naming scheme to the manner in which a name is resolved to an IP address
Peer-to-Peer A simple example is keys being analogous to a URL Will improve robustness Key difference is that content within the CAN can always be located by any
other node because there is a clear “home” (point) in the CAN for that content and every other node knows what the home is how to reach it
ECE 1770 – Content-Addressable Networks
Discussion
Security? Better or worse with CAN?
Any Other Design Improvement? Is The Communication Overhead Significant?
ECE 1770 – Content-Addressable Networks
References A Scalable Content-Addressable Network, Ratnasamy, University of
California – Berkeley, http://www.sigcomm.org/sigcomm2001/p13-ratnasamy.pdf