scalable content- addressable networks prepared by kuhan paramsothy march 5, 2007

16
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

Upload: caroline-blankenship

Post on 05-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

Scalable Content-Addressable Networks

Prepared by

Kuhan Paramsothy

March 5, 2007

Page 2: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

High-Level Overview

Hash tables (map keys to values) are heavily used in building software applications

The concept of a Content-Addressable Network (CAN) provides hash table-like functionality on Internet-like scales.

CAN is: Scalable Robust/Fault-tolerant Self-organizing Low-latency

Page 3: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

Hash Tables and CAN

A data structure that efficiently maps keys onto values

CANs are a form of distributed, Internet-scale hash tables.

Page 4: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

What CAN would do for us CAN would improve peer-to-peer systems

Napster: the process of locating a file is centralized Expensive to scale the central repository, single point of failure

Gnutella: decentralized the file location process (network self-organizes into an application layer mesh) Requests for files are done through flooding, not scalable, may not find

content Conclusion: P2P systems need a scalable indexing mechanism

CAN would improve large data repositories These systems need efficient insertion and retrieval

CAN would create large-scale name resolution services that don’t use a naming scheme (ie. Not DNS) No more location-dependent naming schemes

Page 5: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

Basic Operations Performed On CANs Basic Operations

Insertion (of key,value pairs) Lookup (of key,value pairs) Deletion (of key,value pairs)

Each CAN stores1. A piece (called a zone) of the entire hash table2. Holds information about a small number of adjacent zones in the table

Routing in a CAN Done by intermediate CAN nodes towards the CAN node whose zone contains that

key CAN Design is

Distributed (requires no centralized control or coordination) Scalable (nodes hold only a small about of information that doesn’t grow with the

network) Fault-tolerant (nodes can route around failures) Doesn’t require a naming hierarchy Is entirely Application Layer

Page 6: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

CAN Design

Centers around a virtual d-dimensional Cartesian coordinate space on a d-torus

At any time, the entire coordinate space is dynamically partitioned among all the nodes in the system Each node owns a

distinct zone

Page 7: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

CAN Design (2)1. To store a pair, key K1 is mapped to

P via a uniform hash function2. The pair is then stored at the node

that owns the zone where P lies3. To retrieve an entry corresponding

to K1, any node can apply the same hash function to map K1 to P and get the corresponding value

A node learns and maintains the IP addresses of those nodes that hold adjoining coordinate zones

Efficient routing is critical to a useful CAN

Page 8: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

Routing in a CAN Routing in a Content Addressable Networks works by following the straight line

path through the Cartesian space from source to destination coordinates. A CAN node maintains a coordinate routing table that holds the IP address and

virtual coordinate zone of each of its immediate neighbors in the coordinate space.

Average Path Length = (d/4)(n1/d)Individual Nodes Have 2d Neighbors

Average Path Length Grows As O(n1/d)

Page 9: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

Construction of a CAN Overlay The entire CAN space is divided amongst the nodes currently in the system Incremental construction process takes three steps

The new node finds a node already in the CAN Using the CAN routing mechanisms, finds a node whose zone will be split The neighbors of the split zone must be notified so that routing can include the new

node Bootstrapping: There are CAN bootstrap nodes associated to a DNS domain

nameNode Insertion Affects Only O(number of dimensions) existing nodes

Page 10: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

Maintenance of a CAN Overlay Node Graceful Departure: node explicitly hands over its zone and the

associated (key,value) database to one of its neighbors Node Abrupt Disappearance: An immediate takeover algorithm ensures

one of the “failed” node’s neighbors takes over the zone Under normal conditions, a node sends periodic update messages to

each of its neighbors and a list of neighbors and their zone coordinates. Prolonged absence of an update message from a neighbor signals it’s

failure

Page 11: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

Design Improvements

Basic CAN algorithm provides Low per-node state (O(d) for a d-dimensional

space) Short path lengths (O(dn1/d) hops for d dimensions

and n nodes) The problem is that there are application-

layer hops, not IP-layer hops Latency of each hop might be substantial

Page 12: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

Design Improvements (2) Improvement: Multi-dimensioned Coordinate Spaces

Increasing the dimensions of the CAN coordinate space reduces the routing path length and path latency for a small increase in the size of the coordinate routing table

Path Length scales as O(d(n1/d)) Fault-tolerance improves

Improvement: Multiple Coordinate Spaces (a.k.a. Multiple Realities) Maintain multiple independent coordinate spaces with each node in the system being assigned a different zone in the coordinate

space (each coordinate space is a reality) Fault-tolerance improves Low per-node state (O(d) for a d-dimensional space) Short path lengths (O(dn1/d) hops for d dimensions and n nodes)

Which is better? Increasing the dimensions

Page 13: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

Design Improvements (3) Improvement: Better CAN Routing Metrics

Have each node measure the network-level round-trip-time RTT to each of its neighbors. Then route messages accordingly.

Favors lower latency paths and avoids unnecessarily long hops Improvement: Caching and Replication

A CAN node can maintain a cache of the data keys it recently accessed A CAN node can replicate the data key at each of its neighboring nodes Both schemes need an associated time-to-live field, to eventually expire from

the cache

Page 14: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

Related Systems Domain Name System

CANs are more general than the DNS because DNS closely ties the naming scheme to the manner in which a name is resolved to an IP address

Peer-to-Peer A simple example is keys being analogous to a URL Will improve robustness Key difference is that content within the CAN can always be located by any

other node because there is a clear “home” (point) in the CAN for that content and every other node knows what the home is how to reach it

Page 15: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

Discussion

Security? Better or worse with CAN?

Any Other Design Improvement? Is The Communication Overhead Significant?

Page 16: Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

ECE 1770 – Content-Addressable Networks

References A Scalable Content-Addressable Network, Ratnasamy, University of

California – Berkeley, http://www.sigcomm.org/sigcomm2001/p13-ratnasamy.pdf