1 cs 268: lecture 22 dht applications ion stoica computer science division department of electrical...
Post on 18-Dec-2015
216 views
TRANSCRIPT
![Page 1: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/1.jpg)
1
CS 268: Lecture 22 DHT Applications
Ion StoicaComputer Science Division
Department of Electrical Engineering and Computer SciencesUniversity of California, Berkeley
Berkeley, CA 94720-1776
(Presentation based on slides from Robert Morris and Sean Rhea)
![Page 2: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/2.jpg)
2
Outline
Cooperative File System (CFS) Open DHT
![Page 3: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/3.jpg)
3
Target CFS Uses
Serving data with inexpensive hosts:- open-source distributions
- off-site backups
- tech report archive
- efficient sharing of music
node
nodenode
node
Internet
node
![Page 4: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/4.jpg)
4
How to mirror open-source distributions?
Multiple independent distributions- Each has high peak load, low average
Individual servers are wasteful
Solution: aggregate- Option 1: single powerful server
- Option 2: distributed service
• But how do you find the data?
![Page 5: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/5.jpg)
5
Design Challenges
Avoid hot spots Spread storage burden evenly Tolerate unreliable participants Fetch speed comparable to whole-file TCP Avoid O(#participants) algorithms
- Centralized mechanisms [Napster], broadcasts [Gnutella]
CFS solves these challenges
![Page 6: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/6.jpg)
6
CFS Architecture
Each node is a client and a server Clients can support different interfaces
- File system interface
- Music key-word search
node
client server
node
clientserverInternet
![Page 7: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/7.jpg)
7
Client-server interface
Files have unique names Files are read-only (single writer, many readers) Publishers split files into blocks Clients check files for authenticity
FS Client serverInsert file f
Lookup file f
Insert block
Lookup block
node
server
node
![Page 8: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/8.jpg)
8
Server Structure
• DHash stores, balances, replicates, caches blocks
• DHash uses Chord [SIGCOMM 2001] to locate blocks
DHash
Chord
Node 1 Node 2
DHash
Chord
![Page 9: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/9.jpg)
9
Chord Hashes a Block ID to its Successor
N32
N10
N100
N80
N60
CircularID Space
• Nodes and blocks have randomly distributed IDs• Successor: node with next highest ID
B33, B40, B52
B11, B30
B112, B120, …, B10
B65, B70
B100
Block ID Node ID
![Page 10: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/10.jpg)
10
DHash/Chord Interface
lookup() returns list with node IDs closer in ID space to block ID
- Sorted, closest first
server
DHash
Chord
Lookup(blockID) List of <node-ID, IP address>
finger table with <node IDs, IP address>
![Page 11: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/11.jpg)
11
DHash Uses Other Nodes to Locate Blocks
N40
N10
N5
N20
N110
N99
N80 N50
N60N68
Lookup(BlockID=45)
1.
2.
3.
![Page 12: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/12.jpg)
12
Storing Blocks
Long-term blocks are stored for a fixed time
- Publishers need to refresh periodically Cache uses LRU
disk: cache Long-term block storage
![Page 13: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/13.jpg)
13
Replicate blocks at r successors
N40
N10
N5
N20
N110
N99
N80
N60
N50
Block17
N68
• Node IDs are SHA-1 of IP Address• Ensures independent replica failure
![Page 14: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/14.jpg)
14
Lookups find replicas
N40
N10
N5
N20
N110
N99
N80
N60
N50
Block17
N68
1.3.
2.
4.
Lookup(BlockID=17)
RPCs:1. Lookup step2. Get successor list3. Failed block fetch4. Block fetch
![Page 15: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/15.jpg)
15
First Live Successor Manages Replicas
N40
N10
N5
N20
N110
N99
N80
N60
N50
Block17
N68
Copy of17
• Node can locally determine that it is the first live successor
![Page 16: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/16.jpg)
16
DHash Copies to Caches Along Lookup Path
N40
N10
N5
N20
N110
N99
N80
N60
Lookup(BlockID=45)
N50
N68
1.
2.
3.
4.RPCs:1. Chord lookup2. Chord lookup3. Block fetch4. Send to cache
![Page 17: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/17.jpg)
17
Caching at Fingers Limits Load
N32
• Only O(log N) nodes have fingers pointing to N32• This limits the single-block load on N32
![Page 18: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/18.jpg)
18
Virtual Nodes Allow Heterogeneity
Hosts may differ in disk/net capacity Hosts may advertise multiple IDs
- Chosen as SHA-1(IP Address, index)
- Each ID represents a “virtual node” Host load proportional to # v.n.’s Manually controlled
Node A
N60N10 N101
Node B
N5
![Page 19: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/19.jpg)
19
Why Blocks Instead of Files?
Cost: one lookup per block- Can tailor cost by choosing good block size
Benefit: load balance is simple- For large files
- Storage cost of large files is spread out
- Popular files are served in parallel
![Page 20: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/20.jpg)
20
Outline
Cooperative File System (CFS) Open DHT
![Page 21: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/21.jpg)
21
Questions:
How many DHTs will there be?
Can all applications share one DHT?
![Page 22: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/22.jpg)
22
Benefits of Sharing a DHT
Amortizes costs across applications- Maintenance bandwidth, connection state, etc.
Facilitates “bootstrapping” of new applications- Working infrastructure already in place
Allows for statistical multiplexing of resources- Takes advantage of spare storage and bandwidth
Facilitates upgrading existing applications- “Share” DHT between application versions
![Page 23: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/23.jpg)
23
The DHT as a Service
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V
![Page 24: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/24.jpg)
24
The DHT as a Service
K V
K V
K V
K V
K V
K V
K V
K V
K V
K V OpenDHT
![Page 25: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/25.jpg)
25
The DHT as a Service
OpenDHT Clients
![Page 26: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/26.jpg)
26
The DHT as a Service
OpenDHT
![Page 27: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/27.jpg)
27
The DHT as a Service
OpenDHT
What is this interface?
![Page 28: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/28.jpg)
28
It’s not lookup()
lookup(k)
k
What does this node do with it?
Challenges:1. Distribution2. Security
![Page 29: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/29.jpg)
29
How are DHTs Used?1. Storage
- CFS, UsenetDHT, PKI, etc.
2. Rendezvous- Simple: Chat, Instant Messenger
- Load balanced: i3
- Multicast: RSS Aggregation, White Board
- Anycast: Tapestry, Coral
![Page 30: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/30.jpg)
30
What about put/get?
Works easily for storage applications
Easy to share- No upcalls, so no code distribution or security complications
But does it work for rendezvous?- Chat? Sure: put(my-name, my-IP)
- What about the others?
![Page 31: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/31.jpg)
31
Protecting Against Overuse
Must protect system resources against overuse- Resources include network, CPU, and disk
- Network and CPU straightforward
- Disk harder: usage persists long after requests
Hard to distinguish malice from eager usage- Don’t want to hurt eager users if utilization low
Number of active users changes over time- Quotas are inappropriate
![Page 32: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/32.jpg)
32
Fair Storage Allocation
Our solution: give each client a fair share- Will define “fairness” in a few slides
Limits strength of malicious clients- Only as powerful as they are numerous
Protect storage on each DHT node separately- Must protect each subrange of the key space
- Rewards clients that balance their key choices
![Page 33: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/33.jpg)
33
The Problem of Starvation
Fair shares change over time- Decrease as system load increases
time
Client 1 arrivesfills 50% of disk
Client 2 arrivesfills 40% of disk
Client 3 arrivesmax share = 10%
Starvation!
![Page 34: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/34.jpg)
34
Preventing Starvation
Simple fix: add time-to-live (TTL) to puts- put (key, value) put (key, value, ttl)
Prevents long-term starvation- Eventually all puts will expire
![Page 35: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/35.jpg)
35
Preventing Starvation
Simple fix: add time-to-live (TTL) to puts- put (key, value) put (key, value, ttl)
Prevents long-term starvation- Eventually all puts will expire
Can still get short term starvation
time
Client A arrivesfills entire of disk
Client B arrivesasks for space
Client A’s valuesstart expiring
B Starves
![Page 36: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/36.jpg)
36
Preventing Starvation
Stronger condition:Be able to accept rmin bytes/sec new data at all times
This is non-trivial to arrange!
Reserved for futureputs. Slope = rmin
Candidate put
TTL
size
Sum must be < max capacity
time
space
max
max0now
![Page 37: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/37.jpg)
37
Preventing Starvation
Stronger condition:Be able to accept rmin bytes/sec new data at all times
This is non-trivial to arrange!
TTL
size
time
space
max
max0now
TTLsize
time
space
max
max0now
Violation!
![Page 38: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/38.jpg)
38
Preventing Starvation
Formalize graphical intuition:
f() = B(tnow) - D(tnow, tnow+ ) + rmin • D(tnow, tnow+ ): aggregate size of puts expiring in the
interval (tnow, tnow+ )
To accept put of size x and TTL l:
f() + x < C for all 0 ≤ < l
Can track the value of f efficiently with a tree- Leaves represent inflection points of f
- Add put, shift time are O(log n), n = # of puts
![Page 39: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/39.jpg)
39
Fair Storage Allocation
Per-clientput queues
Queue full:reject put
Not full:enqueue put
Select mostunder-
represented
Wait until canaccept withoutviolating rmin
Store andsend accept
message to client
The Big Decision: Definition of “most under-represented”
![Page 40: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/40.jpg)
40
Defining “Most Under-Represented”
Not just sharing disk, but disk over time- 1 byte put for 100s same as 100 byte put for 1s
- So units are bytes seconds, call them commitments
Equalize total commitments granted?- No: leads to starvation
- A fills disk, B starts putting, A starves up to max TTL
time
Client A arrivesfills entire of disk
Client B arrivesasks for space
B catches up with A
Now A Starves!
![Page 41: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/41.jpg)
41
Defining “Most Under-Represented”
Instead, equalize rate of commitments granted- Service granted to one client depends only on others putting “at same
time”
time
Client A arrivesfills entire of disk
Client B arrivesasks for space
B catches up with A
A & B shareavailable rate
![Page 42: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/42.jpg)
42
Defining “Most Under-Represented”
Instead, equalize rate of commitments granted- Service granted to one client depends only on others putting “at same
time”
Mechanism inspired by Start-time Fair Queuing- Have virtual time, v(t)
- Each put gets a start time S(pci) and finish time F(pc
i)
F(pci) = S(pc
i) + size(pci) ttl(pc
i)
S(pci) = max(v(A(pc
i)) - , F(pci-1))
v(t) = maximum start time of all accepted puts
![Page 43: 1 CS 268: Lecture 22 DHT Applications Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California,](https://reader034.vdocuments.mx/reader034/viewer/2022042821/56649d255503460f949fc0fc/html5/thumbnails/43.jpg)
43
FST Performance