1 secure peer-to-peer file sharing frans kaashoek, david karger, robert morris, ion stoica, hari...
TRANSCRIPT
1
Secure Peer-to-Peer File Sharing
Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan
http://www.pdos.lcs.mit.edu/chord
MIT Laboratory for Computer Science
2
SFS: a secure global file system
• One name space for all files
• Global deployment
• Security over untrusted networks
ServerOxygen
clientClient
MIT
H21
H21
/global/mit/kaashoek/sfs
3
SFS results
• Research: how to do server authentication?– Self-certifying pathnames– flexible key management
• Complete system available– www.fs.net– 65,000 lines of C++ code– Toolkit for file system research
• System used inside and outside MIT• Ported to iPAQ
4
New direction: peer-to-peer file sharing
• How to build distributed systems without centrally-managed servers?• Many Oxygen technologies are peer-to-peer
– INS, SFS/Chord, Grid
• Chord is a new elegant primitive for building peer-to-peer applications
5
Peer-to-Peer Properties
• Main advantage: decentralized– No single point of failure in the system– More robust to random faults or adversaries– No need for central adminstration
• Main disadvantage: decentralized– All failures equally important---no “clients”– Difficult to coordinate use of resources– No opportunity for central administration
6
Peer-to-Peer Challenges
• Load balancing– No node should be overloaded
• Coordination– Agree globally on who responsible for what
• Dynamic network/fault tolerance– Readjust responsibility as peers come and go
• Scalability– Rousources per peer must be negligible
7
Peer-to-peer sharing example
• Internet users share music files – Share disk storage and network bandwidth
– 10Gb for 1 hour/day continuous 400Mb
Internet
8
Key Primitive: Lookup
insert
find
9
Chord: a P2P Routing Primitive
• Lookup is the key problem– Given identifier, find responsible machine
• Lookup is not easy:– GNUtella scales badly---too much lookup work– Freenet is imprecise---lookups can fail
• Chord lookup provides:– Good naming semantics and efficiency– Elegant base for layered features
10
Chord Architecture
• Interface:– Lookup(ID) IP-address– ID might be node name, document ID, etc.– Get IP address of node responsible for ID– Application decides what to do with IP address
• Chord consists of– Consistent hashing to assign IDs to nodes– Efficient routing protocol to find right node– Fast join/leave protocol
11
Chord Properties
• Log(n) lookup messages and table space.– Log(1,000,000) 20
• Well-defined location for each ID– No search required
• Natural load balance
• Minimal join/leave disruption
• Does not store documents– But document store layers easily on top
12
Assignment of Responsibility
13
Consistent Hashing
14
Consistent Hashing
06
13
18
22
31
47
4236
51
60
Each node picks random point on identifier circle
15
Consistent Hashing
Hash document ID to identifier
circle
06
13
18
22
31
47
4236
51
60
49
16
Consistent Hashing
06
13
18
22
31
47
4236
51
60
Assign ID to “successor”
node on circle49
Assign doc with hash 49 to node 51
17
Load Balance
• Each node responsible for circle segment between it and previous node
• But random node positions mean previous node close
• So no node responsible for too much 31
22
Segment for node 31
18
Dynamic Network
• To know appropriate successor, must know identifiers of all nodes on circle
• Requires lots of state per node
• And state must be kept current
• Requires huge number of messages when a node joins or leaves
19
Successor Pointers
• Each node keeps track of successor on circle
• To find objects, walk around circle using successor pointers
• When node joins, notify one node to update successor
• Problem: slow!
06
13
18
22
31
47
42 36
51
60
20
Fingers• Each node keeps
carefully chosen “fingers”---shortcuts around circle
• For distant ID, shortcut covers much distance
• Result: – fast lookups
– small tables
06
13
18
22
31
47
4236
51
60
21
Powers of 2
• Node at ID n stores fingers to nodes at IDs n+1/2, n+1/4, n+1/8, n+1/16….
• log(n) fingers needed into n nodes
• Key fact: whatever current node, some power of 2 is halfway to target
• Distance to target halves in each step
• log(n) steps suffice to reach target
• log(1,000,000) ~ 20
22
Chord Lookups0
6
13
18
22
31
47
4236
51
60
23
Node Join Operations
• Integrate into routing mechanism– New node finds successor (via lookup)– Determines fingers (more lookups)– Total: O(log2(n)) time to join network
• Takes responsibility for certain objects from successor – Upcall for application dependent reaction– E.g., may copy documents from other node
24
Fault Tolerance
• Node failures have 2 problems:– Lost data– Corrupted routing (fingers cut off)
• Data solution: replicate:– Place copies of data at adjacent nodes– If successor fails, next node becomes successor
• Finger solution: alternate paths– If finger lost, use different (shorter) finger– Lookups still fast
25
File sharing with Chord
Chord ChordChord
Key/Value Key/Value Key/Value
Client App(e.g. Browser)
Client Server Server
lookup(id)
get(key)put(k, v)
• Fault tolerance: store values at r successors• Hot documents: cache values along Chord lookup path• Authentication: self-certifying names (SFS)
26
Chord Status
• Working Chord implementation
• SFSRO file system layered on top
• Prototype deployed at 12 sites around world
• Understand design tradeoffs
27
Open Issues
• Network proximity
• Malicious data insertion
• Malicious Chord table information
• Anonymity
• Keyword search and indexing
28
Chord Summary
• Chord provides distributed lookup– Efficient, low-impact join and leave
• Flat key space allows flexible extensions
• Good foundation for peer-to-peer systems
http://www.pdos.lcs.mit.edu/chord