8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 1/17
CS514: IntermediateCourse in Computer SystemsLecture 25: Nov 17, 2003“Peer-to-peer protocols for file and data replication: file sharing”
CS514
P2P file sharing
File sharing dominates traffic usagefor University of Washington
Recent study, presented recently atCornell by Hank LevyKazaa
This is a recent sea change, so P2Pphenomenon worth looking at
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 2/17
CS514
File sharing is nothing new
Has been going on over IRC (Internet RelayChat) for years
Chat groups focusing on certain artists or genres
Upload servers were popular for a whileUsers would upload (FTP) a songThis allowed them to download N songsPerformance typically sucked
The record industry shut these downIn all the above it took user effort to findwhat was wanted
CS514
Napster changed everything
Central search engine, but peer-to-peer file transfer 160+ search engines at peek
User attaches to one of themEngine would index collections of its own active usersSearch on a given engine returned results from thatengineUnless not enough results, then would ask other engines
• This is what I understand from Saroiu et.al. U-Washmeasurement paper
Peer-to-peer file transfer improved scalability
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 3/17
CS514
Napster problem
As we all know, the problem withNapster is that it is a single point of litigationGnutella designed as a lawyer-resilient Napster
Nothing more, nothing less (in myopinion)
CS514
In the meanwhile…
Ian Clarke (Edinburgh, 1999*) wasthinking about P2P from an anonymityperspective
He was interested in free speechWhistleblowers, political dissent, etc.
Ian designed Freenet as a classprojectFreenet is not so much file sharing asit is a publishing medium
* Before Gnutella
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 4/17
CS514
Freenet, Gnutella, andDHTs: One out of three…
Gnutella
Freenet
DHT
AnonymityKeywordsearch
Scalability
CS514
Get-em-out-quick projects
Both Gnutella and Freenet were quick-and-dirty prototypes
Neither really worked through all the issues
As a result, both are deeply flawedNevertheless, both captured the popular imagination, and so are worth talking about
And both have spawned new workFor instance, FastTrack (Kazaa, Grokster …)spawned from Gnutella
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 5/17
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 6/17
CS514
Bootstrap
DHT:Have to know at least one active member
Gnutella:Have to know at least one active member
Freenet:Have to know at least one active member
Scaling issue for everybodyBut fundamental operational issue for Gnutella (which cannot have any centralpoint of control)
CS514
Bootstrap
Various possible ways to know a current member Email from friends, web site with list of addresses,rendezvous server with active knowledge of P2PnetworkPastry folks suggest using a universal DHT tobootstrap other DHTs
Gnutella uses rendezvous server approach, called“pong server” or “host cache”!!!
Host cache lists are distributed with software• They may change, so also listed on various websites,
forums, etc.Single point of litigation!Scaling issue also
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 7/17
CS514
Neighbor discovery andselection
DHT:Search network, download neighbor tables,adjust as needed
Gnutella:“Pong” message truncated flooded throughnetwork“Ping” messages returned from each nodevia reverse path of pong
Freenet:Not sure about initial discovery and selection
CS514
Network (neighbor)maintenance
DHT:Nodes ping each other, do repair whenneighbor lost (or in background)
Gnutella:
Nodes only need to have enough activeneighbors … no structure to maintainFreenet:
Loose structure…learns of neighbors near itself in the node ID space over time throughprocess of insertion and search
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 8/17
CS514
File insertion and storage
DHT:File (or file pointer) stored at hashed nodemay be replicated or cached
Gnutella:File stored at owner node
Freenet:File stored in “vicinity” of hashed node
may be cached
CS514
File search and retrieval
DHT:Search for hashed node or replicaOpportunistically may find cache
Gnutella:Truncated flood of search queryFreenet:
“Soft” search based on “steepest-ascent hill-climbing”
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 9/17
CS514
File deletion
DHT:Send delete message to hashed node
Gnutella:Delete from local directory
Freenet:LRU replacement policyNo explicit delete
(No update either, because don’t know howto flush old copies)
CS514
Freenet structure
Every node gets a GUIDGlobally Unique IDSame hash space as filesAssigned by some cryptographic distributedrandom number generation protocol runamong nodes discovered with a truncatedrandom walk
• This is supposed to prevent a newcomer frommaking up its own GUID, though seems easy tobreak to me…
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 10/17
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 11/17
CS514
How Freenet supportsanonymity
All messages are encrypted (hop-by-hop)Source of search (find and insert) is hidden
Each node remembers previous hop in chainback to source
Originator of file does not store file innetworkNodes in search path occasionally(randomly) claim to be the holder of a file
Search can start with initial random walk tohide “location” of searcher (partly deduciblethrough TTL value)
CS514
Freenet scalability: Fitscurve of N 0.28
That’s pretty good! But . . .
(simulation,no deletes)
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 12/17
CS514
Distribution of node degree
Artificial max nodedegree of 250
CS514
Distribution of node degree
Reinforcing nature of learning tends toconcentrate knowledge on a smallpercentage of nodesFreenet inventors consider this a feature,
but . . .“Small world network” (power-law distributionof node degree)
Terrible load balance!!!In essence like centralizing the search!!!
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 13/17
CS514
Freenet’s anonymity is weakat best
Attacker can attach itself all over thenetwork
Pretend to have lots of different keys so attach to lotsof nodesNow can see a lot of the traffic
Attacker can push files out of the system byauthoring lots of files of similar ID, and bysuppressing requests for the file
Cache file itself and answer queries from cache
Other nodes won’t see queries, so won’t refreshcache
CS514
Freenet’s anonymity is weakat best
Originator has to repeatedly refresh the fileLook for these refreshes, and deducelocation of originator
Of course, can similarly find out who is
searching for certain filesIn fact, repressive government’s beststrategy might be to encourage Freenet, inorder to encourage usage by subversivesand find them!
Weak anonymity worse than none at all!
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 14/17
CS514
Fundamental problem withGnutella
Scalability of broadcast searchLow bandwidth nodes essentially unusablebecause of search load
Solution is to elect “super-nodes”(FastTrack does this) among stable, well-connected participantsClients upload index to super-nodesTwo positive effects:
Search sent to nodes better equipped tohandle themSearches sent to fewer nodes total
CS514
Gnutella load balancing isvery bad
Suffers from same “small world network”phenomenon as Freenet
Saroiu et.al. measurement studyBroadcast discovery (ping) tends to
discover already well-connected nodesConnecting to them makes them more well-connected
Well-connected nodes will getdisproportionate share of searchesLike Freenet, this makes it easy for anattacker to be everywhere in the network
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 15/17
CS514
Gnutella file transfer performed poorly
>50% of downloads failed (in 2001)Bad client softwareThinly connected clients
Improvements:Persistent automatic attempts by client (actslike pub/sub in a way)Chunked data
• Allows interrupted downloads to continue later • Allows “striped” parallel download
CS514
P2P measurement study
Saroiu, Gummadi, Gribble, UWashCrawled Gnutella and Napster Measured:
Bottleneck bandwidthsIP-level latenciesConnect/disconnect patternsDegree of sharing and downloadDegree of cooperationCorrelations of above
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 16/17
CS514
P2P measurement studymajor results
Extreme heterogeneity7% of users offer more than 50% of files20% longest sessions an order of magnitude longer than 20% shortestsessionsDesign for extreme heterogeneity
Gnutella overlays often disjoint
CS514
P2P measurement studymajor results
Latency: Closest 20% 4 times closer thanfurthest 20%
Have techniques to find low-latency neighbors
Peers deliberately misrepresent themselves30% advertise lower than actual bandwidth,presumably to discourage upload25% don’t advertise bandwidthDon’t trust clients to be honest, measuretheir capabilities in the system
8/14/2019 CS514: Intermediate Course in Computer Systems
http://slidepdf.com/reader/full/cs514-intermediate-course-in-computer-systems 17/17
CS514
Session duration
CS514
Status of distributed filesharing
FastTrack/Kazaa the big player todayRIAA tried to sue them, claiming:
They control the networkThey are aware of and encourage piratingThey profit from it
RIAA lost (now they are suing users!)But I suspect that RIAA is right in that Kazaacould easily control the network