prof. younghee lee 1 1 computer networks u lecture 11: peer to peer prof. younghee lee

1Prof. Younghee Lee1

Computer networks Lecture 11: Peer to Peer

Prof. Younghee Lee


Peer-to-Peer? Centralized server Distributed server

– Client –server paradigm» Plat: RPC» Hierarchical: DNS, mount

– Peer to Peer paradigm» both a web client and a transient web server: Easy part

How a peer determines which peers have the desired content?– Connected peers that have copies of the desired object.: Difficult part

» Dynamic member list makes it more difficult– Pure: Gnutella, Chord– Hybrid: Napster, Groove

Other challenges– Scalability: up to hundred of thousands or millions of machines – Dynamicity: machines can come and go any time


Peer-to-Peer? Network is hard to change. Especially the Internet! Overlay network => end node => easy to change If overlay end nodes act as the network node?

– Overlay multicast– (VoIP)– File sharing

The Internet was an overlay on the telephone network Future Internet

– “naming” : key design issue today– Querying and data independence : key to tomorrow?

» Decouple application level API from data organization


Peer-to-Peer? Share the resources of individual peers

– CPU, disk, bandwidth, information, …

Communication and collaboration– Magi, Groove, Skype

File sharing– Napster, Gnutella, Kazaa, Freenet, Overnet

P2P applications built over emerging overlays– PlanetLab


Peer-to-Peer? Distributed computing

– SETI@Home» is a scientific experiment that will harness the power of hundreds of

thousands of Internet-connected computers in the Search for Extra-Terrestrial Intelligence

» Server assigns work unit: Computers send machine information to server Server assigns Task Computers send results

– folding@Home» distributed computing project which studies protein folding, misfolding,

aggregation, and related diseases


Overlay Networks Virtual edge: TCP connection or simply a pointer to an IP address Overlay maintenance: Ping?, messaging?, new edge? (neighbor

goes down), ..

TCP/IP

P2P/overlay middleware

P2P applications/ DNS, CDN, ALM,..


P2P file sharing Napster

– Centralized, sophisticated search– C-S search– Point to point file transfer

Gnutella– Open source, Flooding, TTL, unreachable nodes

FastTrack (KaZaA)– Heterogeneous peers

Freenet – Anonymity, caching, replication


Centralized directory : Napster Napster: first commercial company,

– for MP3 distribution Large scale server(server farm) How to find a file:

– On startup, client contacts central server and reports list of files

– Query the index system return a machine that stores the required file» Ideally this is the closest/least-loaded machine

– ftp the file directly from peer Centralized index

– Lawsuits– Denial of service

Copyright issues– Direct infringement: download/upload– Indirect infringement: Individual accountables for actions of others, contribut

ory


Centralized Lookup (Napster)

Inform & update

Publisher@

ClientQuery for contentLookup(“title”)

N6

N9 N7

DB

N8

N3

N2N1SetLoc(“title”, N4)

•Simple, •but O(N) state•single point of failure•Performance bottleneck•Copyright infringement

Key=“title”Value=MP3 data…

N4

To keep its database current • Directory server can determine when peer become disconnected

-send msg periodically to the peers-Permanent TCP connection with connected peer


Decentralized directory:Flooding(1)

Gnutella – Distribute file location– Idea: flood the request– How to find a file:

» Send request to all neighbors» Neighbors recursively multicast the request» A machine that has the file receives the request, and it sends back the ans

wer» Transfers are done with HTTP between peers

– Advantages:» Totally decentralized, highly robust

– Disadvantages:» Not scalable; the entire network can be swamped with request (to alleviate

this problem, each request has a TTL(limited scope query)» worst case O(N) messages per lookup


Decentralized directory:Flooding(2)

FastTrack (aka Kazaa) – Modifies the Gnutella protocol into two-level hierarchy

» Hybrid of Gnutella and Napster– Group leader: Super node

» Nodes that have better connection to Internet» Act as temporary directory servers for other nodes in group» Maintains database, mapping names of content to IP address of its group member» Not a dedicated server; an ordinary server

– Bootstrapping node» A peer wants to join the network contacts this node.» This node can designate this peer as new bootstrapping node.

– Standard nodes: Ordinary node» Connect to super nodes and report list of files» Allows slower nodes to participate

– Broadcast (Gnutella-style) search across Group leader peer; Query flooding– Drawbacks

» Fairly complex protocol to construct and maintain the overlay network» Group leader have more responsibility. Not truly decentralized » Still not purely serverless(Bootstrapping node is on “always up server”)

Overlay peer

Group leader peer

Neighboring relationshipsIn overlay network

KaZaA metadata- File name- File size- Content Hash- File descriptors: used for keyword matches during query


Gossip protocols

Epidemic algorithms Originally targeted at database replication

– Rumor mongering» Propagate newly received update to k random neighbors

Extended to routing– Rumor mongering of queries instead of flooding


Hierarchical Networks IP

– Hierarchical routing DNS

– Hierarchical name space» Client + hierarchy of server

Pros & Cons of Hierarchical data management– Work well for things aligned with hierarchy

» Physical locality – Inflexible: no data independence

A Layered naming Architecture for the Internet [Balak+ 04]– Three levels of name resolution: for “mobility. Multihoming, integrate m

iddlebox(NAT, firewall)…..”» From user level descriptors to service identifiers (SID)» From SID to endpoint identifier (EID)» From EID to IP address ; DNS

– Flat names for SID and EID– Scalable resolution for flat architecture? => DHT


Commercial products JXTA

– Java/XML Framework for p2p applications– Name resolution and routing is done with floods & superpeers

MS WinXP p2p networking– An unstructured overlay, flooded publication and caching– “does not yet support distributed searches”

Security support– Authentication via signatures (assumes a trusted authority)– Encryption of traffic

Groove– Platform for p2p “experience”. MS collaboration tools.

» Microsoft Office 2007– Client-server name resolution, backup services, etc.


Routed lookups: Freenet, Chord, CAN

N4Publisher

Client

N6

N9

N7N8

N3

N2N1

Lookup(“title”)

Key=“title”Value=MP3 data…


Routing: Freenet Addition goals to file location:

– Provide publisher anonymity, security – Resistant to attacks – a third party shouldn’t be able to deny the

access to a particular file (data item, object), even if it compromises a large fraction of machines

Architecture:– Each file is identified by a unique identifier– Each machine stores a set of files, and maintains a “routing table” to

route the individual requests Files are stored according to associated key; unique

identifier– Core idea: try to cluster information about similar keys

Messages– Random 64bit ID used for loop detection


Routing: Freenet Routing Tables Each node maintains a common stack

– id – file identifier– next_hop – another node that stores the file id– file – file identified by id being stored on the

local node Forwarding of query for file id

– If file id stored locally, then stop» Forward data back to upstream requestor» Requestor adds file to cache, adds entry in routing table

– If not, search for the “closest” id in the stack, and forward the message to the corresponding next_hop

– If data is not found, failure is reported back» Requestor then tries next closest match in routing table

id next_hop file

……


Query API: file = query(id); Notes:

– Any node forwarding reply may change the source of the reply (to itself or any other node)

» Helps anonymity

– Each query is associated a TTL that is decremented each time the query message is forwarded;

» to obscure distance to originator: TTL can be initiated to a random value within some bounds When TTL=1, the query is forwarded with a finite probability Depth counter

– Opposite of TTL – incremented with each hop– Depth counter initialized to small random value

– Each node maintains the state for all outstanding queries that have traversed it help to avoid cycles

– When file is returned, the file is cached along the reverse path


Routing: Freenet Example

Note: doesn’t show file caching on the reverse path

4 n1 f412 n2 f12 5 n3

9 n3 f9

3 n1 f314 n4 f14 5 n3

14 n5 f1413 n2 f13 3 n6

n1 n2

n3

n4

4 n1 f410 n5 f10 8 n6

n5

query(10)

1

2

3

4

4’

5


Insert

API: insert(id, file); Two steps

– Search for the file to be inserted– If not found, insert the file

Searching:– like query, but nodes maintain state after a collision is

detected and the reply is sent back to the originator Insertion

– Follow the forward path; insert the file at all nodes along the path

– A node probabilistically replace the originator with itself; obscure the true originator


Cache Management

LRU(Least Recently Used) Cache of files– Files are not guaranteed to live forever

» Files “fade away” as fewer requests are made for them

File contents can be encrypted with original text names as key(id)– Cache owners do not know either original name or co

ntents cannot be held responsible


Freenet Summary

Advantages– Provides publisher anonymity– Totally decentralize architecture robust and

scalable– Resistant against malicious file deletion

Disadvantages– Does not always guarantee that a file is found, even

if the file is in the network


Routing: Structured Approaches Goal: make sure that an item (file) identified is

always found in a reasonable # of steps Abstraction: a distributed hash-table (DHT) data

structure – insert(id, item);– item = query(id);– Note: item can be anything: a data object, document, file,

pointer to a file… Proposals

– CAN (ICIR/Berkeley)– Chord (MIT/Berkeley)– Pastry (Rice)– Tapestry (Berkeley)


High level idea: Indirection

Indirection in space– Logical IDs (Content based)

» Routing to those IDs» “Content addressable” network

– Tolerant of “nodes joining and leaving the network” Indirection in time

– Scheme to temporally decouple send and receive– Soft state

» “publisher” requests TTL on storage

Distributed Hash Table– Directed search


Distributed Hash Table (DHT)

Hash table– Data structure that maps “keys” to “values”

DHT– Hash table but spread across the Internet

Interface – insert(key, value)– lookup(key)

Every DHT node supports a single operation– Given key as input route messages toward node holding

key


Distributed Hash Table (DHT)


K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: put()

Operation: take key as input; route messages to node holding key

insert(K1,V

1)

(K1,V1)


retrieve (K1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: get()

Operation: take key as input; route messages to node holding key


Routing: Chord

Associate to each node and item a unique id in an uni-dimensional space

Goals– Scales to hundreds of thousands of nodes– Handles rapid arrival and failure of nodes

Properties – Routing table size O(log(N)) , where N is the total nu

mber of nodes– Guarantees that a file is found in O(log(N)) steps


Aside:Consistent Hashing [Karger97]

•A key is stored at its successor: node with next higher ID•This is designed to let nodes enter and leave the network with minimal disruption


Routing: Chord Basic Lookup


Routing: Finger table - Faster Lookups


Routing: join operation


Routing: join operation

Before and after node 6 joins.


Routing: Chord Summary

Assume identifier space is 0…2m

Each node maintains– Finger table

» Entry i in the finger table of n is the first node that succeeds or equals n + 2i

– Predecessor node

An item identified by id is stored on the successor node of id

Pastry– Similar to Chord


Routing: Chord Example

Assume an identifier space 0..8 Node n1:(1) joinsall entries in its

finger table are initialized to itself 01

2

34

5

6

7

i id+2i succ0 2 11 3 12 5 1

Succ. Table



Node n2:(3) joins

01

2

34

5

6

7

i id+2i succ0 2 21 3 12 5 1

Succ. Table

i id+2i succ0 3 11 4 12 6 1

Succ. Table



Nodes n3:(0), n4:(6) join

01

2

34

5

6

7

i id+2i succ0 2 21 3 62 5 6

Succ. Table

i id+2i succ0 3 61 4 62 6 6

Succ. Table

i id+2i succ0 1 11 2 22 4 0

Succ. Table

i id+2i succ0 7 01 0 02 2 2

Succ. Table


Routing: Chord Examples

Nodes: n1:(1), n2(3), n3(0), n4(6) Items: f1:(7), f2:(2)

01

2

34

5

6

7 i id+2i succ0 2 21 3 62 5 6

Succ. Table

i id+2i succ0 3 61 4 62 6 6

Succ. Table

i id+2i succ0 1 11 2 22 4 0

Succ. Table

7

Items 1

Items

i id+2i succ0 7 01 0 02 2 2

Succ. Table


Routing: Query

Upon receiving a query for item id, a node

Check whether stores the item locally

If not, forwards the query to the largest node in its successor table that does not exceed id 0

1

2

34

5

6

7 i id+2i succ0 2 21 3 62 5 6

Succ. Table

i id+2i succ0 3 61 4 62 6 6

Succ. Table

i id+2i succ0 1 11 2 22 4 0

Succ. Table

7

Items 1

Items

i id+2i succ0 7 01 0 02 2 2

Succ. Table

query(7)


CAN: Query Example

Each node knows its neighbors in the d-space

Forward query to the neighbor that is closest to the query id

Example: assume n1 queries f4 Can route around some failures

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4


Node Failure Recovery

Simple failures– Know your neighbor’s neighbors– When a node fails, one of its neighbors takes over its

zone

More complex failure modes– Simultaneous failure of multiple adjacent nodes – Scoped flooding to discover neighbors– Hopefully, a rare event


Routing: Concerns/optimization Each hop in a routing-based P2P network can be expensive

– No correlation between neighbors and their location– A query can repeatedly jump from Europe to North America, though both the

initiator and the node that store the item are in Europe!– Solutions: Tapestry takes care of this implicitly; CAN and Chord maintain multiple

copies for each entry in their routing tables and choose the closest in terms of network distance

CAN/Chord Optimizations– Weight neighbor nodes by RTT

» When routing, choose neighbor who is closer to destination with lowest RTT from me» Reduces path latency

– Multiple physical nodes per virtual node» Reduces path length (fewer virtual nodes)» Reduces path latency (can choose physical node from virtual node with lowest RTT)» Improved fault tolerance (only one node per zone needs to survive to allow routing through the zone)

What type of lookups?– Only exact match!


BitTorent A p2p file sharing system

– Load sharing through file splitting– Uses bandwidth of peers instead of a server

Successfully used:– Used to distribute RedHat 9 ISOs (about 80TB)

Setup– A “seed” node has the file– File is split into fixed-size segments (256KB typ)– Hash calculated for each segment– A “tracker” node is associated with the file– A “.torrent” meta-file is built for the file – identifies the address of the tracker nod

e– The .torrent file is passed around the web


BitTorent Download

– A client contacts the tracker identified in the .torrent file (using HTTP)– Tracker sends client a (random) list of peers who have/are downloading the file– Client contacts peers on list to see which segments of the file they have– Client requests segments from peers (via TCP)– Client uses hash from .torrent to confirm that segment is legitimate– Client reports to other peers on the list that it has the segment– Other peers start to contact client to get the segment (while client is getting

other segments)

file.torrent info:• length• name• hash• url of tracker


Conclusions

Distributed Hash Tables are a key component of scalable and robust overlay networks

CAN: O(d) state, O(d*n1/d) distance Chord: O(log n) state, O(log n) distance Both can achieve stretch < 2 Simplicity is key Services built on top of distributed hash tables

– p2p file storage, i3 (chord)– multicast (CAN, Tapestry)– persistent storage (OceanStore using Tapestry)

prof. younghee lee 1 1 computer networks u lecture 11: peer to peer prof. younghee lee

Documents