prof. younghee lee 1 1 computer networks u lecture 11: peer to peer prof. younghee lee

46
1 Prof. Younghee Lee 1 Computer networks Lecture 11: Peer to Peer Prof. Younghee Lee

Upload: alexis-newton

Post on 19-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

1Prof. Younghee Lee1

Computer networks Lecture 11: Peer to Peer

Prof. Younghee Lee

Page 2: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

2Prof. Younghee Lee2

Peer-to-Peer? Centralized server Distributed server

– Client –server paradigm» Plat: RPC» Hierarchical: DNS, mount

– Peer to Peer paradigm» both a web client and a transient web server: Easy part

How a peer determines which peers have the desired content?– Connected peers that have copies of the desired object.: Difficult part

» Dynamic member list makes it more difficult– Pure: Gnutella, Chord– Hybrid: Napster, Groove

Other challenges– Scalability: up to hundred of thousands or millions of machines – Dynamicity: machines can come and go any time

Page 3: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

3Prof. Younghee Lee3

Peer-to-Peer? Network is hard to change. Especially the Internet! Overlay network => end node => easy to change If overlay end nodes act as the network node?

– Overlay multicast– (VoIP)– File sharing

The Internet was an overlay on the telephone network Future Internet

– “naming” : key design issue today– Querying and data independence : key to tomorrow?

» Decouple application level API from data organization

Page 4: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

4Prof. Younghee Lee4

Peer-to-Peer? Share the resources of individual peers

– CPU, disk, bandwidth, information, …

Communication and collaboration– Magi, Groove, Skype

File sharing– Napster, Gnutella, Kazaa, Freenet, Overnet

P2P applications built over emerging overlays– PlanetLab

Page 5: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

5Prof. Younghee Lee5

Peer-to-Peer? Distributed computing

– SETI@Home» is a scientific experiment that will harness the power of hundreds of

thousands of Internet-connected computers in the Search for Extra-Terrestrial Intelligence

» Server assigns work unit: Computers send machine information to server Server assigns Task Computers send results

– folding@Home» distributed computing project which studies protein folding, misfolding,

aggregation, and related diseases

Page 6: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

6Prof. Younghee Lee6

Overlay Networks Virtual edge: TCP connection or simply a pointer to an IP address Overlay maintenance: Ping?, messaging?, new edge? (neighbor

goes down), ..

TCP/IP

P2P/overlay middleware

P2P applications/ DNS, CDN, ALM,..

Page 7: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

7Prof. Younghee Lee7

P2P file sharing Napster

– Centralized, sophisticated search– C-S search– Point to point file transfer

Gnutella– Open source, Flooding, TTL, unreachable nodes

FastTrack (KaZaA)– Heterogeneous peers

Freenet – Anonymity, caching, replication

Page 8: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

8Prof. Younghee Lee8

Centralized directory : Napster Napster: first commercial company,

– for MP3 distribution Large scale server(server farm) How to find a file:

– On startup, client contacts central server and reports list of files

– Query the index system return a machine that stores the required file» Ideally this is the closest/least-loaded machine

– ftp the file directly from peer Centralized index

– Lawsuits– Denial of service

Copyright issues– Direct infringement: download/upload– Indirect infringement: Individual accountables for actions of others, contribut

ory

Page 9: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

9Prof. Younghee Lee9

Centralized Lookup (Napster)

Inform & update

Publisher@

ClientQuery for contentLookup(“title”)

N6

N9 N7

DB

N8

N3

N2N1SetLoc(“title”, N4)

•Simple, •but O(N) state•single point of failure•Performance bottleneck•Copyright infringement

Key=“title”Value=MP3 data…

N4

To keep its database current • Directory server can determine when peer become disconnected

-send msg periodically to the peers-Permanent TCP connection with connected peer

Page 10: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

10Prof. Younghee Lee10

Decentralized directory:Flooding(1)

Gnutella – Distribute file location– Idea: flood the request– How to find a file:

» Send request to all neighbors» Neighbors recursively multicast the request» A machine that has the file receives the request, and it sends back the ans

wer» Transfers are done with HTTP between peers

– Advantages:» Totally decentralized, highly robust

– Disadvantages:» Not scalable; the entire network can be swamped with request (to alleviate

this problem, each request has a TTL(limited scope query)» worst case O(N) messages per lookup

Page 11: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

11Prof. Younghee Lee11

Decentralized directory:Flooding(2)

FastTrack (aka Kazaa) – Modifies the Gnutella protocol into two-level hierarchy

» Hybrid of Gnutella and Napster– Group leader: Super node

» Nodes that have better connection to Internet» Act as temporary directory servers for other nodes in group» Maintains database, mapping names of content to IP address of its group member» Not a dedicated server; an ordinary server

– Bootstrapping node» A peer wants to join the network contacts this node.» This node can designate this peer as new bootstrapping node.

– Standard nodes: Ordinary node» Connect to super nodes and report list of files» Allows slower nodes to participate

– Broadcast (Gnutella-style) search across Group leader peer; Query flooding– Drawbacks

» Fairly complex protocol to construct and maintain the overlay network» Group leader have more responsibility. Not truly decentralized » Still not purely serverless(Bootstrapping node is on “always up server”)

Overlay peer

Group leader peer

Neighboring relationshipsIn overlay network

KaZaA metadata- File name- File size- Content Hash- File descriptors: used for keyword matches during query

Page 12: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

12Prof. Younghee Lee12

Gossip protocols

Epidemic algorithms Originally targeted at database replication

– Rumor mongering» Propagate newly received update to k random neighbors

Extended to routing– Rumor mongering of queries instead of flooding

Page 13: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

13Prof. Younghee Lee13

Hierarchical Networks IP

– Hierarchical routing DNS

– Hierarchical name space» Client + hierarchy of server

Pros & Cons of Hierarchical data management– Work well for things aligned with hierarchy

» Physical locality – Inflexible: no data independence

A Layered naming Architecture for the Internet [Balak+ 04]– Three levels of name resolution: for “mobility. Multihoming, integrate m

iddlebox(NAT, firewall)…..”» From user level descriptors to service identifiers (SID)» From SID to endpoint identifier (EID)» From EID to IP address ; DNS

– Flat names for SID and EID– Scalable resolution for flat architecture? => DHT

Page 14: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

14Prof. Younghee Lee14

Commercial products JXTA

– Java/XML Framework for p2p applications– Name resolution and routing is done with floods & superpeers

MS WinXP p2p networking– An unstructured overlay, flooded publication and caching– “does not yet support distributed searches”

Security support– Authentication via signatures (assumes a trusted authority)– Encryption of traffic

Groove– Platform for p2p “experience”. MS collaboration tools.

» Microsoft Office 2007– Client-server name resolution, backup services, etc.

Page 15: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

15Prof. Younghee Lee15

Routed lookups: Freenet, Chord, CAN

N4Publisher

Client

N6

N9

N7N8

N3

N2N1

Lookup(“title”)

Key=“title”Value=MP3 data…

Page 16: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

16Prof. Younghee Lee16

Routing: Freenet Addition goals to file location:

– Provide publisher anonymity, security – Resistant to attacks – a third party shouldn’t be able to deny the

access to a particular file (data item, object), even if it compromises a large fraction of machines

Architecture:– Each file is identified by a unique identifier– Each machine stores a set of files, and maintains a “routing table” to

route the individual requests Files are stored according to associated key; unique

identifier– Core idea: try to cluster information about similar keys

Messages– Random 64bit ID used for loop detection

Page 17: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

17Prof. Younghee Lee17

Routing: Freenet Routing Tables Each node maintains a common stack

– id – file identifier– next_hop – another node that stores the file id– file – file identified by id being stored on the

local node Forwarding of query for file id

– If file id stored locally, then stop» Forward data back to upstream requestor» Requestor adds file to cache, adds entry in routing table

– If not, search for the “closest” id in the stack, and forward the message to the corresponding next_hop

– If data is not found, failure is reported back» Requestor then tries next closest match in routing table

id next_hop file

……

Page 18: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

18Prof. Younghee Lee18

Query API: file = query(id); Notes:

– Any node forwarding reply may change the source of the reply (to itself or any other node)

» Helps anonymity

– Each query is associated a TTL that is decremented each time the query message is forwarded;

» to obscure distance to originator: TTL can be initiated to a random value within some bounds When TTL=1, the query is forwarded with a finite probability Depth counter

– Opposite of TTL – incremented with each hop– Depth counter initialized to small random value

– Each node maintains the state for all outstanding queries that have traversed it help to avoid cycles

– When file is returned, the file is cached along the reverse path

Page 19: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

19Prof. Younghee Lee19

Routing: Freenet Example

Note: doesn’t show file caching on the reverse path

4 n1 f412 n2 f12 5 n3

9 n3 f9

3 n1 f314 n4 f14 5 n3

14 n5 f1413 n2 f13 3 n6

n1 n2

n3

n4

4 n1 f410 n5 f10 8 n6

n5

query(10)

1

2

3

4

4’

5

Page 20: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

20Prof. Younghee Lee20

Insert

API: insert(id, file); Two steps

– Search for the file to be inserted– If not found, insert the file

Searching:– like query, but nodes maintain state after a collision is

detected and the reply is sent back to the originator Insertion

– Follow the forward path; insert the file at all nodes along the path

– A node probabilistically replace the originator with itself; obscure the true originator

Page 21: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

21Prof. Younghee Lee21

Cache Management

LRU(Least Recently Used) Cache of files– Files are not guaranteed to live forever

» Files “fade away” as fewer requests are made for them

File contents can be encrypted with original text names as key(id)– Cache owners do not know either original name or co

ntents cannot be held responsible

Page 22: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

22Prof. Younghee Lee22

Freenet Summary

Advantages– Provides publisher anonymity– Totally decentralize architecture robust and

scalable– Resistant against malicious file deletion

Disadvantages– Does not always guarantee that a file is found, even

if the file is in the network

Page 23: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

23Prof. Younghee Lee23

Routing: Structured Approaches Goal: make sure that an item (file) identified is

always found in a reasonable # of steps Abstraction: a distributed hash-table (DHT) data

structure – insert(id, item);– item = query(id);– Note: item can be anything: a data object, document, file,

pointer to a file… Proposals

– CAN (ICIR/Berkeley)– Chord (MIT/Berkeley)– Pastry (Rice)– Tapestry (Berkeley)

Page 24: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

24Prof. Younghee Lee24

High level idea: Indirection

Indirection in space– Logical IDs (Content based)

» Routing to those IDs» “Content addressable” network

– Tolerant of “nodes joining and leaving the network” Indirection in time

– Scheme to temporally decouple send and receive– Soft state

» “publisher” requests TTL on storage

Distributed Hash Table– Directed search

Page 25: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

25Prof. Younghee Lee25

Distributed Hash Table (DHT)

Hash table– Data structure that maps “keys” to “values”

DHT– Hash table but spread across the Internet

Interface – insert(key, value)– lookup(key)

Every DHT node supports a single operation– Given key as input route messages toward node holding

key

Page 26: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

26Prof. Younghee Lee26

Distributed Hash Table (DHT)

Page 27: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

27Prof. Younghee Lee27

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: put()

Operation: take key as input; route messages to node holding key

insert(K1,V

1)

(K1,V1)

Page 28: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

28Prof. Younghee Lee28

retrieve (K1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: get()

Operation: take key as input; route messages to node holding key

Page 29: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

29Prof. Younghee Lee29

Routing: Chord

Associate to each node and item a unique id in an uni-dimensional space

Goals– Scales to hundreds of thousands of nodes– Handles rapid arrival and failure of nodes

Properties – Routing table size O(log(N)) , where N is the total nu

mber of nodes– Guarantees that a file is found in O(log(N)) steps

Page 30: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

30Prof. Younghee Lee30

Aside:Consistent Hashing [Karger97]

•A key is stored at its successor: node with next higher ID•This is designed to let nodes enter and leave the network with minimal disruption

Page 31: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

31Prof. Younghee Lee31

Routing: Chord Basic Lookup

Page 32: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

32Prof. Younghee Lee32

Routing: Finger table - Faster Lookups

Page 33: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

33Prof. Younghee Lee33

Routing: join operation

Page 34: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

34Prof. Younghee Lee34

Routing: join operation

Before and after node 6 joins.

Page 35: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

35Prof. Younghee Lee35

Routing: Chord Summary

Assume identifier space is 0…2m

Each node maintains– Finger table

» Entry i in the finger table of n is the first node that succeeds or equals n + 2i

– Predecessor node

An item identified by id is stored on the successor node of id

Pastry– Similar to Chord

Page 36: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

36Prof. Younghee Lee36

Routing: Chord Example

Assume an identifier space 0..8 Node n1:(1) joinsall entries in its

finger table are initialized to itself 01

2

34

5

6

7

i id+2i succ0 2 11 3 12 5 1

Succ. Table

Page 37: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

37Prof. Younghee Lee37

Routing: Chord Example

Node n2:(3) joins

01

2

34

5

6

7

i id+2i succ0 2 21 3 12 5 1

Succ. Table

i id+2i succ0 3 11 4 12 6 1

Succ. Table

Page 38: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

38Prof. Younghee Lee38

Routing: Chord Example

Nodes n3:(0), n4:(6) join

01

2

34

5

6

7

i id+2i succ0 2 21 3 62 5 6

Succ. Table

i id+2i succ0 3 61 4 62 6 6

Succ. Table

i id+2i succ0 1 11 2 22 4 0

Succ. Table

i id+2i succ0 7 01 0 02 2 2

Succ. Table

Page 39: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

39Prof. Younghee Lee39

Routing: Chord Examples

Nodes: n1:(1), n2(3), n3(0), n4(6) Items: f1:(7), f2:(2)

01

2

34

5

6

7 i id+2i succ0 2 21 3 62 5 6

Succ. Table

i id+2i succ0 3 61 4 62 6 6

Succ. Table

i id+2i succ0 1 11 2 22 4 0

Succ. Table

7

Items 1

Items

i id+2i succ0 7 01 0 02 2 2

Succ. Table

Page 40: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

40Prof. Younghee Lee40

Routing: Query

Upon receiving a query for item id, a node

Check whether stores the item locally

If not, forwards the query to the largest node in its successor table that does not exceed id 0

1

2

34

5

6

7 i id+2i succ0 2 21 3 62 5 6

Succ. Table

i id+2i succ0 3 61 4 62 6 6

Succ. Table

i id+2i succ0 1 11 2 22 4 0

Succ. Table

7

Items 1

Items

i id+2i succ0 7 01 0 02 2 2

Succ. Table

query(7)

Page 41: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

49Prof. Younghee Lee49

CAN: Query Example

Each node knows its neighbors in the d-space

Forward query to the neighbor that is closest to the query id

Example: assume n1 queries f4 Can route around some failures

1 2 3 4 5 6 70

1

2

3

4

5

6

7

0

n1 n2

n3 n4n5

f1

f2

f3

f4

Page 42: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

50Prof. Younghee Lee50

Node Failure Recovery

Simple failures– Know your neighbor’s neighbors– When a node fails, one of its neighbors takes over its

zone

More complex failure modes– Simultaneous failure of multiple adjacent nodes – Scoped flooding to discover neighbors– Hopefully, a rare event

Page 43: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

51Prof. Younghee Lee51

Routing: Concerns/optimization Each hop in a routing-based P2P network can be expensive

– No correlation between neighbors and their location– A query can repeatedly jump from Europe to North America, though both the

initiator and the node that store the item are in Europe!– Solutions: Tapestry takes care of this implicitly; CAN and Chord maintain multiple

copies for each entry in their routing tables and choose the closest in terms of network distance

CAN/Chord Optimizations– Weight neighbor nodes by RTT

» When routing, choose neighbor who is closer to destination with lowest RTT from me» Reduces path latency

– Multiple physical nodes per virtual node» Reduces path length (fewer virtual nodes)» Reduces path latency (can choose physical node from virtual node with lowest RTT)» Improved fault tolerance (only one node per zone needs to survive to allow routing through the zone)

What type of lookups?– Only exact match!

Page 44: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

52Prof. Younghee Lee52

BitTorent A p2p file sharing system

– Load sharing through file splitting– Uses bandwidth of peers instead of a server

Successfully used:– Used to distribute RedHat 9 ISOs (about 80TB)

Setup– A “seed” node has the file– File is split into fixed-size segments (256KB typ)– Hash calculated for each segment– A “tracker” node is associated with the file– A “.torrent” meta-file is built for the file – identifies the address of the tracker nod

e– The .torrent file is passed around the web

Page 45: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

53Prof. Younghee Lee53

BitTorent Download

– A client contacts the tracker identified in the .torrent file (using HTTP)– Tracker sends client a (random) list of peers who have/are downloading the file– Client contacts peers on list to see which segments of the file they have– Client requests segments from peers (via TCP)– Client uses hash from .torrent to confirm that segment is legitimate– Client reports to other peers on the list that it has the segment– Other peers start to contact client to get the segment (while client is getting

other segments)

file.torrent info:• length• name• hash• url of tracker

Page 46: Prof. Younghee Lee 1 1 Computer networks u Lecture 11: Peer to Peer Prof. Younghee Lee

54Prof. Younghee Lee54

Conclusions

Distributed Hash Tables are a key component of scalable and robust overlay networks

CAN: O(d) state, O(d*n1/d) distance Chord: O(log n) state, O(log n) distance Both can achieve stretch < 2 Simplicity is key Services built on top of distributed hash tables

– p2p file storage, i3 (chord)– multicast (CAN, Tapestry)– persistent storage (OceanStore using Tapestry)