P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion.

Download P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion.

Post on 05-Jan-2016

217 views

Category:

Documents

2 download

Embed Size (px)

TRANSCRIPT

  • P2P Computing

    MIRA YUNSeptember 16, 2005

  • Outline

    What is P2PP2P taxonomiesCharacteristicsDifferent P2P systemsConclusion

  • P2PPeer-to-peer (P2P) refers to a class of systems and applications that employ distributed resources to perform a function in a decentralized manner

    Generally opposed to the client/server architecture

  • PeersA peer gives some resources and obtains other resources in return.Peer = like each otherAll participants are peers (in the pure form of a P2P net.)

    Each peer depends on other peersMeaningless to be alone

    Peers are autonomous (self governing) if not wholly controlled by each other or by the same authority as everyone else

  • What is P2P?The sharing of computer resources and services by direct exchange between systems [p2pwg, 2001].

    Systems and applications that employ distributed resources to perform critical functions in a decentralized manner

    enables peers to share their resources (information, processing, presence, etc.) with at most a limited interaction with a Centralized server.

  • Taxonomy of computer systems

  • P2P Models : pure, hybrid, super-peersPure: peers have same capability and responsibility.symmetric communication. No host superior;all hosts can act as client or server.examples: Gnutella, Freenet

    Hybrid: servers facilitate the interaction between peersaddressing bypasses the DNS, but a central server as directoryexamples: Napster, ICQ, Jabber

  • P2P Models : pure, hybrid, super-peersSuper-peersA super-peer is a node in a peer-to-peer network that operates both as a server to a set of clients, and as an equal in a network of super-peers.Super-peer networks try to balance the efficiency of centralized search, and the autonomy, load balancing and robustness to attacks provided by distributed search.example: Kazaa

  • P2P search modelsCentralized directory modelThere is a central index.Once the requested file is located, exchange takes place directly between peers.

  • P2P search modelsNapsterCreated in 1999 by Shawn Fanning a freshman student at Northeastern University.To freely get MP3 music files.Central index server, P2P exchange

    Sued several times, suspended.The music industry is against Napster because people can get music for free instead of paying for a CD.Napster's defense is that the files are personal files that people maintain on their own machines, and therefore Napster is not responsible.

  • P2P search modelsFlooded requests modelEach request from a peer is flooded/broadcast to directly connected peers (1) which in turn flood their peers (2).Propagated until a maximum number of floods occur (typically 5 to 9) or the request is answered.Used by GnutellaRequires a lot of bandwidth, does not scale Good for company networks

  • P2P search models Document routing model Each peer is assigned a random ID; each peers knows a number of other peers.When a document is published, an ID is computed by hash on the document contents and name.Each peer routes the document to the node with the most similar ID until the nearest peer ID is the current peer's ID.

  • P2P search models Document routing model When a peer requests the document, the request will go to the peer with the ID most similar to the document ID.This process is repeated until a copy of the document is found. Then the document is transferred back to the request originator, while each peer participating in the routing will keep a local copy.

  • P2P search models Document routing model Efficient for large communitiesBut document ID must be known before posting requestUsed in FreeNetFour improved algorithms:Chord, CAN, Tapestry and Pastry.

  • Characteristics Decentralization Centralized systemsIdeal for some applicationsBottlenecksInefficient use of resourcesExpensive to setupHard to maintainDecentralized systems P2P emphasis on the users' ownership and control of data and resources.Fully decentralized is difficult in practiceHybrid approach

  • Characteristics Scalability Limited by factors: The amount of centralized operations The amount of state The inherent parallelism an application exhibits Scalability also depends on the ratio of communication to computation between the nodesNapset: can scale up to over 6 million usersSETI@home : close to 3.5 million users so far

  • Characteristics Anonymity One goal of P2P is to allow people to use systems without concern for legal issue. Three different kinds of anonymity sender anonymity, Receiver anonymity mutual anonymity GnutellaRequest is broadcast and rebroadcast until it reaches a peer with the contentFreenetRequest is sent and forward to a peer that is most likely to have the content

  • Characteristics Self-Organization Needed because of scalability, fault resilience, and the cost of ownership. Adaptation is required to handle the changes caused by peers connecting and disconnecting from the P2P systems. Cost of Ownership Reduces the cost of owning the systems and the content, and the cost of maintaining them.SETI@home faster than fastest supercomputer in world, cost is 1% Ad-Hoc Connectivity Has a strong effect on all classes of P2P systems

  • CharacteristicsPerformance Influenced by three types of resources: processing, storage, and networking. Three key approaches to optimize performance: Replication: puts copies of objects/files closer to the requesting peers Caching : Reduces the path length required to fetch a file/object and therefore the number of messages exchanged between the peers. Intelligent routing and network organization:

  • Taxonomy of P2P systems

  • - Processing scalability in massive multi- parameters systems - Run by a central controller - Fork and join mechanism - Limitations Independent small partsInternet latencies - Intel claim speed-ups from 15hours to 30 minutes in case of interest rate swap modeling by using P2P Distributed Computing

  • Distributed ComputingSETI@home (Search for Extraterrestrial Intelligence)A collection of research projects aimed at discovering alien civilizations.Goals: to search for extraterrestrial radio emissions.Design: Two major components: data server & client.Decentralization and Scalability:distributes files (350KB large) to its users.

  • Jay ShethJay ShethJay Sheth - Application level collaboration between users - Event based applications such as Instant messaging, chat, online games - ChallengesLocation of other peers (e.g.. NetMeeting requires to know other peers IP address)Real time constraints e.g.. Game DOOMCollaboration

  • Jay ShethJay ShethJay Sheth

    - Platforms have support for primary P2P components : naming, discovery, communication, security and resource aggregation - Candidates for future P2P platform : .net, JXTAPlatforms

  • Platforms (JXTA)JXTA = Juxtapose = side by side Open-source initiative from Sun (Java) JXTA technology is a set of open protocols that allow any connected device on the network ranging from cell phones and wireless PDAs to PCs and servers to communicate and collaborate in a P2P manner.JXTA peers create a virtual network where any peer can interact with other peers and resources directly even when some of the peers and resources are behind firewalls and NATs or are on different network transports.Objectives:Interoperability - across systems and communitiesPlatform independence - multiple/diverse languages, systems, and networksUbiquity - every device with a digital heartbeat

  • Platforms (JXTA)Architecture JXTA application layerJXTA service layerJXTA core layerSet of 6 protocolsPeer Endpoint Protocols: available route to destinationPeer Rendezvous Protocol : sign in/out, authenticationPeer Resolver Protocol : send/receiver search queries for peersPipe Binding protocols : pipe advertisement to pipe and pointPeer Information protocol : learn peers status/propertiesPeer Discovery Protocol : find peers, groups, advertisement

  • - Content storage and exchange is where P2P is most successfulNapster, Gnutella, Kazza

    File Sharing

  • Gnutella Protocol v0.4 (1/5)One of the most popular file-sharing protocols.Operates without a central Index Server (such as Napster).Clients (downloaders) are also servers => serventsClients may join or leave the network at any time => highly fault-tolerant but with a cost!Searches are done within the virtual network while actual downloads are done offline (with HTTP).The core of the protocol consists of 5 descriptors (PING, PONG, QUERY, QUERYHIT and PUSH).

  • Gnutella Protocol (2/5)A Peer (p) needs to connect to 1 or more other Gnutella Peers in order to participate in the virtual Networkp initially doesnt know IPs of its fellow file-sharers

  • Gnutella Protocol (3/5)a. HostCaches The initial connectionP connects to a HostCache H to obtain a set of IP addresses of active peers.P might alternatively probe its cache to find peers it was connected in the past.

    12Request/Receive a set of Active PeersHConnect to network

  • Gnutella Protocol (4/5)b. Ping/Pong The communication overheadAlthough p is already connected it must discover new peers since its current connections may break.Thus, it sends periodically PING messages which are broadcasted (message flooding).If a host e.g. p2 is available it will respond with a PONG (routed only the same path the PING came from).P might utilize this response and attempt a connection to p2 in order to increase its degree. Gnutella Network NPING1PONG2Servent p2

  • Gnutella Protocol (5/5)c. Query/QueryHit The utilizationQuery descriptors contain unstructured queries e.g. celine dion mp3They are again, like PING, broadcasted with a typical TTL=7.If a host e.g. p2 matches the query it will respond with a Queryhit descriptor

    d. Push Enable downloads from peers that are firewalled.If a peer is firewalled => we cant connect to him. Hence we request from him to establish a connection on us and to send us the file.

  • ConclusionsNot anything new ... but right time to:Take advantage of available resourcesFind an alternative to centralized c/s solutionsThere is something attractive about the defiance or avoidance of authority.

    Raised legal copyright issues

    Currently, 60% to 89% of all Internet traffic is due to p2p traffic => source of revenue => marketing argument.

    Potential good match between adhoc nets and P2P

    Interesting architectural and technical issues behind ... And challenging requirements

  • Summary of P2P computing

    Now a peer knows where to connect to

Recommended

View more >