dr. jack lange computer science department …jacklange/teaching/cs2510-f...epidemic algorithms!...

19
10/8/15 1 DISTRIBUTED COMPUTER SYSTEMS Data Dissemination Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Outline Multicast Communication Network-Level Multicasting Design Issues Group Management IP Multicasting Protocols Application-Level Multicasting Design Issues Gossiping – Epidemic Protocols Data propagation Models – Anti-Entropy Rumor Spreading Data Removal Applications Conclusion

Upload: others

Post on 30-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

1!

DISTRIBUTED COMPUTER SYSTEMS

Data Dissemination

Dr. Jack Lange Computer Science Department

University of Pittsburgh

Fall 2015

Outline

n  Multicast Communication n  Network-Level Multicasting Design Issues

n  Group Management n  IP Multicasting Protocols

n  Application-Level Multicasting Design Issues n  Gossiping – Epidemic Protocols

n  Data propagation Models – Anti-Entropy n  Rumor Spreading n  Data Removal n  Applications

n  Conclusion

Page 2: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

2!

Multicast Routing

n  Unicast: one source to one destination

n  Multicast: one source to many destinations

n  Main goal: efficient data distribution

Multicasting on the Internet

IP

Application

Internet Architecture

Access Network

At which layer should multicast be implemented?

Page 3: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

3!

Data Dissemination Models

Network-Level Multicasting

IP Multicast Architecture

Hosts

Routers

Service Model

Host-to-Router Protocol (IGMP)

Multicast Routing Protocols (various)

Page 4: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

4!

Logical Naming

n  Single name/address maps to logically related set of destinations n  Destination Set is referred to as the Multicast Group

n  Scalability is a key challenge n  Single name/address independent of group growth or

changes

Multicast Router Responsibilities

n  Learn of the existence of multicast groups n  Typically done through advertisement

n  Identify links with group members n  Establish state to route packets

n  Replicate packets on appropriate interfaces n  Routing entry in the routing table

Src, incoming interface List of outgoing interfaces

Page 5: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

5!

IP Multicast Service Model

n  Each group identified by a single IP address n  Groups may be of any size n  Members of groups may be located anywhere in the

Internet n  Members of groups can join and leave at will n  Senders need not be members n  Group membership not known explicitly n  Analogy:

n  Each multicast address is like a radio frequency, on which anyone can transmit, and to which anyone can tune-in.

IP Multicast API

n  Sending – Send packets to a multicast group n  Receiving – Two new operations

n  Join-IP-Multicast-Group – (Group-address, Interface) n  Leave-IP-Multicast-Group – (Group-address,

Interface) n  Receive multicast packets for joined groups via normal

IP-Receive operation n  Implemented using socket options

Page 6: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

6!

IP Multicast Addresses

n  Class D IP addresses n  224.0.0.0 – 239.255.255.255

n  How to allocated these addresses? n  Well-known multicast addresses, assigned by IANA n  Transient multicast addresses, assigned and reclaimed

dynamically, e.g., by “sdr” program

1 1 1 0 Group ID

Multicast Scope Control – Small TTLs

n  TTL expanding-ring search to reach or find a nearby subset of a group

s

1

2

3

Page 7: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

7!

Multicast Scope Control – Large TTLs

n  Administrative TTL Boundaries to keep multicast traffic within an administrative domain, e.g., for privacy or resource reasons

An Administrative Domain

TTL threshold set on interfaces to these links, greater than the diameter of the administrative domain

The rest of the Internet

Internet Group Management Protocol

n  End system to router protocol is IGMP n  Each host keeps track of which mcast groups are

subscribed to n  Socket API informs IGMP process of all joins

n  Objective is to keep router up-to-date with group membership of entire LAN n  Routers need not know who all the members are, only that

members exist

Page 8: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

8!

IGMP Basic Operations – Queriers

n  On each link, one router is elected to be the “Querier” n  Querier periodically sends a Membership Query message to the all-

systems group (224.0.0.1), with TTL = 1 n  On receipt, hosts start random timers (between 0 and 10 seconds) for

each multicast group to which they belong

Routers

Hosts

Querier

IGMP Basic Operations

n  When a host’s timer for group G expires, it sends a Membership Report to group G, with TTL = 1

n  Other members of G hear the report and stop their timers n  Routers hear all reports, and time out non-responding groups

Routers

Hosts

Querier

Page 9: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

9!

Routing Techniques

n  The main objective is to build distribution tree for multicast packets n  Flood and prune – DVMRP, PIM-DM n  Begin by flooding traffic to entire network n  Prune branches with no receivers n  Examples: Unwanted state where there are no receivers

n  Link-state multicast protocols – MOSPF n  Routers advertise groups for which they have receivers to

entire network n  Compute trees on demand n  Unwanted state where there are no senders

IP Multicast Challenges

n  IP-Level Multicasting presents various challenges that turned out to be very hard to address in a large-scale, heterogeneous networks n  Scalability of routing protocols n  Hard to manage n  Hard to implement TCP equivalent n  Hard to get applications to use IP Multicast without

existing wide deployment n  Hard to get router vendors to support functionality and

hard to get ISPs to configure routers to enable

Page 10: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

10!

Data Dissemination Models

Application-Level Multicasting

Application-Level Multicasting

n  IP multicasting gives rise to several challenges n  Often difficult to configure n  Requires complex administrative support n  Involves mutual agreements among different parties

n  Application-level seeks to address those issues. n  Structured overlay management n  No need to set up explicit communication paths n  Use gossiping for information dissemination

Page 11: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

11!

Application-Level Multicasting n  The basic idea is to organize nodes of a distributed system

into an overlay network and use the resulting overlay network to disseminate data.

n  Overlay multicasting over a structured chord-based peer-to-peer network – Example n  1. Initiator generates a multicast identifier (MiD) n  2. Lookup succ(MiD), the node responsible for MiD. n  3. Request is routed to succ(MiD), which will become the root. n  4. If P wants to join, it sends a join request to the root. n  5. When request arrives at Q:

n  If Q has not seen a join request before, then it becomes forwarder n  P becomes child of Q. n  Join request continues to be forwarded.

n  If Q knows about tree, then P becomes child of Q. n  No need to forward join request anymore.

Overlay Construction n  Overlays may be very inefficient.

n  Multicasting from A will traverse <B, Rb>, <Ra, Rb >, < Rc, Rd>, and <D, Rd> twice.

n  It would have been better to make an overlay link between A and C instead of B and D.

Page 12: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

12!

Overlay Network – Quality Metrics n  Several metrics can be used to indicate the quality of an overlay network.

n  Link stress – How often a packet crosses a link? n  Message from A to D needs to cross < Ra, Rb > twice.

n  Stretch or Relative Delay Penalty – Ratio between the delay in the overlay network, and delay in the underlying network.

n  Messages B to C follow path of length 59 using Application Level Multicasting, but 47 at network level multicasting à Stretch = 59/47.

n  Tree cost – Minimizing some metric of a tree, such as delays. n  Minimal delay spanning tree.

Overlay Networks – Operations

n  Joining a multicast overlay network n  Joining node contacts a well-known node n  Typically referred to as a rendezvous node n  Rendezvous node returns a list of possible parents. n  What parent should the joining node select?

n  There are many alternatives and different proposals often follow different solutions.

Page 13: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

13!

Switch Trees

n  One specific family of solutions to parent selection uses n  It assumes that periodically, a node will seek to switch

parents. n  Node periodically check other nodes, to determine if

any of them have a shorter path. n  Whenever a node notices that its parent has failed, it

simply attaches itself to the root.

Data Dissemination Models

Gossiping

Page 14: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

14!

Epidemic Algorithms

n  Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially effective mechanism n  for propagating information in large peer-to-peer systems

deployed on Internet or ad hoc networks. n  The term epidemic protocol is sometimes used as a

synonym for a gossip protocol, n  Gossip spreads information in a manner similar to the

spread of a virus in a biological community.

Epidemic Protocols

n  Nodes in epidemic protocols are in one of three possible states n  Infected – Node holds data that it is willing to spread. n  Susceptible: Node has not yet seen this data. n  Removed: Node is not able or willing to spread data.

n  Anti-entropy n  Node P picks another node Q at random, and exchanges

updates. n  Three approaches to the exchange:

1.  P only pushes its own updates to Q. 2.  P only pulls in new updates from Q. 3.  P and Q send updates to each other

§ A combined push-pull approach

Page 15: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

15!

Anti-Entropy Approaches

n  Push-based Model – When it comes to rapidly spreading updates, only pushing updates turns out to be a bad choice n  In pure push-based model, updates can be propagated only

by infected nodes n  Pull-based Model – A pull-based approach achieves

better update spreading when many nodes are infected. n  It will take O(log(N)) rounds to propagate a single update

to all nodes. n  A round is a period of time when each node will have had

a chance to be active.

Gossiping Basic Operation

n  Forever n  Wait for a time period n  Pick a random peer, P n  Send current state to P

n  Forever n  Receive a message, M n  Build a new state

n  Merge(Crnt_State, M)

Page 16: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

16!

n  Information dissemination model n  If P has just been updated, it will contact an arbitrary node

Q. n  If Q was already updated, P will lose interest and becomes

removed, with probability 1/k. n  Fraction s of nodes that will always remains ignorant of an

update – susceptible state – satisfies the equation:

)1)(1(1 skes −+−=

Gossip Propagation Property

n  k = 4, s = 0.7% n  Solutions to guarantee that those nodes will also be

updated?

Page 17: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

17!

Possible Approaches

n  Combining anti-entropy with gossiping n  Directional gossiping, nodes that are connected to

only a few other nodes are contacted with a relatively high probability.

n  By regularly updating the partial view of each node, random selection is no longer a problem.

Deleting Data

n  How do you delete data using gossiping? n  If you completely erase a datum, how do you remember that you got

it, so that you don’t receive it again via gossiping/epidemic? n  Use a record of deletion – these are known as death

certificates. n  But how do you prevent them from accumulating?

n  Timestamp the death certificates, then discard after a certain time period has passed.

n  How secure is this? What if you need to be absolutely sure that something does not come back? n  A few nodes will maintain a dormant death certificate, which will

“reawaken”, if it is re-infected.

Page 18: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

18!

Applications

n  Data dissemination: Perhaps the most important one. Note that there are many variants of dissemination.

n  Aggregation: Let every node i maintain a variable xi. When two nodes gossip, they each reset their variable

Gossiping Applications

n  Say you have a nodes, and each node has a number. How do you quickly compute the average of the numbers? n  xi,xj ß (xi + xj)/2

n  Can you use this to estimate the number of nodes in the system? n  All nodes zero except one. What is the average?

n  How about picking a random node? n  Each node generates a random number. n  Disseminate the max.

Page 19: Dr. Jack Lange Computer Science Department …jacklange/teaching/cs2510-f...Epidemic Algorithms! Easy to deploy, robust, and resilient to failure, epidemic algorithms are a potentially

10/8/15!

19!

Outline

n  Multicast Communication n  Network-Level Multicasting Design Issues

n  Group Management n  IP Multicasting Protocols

n  Application-Level Multicasting Design Issues n  Gossiping – Epidemic Protocols

n  Data propagation Models – Anti-Entropy n  Rumor Spreading n  Data Removal n  Applications

n  Conclusion