08 distributed algorithms

7/28/2019 08 Distributed Algorithms

1/13

1 (49) - DISTRIBUTED SYSTEMS Distributed Algorithms - Sven Arne Andreasson - Computer Science and Engineering

Distributed Algorithms

Distributed Algorithms computes over more than one process using message passing through a network.

Complexity Analysis

Cost analysis

Examples:

Communication Protocols

Flow Control

Routing

Resource Allocation

Leader Election

choosing a synchronization process


Complexity Analysis

Internal computation in a node is normally negligible compared to message times.

Secondary memory time is also mostly negligible.

Complexity analysis for Distributed Algorithms:

The number ofmessages sent compared to number of participating processes (nodes).

The least number ofmessage jumps between nodes compared to number of participating processes

(nodes) until algorithm terminates:

cost of waiting.

The highest number of message jumps for any message before algorithm termination compared to

number of participating processes (nodes).

cost on network.

Bit-complexity

If the algorithm uses a large amount of data.

The amount of data to send (times message jumps) compared to number of participating processes

(nodes).


Complexity Analysis (2)

A choice between:

Cheapest algorithm

Minimum number of total messages- nice to the network

Fastest algorithm

Allowing more messages than necessary- not nice to the network


Algorithms for Information Distribution

Flooding Algorithms

Fast

Expensive

Echo Algorithms

Not so fast

Cheaper

Virtual Ring Algorithms

Slow

Cheap


2/13


3/13


Echo Algorithms

Mainly for mesh networks. (Ernest Chang 1979)

refining offlooding

Can be used for

Election Algorithms

Broadcast in mesh networks

Presumptions

Each node has an unique name (identifier).

No shared memory, processes uses message passing.

FIFO on communication links

One node does not know about all nodes in the network, only its neighbors.

single source one initiating node

multi source might have several initiating nodes concurrently


Traversal Algorithm

Two phases

forward phase

echo phase

The initiating node,IN, sends an Explorer Message,EM, on all its outgoing links.

When a node gets its firstEMit marks the corresponding link as First Link, FL.

If the node doesnt have more links (it is a leaf) it will send an echo message,ECHO, back to the node

that had sent theEM.

If the node has more links it should send anEMon these links.

Then the node waits for an echo message,ECHO, on each of these links.

If a node gets anotherEMit sends an echo message,ECHO, on the corresponding link.

When a node gets an echo message,ECHO, the corresponding link is marked as ready.

If the node has got anECHO on all links except its First Link, FL, it sends anECHO back on its FL

WhenINhas got anECHO on all its links the algorithm terminate.


Echo Algorithm

Traversal Execution Tree (TET)

The tree of links formed by all the Explorer Messages,EMs.

P-tree

The tree of links formed by all the First Links, FLs.


Echo Algorithm

A B

CD

E

A

B D

C E C

D E

D

A

B D

C

E

Network

Traversal Execution Tree (TET) P-tree


4/13


Echo Algorithm

Complexity analysis

The algorithm requires at most 4l messages ifl is the number of links in the network.

If the speed is roughly the same on the links the algorithm will take roughly 2D+2 time units, whereD isthe network diameter.


Echo Algorithm

Applications

Distribution of list among nodes.

Calculation of nodes maximum value (identity)

Election Algorithm


Distribution of List among the nodes in a Mesh Network

Each node in the network should be given an unique number.

This will be distributed by the node S that when the algorithm starts doesnt know which nodes there arein the network.

S initiating node.

Phase 1:

1. S starts an Echo Algorithm.

2. Each leaf node returns the value 1 in itsECHO.

3. EachECHO sent back on a link that is not FL returns the value 0.

4. Each node registers the return values forECHOs for the corresponding links.

5. When a node has got allECHO messages it return anECHOon its FL containing the sum valueof all its return values + 1 (for itself).

6. When the initiating node, S, has gotECHOs on all its links it knows how many nodes that arepresent in the network at the given moment.


Distribution of List among the nodes in a Mesh Network (2)

Phase 2:

7. The initiating node creates as many unique identifiers that are nodes in the network and sendsthem on its links to its neighbors. Each link message get the number of identifiers as was givenin the correspondingECHO.

8. Each other node will get a message with unique identities on its FL (the FL of Phase 1).

The node keeps one of the identities and sends the rest on its links according to thecorresponding registered number of nodes.

Each link gets as many as it indicated in itsECHO.

9. Echo messages can optionally be sent back so the initiator node can get a confirmation that thealgorithm has terminated.


5/13


6/13


7/13


AllEM-messages have reached their destination

N1

N2 N3

N4

N5N6

N6

N6

N6

N6

N6

N6


N1

N2 N3

N4

N5N6

N6

N6

N6

N6

N6

N6


N1

N2N

3

N4

N5N6

N6

N6

N6

N6

N6

N6


N1

N2 N3

N4

N5N6

N6

N6

N6

N6

N6

N6


8/13


The algorithm has terminated

N1

N2 N3

N4

N5N6

N6

N6

N6

N6

N6

N6


Broadcast in a Mesh Network using Echo Algorithm

The P-tree can be used for broadcastmessage distribution.

The messages follows the tree.

One tree for each sender

reasonable cost broadcastin a mesh network.

Broadcasts from different nodes might reach receivers in different order.


Improved Traversal Algorithm (Segall)

An improved transversal algorithm. (Segall)

The initiating node,IN, sends an Explorer Message,EM, on all its outgoing links.

When a node gets its firstEMit marks the corresponding link as First Link, FL.

If the node doesnt have more links (it is a leaf) it will send an echo message,ECHO, back to the node

that had sent theEM.

If the node has more links it should send anEMon these links.

Then the node waits for anEMorECHO on these links.

If a node gets anotherEM, the corresponding link is marked as ready.

If a node gets anECHO, the corresponding link is marked as ready.

If the node has got anEMorECHO on all links, it sends anECHO back on its FL

WhenINhas got anEMorECHO on all its links the algorithm terminates.


Logical Clocks

Distributed Resource Allocation Algorithm

Lamport algorithm

Already shown.

Ricart and Agrawala algorithm

Given in the textbook (6.3.4).

A little more efficient.


9/13


Virtual Ring Algorithms

A special message, a control-token, is sent among the nodes.

The node possessing the token has the right to perform operation that must be done by mutual exclusion.

The nodes must be able to:

assure that there is one and only one token

create a new token

discover if the ring is broken

create a new ring


Example of a Virtual Ring

A

B

C

D

E

F

G

H

A - B - E - H - F - G - D - C - A


Algorithm that guarantees exactly one control-token on the ring.

Le Lann 1978.

All nodes has got unique names, N i, which are totally ordered.

A special message, a control-tokenCT, is sent among the nodes.

Another special message, an election tokenET(Ni), that is created by node Ni, is also used by thealgorithm.

Each node has got a timer that is used for time-outs if no token arrives within a given time limit.

All tokens circulate in a given order, FIFO.

A node can be in normal state or election state


In each node the following algorithm is executed:

Each time the CTpasses the node its timer is restarted with a given time-outvalue.

The node will keep/change to normal state.

At time-outat node Ni, i.e. when the timer signals that there has been no CTorETpassing within the

given time-out limit:

The node creates a new token, an election token,ET(Ni), which contains the nodes identification.

The node changes to election state and restarts its timer as it sends theET(Ni) on the ring.

Each time anET(Nj) arrives at Ni:

If the node is in normal state the timer is restarted- if Nj Ni theETis sent further on the ring.

If the node is in election state the node compares its originators identity with its own:- if Nj < Ni then the node (Ni) will change to normal state, send theETfurther on the ring and

it restarts its timer.- if Nj > Ni then the node should send theETfurther on the ring and restart its timer.- if Nj = Ni, i.e. it is the nodes ownET, it converts it into a control-tokenCTthat then is sent further

on the ring and it restarts its timer.


10/13


11/13


Election algorithm

The unique control token algorithm can be used for election purposes.

The node that was allowed to create a CTwill be the elected.

There has been designed a whole family of election algorithm based on this with different

modifications:

letETmessages traverse both directions on the ring

letETmessages randomly choose which direction to be sent

and so on ...

30 years of research!!


Voting Algorithms (1)

Voting can be used for resource allocation or election.

A group of nodes cooperates in some way and need to make decisions together.

A node that wants to be elected (or use a resource) sends a request message to all other nodes in the

group.

A node get a request message answers the originator:

yesif no other node has requested since last release (of the resource)

nootherwise

The requesting node is elected (can use the resource) when it gets a majority ofyesanswers on its

request.

Here a majority means more than half of the group.

If the requesting node doesnt get a majority ofyesanswers it will not be elected.



Doesnt require FIFO on network links

Doesnt require totally ordered identifications on nodes

Doesnt require answer from all nodes

Drawback: Voting might lead to that no one gets elected (can use the resource)



After each voting there can be two different states:

One is elected.

No one is elected.

It is important to distinguish between these states.Therefore there must be a message sent from the winner to all other nodes so they know that one was

elected.

That no one is elected can only be determined by time-out.

When a node wants to release a resource a Release message must be sent.Then the other nodes can start a new election.

When there is a time-outfor an election another node can start a new election.

Then there will be a need to distinguish between different elections.This can be done using the node name and an ordering number as the identity for the request.Then all answers to that request must use this identity as well as the eventual Release.


12/13


The Bully Algorithm

The Bully Algorithm is an Election Algorithm, Garcia-Molina 1982.

Can handle process crashes.

Presumptions

All processes has got unique identity which are totally ordered.

Every process does know about all other processes in the network.

The system is synchronous, i.e. there is a maximal time limit Twithin a request will be answered if the

requested process is alive.


The Bully Algorithm (2)

Algorithm:

1. The process that wants an election sends an election-message to all processes with higher

identity than itself and then waits for answer-messages.- if no answer-message arrives within the time limit Tthe process consider itself elected and

then sends a coordinatormessage to all processes with a lower identity.

- if there is one or more answer-message the process waits a further time period Tfor a

coordinatormessage. If there is no such a message the process restarts the algorithm.

2. A process receiving an election-message, returns an answer-message and starts the algorithmfrom the beginning if not done so before.

3. A process receiving acoordinatormessage register the senders identity and consider it elected.

4. When a faulty process restarts it also starts the algorithm.


Local Networks

Communication on Local Networks:

faster

cheaper

cheap broadcast

might guarantee atomic broadcast

all nodes get all broadcasts in the same order


Skansholm Algorithm for Resource Allocation

Utilizes atomic broadcast with same cost as single message

Networks:

control-token-ring

ETHERNET

the network is the synchronization tool

or general:

any network with an atomic broadcast service but then the broadcast might be expensive


13/13


Skansholm Algorithm for Resource Allocation (2)

Each node has got a copy of the Request Queue.

A node which wants to allocate a resource sends a Request Message as a broadcast to all nodes.

Since there only can be one message at a time on the network all nodes will receive theses messages in thesame order.This order will be the order in the Request Queue.Note that this also holds for the sending node. It should not put the request in its queue until its RequestMessage actually is transmitted on the network.

When a nodes request is first in the local Request Queue it can be processed.

After processing a Release message is sent to all.Then the first message in each Local Queue is removed and the next message can be processed.

This algorithm uses Broadcast (Multicast) but since it is a single local network the broadcast has the samecost as a single message. Thus this algorithm can have a very high performance.

08 distributed algorithms

Documents