optimum delay reliability distributed systems architecture

Reliability Engineering and System Safety 26 (1989) 279-288

Optimum Delay Reliability Distributed Systems Architecture

V. K. Jain & Kr i shna G o p a l

Electrical Engineering Department, Regional Engineering College, Kurukshetra, Haryana, India

A BSTRA CT

This paper deals with the combined problem of designing fault tolerant network topology and maximizing reliability for distributed communication networks. The resulting topologies maximize network reliability while reducing communication cost and delay simultaneously. The system architecture presented enjoys the advantages o f low complexity, short routing distance (low communication delay) and is fault tolerant. Retaining the topology of communication network, enhancement of system reliability is undertaken next by altering the link arrangements. A simple and efficient heuristic algorithm for getting best arrangement of links is proposed and illustrated through an example.

N O T A T I O N

C Node connectivity of graph G =min{Ci j l a l l i , j ;O<i , j<_n-1} Also C_<dmi n

Cij Number of node disjoint paths between i and j di Degree of node i representing the number of edges incident on it. G Network Graph (System Topology)

kay 2 ~ kij /(n 2 - n)= Average number of data links required \ j=o i=j+ l

to send messages between all pairs of nodes 279

Reliability Engineering and System Safety 0951-8320/89/$03.50 © 1989 Elsevier Science Publishers Ltd, England. Printed in Great Britain

280 V. K. ,lain, Krishna Gopal

K

g U

L m

71

N

Pi, qi

PBA PCA R S (P) A"

xi

M a x {Kij I atli,j; 0 < i < n - 1 } Maximum number of data links required to send messages between all pairs of nodes Minimum number of data links required to send a message from node i to node j Set of links under consideration for link allocation Maximum number of data links in the shortest message path between any pair of nodes Number of nodes Total number of links Reliability (Unreliability) of xi Reliability of links where i = 1, 2 . . . , N Available link reliabilities arranged in descending order Link reliability allocation at a particular stage Network Reliability function Sensitivity of the reliability function with respct to link i = ~R/(?Pi Parameter based upon maximum allowable transmission delay for messages Link i

1 I N T R O D U C T I O N

Distributed system architecture is of great importance in computer communicat ion networks due to several advantages offered like sharing of resources, reorganisation of work when one or more computers fail, increased efficiency, flexibility, reliability and sufficient versatility as some or all the nodes or links are replaced or upgraded with newer technology representing more cost effectiveness or powerful performance. Due to these advantages the distributed network concept is of great practical importance and numerous projects have been executed and are currently underway using this concept.

An important component of a distributed system is the system topology. The system topology defines the interprocessor communication architecture. Other important components are message delay and system reliability considerations. Also there are well defined relationships between the system topology and message delay, routing algorithms and fault tolerance. The message delay may be directly proportional to internode distance. 1 Fault tolerance refers to the ability of the network to continue to carry out its function of communicat ion among various sites during failures. This requires that the network remain connected in the wake of faults. Fault tolerant intercommunication networks can aid in achieving satisfactory reliability levels.

The truly optimal design of a good distributed system can proceed only

Optimum delay reliability distributed systems architecture 281

when all the conflicting goals and design specifications are systematically considered. So, in computer communication networks it is important to design a system architecture which has short routing distance, is highly reliable and fault tolerant. This paper considers the problem of designing a distributed system architecture for a computer communication network which is fault tolerant, highly reliable, restricted in number of data links and has short routing distance between internodes, i.e. has low communication delay. The data available for design is the number of nodes, maximum allowable communication delay and link reliabilities.

The problem is attacked mainly to two parts. In the first part a system topology for the computer communication network is determined assuming that all links of the network have equal delay and reliability. 2 This topology enjoys the advantages of low complexity, short routing distance and is fault tolerant. The interconnection topology is defined in terms of a parameter x selected on the basis of number of I/O ports available and/or maximum allowable transmission delay for messages (packets). The number of nodes n are assumed to be equal to x" where m is the maximum number of data links required to send messages between any pair of nodes which may be related directly to the maximum delay experienced by any message. The total number of data links required in the topology are exactly nx-(x2+ x)/2. The network allows simple routing and is fault tolerant, being able to tolerate up to (x - 1) node failures. In case the given number of sites n cannot be expressed as x m we increase n so that it can be expressed as x". However, only auxilary equipment is used at these added sites.

TABLE 1

Network topology Bridge network Fig. 2 Seven link network Fig. 3

Given link reliability {0.1, 0.2, 0'3, 0.4, 0-5} {0-8, 0"4, 0'2, 0"6, 0'5, 0.7, 0.65

Allocated link reliabilities Kontoleon method Proposed method

System reliability achieved Kontoleon method 0.258 Proposed method 0.258 % Loss in reliability

by proposed method 0'0 0-12

Reliability function evaluation required Kontoleon method 111 309 Proposed method 19 34 %Reduction of reliability

function evaluation 82.88 88'99

{0.4, 0"3, 0-1, 0"5, 0'2} {0"5, 0"2, 0"1, 0"4, 0"3}

{0-65, 0"4, 0'5, 0'6, 0"2, 0"7, 0'8 {0-6, 0"5, 0"65, 0"2, 0"4, 0"8, 0.7

0"711 04 0.71020

2~2 l, K. Jain+ Krishna Gopal

In the second part of the problem solution link assignment is done by altering the link arrangements retaining system topology to enhance system reliability. The terminal pair selected for consideration of system reliability is one which has maximum internode distance. An efficient heuristic algorithm for getting the best arrangement of links efficiently is proposed which is a modification of Ref. 3, in which the author searches over a large combination of link reliabilities which is highly inefficient as is evident from the number of reliability function evaluations required (see Table 1). In the proposed algorithm initially the most reliable network using only the highest reliability links is assumed. This network is then iteratively modified to include only available links using the sensitivity of the network reliability function with respect to each link. At each iteration one link reliability is fixed and a better allocation is obtained and hence the maximum number of iterations required are one less than the number of links in the network. The method being heuristic does not guarantee global opt imum solution, but has been found to work well in most problems (see Table 1).

2 ASSUMPTIONS

1. All nodes of the network are perfect. 2. Links are bidirectional. 3. Each link has only two states; good or bad and the links are S-

independent. 4. Link reliabilities are constant.

3 SYSTEM T O P O L O G Y

This section presents a topology 2 of the proposed computer communicat ion distributed system architecture, lnterconnection topology is defined in terms of parameter x where value o f x can be selected based on the number of I/O ports available and/or maximum allowable transmission delay for messages. The number of nodes n are assumed to be equal to x" (possibly after adding some auxilary nodes, if required). For the given value of n, one can use different values of x to construct different inter connections.

Let (i,,,i,, 1 . . . . . i~) be the radix-x representation of node i; O < i < n < l . Let (.],,, ],, _ 1 . . . . . . it ) be the radix-x respresentation of node j; 0 _<j < (n - 1).

Every node i is connected to node j if

ir = / ~ _ t 2 < _ r < m o r

ir=.g+l I _<r_<m-- 1 (1)


0

X 8 ~ X4

3

Fig. !. System topology for n = 8, N = 13.

Example 1

Figure 1 gives the system topology for n = 8 and x = 2. Interconnections of nodes is made as per relation described in Eqn (1). This topology gives the total number of data links as 13 and there are four nodes of degree 4, two nodes of degree 3 and two nodes of degree 2. Also here K = 3 and C = 2. The node connectivity follows from the observation that there is a loop structure (Hamiltonian Circuit) imbedded in this graph as shown in Fig. 1. Given any two nodes, one can construct two node disjoint pa ths - -one clockwise, the other counter-clockwise.

In this topology the following observations are made:

(1) There are exactly: (a) n - x 2 nodes of degree 2x. (b) x nodes of degree 2 x - 2. (c) x z - x nodes of degree 2 x - 1.

(2) There are exactly n x - (x z + x)/2 data links in the system. (3) k = m and kav ~ (m - 1).

Proof: Let i = 0 and j = ( n - 1). Thus in radix x, i = ( 0 , 0 .... 0) and j = (x - 1, x - 1, . . . , x - 1). Since one data link can introduce one digit, we require m data links.

(4) A Hamiltonian Circuit is imbedded in G. This follows from the observations that there is a Euler diagraph 4 for maximum length sequence of length x" for every x and m.

(5) The node connectivity C of G is at least x, i.e. C >_ x.

This is a direct consequence of the fact that x paths are clearly node

284 V. K. ,lain, Krishna Gopal

disjoint since an in te rmedia te node in the eth pa th has e as the least significant digit. T he x pa ths be tween node i and j are shown below, the nodes are represen ted in radix-r.

i,. _ 1 ' io

P a t h O ~ P a t h l l ~ P a t h ( x - 1 )

i,._ 2 " " ioO i,._ 2 " " iol t,._ 2 " " io(X - 1)

i,._ 3 ",'" ioO0 i., _ I ','" i011 i,. _ 3 , io(X - l)(x -- 1) ~t ' ' , ,

I

00. I..0 11.{.. 11 ( x - 1)(x-- 1 ) . i . . ( x - 1) I I i I

1 ' ', j00 .'.. 0 j01 .I.. 1 jo(X -- 1).'.. (x -- 1)

"'" " " " j o l J,.- 2""Jo( x - 1) J , , -2 J o O ~ . . . . J,, 2 { . j o j

J m - 1 " "

The above s t a t emen t implies tha t the remova l o f any (x - 1) nodes f rom G does no t d i sconnec t G.

E x a m p l e 2

Cons ide r the G shown in Fig. 1. Given any two nodes i = (i2, i~,io) and J = (J2, Jl , Jo). We have the fol lowing two node disjoint paths.

i2 il io

i, i o O ~ ' ' ' ~ ~ " ~ i l i o l

i o 0 0 i o 1 1

0 0 0 l l l

Jo 0 0 Jo 1 1

Jl Jo o J, Jo,

J2.11 Jo

Thus even after the remova l o f any node f rom G there exists a pa th be tween any pair o f nodes i, j.


4 LINK ASSIGNMENT A L G O R I T H M

In this section, to enhance system reliability an optimal or near optimal link arrangement is determined retaining the system topology obtained earlier. An efficient heuristic algorithm for link allocation is proposed which is based upon the idea of Kontoleon method 3 in which the author allocated the link reliabilities iteratively using the sensitivity of the network reliability function with respect to each link. The major disadvantage of the method is that an opt imum link allocation is obtained by searching over a large number of combinations of link reliabilities requiring a very large number of evaluations of the network reliability function (see Table 1). In the proposed algorithm the search domain is considerably reduced by allocating one link at each iteration. To begin with all the links are considered for allocation and the highest reliability value is assigned to all links. Using network sensitivity, the link giving maximum reduction in network reliability on failure is determined. It is allocated the current assigned value of link reliability and is removed from further consideration and all the remaining links under consideration are assigned the next lower value of reliability. The iteration is repeated until all link reliabilities are allocated. If in any iteration more than one link gives the same reduction in network reliability then any one can be chosen.

5 A L G O R I T H M FOR LINK ASSIGNMENT

Step0: Select the node pair with maximum internode distance and determine the terminal reliability of the network using any of the known methods. 5

Step 1: Arrange given link reliabilities in descending order (PBA). Step 2: To begin with consider all links of the network for allocation

purpose (L). L--Set of links under consideration.

Step 3: Allocate to all links under consideration, the highest reliability value and call it current allocation (PCA).

Step 4: Calculate network reliability R for current reliability allocation to links R(PCA) .

Step 5: If there is only one link in L, the set of links under consideration, then stop.

Step 6: Calculate for all links under consideration S , (PCA) = R(PCA)[ Pi= I - R ( P C A ) I PI=o.

However to simplify calculations we have used Si (PCA) as reduction in network reliability if link i fails.

i.e. S~(PCA) = R ( P C A ) -- R ( P C A ) I P~ = o

286 V. K. Jain, Krishna Gopal

Step 7: Determine the link with maximum S~(PCA). If more than one such link is available then select any one of them and allocate the highest reliability to this link from PBA under consideration.

Step 8: (a) Remove first element of PBA. (b) Remove the link chosen in Step 7 from L.

Step 9: Go to Step 3.

Example 3

The algorithm for the link assignment is illustrated by considering the network G shown in Fig. 1. The given link reliabilities are as under:

Step 0:

Step 1:

Step 2: Step 3:

Step 4: Steps 5&6:

Step 7: Step 8:

Step 3: Step 4: Steps 5&6:

Step 7: Step 8:

PI = 0"92 P6 = 0"8 Plo = 0"75 P2 = 0"9 Pv = 0"8 P11 = 0"75 P3 = 0"9 P8 = 0"8 P1 z = 0"7 P4 = 0"85 P9 = 0"75 P13 = 0"7 P5 =0.85.

The terminal pair selected is node pair (0, 7). The correspond- ing reliability expression found is given in the Appendix. PBA = 0.92, 0.9, 0.9, 0-85, 0.85, 0.8, 0.8, 0.8, 0.75, 0.75, 0.75,

0.7, 0.7.

L = X 1, X 2, X3, X4, X5, X6, X7, X8, )('9, Xlo, XI1, X12, X13. PCA = 0"92, 0"92, 0"92, 0"92, 0"92, 0"92, 0"92, 0"92, 0"92, 0"92,

0"92, 0"92, 0"92. R(PCA) = 0"986 134 5. Si(PCA) = 0"073 473 7, 0"073 478 2, 0"005 029 5, 0"000 5204,

0"0005188, 0"0678737, 0"0016538, 0"0629426, 0"062 729 9, 0"056 032 9, 0"000 559 5, 0"073 465 1, 0"073 796 t.

Max. S i ( P C A ) = S13(PCA ) = 0"073 796 1, P13 = 0"92. New PBA = 0"9, 0'9, 0"85, 0"85, 0"8, 0"8, 0"8, 0"75, 0"75, 0"75,

0'7, 0"7

New L = X1, X2, X3, X4. , )(5, X6, X7, X8, X9, XlO, X11, X12. PCA = 0"9, 0"9, 0"9, 0"9, 0"9, 0"9, 0"9, 0"9, 0"9, 0"9, 0"9, 0"9, 0"92. R(PCA) = 0"980 313 607 Si(PCA) = 0"090 348 703, 0"089 627 448, 0"003 513 837,

0"001 427 763, 0"000906 393, 0"022497 683, 0"008010117, 0"011 884508, 0"001 175786, 0"001 244032, 0"001 357 233, 0"071 070474.

Max. Si(PCA) = SI(PCA) = 0"090 348 703, P1 = 0"9 New PBA = 0"9, 0"85, 0'85, 0"8, 0"8, 0"8, 0"75, 0"75, 0"75, 0-7, 0-7 New L = X 2, X 3, X 4, X s, X 6, X 7, X 8, X 9, Xlo, X l l , X12.


Proceeding similarly the link reliabilities assigned in order are as:

P2 = 0.9; P12 = 0.85; Ps = 0.85; P7 = 0.8;

P6 = 0"8; P5 = 0-8, Plo = 0.75; P9 = 0.75;

P4 = 0.75; P3 = 0.7; Pl l = 0.7

The opt imum system reliability after assigning all the link reliabilities is R(S) = 0.964 013.

6 CONCLUSIONS

An efficient method of designing system architecture for a distributed communicat ion network having optimized the solution for communication delay and network reliability is presented. System topology presented has low complexity and is fault tolerant. In the proposed topology a Hamiltonian Circuit is embedded in G and there is a Euler Diagraph of maximum length sequence of length x". The maximum number of data links required to send a message between all pairs of nodes (K) is equal to m and Kay < (m - 1). The node connectivity is at least x. An heuristic algorithm for getting a best arrangement of links (optimal or near optimal) efficiently is proposed. The optimal link allocation is obtained in only ( N - 1) trials in xlz-.

/x5 4

Fig. 2. Bridge network.

x ~ x2 5 •

1 2

3

Fig. 3. 7-Link Network.

288 v. K. Jain, Krishna Gopal

c o n t r a s t to a l g o r i t h m 3 which requi res a large n u m b e r o f trials o f l ink rel iabi l i ty c o m b i n a t i o n s . T a b l e 1 gives the c o m p a r a t i v e d a t a for the p r o p o s e d m e t h o d a n d a l g o r i t h m 3 for n e t w o r k s o f Fig. 2 and Fig. 3. It is seen f r o m this table t h a t the p r o p o s e d a l g o r i t h m resul ts in 8 0 % to 9 0 % saving in rel iabi l i ty f u n c t i o n e v a l u a t i o n whi le resul t ing in a very sl ight r educ t i on (0"12% o n l y ) i n ach ieved n e t w o r k reliabil i ty.

R E F E R E N C E S

1. Davies, D. W., et al., Computer Ne tworks and their Protocols, Wiley, New York, 1979.

2. Pradhan, D. K. & Reddy, S. M., A fault-tolerant communicat ion architecture for distributed system. I E E E Trans. Computers, C-31(9) (I 982).

3. Kontoleon, J. M., Opt imum link allocation of fixed topology networks. I E E E Trans. Reliability, R-28(2) (1979).

4. Deo, N., Graph theory with applications to Engineering and Computer Science. Prentice Hall, Englewood Cliffs, N J, 1974.

5. Gopal, K., Aggarwal, K. K. & Gupta , J. S. An event expansion algorithm for reliability evaluation in complex systems. Int. J. Sys tems Sei., 10(4) (1977).

A P P E N D I X

R(S) = P 3 P l l( 1 - qlq2)( 1 - q6q8( 1 - P 7 ( 1 - q4q5)( 1 - qgqx 0)))( 1 - qt 2q13) + P3q 11PgP8(I - q t q2)( 1 - q 13( 1 - Pl z( 1 -- q 1 oq6( 1 -- pT( 1 -- q4q 5))))) +P3q l lP9q8( 1 -- qlq2)(1 -- q6(1 --pT(l -- q4qs)))(1 -- ql 2(1 --P~oPl 3))

+P3q l lq9(1 -- qlq2)(1 --(1 - -P6Pl 2)( 1 --P! 3( 1 -- qs( 1 - -Pl op7(1 -- q4qs))))) + q3Pl lP4Ps( 1 -- qlq2)( 1 -- qsq6( 1 --p7(1 -- qgqlo)))( 1 -- ql 2ql 3) + q3P 1 lP4q5(1 -- ql eql 3)( 1 -- (1 -- P2P8)(1 - -p 1(1 -- q6(1 -- P7( 1 -- q9ql 6))))) +q3p~lq4(1 -- q~zqt 3)(1 --(1 --PIP6)(1 --p2(1 --q8(1 --PsPT(1 -- qoq~o))))) + q3q 1 l PsP4PgPs( 1 -- q I q2)( l -- q 13( 1 -- p 12( 1 -- q6qTq 10)))

+ q3ql IPsP4Pgq5(Pl O( 1 -- q2(1 --pl(1 -- q6q7)))(l -- q lZql 3) + qlo(1 -- (1 --PzP~ 3)( 1 --Pl2pl(1 -- q6qT)))) + q3ql IP8P4qg(P2( 1 -- ql 3( 1 - -P~Pl 2( 1 -- qlqs(1 --PvPlo)))) + qzpl(1 --(1 --p6P12)(1 - -p l3( l --qs(1 --PTPlo))))) + qsql lPsq4P2(1 -- ql 3(1 - -Pl 2( 1 -- (l --p~P6)(1 --pg(1 -- qlo(1 --PsPv)))))) + q3ql lP8qgqzPlP6( 1 -- qlz( 1 --Pl3Po(1 -- qlo( 1 --PsPv)))) + qsql lqsP4P9( 1 -- ql( 1 --PzPs))( 1 -- q6qv)( 1 -- ql 2( 1 --PI0P13)) + q3ql lq8P4q9( 1 -- ql( 1 --P2Ps))( 1 -- (1 - -p6ptz ) (1 - -PvPloPl 3)) + q3ql lq8q4Pg( 1 -- (1 - -p lp6)(1 --PzPsPv))(1 -- ql 2(1 - -P loPl 3))

+ q3ql lq8q4qg( 1 -- ( 1 - -PxP6Pl 2)(1 - -P2PsPvPloPl 3))

optimum delay reliability distributed systems architecture

Documents