voting mechanisms in distributed systems

8
IEEE TRANSACTIONS ON RELIABILITY, VOL. 40, NO. 5, 1991 DECEMBER Voting Mechanisms in Distributed Systems Akhil Kumar Kavindra Malik Cornell University, Ithaca Cornel1 University, Ithaca Key Words - Voting, Cost vs reliability, Distributed system, Decision-making . Reader Aids - Purpose: Show an application Special math needed for explanations: Basic probability Special math needed: Same Results useful to: Designers of distributed systems Summary & Conclusions - This paper illustrateshow voting mechanisms can be exploited to improve the reliability of decisions in a distributed system. We assume a model of decision making in which several processors (nodes) are assigned to work in- dependently on various aspects of a problem, and each returns a binary answer to a coordinator node. The coordinator combines the answers using a voting mechanism to arrive at a fiial answer. Two issues are addressed. If the reliability of each node is known, then by assigning suitable votes to the various nodes, it is possible to maximize the prob- ability of a correct decision by the coordinator. If a cost vs reliability (C-R) function for each node is known, then it is possible to determine a best operating point for each node so as to minimize the total cost of the computation. Algorithms for minimizing the cost were designed and tested, and conditionsunder which savings can be realized were identifed. The magnitude of savings from such a distributed system were a function of the parameters of the individual C-R function. The parameters were associated with the slope (steepness)of the curve, and the savings were larger when the individual curve was steeper. On the other hand, if the slope of the individual curve was below a cut-off, then savings were not realized. The potential for savings from a judicious assignment of votes is illustrated. As one would anticipate, such savings occurred when, and only when, the C-R functions were non-identical. 1. INTRODUCTION In a distributed model of decision making, several nodes cooperate to solve a problem. The problem is framed such that it has a yes-no answer. Each system-node is specialized to: 1) work on some aspect of the problem, 2) reach an independent answer, and 3) send it to a coordinator. The coordinator com- bines the answers of all nodes. One common approach for the coordinator is to count the yes votes; if this number is greater than a threshold, the coordinator announces that the answer is yes, else it is no. In a military application to detect whether an object on the radar screen is an enemy aircraft, several nodes 593 might work independently on different aspects of the problem, and come up with independent answers. Eventually, the coor- dinator combines their replies and decides yes or no. Similar- ly, in an organization, a prospective employee is interviewed and evaluated independently by several managers and the final decision (to hire or not) is based on the number of favorable reports. Voting mechanisms have been proposed in the context of replicated data [ 1-21where read and update operations on multi- ple copies of a file muSt be synchronized. Here multiple copies of a file (or data item) are kept on different computers or nodes with statistically independent failure modes, a vote is associated with each copy, and to perform a read or a write operation, a quorum of votes must be assembled. By doing so, the file can be accessed in spite of failures of some nodes, and the file- system reliability is thus increased. This paper extends these mechanisms to another environ- ment. We assume that a problem (eg, making a decision) is distributed across several processors or nodes and that the prior probability that each node arrives at a correct decision is known. We address the following two issues: How can the probability that the coordinator decides correctly How can the cost of making the right decision be minimized. To the best of our knowledge, this extension has not been discussed in the literature. The two underlying reasons for distributed systems are speed and cost. If a reply is required in real t h e , then the response time improves by distributing the processing across several nodes. A single node might not produce a reasonable response time, or would have to be extremely powerful (and hence, expensive) to do so. Therefore, the characteristic plot of cost vs reliability (C-R curve) for a single node is of the form in figure 1. be maximized by assigning votes to the nodes 0 1 RELIABILITY Figure 1. Characteristic Plot: Cost vs Reliability (C-R Curve) 0018-9529/9 1 / 1200-0593$0 1.WO1991 IEEE

Upload: k

Post on 22-Sep-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Voting mechanisms in distributed systems

IEEE TRANSACTIONS ON RELIABILITY, VOL. 40, NO. 5 , 1991 DECEMBER

Voting Mechanisms in Distributed Systems

Akhil Kumar

Kavindra Malik Cornell University, Ithaca

Cornel1 University, Ithaca

Key Words - Voting, Cost vs reliability, Distributed system, Decision-making .

Reader Aids - Purpose: Show an application Special math needed for explanations: Basic probability Special math needed: Same Results useful to: Designers of distributed systems

Summary & Conclusions - This paper illustrates how voting mechanisms can be exploited to improve the reliability of decisions in a distributed system. We assume a model of decision making in which several processors (nodes) are assigned to work in- dependently on various aspects of a problem, and each returns a binary answer to a coordinator node. The coordinator combines the answers using a voting mechanism to arrive at a fiial answer. Two issues are addressed.

If the reliability of each node is known, then by assigning suitable votes to the various nodes, it is possible to maximize the prob- ability of a correct decision by the coordinator. If a cost vs reliability (C-R) function for each node is known, then it is possible to determine a best operating point for each node so as to minimize the total cost of the computation.

Algorithms for minimizing the cost were designed and tested, and conditions under which savings can be realized were identifed. The magnitude of savings from such a distributed system were a function of the parameters of the individual C-R function. The parameters were associated with the slope (steepness) of the curve, and the savings were larger when the individual curve was steeper. On the other hand, if the slope of the individual curve was below a cut-off, then savings were not realized. The potential for savings from a judicious assignment of votes is illustrated. As one would anticipate, such savings occurred when, and only when, the C-R functions were non-identical.

1. INTRODUCTION

In a distributed model of decision making, several nodes cooperate to solve a problem. The problem is framed such that it has a yes-no answer. Each system-node is specialized to: 1) work on some aspect of the problem, 2) reach an independent answer, and 3) send it to a coordinator. The coordinator com- bines the answers of all nodes. One common approach for the coordinator is to count the yes votes; if this number is greater than a threshold, the coordinator announces that the answer is yes, else it is no. In a military application to detect whether an object on the radar screen is an enemy aircraft, several nodes

593

might work independently on different aspects of the problem, and come up with independent answers. Eventually, the coor- dinator combines their replies and decides yes or no. Similar- ly, in an organization, a prospective employee is interviewed and evaluated independently by several managers and the final decision (to hire or not) is based on the number of favorable reports.

Voting mechanisms have been proposed in the context of replicated data [ 1-21 where read and update operations on multi- ple copies of a file muSt be synchronized. Here multiple copies of a file (or data item) are kept on different computers or nodes with statistically independent failure modes, a vote is associated with each copy, and to perform a read or a write operation, a quorum of votes must be assembled. By doing so, the file can be accessed in spite of failures of some nodes, and the file- system reliability is thus increased.

This paper extends these mechanisms to another environ- ment. We assume that a problem (eg, making a decision) is distributed across several processors or nodes and that the prior probability that each node arrives at a correct decision is known. We address the following two issues:

How can the probability that the coordinator decides correctly

How can the cost of making the right decision be minimized.

To the best of our knowledge, this extension has not been discussed in the literature.

The two underlying reasons for distributed systems are speed and cost. If a reply is required in real t h e , then the response time improves by distributing the processing across several nodes. A single node might not produce a reasonable response time, or would have to be extremely powerful (and hence, expensive) to do so. Therefore, the characteristic plot of cost vs reliability (C-R curve) for a single node is of the form in figure 1.

be maximized by assigning votes to the nodes

0 1 RELIABILITY

Figure 1. Characteristic Plot: Cost vs Reliability (C-R Curve)

001 8-9529/9 1 / 1200-0593$0 1 .WO199 1 IEEE

Page 2: Voting mechanisms in distributed systems

594 IEEE TRANSACTIONS ON RELIABILITY, VOL. 40, NO. 5, 1991 DECEMBER

Nomenclature algorithm to solve it, test the algorithm, and report the results. Section 2 defines some basic concepts related to voting and

shows how system reliability is computed from individual reliabilities and votes of individual nodes. Section 3 describes some propositions which are applied in the design of algorithm MIN for obtaining a minimum cost solution. Section 4 turns

Node reliability. Pr { the node statistically independently

Decision reliability. Pr {the correct decision receives a computes the correct decision}.

majority of the votes}.

Notation

A decision-reliability threshold Bi,Di,Ki parameters of the cost function for node i Ci(pi) cost of node i at a desired pi C-R curve cost vs reliability relationship for a single node N number of nodes Pi reliability of node i P reliability vector (pi, . . . , p N } P* ( V ) best reliability vector for a vote vector V R decision reliability (of a multiple-node system) TC total cost vi vote assigned to node i V vote vector {vi, ..., vN} W collection (set) of all possible vote vectors.

Other, standard notation is given in “Information for Readers & Authors” at the rear of each issue.

Assumptions

1. The cost of each node is represented by a continuous function that rises sharply as the decision-reliability threshold approaches 1 .

2. The total cost is the simple sum (over all nodes) of the node costs:

TC = Sum of Ci(p i ) over all i.

We consider two models for exploiting voting mechanisms in decision making:

Model 1

Maximize R (given P ) Decision variable: V

Model 2

Minimize TC such that R 2 A. Decision Variables: P, V

Model 1 has been studied in the context of replicated data [3-51 where multiple copies of a file are maintained. In that en- vironment, the individual node reliabilities are known, and the objective is to maximize availability of the file by appropriate- ly assigning votes to copies. In order to ensure read and write synchronization, an operation can be performed on the file system only if a majority of votes are assembled. Hence, there is a direct mapping between the availability in the multiple-copy case, and the reliability of a decision in our environment.

The contribution of this paper lies in proposing an effi- cient solution to Model 2. Consequently, we develop an

to experimental results obtained from testing MIN and com- pares it against a 1-node alternative.

2. BASIC CONCEPTS, FRAMEWORK

2.1 Model for Distributed Computation

A problem is accepted by a coordinator node that assigns each aspect of the problem to a specialized node. Each node computes and returns an answer to the coordinator. The coor- dinator weights each answer with the votes assigned to the cor- responding node, and makes the decision that receives a ma- jority of votes.

2.2 Computation of System Reliability

After all nodes compute their decisions, the number of votes for each decision (node) is summed. The quantity of in- terest is the decision reliability.

Section 3 gives an efficient algorithm for computing the decision reliability R. Since there are N nodes and each returns a binary answer, there are 2” possible state combinations; the R is determined by aggregating the probability of each state in which a majority of votes is received for the correct decision. Function R ( V, P ) computes R using a recursive function R1. The 4 arguments supplied to R1 are:

number of nodes that have been examined so far, i number of votes included in obtaining the majority, V,, number of votes excluded, V,,, probability corresponding to this combination of included and excluded votes, p .

function R ( V, P ) : real; function R1 (i, VI,, Vouout, p): real; begin

if VI, 2 majority then R1 : = p

else if V,,, 2 majority then RI : = 0

else if E;= v, - V,,, = majority then R1 := p Hy=,pl

else if i < n then R1 := R l ( i + l , V,,+v,, Vouout, pap,)

+ R l ( i + l , Vi,, Vout+Vl, p a p l ) end;

Page 3: Voting mechanisms in distributed systems

The main advantage of R ( V, P ) over a brute force method is that it prunes several search paths in the tree describing all possible combinations of included and excluded nodes, and therefore, the computational effort is reduced considerably. Now we outline our approach to integrating several nodes to minimize cost.

2.3 Cost Minimization

A general methodology is developed for minimizing the cost of designing a system with a prespecified decision reliabili- ty. The distinguishing characteristic of our approach lies in recognizing that the total cost can be minimized by selecting a best operating point on the cost function of each node and a best vote assignment for each node. The benefits are illustrated by the experimental results in section 4.

3. ALGORITHM MIN

We describe Algorithm MIN which minimizes the total cost of achieving a decision-reliability threshold. The problem of assigning votes to nodes so as to maximize system availability is a hard problem [3]. Therefore, no efficient optimal algorithm for this problem is known. Since the vote assignment problem is a sub-problem of our problem, any optimal solution is not efficient. Hence, optimality of MIN cannot be guaranteed. Our objective is to develop an algorithm that would run efficiently, and to demonstrate its performance.

MIN is based upon some results which are explained in section 3.1. Then section 3.2 discusses MIN.

3.1 Relationship Between Individual p i and R

MIN begins with an initial set of p i that produces a deci- sion reliability at least as high as the decision-reliability threshold, A. Then it makes 2 improvements:

Identify pairs of nodes i and j such that increasing p i and decreasing pi, while holding the decision reliability constant, results in a cost reduction. Then adjust those individual node reliabilities accordingly. Adjust the votes assigned to individual nodes.

These two steps are repeated successively until no further im- provement in cost is possible.

Before describing the details of this process, it is necessary to discuss some key relationships between individual node reliabilities and the decision reliability. The decision reliability is:

Notation

Poo Plo

Pr { Forming a majority given i votes no and j votes no} Pr {Forming a majority given i votes yes and j votes no}

Pol P1

Pr{ Forming a majority given i votes no and j votes yes} Pr {Forming a majority given i votes yes and j votes

implies the complement, eg, p = 1 - p - Y e 4

Thus, if the reliability of node j is changed from pi to pi, then the reliability of node i must be changed from p i to pf for the decision reliability to stay unchanged:

pf = [R - f l P m - p i P o l ] / @ i P l o - f i ’Pm

The following problem can be posed in this context. Given a reliability vector P , with a decision reliability R , then con- sider a pair of nodes with individual reliabilities p i and p j (p i 1 p i ) ; what is the effect on R of increasing p i by A while decreasing p j by A. This relationship between R and A is in- vestigated in appendix 1 .

Another useful expression can be derived by conditioning on whether a node i is included or excluded in forming the majority:

R = p i P l ( i ) + p i P 0 ( i )

Notation

P1 ( i ) P o ( i )

Pr {Forming a majority I node i is included} Pr{Fonning a majority I node i is excluded}

Given the reliabilities of all other nodes, (3) gives the value of p i that yields a decision reliability R. Therefore, if the reliabilities of all nodes, except one, are known, then (3) com- putes the unknown node reliability necessary so that the deci- sion reliability is equal to the decision-reliability threshold. Eq (2) & (3) are used in MIN.

3.2 Discussion of MIN

The 2 inputs to MIN are:

maximum number of available nodes, N the C-R curve for each node i , C i ( p i ) .

Initially, the cost minimization problem is solved assuming that all N nodes are employed. Subsequently, the number of nodes is iteratively reduced by successively eliminating the node with the least reliability, and a new solution is computed. Finally, the least-cost solution is chosen.

MIN has 3 phases: an initial phase; then phases I & I1 are executed repeatedly until no further improvement occurs.

Initial Phase

An initial V & P are constructed. The votes are always assigned to nodes in order of decreasing reliability: The highest reliability node is given the largest vote, etc.

I

KUMAWMALIK: VOTING MECHANISMS IN DISTRIBUTED SYSTEMS 595

7 ~- - ~ ____ ~ ~

Page 4: Voting mechanisms in distributed systems

596 IEEE TRANSACTIONS ON RELIABILITY, VOL. 40, NO. 5, 1991 DECEMBER

Initial V If the number of nodes is odd, then each node is assigned 1 vote, else node 1 receives 2 votes and the remain- ing nodes receive 1 vote each. This assignment ensures that the initial V is undominated [3]. Initial P: Assign a reliability p to each node i ; p is computed to ensure that the decision reliability (for this p ) is A.

Phase I

This phase attempts to adjust P to minimize cost, while holding Vunchanged. This is done by identifying a pair of nodes i and j , such that the cost per unit-increase in pi is smaller than cost per unit-decrease in p,. Next, the possibility of lowering pj by a small interval, to a new value piis considered. The new p / for p i such that R remains constant is computed from (2). If the cost of operating at pf and p i is less than the cost at p i and p,, then the operating point is moved to pf and pi. This pro- cess is repeated until no other such pair is found. This results

U in a least cost P* ( V ) for a given V .

Phase ZZ This phase examines the possibility of further cost savings

by modifying V . Ref [4] gives an algorithm for determining a partially exhaustive set of vote assignments for a given number of nodes. This algorithm is used to generate such a set, W, of alternative vote assignments. Then the R corresponding to the reliability vector P* ( V ) is recomputed for each Vin W. If any of these assignments results in a value of R greater than A , then it is possible to reduce the cost further by selecting the assign- ment V’ that produces the highest R. For this new assignment, the P is readjusted by lowering one p i , using (3), to produce a decision reliability of exactly A . The cost is reduced further in this manner; then phase I is reinvoked with the new V & P . 0

This repeated execution of phases I & I1 terminates when no savings result from phase 11. The detailed steps in MIN are listed below.

algorithm MIN ( N ) ; function COST ( P ) : real; ( Given probability vector P , compute cost ) begin

end; COST : = C i ( p i )

function Rel(V,P): real; ( Given vote vector V and probability vector P , compute

begin maximum reliability )

Let .rr be a permutation of 1,2, ... ,N such that

P r ( 1 ) P s ( 2 ) ... P s ( N ) ;

Let vote vector V be such that

v1 2 v2 ... 2 V N ;

Construct a vote vector V’ such that v; : = vi; Re1 := R ( V ’ , P )

end;

procedure phase1 ( V, P); ( Given vote vector V, compute a least cost probability

begin vector P )

for each pair i j of processors such that ( $ I p , > $ I p , ) do begin

repeat

compute p / from expression (2); if C i ( p f ) + q ( p i ) < C i ( p / ) + C,(p;) then begin

p i : = p i - € ;

p . =p!. p . = p ”

I 1 )

J J ’ end else flag(ij) = 1 ;

until (%Ip, = 21p,) or flag(i,j)=l;

end end

procedure phaseII(V,P); ( Examine other vote vectors to further reduce cost ) begin

done : = false; repeat

Find V’ such that Rel( V ’ , P ) = m w { R e l ( W , P ) } ( Vote vectors W are elements in a list of distinct

if Re1 ( V‘ , P ) > Re1 ( V,P) then begin vote assignments )

V : = V ’ ; lower a p i using ( 3 ) so that Rei( V,P’ ) = A ; P = P ’ ; phaseI(V ,P);

end else done : = true;

until done end;

if odd(N) then else begin

end In the initial vector P , set p 1 = p2 =...pN = po s.t.

MinCost : = mini C i ( A ) ; ( This is the minimum single system cost ) repeat

begin : = 1

:= 2 ; V’ := V3 := ... VN := 1

Rel(V,P) = A ;

phaseI(V ,P); phaseII(V ,P); if MinCost > COST(P) then begin

MinCost : = COST(P); bestV : = V bestP : = P

Page 5: Voting mechanisms in distributed systems

K U M M M A L M : VOTING MECHANISMS IN DISTRIBUTED SYSTEMS 597

end; Exclude system 1 with the least reliability pr; N : = N-1;

until N < 3; Output MinCost, bestV, b e d

end.

4. EXPERIMENTAL RESULTS

4.1 Introduction

We ran the tests on a VAX-11/785 computer, and the run- ning time in all cases was a fraction of a second. The algorithm converges extremely fast. The following form is assumed for Ci(Pi):

Ki, Bi, Di are parameters of node i.

This family of cost functions has the general form in figure 1. Two features about this cost function are important:

Bi affects the slope of the curve. A high value of Bi cor- responds to a curve which rises very steeply as pi approaches 1, while a lower value corresponds to a curve which rises gradually. By making (Bi - Di) equal to 1 in all our experiments, we ensured that, when pi = 1, then Ci(p i ) = e.Ki =: 2.7Ki.

Two types of experiments were conducted.

1. The cost functions for all nodes are identical. The results

2. The cost function for each nodes is different. The results are in section 4.2.

are in section 4.3.

4.2 Identical Nodes

Ki = 15000, N ranges from 1 to 7.

A. Figure 2: Bi= 15, Di = 14.

Figure 2 is a plot of the cost vs decision reliability for odd values of N. The plots for even values of N were omitted because they were dominated both by the next-higher and next-lower odd value of N. R ranges between 0.95 and 0.999. This figure shows that the 7-node case is the least-cost alternative for all values of R, except 0.999. The savings in the 7-node case, com- pared to the 1-node case, are largest (49.5%) when R=0.95 ,

0

In the next two sets of experiments, Bi & Di were changed to see how the sharpness of the individual curves affects the results.

B. Figure 3: Bi=8 & Di=7.

Figure 3 shows that the 1-node case is the least expensive for all values of R considered. The reason is that the individual cost functions are not sufficiently steep in the neighborhood

and taper off to 3.7% when R=0.995.

80000

60000 v) 0 V

40000

0.94 0.95 0.96 0.97 0.98 0.99 1.00 Overall Reliability

Figure 2. Cost vs Reliability for Various N (Identical nodes case: Ki=15OOO, Bi=15, Dj=14]

200000

c1 v1 0 0

100000

0

I N = 7

N = 5

N = 3

N = l

0.94 0.95 0.96 0.97 0.98 0.99 1.00 Overall Reliability

Figure 3. Cost vs Reliability for Various N (Identical nodes case: Ki= 15000, Bi=8, Dj=7]

- -.

Page 6: Voting mechanisms in distributed systems

598 IEEE TRANSACTIONS ON RELIABILITY, VOL. 40, NO. 5, 1991 DECEMBER

of R = 1, and the cost does not drop rapidly as the correspon- ding reliability falls away from 1. If the cost function drops sharply, then it is cheaper to have several nodes operating at lower reliability (and therefore, lower cost) than having 1 node operating at a high reliability (and therefore, high cost). On the other hand, if the cost function drops very gradually, then 1

On varying Bi & Di further, it was found that Bi= 12, Di= 11 is the cut-off point where the I-node case ceases to dominate.

C. Figure 4: Bi=35 Di=34.

node is preferable.

Figure 4 is quite similar to figure 2, except that in figure 4-

The 7-node case is always the least-cost alternative, even when R = 0.999. The cost savings are exorbitant compared to those in figure 2. This is because the individual C-R curve ( N = 1 ) in figure 4 is considerably steeper, in the neighborhood of R = 1, than the one in figure 2; and operating at a lower value of R on a steeper curve leads to larger benefits.

80000

60000

Y m 0 V

40000

20000

0

- N = l - N = 3 - N = 5 N = 7

the best, while for Bi < 12, the 1-node case was the best. For some Bi in between, the I-node case was the best at very high values of R (eg, 0.999) and the 7-node case was the best when R was lowered.

2. Voting did not help in most cases. This means that when all nodes are identical, phase 2 of the algorithm does not lead to any additional savings.

4.3 Non-Identical C-R Curves

The parameters K,, B,, Di differ among nodes. Two cost functions were considered.

A. Figure 5

K. = 22000, Bi = 50, Di = 49 for 4 nodes Ki= 16000, Bi=20, Di= 19 for the other 3 nodes.

Figure 5 is a plot of least cost vs R. R was varied from 0.98 to 0.999, and the cost computed at several points in the range for all values of N . Again, only odd values of N are plot- ted because the even values are dominated by the odd ones.

0.94 0.95 0.96 0.97 0.98 0.99 1.00 Overall Reliability

Figure 4. Cost vs Reliability for Various N [Identical nodes case: Ki= 15000, Bi=35, Dj=34]

0

60000

m 0 V

D. Summary of Results

1. The least cost occurred either in a 7-node or a 1-node situation. Bi is a surrogate for the steepness of an individual C-R curve. When B; is sufficientlv large. the 7-node case was

40000

20000

0

Overall Reliability

Figure 5. Cost vs Reliability for Various N [2 different cost functions]

The 7-node system is the least-cost alternative for all values of R examined. For values of R less than 0.99, the 7-node system is only marginally better than the 5-node system; however, the gap increases rapidly beyond that point. The cost savings in a 7-node system as against a 1-node system are larger at lower -

I " I values of R. For instance, when R=O.98, the savings are

Page 7: Voting mechanisms in distributed systems

KUMAWMALIK: VOTING MECHANISMS IN DISTRIBUTED SYSTEMS 599

94.7%, whereas when R increases to 0.999, the savings shrink to 31.7%. The costs also include any improvements from ad- justing the vote assignment in phase 2 of MIN.

To study the savings that occur from finding the best assignment of votes, table 1 gives the cost of a 7-node system for the case of equal votes, and unequal votes, along with the fraction of savings. A judicious assignment of votes does lead to reduced cost. For instance, when N = 7, and R = 0.98, phase 2 of MIN results in an additional cost-reduction of 17.9%.

TABLE 1 Effect of Unequal Votes on Cost, vs Reliability

[N = 7, and 2 different cost functions]

Desired Equal Unequal Savings Reliability Votes Votes (%)

0.9800 1409 1157 17.9 0.9825 1723 1440 16.4 0.9850 2151 1831 14.9 0.9875 2759 2446 11.3 0.9900 3674 3440 6.4 0.9925 5168 5168 0 0.9950 7975 7975 0 0.9990 29116 29116 0

Table 1 also illustrates that savings from the best vote- assignment are greater when R is small. For a large value of R (above O B ) , savings do not occur in phase 2 of the algorithm. This is because, for a high value of R, the spread between the minimum and maximum reliability in P is small, while it is larger for lower values of R. When the spread in P is small, adjusting the vote assignment does not help.

B. Figure 6

A different cost function is used for each node.

Steepest curve: Ki = 22000, B, = 70, Di = 69 Shallowest curve: Ki = 16OO0, B, = 20, Di = 19

The remaining C-R curves were uniformly spaced out within the range defined by these two curves.

Figure 6 shows that a multiple-node system is better than a 1-node alternative. However, a 5-node system is the best for R in the range 0.98 to 0.99, and a 7-node system is best for R beyond 0.99.

An intuitive understanding for this behavior can be gain- ed by thinking in terms of the slope of an individual C-R curve. Since this slope increases monotonically with reliability, cost savings from lowering the operating point by a small, fixed in- terval are greater when the operating point is itself higher. Because of this phenomenon, for R = 0.99, using 7 nodes, in- stead of 5, lowers the operating point of each node appreciably on the C-R curve, and leads to cost savings. On the other hand, when R < 0.99, the savings from shifting the operating point lower in the 7-node case are not offset by the increase in cost as a result of adding 2 nodes. Hence, the 5-node system is best in this range of R.

0.97 0.98 0.99 1 -00 Overall Reliability

Figure 6.Cost vs Reliability for Various N [7 different cost functions]

Table 2 gives the savings from unequal vote assignment when N = 7 . Phase 2 does lead to a cost reduction; however, as illustrated in figure 6, in all cases except one, the 5-node system is cheaper than the 7-node system. The exception is the case, R = 0.9925; the savings from unequal voting are a modest 2.6%. Therefore, the savings from phase 2 are low when all 7 nodes have different cost functions.

TABLE 2 Effect of Unequal Votes on Cost, vs Reliability

(N = 7, and 7 different cost functions]

Desired Reliability

Equal Votes

Unequal Votes

0.9800 0.9825 0.9850 0.9875 0.9900 0.9925 0.9950 0.9990

392 502 661 898

1281 1950 3333

16689

31 6 432 592 833

1228 1900 3333

16689

Savings ( O 4

19.4 13.9 10.4 7.2 4.1 2.6 0 0

APPENDIX Relationship Between Variation in pi & R

Given a pair of node reliabilities (pi, p j ) , and assume, without loss of generality, that p i > pp We determine how

Page 8: Voting mechanisms in distributed systems

600 IEEE TRANSACTIONS ON RELIABILITY, VOL. 40, NO. 5 , 1991 DECEMBER

increasing p i to p i + A , while decreasing pi to p j - A ( A > 0) , affects R. An understanding of this relationship is useful because: If a) the slope of the cost curve for node i is smaller at p i than the slope of the cost curve for node j at p j , and b) R increases with increasing A, then increasing pi by a small in- terval, while decreasing p i , definitely reduces total cost. The

Otherwise. Define K = - K2/ ( 2 K 1 ) . R increases mono- tonically in the range 0 < A < K , decreases monotonically for A > K , and is constant for A = K .

REFERENCES ~ - _ I

notation is explained in section 3.1.

pi = pj = p m , and write R as (from (1)): To understand how A affects R, we that initially [ I ] D. K. Gifford, “Weighted voting for replicated data”, Proc. Th ACM

[2] R. H. Thomas, “A majority consensus approach to concurrency control”, SIGOPS Symp. Operating Systems Principles, 1979 Dec, pp 150-159.

ACM Trans. Database Systems, vol 4, 1979 Jun, pp 180-209. [3] H. Garcia-Molina, D. Barbara, “How to assign votes in a distributed

system”, J . ACM, vol 32, 1985 Oct, pp 841-860. [4] A. Kumar, A. Segev, “Optimizing and evaluating algorithms for replicated

data concurrency control”, Proc. 91h Int’l IEEE Con$ Distributed Com- puting Systems, 1989 Jun, pp 101-109.

[5] A. Kumar, A. Segev, “Cost and availability trade-offs in replicated data concurrency control”, Lawrence Berkeley Labs Technical Report, 1990 Feb, pp 1-32.

R = F i ~ 0 0 + Pm F m lo + Pm F m pol + p i ~ 1 1 .

Now consider a change p m - A . We would like to find the conditions under which this would lead to an increase in R. The new value of R is:

such that pi = pm+A> and PJ =

R‘ = (p i - A2)Pm + ( p , Pm + A2 + P A + FA)P,o

+ p m p m + A 2 - P A - FA)PoI + ( p i - A 2 ) P , l .

On further simplification, the condition for R‘ - R 2 0 is: AUTHORS

A2(Plo + Pol - Pm - Pl1) + A(P10 - Pol) L 0. Dr. Akhil Kumar; S. C. Johnson Graduate School of Management; Cornell University; Ithaca, New York 14853 USA.

Akhil Kumar is an Assistant Professor of Information Systems at the Johnson Graduate School of Management of Cornell University since 1988 Ju- ly. He received a BTech in Electrical Engineering from the Indian Institute of Technology, New Delhi and an MBA from the Indian Institute of Manage- ment, Ahmedabad. He also obtained a MS in Computer Science and a PhD in Management Information Systems from the University of California, Berkeley. His research interests are in database systems, distributed systems, and expert systems. He has also worked in industry for 4 years in system analysis and design. He is a member of ACM and the IEEE Computer Society.

1. All Votes Are Equal Dr. Kavindta Malik; S. C. Johnson Graduate School of Management; Cornell University; Ithaca, New York 14853 USA. K? =o because PI^ =Pol. Therefore, the relationshiu be- _- ”.

Kavindra Malik is an Assistant Professor of Operations Research at the Johnson Graduate School of Management of Cornell University. He received BEngg in Industrial Engineering from University of Roorkee, MTech in In- dustrial and Management Engineering from Indian Institute of Technology, Kan-

tween R’ - R and A depends only upon K 1 . R increases monotonically with A if K 1 > o, while it decreases monotonically if K , < 0.

2 . Otherwise.

Since pi > p j , it follows that-

pur, and PhD in Decision Sciences from the Wharton School of the University of Pennsylvania. His research interests are in applied mathematical program- ming and computational solution of applied problems. He is a member of TIMS, ORSA. and MPS.

a. vi L vj: A more reliable node is never assigned a vote smaller than that assigned to a less reliable node. (The proof is simple and is omitted.) Manuscript TR89-209 received 1989 December 11; revised 1990 August 21;

revised 1991 April 25. b. Plo > Pol and hence K2 > 0. There are 2 cases:

K1 L 0. R increases monotonically with A IEEE Log Number 01572 4 T R F