optimal partitioning of randomly generated distributed programs
TRANSCRIPT
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, YOLo SE-12, NO.. 3, MARCH 1986
Optimal Partitioning of Randomly GeneratedDistributed Programs
BIPIN INDURKHYA., HAROLD S. STONE, SENIOR MEMBER, IEEE, AND LU XI-CHENG
483
Abstract-This paper investigates an optimal task-assignment policyfor a random-graph model of a distributed program. The model of thedistributed computer system assumes that communications overheadadds to total run time and that total run time decreases as the numberof processors running the program are increased. When the processorsare homogeneous, the optimal task-assignments are extremal in thesense that tasks are totally distributed among all processors as evenlyas possible or not distributed at all. The point at which the policy showsa sharp change of behavior depends upon the ratio of runtimes to communication times.
We derive two important properties of the optimal task-assignmentsfor heterogeneous processors. The first property is that an optimal policy distributes the cost of processing among the processors as evenly aspossible so that a processor with higher speed gets more tasks and viceversa. The second property determines the number of processors amongwhich to distribute the tasks evenly. In the special case when there isa uniform degradation of processing speed it is shown that the optimalpolicy again exhibits an extremal characteristic.
Index Terms-Computer networks, distributed computers, local areanetworks, multiprocessors, optimal partitioning, random-graphmodels, task assignments.
I. INTRODUCTION
As the number of distributed-computer systems has increased in recent years, a major research question that
has developed is the question of task partitioning for maximum performance. Intuitively, a program should be distributed over many processors to take advantage of parallel computation. However, overhead and communication delays dictate that tasks should be distributed overas few processors as possible in order to reduce the negative effects on performance due to distribution of computation. These two contrary aspects of distributed systems make task assignment particularly difficult becausethey drive task assignment policies in opposite directions.
Task assignment is the process of assigning tasks, dy-
Manuscript received lanuary 31. 1983: revised March 6, 1984. Thiswork was supported in part by the National Science Foundation under GrantMCS-78-05295.
B. Indurkhya was with the Department of Computer and InformationScience. University of Massachusetts. Amherst, MA 01003. He is nowwith the Department of Computer Science. Boston University, Boston. MA02215.
H. S. Stone was with the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA 01003. He is nowwith the IBM Thomas 1. Watson Research Center. Yorktown Heights. NY10598.
L. Xi-Cheng was with the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst. MA 01003. He is nowwith the Changsha Institutute, Changsha, Republic of China.
IEEE Log Number 8404940.
namically or statically, to processors in a distributed system. Past research has treated various techniques for suchassignment [2], [3], [5]-[7], [9]-[ 12]. In this paper wederive an optimal policy for task assignment for a random-graph model of a distributed program. The model ofthe distributed-computer system assumes that the totalcommunication bandwidth among processors is a fixedconstant, and that the time attributed to communicationdelays is proportional to the total amount of data exchanged among processors. This model is quite reasonable for distributed systems that share a common communication bus such as ETHERNET [8] or a commoncommunications ring such as the proposed IEEE Standard802.4. The model is less acceptable for a distributed system that contains internal point-to-point connections. Byaggregating bandwidth and aggregating traffic in order toestimate communication delays, the model fails to account for additional delays in a point-to-point communications structure due to nonuniform traffic patterns whichtend to add delays on some links while other links arelightly loaded.
Distributed programs are modeled by random graphs.We do not claim that a random graph is an accurate modelof a distributed program. In fact, the random graphs usedin this paper do not exhibit locality of reference. A program module is represented by a node in the random graphand its traffic with other modules is represented by arcsconnecting that node with other nodes. In our model, eachnode is like every other node, and communications between any pair of nodes is as likely as between any otherpair of nodes. This model is tractable, and we can obtainan optimal policy for distributing tasks by working withthe model. Although real distributed programs are not accurately characterized by our model, we conjecture thatprogram locality forces sets of modules to be assignedtogether into "super modules," and that the random-graphmodel may be a more acceptable characterization of thesuper modules of a program where it is not acceptable forthe modules themselves.
The results obtained in the body of this paper are somewhat surprising. The effects of parallel execution tend toforce tasks to be distributed evenly, while the effects ofcommunication overhead tend to force tasks to be assigned to as few processors as possible. The two effectstaken together result in a policy that is discontinuous. Fortwo processors, the optimal policy assigns tasks to thetwo processors in such a way that either both the proces-
0098-5589/86/0300-0483$01.00 © 1986 IEEE
484 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-12, NO.3, MARCH 1986
sors are kept busy all the time or only one of the processors is doing all the tasks, depending on the relativeamount of computation and communication. There is nointermediate policy that allows one machine to do morework than the other unless the heavier loaded machinedoes all the work. If both the processors are identical,then this results in a policy that either distributes the tasksevenly between the two processors or assigns all of themto anyone of the processors. If the processors have different processing speeds then this policy dictates thateither the processors are assigned tasks inversely proportional to their speed, i.e., the slower processor is assignedless tasks and vice versa, or all the tasks are assigned tothe faster processor.
For P processors, for small communication times, theoptimal policy distributes the tasks in a way that maximizes the use of all P processors, i. e., all the processorsare kept busy all the time. If the communication overheadis high then the optimal policy is to place all tasks on onemachine. As for two processors, either the work is distributed as evenly as possible or is not distributed at all.
Section II of this paper provides background of theproblem and describes the computational model. The optimal policy is derived in Section III. The central limittheorem [4, pp. 243-246] is the key tool that simplifiesthe problem and yields a sharp statement of the optimalpolicy. In Section IV we extend our model so that different processors can have different processing speeds andwe derive an optimal scheduling policy for this more general case. Section V concludes the paper with a discussionof how the results can be used and lists other related openresearch questions.
II. BACKGROUND AND COMPUTATIONAL MODEL
Distributed-computer systems are computer systemsthat consist of multiple processors connected togetherthrough a high-bandwidth communication link. The communication link provides a means for each processor toaccess data or subroutines on remote processors, albeitwith a much greater delay and lower net bandwidth thanaccess to local data and subroutines. Typical systems exhibit communication bandwidths from 10 to 100 timeslower than memory bandwidths. Distributed computerssupport distributed execution of programs in the sense thateach program is divided into tasks, and each task is assigned to a processor, not all of which are the same processor. Hence, a distributed program can exhibit parallelexecution, and therefore is potentially faster to executethan the same program executed on a single processor. Ifthe communications bandwidth is very low compared tothe memory bandwidth, say more than 100 times lower,the cost for communications among tasks becomes veryhigh and negates gains attributed to parallel execution.When bandwidths are sufficiently low, parallel executionis of no benefit in terms of reducing total execution time.Nevertheless, such systems are still useful for purposes ofreliability or for supporting multiple access to shared poolsof data. On the other hand, when communication band-
widths are very high and approach the bandwidth of localmemory, it becomes feasible not only to distribute tasksacross processors, but to reassign tasks dynamically tobalance loads on the processors.
The purpose of this paper is to quantify the generalstatements made here. We derive an optimal policy fortask distribution that is a function of the relative costsattributed to computation time and communication delays. To derive this policy, we first present our model ofa distributed computer and distributed program.
A distributed computer system consists of:1) P identical processors, 1 and2) a communications network that has a maximum ag
gregate bandwidth B.Programs are broken into tasks which in turn are as
signed to processors in the network. The tasks communicate by means of messages exchanged over the communications network.
A random-graph model of a distributed program modelsa program as a graph with N weighted nodes, each noderepresenting a task of the program with the node weightrepresenting the running time, and a collection ofweighted arcs where an arc < x, y> that connects nodesx and y with weight w represents w bits of data that areinterchanged between x and y during the running of theprogram. The graph is a random graph in that we assumethe following:
1) an arc is present in the graph with probability p ormissing with probability (1 - p);
2) the communication times, represented by the-weightsof the arcs that are present, are independent and identically distributed nonnegative random variables with anyprobability density P; that has a finite variance a; and meanC; and
3) the running time of the tasks are independent, identically distributed nonnegative random variables, whoseprobability density P, has finite variance a; and mean R.
This model is similar to the random-graph model studied by Stone and Benard [1]. The major difference in thispaper is that in the former research no two processors execute concurrently on one graph program. Parallelism isachieved strictly through multiprogramming. In the present research, parallel execution is presumed to be possible, and we attempt to exploit parallelism to minimizetotal execution time."
The objective of a task assignment policy is to take advantage of parallelism to reduce total execution time. Atask assignment is simply an assignment of each node ofthe program graph to one of the P processors. Every nodeis assigned to some processor and no node is assigned tomore than one processor by the assignments consideredin this paper. Execution times differ from assignment toassignment and are a function of task runtimes and processor-to-processor communication times. We model theexecution time of an assignment by summing all com-
lWe will extend this to model to include the cases when the processorsare not identical in Section IV.
INDURKHYA et al.: OPTIMAL PARTITIONING OF DISTRIBUTED PROGRAMS 485
munications times from processor-to-processor and adding this to the execution time of the busiest processor. Thealgorithm for computing the execution time of an assignment is the following:
1) For each arc (Xi, Xj) let d ij be defined by
d .. = [0, if Xi and Xj are assigned to the same processor
IJ 1, otherwise.
2) Compute the communication time from the formula:N N
T = ~ ~ (d··c·)C i = I j = I IJ lJ
where cij is the cost of communication between ith andjthmodule.
3) Variable eij denotes task assignment and is definedby:
e .. = [0, if processor i does not execute task jlJ •
1, otherwise.
4) Compute the task runtime from the formula:
t; = M~X [.f eijrjIi= I J= I j
where rj is the time it takes to run the jth module. Thus,T, is the execution time of the busiest processor.
5) The execution time of the assignment is the sum:
T, = T; + F:
Fig. 1 shows an example of the model. In this examplethere are two processors PI and P2 , and six tasks. Thetasks are represented as circles and the numbers inside thecircles represent their run times. Communication timesbetween any two tasks are represented by labeling the corresponding arc. Thus, for instance, Task A has a runtimeof 6 units and the communication time between Tasks Aand B is 7 units. The arcs with zero weight are not drawnfor clarity. In the assignment shown Tasks A, B, and Dare assigned to Processor 1 and Tasks C, E, and Fareassigned to Processor 2. The total execution time givenin the figure is the sum of the communication times between the processors and the time to execute Tasks A, B,and D since Processor 1 is the busiest one.
Consider the execution time function and observe howit models a distributed computer. Communication costsare nonzero when a task communicates over a link to another task. They are zero otherwise. Therefore, the communication times in this model reflect the time requiredto communicate over a link that is in excess of the timerequired to communicate within a single computer. Thisexcess includes the overhead required to initiate and terminate a message, as well as transmission delays due tothe finite bandwidth of a communication link.
Step 2 adds all communication times together. This isan accurate model when all communications use a com-
P1 P2
A 2 C6 5
6 5
7
8
B 47
Tr=17
Tc = 12
Te = 29
Fig. 1. An example of task assignment.
mon link such as a shared bus or communications ring thatacts like a bus. It is not accurate when the communications system has independent point-to-point links. A system with several links has a potentially large aggregatebandwidth, but at specific times during a computation itcan suffer from saturation of relatively few links. Ourmodel treats such a system as if all links were aggregatedinto a single link, and as if all messages were transmittedover that single link. Therefore, it presents an overly optimistic view of communication times because it does notaccount for saturation of individual links.
The estimate for running time of tasks is an optimisticestimate because the model does not account for delaysattributed to synchronization among tasks. In our modelno task has to wait for any other task. In reality, waitingmay occur rather frequently among a set of cooperatingtasks.
Finally, the timing estimate simply adds the communication delays to the running times, and does not accountfor situations in which communication occurs in parallelwith task execution. Consider, for example, a situation inwhich Task A must send to and receive from Task B. Ourmodel assumes that Task A expends time to send a message to B, then expends time computing and then expendstime receiving a return message from B. In reality, TaskA may execute concurrently with the sending and receiving of messages from Task B. Hence, this inaccuracy ofthe model tends to overestimate execution times andtherefore tends to compensate for the inaccuracies mentioned above. In fact, in the present example, assume TaskB must wait for the message from Task A before it computes and it runs to completion before it returns a messageto Task A. Then the time expended by invoking Task Bis the sum of its communication and runtimes. If the runtime of Task B is greater than that of Task A, then theactual execution time is the same as our estimated execution time. If, on the other hand, the runtime of Task Ais greater than that of Task B then our model overestimates execution times. At worst, our model adds the communication times that may actually be performed fully inparallel with the execution of Task A.
486 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-I2, NO.3, MARCH 1986
Pl
0) 88 8
/' <,
K )<, ,/
k(N-k)
communicationlinks
(0 8Fig. 2. An assignment of N tasks between two processors.
III. AN OPTIMAL TASK-ASSIGNMENT POLICY
This section derives an optimal task-assignment policyfor the processor and program models developed in theprevious section. The policy for a two-processor systemis somewhat surprising in that it selects one of the following two assignments:
1) tasks are evenly distributed between the two processors, or
2) all tasks are placed on one processor.There are no assignments that distribute tasks unevenlyamong the two processors. The choice depends on the ratio of running time on the processors to communicationdelays. For small communication times, the policy distributes tasks evenly. For larger communication times, thepolicy concentrates the tasks on a single processor. Thus,the policy exhibits a sharp change as communication timesincrease from negligible to large. For three or more processors, the optimal task policy again distributes the tasksevenly among all the processors for small communicationtimes and assigns all the tasks to one of the processorsfor higher communication times. The point at which thepolicy exhibits a sharp change of behavior depends on theratio of communication times to runtimes.
A. Optimal Task Assignment for Two Processors
Fig. 2 shows a model of a program to be assigned totwo processor PI and P2 • The program consists of N tasksand is modeled as a random graph as per our descriptionin the previous section. The tasks are represented as circles and the numbers inside the circles represent their indexes. Assume that processor PI is assigned k tasks andprocessor P2 is assigned the remaining (N - k) tasks. Forconvenience, the tasks are indexed so that the first k tasksare assigned to PI and the remaining (N - k) tasks areassigned to P2 • Likewise, the keN - k) communicationlinks between the two processors, not shown in the figureexplicitly, are arbitrarily numbered from 1 to kiN - k).We let r, denote the runtime of the ith task, and c, denotethe ith communication time. Then according to our model,the total execution time is given by:
(1)
Let T,n be the expected value of execution time I : Then,taking the mean of both sides yields:
t; = E(Te) = E IMAX r± r., . f r;J-j + E Ik(N~ k) c, I
L l[=1 [=k+1 l i=1
(2)
where E denotes the mean, or expected value.In order to simplify the above equation by eliminating
the MAX operator, we introduce the following approximation at this point:
Approximation 1 au.
E rMAX r± r., . f r;J-ll . l[ = I [ = k + I J
= MAX r,E r, ±r;l, E I. f r;IJ·e l[=1 J=k+1 _ (3)
At the end of this section we will consider how realisticthis approximation A 1 is and discuss the error that maybe introduced due to AI.
According to our assumptions the variables r, are independent and identically distributed. By the central limittheorem, the mean of a sum of such variables has a distribution that approaches a normal distribution with amean equal to the sum of the individual means and witha variance equal to the SUITI of the individual variances.Similarly, the variables c, are independent and identicallydistributed, and hence the central limit theorem is alsoapplicable to their sum. This leads to the following theorem.
Theorem 1: Under the assumption AI, the optimal task-assignment policy, i.e., the one that minimizes T,'l' isgiven as follows:
N N R- if- ~ -
k== 2 2 C
o or N otherwise
where R denotes the mean run time of a module and Cdenotes the mean communication time between two ITl0dules.
Proof' From (2) above and using A 1 we can write itas:
T,n(k) = MAX [E l;~1 rij, E l;=~+ I rill + E lk(~lk) -.
(4)
At this point we can invoke the central limit theorem andreplace the sum of random variable r, and c, by the SUITI
of their means, viz. Rand C, respectively. This gives us:
Tm(k) == MAX {kR, (N - k)R} + keN - k)C. (5)
INDURKHYA et al.: OPTIMAL PARTITIONING OF DISTRIBUTED PROGRAMS 487
(6)
This expression for Tm is a parabolic function of k that hasonly one critical point (at k = ~(N + RIC)) which is amaximum. Therefore, the minimum value must occur atone of the extremities, viz. at k = N or at k = N12.
At k = NI2 we have:
Without loss of generality, let us suppose that N ~ k ~
N12. Then the above equation reduces to:
T,n(k) = kR + keN - k)C
Clk(N - k) + k~J
c (R)2N+- 2
- k- T + ~ (N + ~)
~N2C ---------------- --------
o
T
~N
Tr
RN
N
RN
k
C (R)2N+-
- ~ -T + ~ (N + ~y
At k = N we have:kN~No
~N2C ___-=.=_--.-_~Tc
B. Optimal Task Assignment for Processors
Now we extend our optimal assignment policy to include the cases when there are P identical processors andN tasks. Consider an assignment that assigns k I tasks tothe 1st processor, k2 tasks to the 2nd processor, . . . , andthe remaining kp tasks to the pth processor. For simplicity, let us assume that the first k I tasks are assigned toProcessor 1, the next k2 tasks are assigned to Processor 2,and so on. Thus, following the algorithm presented in thelast section, the runtime of this assignment is given by
Fig. 3. Two graphs showing the composition of the terms in the taskassignment function.
(8)
(7)
(N - ~) ~ ~C C'
(R)2N+-
- N-TC
In order that Tm(NI2) ~ Tm(N) we must have:
Thus the optimal assignment policy must assign the Ntasks evenly between both the processors, if NI2 ~ RIC,and assign all the N tasks to one of the processorsotherwise. Q.E.D.
To understand (6) intuitively, we show in Fig. 3 superimposed plots of the two terms in the equation for twodifferent values of Rand C. Note that the effect of addinga linear (runtime) function to a parabolic (communicationtime) function is to create a different parabola. The minimum must be one of the end points k = NI2 and k = Ndepending on whether the runtime at k = N is small orlarge.
In order to compute the communication time, consider thekj tasks that are assigned to the jth processor. Each ofthese tasks communicates with the remaining (N - kj )
tasks that are assigned to different processors. Thus, therewill be kj(N - kj) communication links running from thejth processor to the other processors. Let these links bearbitrarily numbered from 1 to kj(N - kj) and let cij denotethe ith link (the second subscript j is for the processor).Then the cost of communication between the jth processorand other processors is given by:
or
N R- <-2 - C (9)
kj(N -~i)
l:i= I
(10)
(11)
488 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-12, NO.3, MARCH 1986
(16)
C 2 2 2 2= klR + "2 [N - k) - k 2 - • • • - kp]
C- "2 [2(k2 - k3) + 2]
Tm(k., k2 , k3 , • • • , kp) - C[k2 - k3 + 1].
(21)
Since k2 ~ k3 , the second term in the above equation isalways positive, Le., TA2 has less execution time thanTAl. Therefore, TAl is not optimal, which is acontradiction. Q.E.D.
This theorem suggests to us that an optimal policy will
Tm(k) , k2 + 1, k3 - 1, ... ,kp)
C 2 2 2= kiR + "2 [N - k) - (k2 + 1)
- (k 3 - 1)2 ... - k~J
= k1R + ! [N 2- kj - k~ - · · · - k~JC. (20)
From (20) we are easily led to the following properties oftask assignments. (Note: All results obtained in the restof this section are based on (20), which assumes the validity of approximation A2. In order to avoid mundanerepetition we will not mention it explicitly each time westate a result or a theorem.)
Theorem 2: Under the constraint that a definite numberof tasks, say m(m ~ N), are to be assigned to a processor,and no other processor is to be assigned tasks greater thanm (i. e., m ~ NIP), the optimal task assignment takes thefollowing form:
1) Let I be the greatest integer such that mI ~ N. Thenassign m tasks to each of the I processors. These I processors will include the processor that is constrained tohave m tasks assigned to it.
2) The remaining (N - mI) tasks are assigned to the (I+ 1)th processor.
3) The remaining (P - I - 1) processors. are left idle.A consequence of the above policy is that there is at
most one processor which is assigned tasks less than mand greater than zero.
Proof: We will prove this result by contradiction. Letus fix kl to be m, which satisfies the constraints of thetheorem [by (17) l. Let TAl be an optimal task assignmentthat results in two processors getting assigned tasks lessthan m and greater than zero. Without loss of generality,let them be Processors 2 and 3 so that we have m > k2
> 0 and m > k3 > O. Also suppose that k2 ~ k-, Nowconsider a task assignment TA2 that assigns (k 2 + 1) tasksto Processor 2 and (k 3 - 1) tasks to Processor 3. All otherprocessors are assigned the same number of tasks as inTA 1. Obviously, TA2 satisfies all the constraints mentioned in the theorem. Using (20) we can write the meanexecution time of TA2 as follows:
(12)
(14)
E[TrJ = MAX {k 1R, k2R, · · · ,kpR}.
Without loss of generality, let us suppose that
kl = MAX {k., k2 , •• • , kp} (17)
E[Tel = ! [kl(N - k l ) + k2(N - k2)
+ · .. + kp(N - kp)J~. (18)
Using the fact that k, + k2 + ... + kp = N we can writethe above as:
which entails that N ~ k, ~ NIP.Similarly, the expression for E[TeJ can, by using the
central limit theorem, be simplified to
As before, the mean execution time can be written as:
[ lk, l lk, + k2 lE[TrJ = MAX E .I; r, ,E . I; ri,
1= 1 I=kl + 1
l N lJE I; ri'i=N-kp + 1 (15)
Using the central limit theorem this can be written as:
We will discuss the justification for this approximation atthe end of this section.
With this approximation and using the assumption thatthe variables r, and c, are independent and identically distributed (so that the central limit theorem is applicable totheir sum) we can simplify the expression for mean executiontime Tm as follows.
.From (10) above and using A2 we can write the meanrun time E[TrJ as:
In order to find total communication cost of this assignment, we can sum the costs of communication of everyprocessor except that every link is counted twice in theprocess of summing. Therefore, the communication costis given by:
E[Tel = ! [N 2- k~ - k~ - · · · - k~JC. (19)
Combining (13), (16), (17), and (19), we can write themean time for executing the task assignment, viz. Tm , as:
In order to remove the MAX operation from the expression for T, (10) we make another approximation at thispoint, which is a generalization of AI.
Approximation 2 (A2):
INDURKHY A et al.: OPTIMAL PARTITIONING OF DISTRIBUTED PROGRAMS 489
or
C 1 1-N----::; R2 M(M + 1) M(M + 1) ,
In order that Tm(M + 1) -s Tm(M) we must have:
N C 2 M N C 2 M - 1--R+-N --::;-R+-N--M+ 1 2 M+ 1 M 2 M'
Similarly, for Tm(M + 1) we have:
N C 2 MTm(M + 1) = M + 1 R + 2 N M + l' (23)
Although the result of this theorem may seem surprising at the first glance, it will not be so if we pause for amoment and analyze (20). The expression for Tm dependson P variables but we need study its behavior only withrespect to k, since k2 , k3 , • • • , kp are chosen accordingto Theorem 2. We note that (20) has only one critical pointwhich is a maximum and therefore the minimum value ofTm will occur at an extremal value of k l • The result of thetheorem follows directly from this observation.
Thus, Theorem 3 suggests that the optimal policychooses for k, either NIP, the minimum value, or N, themaximum value. In practice, the minimum value assignment may not be possible since N may not be divisible byP. In this case, the smallest integer value for k, will beINIPl . In order to find the point at which the optimal
policy changes behavior, we need to compare the assignment that assigns all N tasks to the first processor with theone that assigns to the first processor INIPl tasks and toall other processors according to Theorem 2. This willdiffer slightly from N/2. The exact deviation will dependon the values of Nand P. If we do not want to computethe exact breakpoint at which the optimal policy changesbehavior, we can either decide to distribute tasks amongall P processors if RIC ~ N/2 in which case we will eitherhave an optimal assignment (if P divides N) or an assignment very close to optimal.
C. Remarks on Approximations Al and A2
Earlier we introduced the approximations A I and A2 inorder to remove the MAX operation from the expressionfor the mean execution time of the task assignment. Inthis section we will discuss the justification of these approximations and derive upper bounds on the error thatmay be introduced due to them. Let us consider Al first.Let us denote E1=1 r, with R I, and E7= k + I r, with R2 , forconvenience. From the central limit theorem, R 1 and R2
are random variables which can be approximated by anormal distribution. Now given that the mean of R 1 islarger than the mean of R2 , what Al states is that if wechoose two random samples from R I and R2 , respectively,then the former is always greater than the latter. Apparently this is not always true, as depicted in Fig. 4. Thefigure shows two normal density functions that describeR 1 and R2 , respectively, and a normal density functionthat describes their difference. The shaded area in the density of the difference gives the probability of error in thatthe sample from R2 is larger than the sample from R 1 inthis region. One can immediately see from the figure thatas the separation between the mean values of R 1 and R2
increases, the shaded area becomes smaller, and thereforethe probability of error becomes smaller too. On the otherhand, if the means are very close together, then the magnitude of error will be very small. We derive these resultsanalytically in the remainder of this section.
By the central limit theorem, R 1 is a random variablewhose distribution approaches a normal distribution withmean kR and variance ka;; R2 is a random variable whosedistribution approaches a normal distribution with mean
(24)
N R- <-2 - C·
or
This shows that when N/2 ::; RIC then assigning tasksamong (M + 1) processors results in lower execution timethan assigning them among M processors. Applying thisargument inductively to all values of M less than P, wereach the conclusion that the minimal execution time isobtained by assigning the tasks equally among all P processors. Similarly, for N/2 ~ RIC distributing tasksamong fewer processors gives less execution time andtherefore the optimal policy will result in assigning all thetasks to one of the processors. Q.E.D.
select some M ::; P and assign tasks among all M processors as evenly as possible. One immediate question itraises is, how does one choose M? It may seem intuitivethat as the ratio of runtime to communication time increases M will increase gradually from I to P. Surprisingly enough, our next theorem shows that the optimalpolicy results in selecting an extremal value of M, viz. 1or P, depending on the runtimes and communicationtimes.
Theorem 3: Under the constraint that all the processorsthat are assigned nonzero tasks receive exactly the samenumber of tasks, the optimal task assignment takes thefollowing form:
1) If RIC ~ N/2 then distribute the N tasks among allthe P processors.
2) If RIC < N/2 then assign all tasks to one of theprocessors.
Proof: Let M < P and consider two task-assignments TAl and TA2. In TAl we assign all N tasks equallyamong M processors so that each of these M processorsgets N1M tasks. In TA2 we distribute the tasks evenlyamong (M + 1) processors so that each of them gets assigned NI(M + 1) tasks. Let us represent the mean execution time of these assignments by Tm(M) and Tm(M +1), respectively. Then by using (20) we get:
Tm(M) = :R + ~ lN2- M(:YJ C
= N R + ~ N2 M - 1 (22)M 2 M
490 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, YOLo SE-12, NO.3, MARCH 1986
P(X) which is (2k - N)lk, or (2 ~ N/k). Let us denote therelative error by E. Since as k decreases, E decreases too,in order to derive an upper bound on E we must first findan upper bound on k. Knowing that x < 1 we can derivean upper bound on k as follows:
o (N-k)R
(a)
kRor
x
(2k - N)R < .J(N) an
ptx)or
N .J(N) a,k<-+---
2 2 R·(28)
Fig. 4. (a) Two normal density functions. (b) The probability densityfunction for the difference of the functions in (a).
o (2k-N)R
(b)
x
This gives us the following bound on E:
E < (2 N~=--_ )
~ + -J~N) ~ '
or
Since 1/(1 + x) > 1 - x for x > 0, this last inequalitycan be rewritten as
IV. AN EXTENSION OF THE MODEL
In this section we extend the model described in SectionII to include the cases when different processors have different processing speeds and derive some results on thenature of the optimal task-assignment policy. The policyfor a two-processor system either assigns all the tasks tothe faster processor or divides them between the proces-
which goes to 0 as N becomes large. This shows that asN becomes large the bound on the error due to approximation A1 goes to o.
Similar observations can be applied to approximationA2. Let R, represent the summation term E~i= I rio Thenby the central limit theorem the distribution of R, approaches a normal distribution with mean k.R and variance kia~. Given that the mean of R] is the greatest, A2states that if we pick random samples from R}, . · · ,Rn
then the one picked from R} will always be the greatest.The remarks we made earlier apply to R} and any Ri, fori > 1, pairwise in that either the probability of the samplefrom R, being greater than the sample from R} is very smallor the means of R} and R, are so close together that therelative error due to A2 is very small.
(29)
(30)
(
2 _ 2 ).a
1 + r
.J(N)R
E <
Zo;E < ---
.J(N) R·
or
(27)
(N - k)R and variance (N - k) a;. Let us define a variableZ to be R} - R2 • If R} and R2 are normal, then Z also hasa normal distribution with mean (2k - N)R and varianceNa;, which we will denote by Ilz and az, respectively.Then the area under the shaded portion is given by:
Pr(R} < R2) = Pr(Z < 0)
1 1° (Z - Ilz)2)= exp - 2 d: ..J(21r)az -00 2az
(25)
Let u = (z - Ilz)laz' so that d; = az du, and x = Ilzlaz'then we get:
P,(R\ < R2) = ~ r-x exp (-u2
) duv(21r) J-00 2
~ roo exp (-u2
) duo (26)v(21r) Jx 2
This integral is bound by the inequality [4, p. 175]
-J(~1r) G-:3) exp (-t) < -J(~1r) tOO exp (-2U2
) du
< ~x exp (-t).
If the mean Ilz is more than az, i.e., x > 1, then the upperbound on the probability of error is no more than 1/-J(21r)exp (-1/2) and becomes smaller as x increases further.On the other hand if the mean Ilz is less than the standarddeviation az, i.e., x < 1, then we can compute the upperbound on the magnitude of error as follows:
The absolute value of error is Ilz = (2k - N)R. Therefore the relative error will be IlzI(R MAX {k, N - k}),
INDURKHY A et al.: OPTIMAL PARTITIONING OF DISTRIBUTED PROGRAMS 491
N otherwise.
k =
(33)
(32)
Proof: From (32) and using Al we can write it as:
N1 + p
A. Optimal Task Assignment for Two Processors
In this section we derive the optimal task-assignmentpolicy for two nonidentical processors. Let the mean runtime of the tasks on Processor 1 be R I and on Processor2 be R2 • We define p to be the ratio of R 1 to R2 , i.e., p =
R1IR2 • Without loss of generality, suppose that p ~ 1,i.e., Processor I is faster than Processor 2. We define Cto be the mean communication times between the tasks.
Consider a task assignment that assigns k tasks to thefirst processor and remaining (N - k) tasks to the secondprocessor. For convenience, suppose that the tasks are indexed in such a way that the first k tasks are assigned toProcessor I and that the keN - k) communication linksbetween the two processors are arbitrarily numbered fromI to keN - k). Let rij denote the runtime of jth task on ithprocessor, and c, denote the ith communication time. Thenthe total execution time of this assignment is given by:
r k N J keN - k)
t; = MAX U~=I rli, i=:E I r» + i~1 c.. (31)
Taking the mean of both sides we get:
r;». E(Te) = E [MAX r± r.; . f r2il !l II = 1 1 = k + 1 )
+ E lk(~lk) CiJ
where E denotes the mean, or expected value.Since according to our assumption, the variables rli,
r2i' and c, are independent and identically distributed wecan apply the central limit theorem to their sum. This, inconjunction with approximation A I to remove the MAXoperation, leads us to the following theorem:
Theorem 4: Under the assumption AI, the optimal taskassignment policy is given as follows:
[ lk J [N 1J [-keN - k) "I
t; = MAX E .~ r; ,E . 2:: r» + E ".2:: Ci.1=1 I=k+l 1=1
For simplicity, we will not be considering the effects ofdifferent processing speeds on the communication timesin our model. In practice there may be some synchronization overheads which may increase the communicationtimes. Also, the cost of communication between ith andjth module cij may depend on the processors to which thesemodules are assigned. We ignore this effect in our model.
sors in a ratio inversely proportional to their processingspeeds, i.e., the product of processing speed and the number of tasks assigned is kept the same for both processors.The choice depends on the ratio of running time on theprocessors to communication times as well as on the ratioof the processing speeds of both the processors. If thecommunication times are small compared to the runningtime, then the optimum policy tends to distribute the tasksamong both the processors, and for relatively large communication times it favors assigning all the tasks to thefaster processor. Also, if the total runtime is kept constant, then for a large difference in processing speeds ofthe two processors, the optimal policy assigns all tasks tothe faster processor and for a small difference it tends todistribute the tasks among both the processors. If the policy favors distribution of tasks at all, then it always distributes the tasks among the processors in a ratio inverselyproportional to their processing speeds.
For three or more processors the optimal policy selectssome M ~ P, where P is the total number of processorsavailable, and distributes the tasks among these M processors in such a way that the number of tasks assignedto a processor is inversely proportional to its processingspeed. In other words, the cost of execution is equallydistributed among all the M processors. We also derivean expression for the condition under which the optimalpolicy tends to distribute the tasks among M + 1 processors rather than M processors. In the special case whenthere is a smooth degradation of processing speed amongall P processors, this expression turns out to be independent of M a consequence of which is that the optimal policy favors one of the extremal assignments. Either it distributes. the tasks among all P processors or assigns themall to the fastest processor. The point at which the policychanges behavior depends on the ratio of runtime to communication time and the factor by which the processingspeed degrades. Large runtimes and a degradation factorcloser to unity favor the distribution of tasks among all Pprocessors whereas large communication times and largedegradation factors tend to result in assigning all the tasksto the fastest processor.
Before deriving the optimal task assignment policy fornonidentical processors, we will modify the random-graphmodel described in Section I as follows:
The running time of the tasks on a processor, say thejth processor, are independent, identically distributed,nonnegative random variables, whose probability densityP':i has finite variance a; and mean Rj •
This allows us to have different mean run times for different processors in the model. The algorithm for computing the execution time of an assignment remains thesame, except Step 4 is modified as follows:
4a) Compute the task-ru-n time from the formula:
T = M~X [f e.-r.Jr . lJ lJi= I J ": I
where rij is the time it takes to run the jth module on theith processor.
We invoke the central limit theorem here and replacethe sum of random variables with the sum of their means.
492 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-12, NO.3, MARCH 1986
This gives us:
Tm(k) = MAX {kRJ, (N - k)R2} + k(N - k) C
= R2 MAX {kp, (N - k)} + keN - k)C.
In order that F[Nll + p)] ~ F(N), we must have:
(34) IN 1 - p _ PR2 ! ~ IN _PR2 !1 + P C C '
Without loss of generality, suppose that kp ~ (N - k), orwhich entails that N ~ k ~ NI(l + p). Then the aboveequation reduces to:
Tm(k) = kpR2 + keN - k)C
= Clk(N - k) + kP~2J
= ~ l- (2k - N- P~2Y + (N + p~2)1(35)
This equation for Tm is a parabolic function of k and hasonly one critical point which is a maximum. Therefore,the minimum will occur when k takes one of the boundaryvalues. Also, from the above equation we can concludethat the value of k that minimizes Tm is the same as theone that maximizes 12k - N - pR21C I. Let us refer tothis latter expression as F(k). We will now derive the conditions under which k maximizes F(k). At k = NI(l + p)we have:
( PR2 _ N~) ~ (N _ PR2)C 1 +pC'
or
N :::; (1 + p) pR2
C
or
RIN ~ C (1 + p). (39)
Thus, for this case, the optimal policy assigns NI(l + p)tasks to the first processor if pR2/C ~ N ~ (RI/C) (1 + p).Otherwise, if(R I/C)(l + p) ~ N ~ (pR2/C)(1 + p)/(l - p),then the optimal policy assigns all N tasks to the first processor.
Case III:
For this case we have:
N < pR2- C·
(41)
(40)
1 ~ i ~ P
M
PM == L: v. "I).i=l
j=i+l
M
SM == I: v.i=l
B. Optimal Task Assignment for P Processors
In this section we extend our optimal task assignmentpolicy to include the cases when there are P processors,not necessarily identical, and N tasks. Let the runtimes ofdifferent tasks on the ith processor have the mean R: Thecommunication times between the tasks have the mean C.For brevity, we introduce the following notation in thissection:
For this case we have,
F(N) = IN - p . R21 ~ IN 1 - P _ P · R21C 1 + p C
-F(~)1 + P
since p ~ 1. Thus the optimal policy for this case alwaysresults in assigning all N tasks to the first processor.
Combining these three cases, we get the result stated inthe theorem. Q.E.D.
Comparing this result to that of Theorem 1, we noticethat as p becomes smaller the breakpoint at which the optimal policy changes behavior occurs at values higher thanN12.
(37)
(36)
I
PR21F(N) = 2N - N - C
F(~) = I~ -N _ PR21l+p l+p C
= IN(l - p) _ PR21.1 + p C
1 - P pR2N--~-~N.
1 + p C
IPR2! I 1 - p PR2
! (N)F(N) = N - - ~ N -- - - = F --C l+p C l+p
(38)
At k = N we have:
since p -s 1. From this we can conclude that if N ~ pR21C, then the optimal policy assigns NI(l + p) tasks to thefirst processor and the remaining tasks to the second processor.
Case II:
In order to derive the conditions under which F(N) or F[NI(1 + p)] is maximum, we will have to consider three casesdepending on the relative magnitudes of N, p, R2 , and C.
Case 1:
INDURKHYA et al.: OPTIMAL PARTITIONING OF DISTRIBUTED PROGRAMS 493
Using the central limit theorem this can be written as:
E[Tr] = MAX {kIR}, k2R2, • • • , kpRp}
Similarly, the total communication time for this assignment is given by (12), that is,
1 p
T; = - · ~ C, (43)2 j=1
where C, is as defined in (11).Using (42) and approximation A2, we can write the
mean runtime as:
Using (50) we can derive the following properties ofthe optimal task-assignment, keeping in mind that all theseresults assume the validity of approximation A2.
Theorem 5: Under the constraint that a definite numberof tasks, say meN ~ m), are to be assigned to the firstprocessor (k l = m), and processor i is to be assigned notmore than m1'i tasks, where i is the processor index (thiscondition puts a lower bound on kl which is m ~ NISp),the optimal task-assignment takes the following form:
1) Let I be the smallest integer such that niS, ~ N.Then for all i, I' :5 i < I, assign mv, tasks to the ithprocessor. Obviously, since 1"1 = 1, the first processorgets assigned m tasks.
2) The remaining (N - mS/_I) tasks are assigned to theIth processor.
3) All other (P - I) processors are left idle.A consequence of the above policy is that there is at
most one processor to which fewer than mv, tasks are assigned, except for processors that receive no tasks.
Proof: The proof of this theorem is very similar tothat of Theorem 2. Let TAl be an optimal task-assignment that results in two processors, say i and j, gettingassigned tasks greater than 0 and less than mv, and m1'j,respectively. Further, suppose that k, ~ k, and Ll = MIN{m1'i - kikj } . Now consider a task assignment TA2 thatassigns (k i + Ll) tasks to the ith processor and (kj - Ll)tasks to the jth processor. TA2 satisfies all the constraintsmentioned in' the theorem and has the mean executiontime:
Tm(kl, · ,. · ,ki + Ll, · · · , k, - Ll, · · · ,kp)
C 2 2= ksR, + "2 [N - k l • • • ,
-(ki + Ll)2 · · · , -(kj - Ll)2 • • ., -k~]
= Tm(kb • · • , k., · · · , kj , • • • , kp )
Combining (13), (45), (46), and (49), we can write the meantime for executing the task assignment, Tm (k l , • • • , kp ) ,
as:
C 2 2 2 2= klR1 + - [N - k l - k2 - • • • - kp]. (50)2
(44)
(47)
(46)
(45)
for all i, 1 :5 i s: P.
[ lk, l l k t +k2 lE[Tr] = MAX E .~ r, ,E . ~ ri'
1=1 l=kt+1
l N lJE ~ ri.i=N-kp+ 1
that is
kl-~--------
1'1
kl k,-~-
1'1 v,Then it is true that
or
= R1 MAX [kt, k2
, • • • , kpJ.1'1 1'2 1'P
Without loss of generality, let us suppose that
~ = MAX [k t , k2, • • • , kpJ
1'1 1'1 1'2 1'P
Without loss of generality, suppose that the processorsare ordered in the order of nonincreasing processingspeeds, or that 1'1 ~ 1'2 ~ • • • ~ 1'P.
Now consider an assignment that assigns k, tasks to theith processor. For simplicity assume that the first kl tasksare assigned to the 1st processor and so on. Then the runtime of this assignment is given by (10), which is
[
k; kt +k2 N Jt; = MAX ~ r., ~ ri,···, ~ ri. (42)
i= 1 i v k: + 1 i=N-kp+ 1
(51)
Since k, ~ kj and Ll > 0, the second term in the aboveequation is always positive, and TA2 has less executiontime than TAl. Therefore TAl is not optimal, which is acontradiction.
This theorem suggests that the optimal policy assignstasks among processors in a ratio inversely proportionalto their processing speeds. In other words, the distributionof tasks is done in such a way that the total cost of execution is distributed among the processors as evenly aspossible.
Now we derive an expression that specifies the condition under which the optimal policy distributes the tasks
(49)
(48)
CE[Tcl = "2 [kt (N- kt ) + k2(N - k2) + · · ·
+ kp(N - kp)]
C= - [N 2
- ki - k~ - . · · - k~].2
Nk >
1 - Sp
since 1'1 1. This gives us the bound N ~ k, ~ NISp.Similarly, by using the central limit theorem, the
expression for E[Te] can be simplified to
494 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, YOLo SE-I2, NO, 3, MARCH )986
(54)
among M + 1 processors rather then M, for some M <P. The distribution among the selected processors is doneaccording to Theorem 5 which results in the number oftasks assigned to each processor being inversely proportional to its processing speed. Let TA 1 be a task assignment which distributes the tasks among M processors inthe manner just explained. Note that k, == k, "Ii, for all 1:5 i :5 M. Since k, + k2 + · · . + kM == N, we have k,== NISM • Using (50), we can write the mean executiontime of this assignment Tm (M) as:
Tm(M) = k1R] + ~[N2 - ki(-yi + -y~ + · · · + -ylt)]2
N C l "17 + "I~ + .. · + "I~-_I N 2== -R1 + - 1 - ---------SM 2 S~
N R + C rPsM~1 N 2•== SM 1 l J (52)
Now consider a task assignment TA2 that distributesthe tasks among M + 1 processors. The mean executiontime for this assignment can be derived in a similar fashion to be:
Tm(M + 1) =~ R] + C IP~+ Il N 2• (53)
SM+] lSM+IJFrom (52) and (53) we can derive the condition underwhich distributing the tasks among M + 1 processors results in a reduced execution time.
Tm(M + 1) < Tm(M) iff
R 1 l "1M + JSMSM+ I -JN<- 2 2·C SMPM+ I - SM+JPM-
Note the complexity of this expression compared to thecorresponding expression in Section III when all the processors are identical. For some special cases we can simplify this further. For instance, suppose that there is auniform degradation of processing speed from the 1st processor to the Pth processor. In other words, v, == Pi-I forall 1 :5 i :5 P, where p is the inverse of the degradationfactor.? For this case, by using (54), we get the followingresult for optimal task-assignment:
Theorem 6: Under the assumption that there is a uniform degradation of processing speed among the processors, that is R, _ IIR, == 1Ip for all 1 < i ~ P, the optimaltask-assignment results in distributing the tasks among allP processors if N :5 (R J I C) (1 + p) and assigning all Ntasks to the first processor otherwise.
Proof: Under the assumption mentioned in the theorem, we have "Ii == pi - J, and therefore we can write SMand PM as:
1 - pMSM == --
1 - P(1 - pM)(1 _ pM- J)
PM = P (l _ p)\l + p). (55)
2Thus a small value of p means that the processing speed degrades rapidly and hence a large degradation factor and vice versa.
Substituting this in (54) and simplifying we obtain:
RJ
T,n(M + 1) :5 T,n(M) iff N:5 C (1 + p). (56)
This shows that when N :5 (R JI C) (1 + p), then assigning tasks among M + 1 processors results in lowerexecution time than assigning them among M processors.Using this argument inductively to all values of M lessthan P, we arrive at the conclusion that the minimal execution time is obtained 'by assigning tasks among all Pprocessors. In a similar fashion when N > (R]/C) (1 +p), distributing the tasks among one fewer processors results in reduced execution time and therefore the optimalpolicy will assign all the tasks to the 1st processor.
Q.E.D.
V. CONCLUSIONS AND FURTHER RESEARCH
This paper has investigated a model of distributed computer system based on random graph models of programs.The crucial assumption that communications overheadadds to total execution time leads to a very interestingresult for systems with homogeneous processors. The optimal task assignments are extremal in the sense that tasksare totally distributed as evenly as possible or not distributed at all. When the processors are not homogeneous,the optimal policy displays extremal characterstics, but isnot extremal in the same sense. Whether the processorsare homogenous or heterogenous, maximum throughputoccurs when processing time is distributed as equally aspossible over the participating processors. The optimalpolicy determines how many and which processors participate. The surprising result for the homogeneous case isthat the optimal policy uses all processors or one processor depending on the ratio of running time to communication time. The optimal policy never selects a number ofprocessors strictly greater than one and strictly less thanthe maximum.
Given the results of this paper, how can they be interpreted for real systems? Real systems do not behave likethe random graph model, nor are the communication costsnecessarily additive as we have assumed here. We conjecture that real systems have locality of reference whendistributed across processors just as real programs exhibitlocality of reference when distributed across processorsjust as real programs exhibit locality of reference in uniprocessors with virtual memory. The actual behavior ofreal systems does not have a uniform intermodule reference as we have assumed, but it probably breaks into"super modules," which are clusters of modules. Between super modules references occur more rarely thanreferences between modules within one cluster. We expect that super modules should be assigned to one machine and not broken across machines. In this case, thestatistical model might apply to references between supermodules, and the probability of such references wouldtend to be low. Hence we expect that distribution acrossprocessors is practical since the costs incurred by communicating across processors may be tolerable, providedthat the assignments are made on super module bound-
INDURKHYA et al.: OPTIMAL PARTITIONING OF DISTRIBUTED PROGRAMS 495
aries. If the statistical model is to be useful, it is mostlikely to be used for the assignment of super modules.
Our second assumption of significance is that communication costs are additive. In reality, to a certain extentcommunications time can be overlapped with processingtime, thereby reducing the apparent cost of communication. Our model correctly treats the unoverlapped portionof communication time. We use the parameter C in thepaper to relate communication time to processing time.The effect of overlapping communication with processingis to reduce the value of C. If the reduction is proportionalto amount of information transmitted, then our modelhandles this effect simply by scaling C appropriately. Ifthe reduction is a more complex function of system parameters, then the effect may have to be modeled mathematically to obtain meaningful results.
The open research questions at this time are mainlyconcerned with validating the model studied here, or atimproving it if it cannot be validated. The present modelseems to say that there has to be a positive gain from distribution, and if the gain is there, then distribute maximally. If the gain is not there then do not distribute at all.
To some extent distribution will be forced on systemsbecause particular data or programs will be held at specific sites. It is not clear how the optimal policy generalizes when such constraints are present.
In closing, we note that performance prediction on distributed programs is still a difficult challenge. The programmer who writes a uniprocessor program has the benefit of an accurate model of machine execution and threedecades of work in analysis of algorithms for that machinemodel. The state of the art for distributed programs isquite primitive by comparison. The programmer whowrites a distributed program has many choices for overallprogram strategy, and for each possible strategy there exist many ways to partition the program into tasks and assign tasks to processors. The underlying model makesperformance prediction difficult, and with so many possible policies that do not exist on uniprocessors, the selection of the best policy is extremely difficult. We hopethat the results of this paper shed some light on how towrite efficient distributed programs. The advice appearsto be:
1) create modules that have as little intercommunication as possible,
2) compare the estimated running time of the fully distributed assignment against a totally local computation,and finally,
3) select the best of the two choices.The results of this paper suggest that the two choices
considered in step 2 are the principal choices, and otherchoices can be ignored.
REFERENCES
[1] M. Benard and H. S. Stone, "A probabilistic analysis of the minimumcuts of random graphs," J. ACM, submitted for publication, 1983.
[2] S. H. Bokhari, "Dual processor scheduling with dynamic reassignment," IEEE Trans. Software Eng., vol. SE-5, pp. 341-349, July1979.
[3] W. W. Chu, L. 1. Holloway, M. Lan, and K. Efe, "Task allocationin distributed data processing," Computer, vol. 13, no. 11, pp. 5769, Nov. 1980.
[4] W. Feller, Introduction to Probability Theory and its Applications,3rd ed. New York: Wiley-Interscience, 1968.
[5] D. Gusfield, "Parametric combinatorial computing and a problem ofprogram module distribution," J. ACM, vol, 30, no. 3, pp. 551-563,July 1983.
[6] S. P. Kartashev and S. I. Kartashev, "Distribution of programs for asystem with dynamic architecture," IEEE Trans. Comput., vol. C31, pp. 488-514, June 1982.
[7] P. R. Ma, E. Y. S. Lee, and M. Tsuchiya, "A task allocation modelfor distributed computing systems," IEEE Trans. Comput., vol. C31, pp. 41-47, Jan. 1982.
[8] R. M. Metcalfe and D. R. Boggs, "Ethernet: Distributed packetswitching for local computer networks," Commun. ACM, vol. 19,no. 7, pp. 395-404, July 1976.
[9] C. C. Price and U. W. Pooch, "Search techniques for a nonlinearmultiprocessor scheduling problem, Naval Res. Logist. Quart., vol.29, no. 2, pp. 213-233, June 1982.
[10] G. S. Rao, H. S. Stone, and T. C. Hu, "Assignment of tasks in adistributed processor system with limited memory," IEEE Trans.Comput., vol. C-28, pp. 291-299, Apr. 1979.
[11] H. S. Stone, "Multiprocessor scheduling with the aid of network flowalgorithms," IEEE Trans. Software Eng., vol. SE-3, pp. 85-93, Jan.1977.
[12] -, "Critical load factors in two-processor distributed system,"IEEE Trans. Software Eng., vol. SE-4, pp. 254-258, May 1978.
Bipin Indurkhya received the B.E. degree in electronics engineering fromBhopal University, Bhopal, India, in 1979, the M.E.E. degree in electronics engineering from Philips International Institute, Eindhoven, The Netherlands, in 1981, and the Ph.D. degree in computer science from the University of Massachusetts at Amherst in 1985.
He is currently an Assistant Professor with the Department of ComputerScience of Boston University, Boston, MA. His research interests includemetaphors and analogies, theoretical artificial intelligence, and parallel architectures.
Harold S. Stone (S'61-M'63-SM'81) receivedthe Ph. D. degree in electrical engineering fromthe University of California at Berkeley in 1963.
He is the Manager of Advanced ArchitectureStudies at IBM Thomas J. Watson Research Center in Yorktown Heights, NY. He has formerlybeen a faculty member at he University of Massachusetts, Amherst, and Stanford University, andhas held visiting faculty appointments at institutions throughout the world .. He is the author,coauthor, or editor of six textbooks, and has pro
duced over sixty technical publications. The series he has produced as aconsulting editor to Addison-Wesley, McGraw-Hill, and University Microfilms contain more than seventy titles in all areas of computer scienceand engineering. His research contributions have been primarily in computer architecture and digital systems design.
Dr. Stone has been active in both the IEEE and the Association forComputing Machinery, and has served as Technical Editor of Computermagazine and Governing Board Member of the IEEE Computer Society.
Lu Xi-Cheng was born in 1946 in Shanghai,China, and graduated from Harbin Institute ofTechnology.
He is currently an Instructor at Changsha Institute, China, and is in the doctoral program inthe Department of Electrical Engineering. From1982 to 1984 he was a visiting scholar at the University of Massachusetts, Amherst, where he conducted research in distributed systems and parallelcomputing. He has been active in the design ofhigh-performance input/output systems and is
currently engaged in research in networks and distributed processing.