a heuristic approach to the multitask-multiprocessor assignment problem using the empty-slots method...

A Heuristic Approach to theMultitask-Multiprocessor Assignment Problemusing the Empty-Slots Method and Rate MonotonicScheduling

J. SANTOS iesantos@criba.edu.arDep. Ing. Electrica, Instituto de Ciencias e Ingenierıa de Computacion, Universidad Nacional del Sur, BahıaBlanca, Argentina

E. FERRODep. Ing. Electrica, Instituto de Ciencias e Ingenierıa de Computacion, Universidad Nacional del Sur, BahıaBlanca, Argentina

J. OROZCODep. Ing. Electrica, Instituto de Ciencias e Ingenierıa de Computacion, Universidad Nacional del Sur, BahıaBlanca, Argentina

R. CAYSSIALSDep. Ing. Electrica, Instituto de Ciencias e Ingenierıa de Computacion, Universidad Nacional del Sur, BahıaBlanca, Argentina

Abstract. A heuristic approach to the problem of assigning a set of preemptible resource-sharing and blockablereal-time tasks to be executed in a set of heterogeneous processors communicated through an interprocessornetwork, is presented. The problem is NP-hard. The empty-slots method is used to test the RM schedulabilityin each processor. There are placement, time, memory, communication and precedence constraints. A generalexpression for the modification of hard-precedence deadlines of related tasks executing in the same or in differentprocessors is given. The effects of the Average Processor Utilization Factor and the Network Bandwidth on thenumber of solutions found are shown and discussed through systematic sets of examples. Also, Success Ratiosare obtained and plottedvs. Average Processor Utilization Factors for different Network Bandwidths. Resultsobtained are compared to those obtained by other methods.

Keywords: real-time, distributed systems, multitask-multiprocessor, RM-scheduling, heuristic

1. Introduction

Real-time scheduling theory is one of the areas in which efforts must be concentrated inorder to develop a science of large scale real-time systems (Stankovic 1988).

Conventional real-time systems are mainly concerned with tasks that do not change theirfundamental timing parameters during execution (static tasks). They may be sporadic (e.g.alarms, aperiodic but with a definite deadline) or periodic (with definite period and deadline).On top of the real-time load, non real-time aperiodic tasks may be also present.

The deadline and maximum repetition rate of each sporadic task as well as the period,deadline and execution time of each periodic task must be known in advance in order to

168 SANTOS ET AL.

make possible the pre-run scheduling computations guaranteeing that no deadline is missed.This is particularly important in the case of hard real-time systems in which missing onedeadline may have catastrophic effects including the loss of human lives. Examples of suchsystems are airplane avionics, smart robots, space vehicles and many industrial applications(Cheng 1993).

A system is said to beschedulableif it meets all the time constraints. If they are knownin advance, the system schedulability may be tested beforehand with the added advantageof reducing the use of precious resources required for scheduling and context switchingduring actual run-time (Xu 1993).

The problem of scheduling a set of independent preemptible real-time tasks in a singleprocessor has been treated in Liu (1973), Leung and Whitehead (1983), Joseph (1986),Lehozcky (1989) and Katcher (1993) for the case of unrestricted granularity and by Santos(1993) for the case of restricted granularity. The upper bound to the computational com-plexity of the problem using the empty-slots method isO(m∗Tmax), wherem denotes thenumber of tasks andTmax the maximum task period in the system (Santos 1993).

On the other hand, the problem of scheduling multiple tasks in a multiprocessor sys-tem is NP-hard. Early results were presented in Muntz (1970) and Ma (1982). Morerecently, deadline-driven heuristic methods have been described in Zhao (1987), Ramam-ritham (1990), and Xu (1993). In Tindell (1992) and Borriello (1994) the problem isattacked using the global optimization technique known assimulated annealing.

The philosophy of the heuristic method here presented can be expressed as an activelyguided search towards optimal or near-optimal solutions tackling the tougher problems first.Tasks that must be executed in one and only one processor are allocated first, allowing earlydetection of absolutely non-schedulable systems. Following that, communicating tasks goto the same processor whenever possible. This has a double advantage: hard-precedenceconstraints are more easily met and the communications network is less loaded. Finally, theallocation of the remaining tasks is intended, trying to allocate tasks with higher utilizationfactors to less loaded processors.

The use of the empty-slots method makes possible the test of network and processorsschedulabilities with preemptible tasks (Santos 1993). As a result, the deadline drivenproposed heuristic deals with the assignment of real-time preemptible tasks on a set ofheterogeneous processors with allocation, precedence, time, communication and resourceconstraints, a generality not shared by any of the previously proposed methods. It mustbe also noted that Rate Monotonic scheduling, ade factostandard, is used throughout thepaper.

In Section 2, the empty-slots method is summarized. In Section 3 the assignment problemis defined. Two types of precedence constraints present in the current literature are analyzedand a general formula for the modification of deadlines among precedence related tasks isderived. In Section 4 the heuristic algorithm is described. In Section 5, the algorithm’sperformance is analyzed on the basis of results obtained with simulations. In Section 6, thealgorithm is compared to other methods. Finally, in Section 7, conclusions are drawn. Apseudocode for the method is presented in the Appendix.

MULTITASK-MULTIPROCESSOR ASSIGNMENT PROBLEM 169

2. The Empty-Slots Method

The problem of finding response times in real-time systems has been studied in Joseph(1986) and Lehozcky (1989). The empty-slots method was presented in Santos (1991) tosolve the scheduling problem in real-time LANs. The method was later extended to thescheduling of preemptible multitasks in a single-processor system with restricted granularity(Santos 1993). In fact, both problems are shown to be isomorphic and particular cases of amore general problem that can be defined in terms of a multiuser population competing forthe use of a single resource.

Time is considered to be slotted and the duration of one slot is taken as unit of time.Slots are notatedt and numbered 1, 2, . . . . The expressionsinstant t and at the begin-ning of slot t are equivalent. A set ofm real-time periodic tasksτ1, τ2, . . . , τm, withdeadlines assumed to be less or equal to their periods, is completely specified asS(m) ={(C1, T1, D1), (C2, T2, D2), . . . , (Cm, Tm, Dm)}, whereCi , Ti andDi denote the executiontime, period and deadline of taskτi . Ci , Ti andDi are mutually commensurable and tasksare assumed to be generated at the beginning of slots. A common assumption (Liu 1973;Joseph 1986; Lehozcky 1989; Santos 1991), in line with real applications, is thatTi = Di .In this paper, however, because of precedence constraints, deadlines must usually be short-ened. The modified deadlines are, of course, harder to meet and are the ones used in theschedulability calculations.

The three main deterministic priority disciplines used in hard real-time systems are FairRound-Robin, Rate Monotonic and Earlier Deadline. The system is Fair Round-Robinschedulable iff

m∑i=1

Ci ≤ Dmin (1)

whereDmin denotes the minimum deadline of all the tasks.The Fair Round-Robin discipline is generally not difficult to implement. Sometimes,

however, the system is not FRR schedulable and it demands some tighter discipline. In Liu(1973) it is formally proved that Rate Monotonic is the best of the Fixed Priority type. Itis ade factostandard supported by the DoD and adopted by Boeing, General Dynamics,General Electric, Honeywell, IBM, Mc Donnell Douglas, Magnavox, Mitre, NASA, NavalAir Warfare Center, Paramax, etc. (Obenza 1993). If deadlines are shorter than periods, thepriority-stack must be Deadline Monotonic.

The Rate Monotonic schedulability requires a more complex analysis. The function:

Wm(t) =m∑

⌉gives the cumulative demands on the processor made by them tasks in the closed interval[1, t ] after simultaneous release of all of them att = 1. This has been proved to be theworst case of load (Liu 1973). If

Wm(M) < M

170 SANTOS ET AL.

whereM is the least common multiple of all the periods, the system is said to be non-saturated. In that case, there will beM −Wm(M) empty slots in [1,M ]. In Santos (1991)it has been formally proved that thej th empty slot in a system ofm tasks, notatedej (m), is

ej (m) = leastt | t = j +Wm(t) (2)

A systemS(m) is RM schedulable iff fori = 2, 3, . . . ,m

i−1∑h=1

Th< 1 (3a)

Di ≥ eCi (i−1) = leastt | t = Ci +W(i 6=1)(t) (3b)

hold. Simply put,S(i ) is schedulable ifS(i − 1) is non saturated and the deadline of taskτi , Di , is larger than or equal to theCi -th empty slot inS(i − 1).

When sharing data and/or resources, tasks may suffer priority inversions and be blockedin critical sections by lower priority tasks. It has been formally proved (Sha 1990) that ifthe ceiling protocol is used, a task may be blocked only in one critical section. The worstcase of blocking can be incorporated to the expression of thej th empty slot by addingKi ,the longest among all the blockings that can affect the task (Ferro 1993). The expression(3b) becomes

Di ≥ e(Ci+Ki )(i−1) = leastt | t = Ki + Ci +W(i−1)(t)

and it is the one used when iterating (3b).

3. The Multitasks Multiprocessor Assignment Problem

The problems treated above are of themultiuser/single resourcetype. The problem to betreated now, allocating a set of real-time tasks to a set of processors in such a way thatall constraints are met, is of themultiuser/multiresourcetype. It may be defined in thefollowing terms:

There is a set ofm preemptible hard real-time tasks to be executed in a set ofn processors,each one with a definite amount of memory. The tasks are periodic and each one has adefinite period, deadline, execution-time and memory requirement. The system is hard real-time and therefore no deadline may be missed. Each task will be completely executed inone processor. It is said then that the task isallocatedto that processor. A set of allocationsdistributing all the tasks on all or part of the processors is called anassignment. If there areno allocation constraints, the number of possible different assignments isnm, which makesit NP-hard. For example, 43 tasks on 8 processors, the numbers used in the example ofTindell (1992), produce different assignments in the order of 1038.

The tasks have different constraints:

3.1. Placement Constraints

Because they need special resources (e.g. coprocessors) some tasks must be executed in acertain definite processor and therefore there is a unique feasible allocation. These taskswill be said to bepreallocated.

Because of fault tolerance, the processing of some tasks must be duplicated. Obviously,the pair (original and replica) must be executed in different processors.

3.2. Resource Constraints

In order to be executed, each task requires a known amount of resources (e.g. memory,input/output, etc.). After allocating several tasks to a certain processor, some of the resourcesmay be insufficient to receive a given task, which must then be allocated to another processor.If Rh denotes the resource requirement of taskτh and Rp the resource capacity of theprocessor to whichj tasks have been allocated.

j∑h=1

Rh ≤ Rp (4)

must hold for each resource. In the examples given to illustrate and test the method, memoryconstraints are considered.

3.3. Communication Constraints

The tasks also have communications requirements: In certain cases, data must flow fromone task to another. If the tasks are allocated to different processors, an interprocessorcommunication network, generally a LAN, is used. If they are allocated to the sameprocessor, the communication is made through some internal mechanism and does not loadthe network.

Although the interprocessor traffic is always the most costly and least reliable factor(Katcher 1993) it is not always given proper consideration.

The communication network itself operates in real-time and it has therefore time con-straints. When a message is generated at a node, a certain time elapses until the node gainsaccess to the transmission medium. The message must then be transmitted and propagated.

The message consist of overhead bytes (preamble, delimiters, addresses, control, error-detecting, etc.) and payload bytes. Because of access, propagation and overhead delays, thepayload bandwidth is only a fraction of the network bandwidth as specified in the physicallevel of the protocol. For instance, a LAN with 1 Mb/s (125 bytes/ms) bandwidth may haveonly a 0.8 Mb/s (100 bytes/ms) payload bandwidth.

172 SANTOS ET AL.

Any protocol with a ring topology, e.g. 802.5 or FDDI with the same priority for allmessages, naturally implements a Fair Round Robin discipline (Santos 1991). The networkis FRR schedulable if

1 =j∑

P≤ Dmin (5)

whereBi , P andDmin denote the number of payload bytes transmitted byτi , the payloadbandwidth and the minimum deadline of thej tasks using the network to transmit datarespectively.1 is the communication delay.

With certain restrictions, 802.5 is also able to implement the Rate Monotonic discipline(Santos 1993).

The bus utilization factoris defined as the ratio between the number of payload bytesactually transmitted and the maximum number of bytes that it is possible to transmit ac-cording to the payload bandwidth. Obviously, for a given assignment, the bus utilizationfactor decreases for faster networks.

Thebus relative loadis defined as the ratio between the number of bytes really transmittedthrough the network and the total number of bytes interchanged by communicating tasks. Alow relative load indicates a good clustering of communicating tasks in the same processor.

3.4. Time Constraints

Tasks allocated to the same processor must execute meeting their time constraints. The pro-cessor RM schedulability is tested using the empty-slots method, eventually with deadlinescorrected to allow for precedence constraints.

3.5. Precedence Constraints

In Chetto (1990), a technique for modifying deadlines to schedule tasks with precedenceconstraints in a dynamic monoprocessor environment using the Earlier Deadline algorithmhas been presented. A general approach to the multiprocessor case, more complex becauseit involves interprocessor communication delays, is treated here. Because it is ade factostandard, Rate Monotonic scheduling is used.

When, in order to be executed, a taskτj needs data produced by other taskτi , a precedencerelation, notatedτi Â τj , is established and determines a partial ordering of the tasks. Ifτi Â τj and there is no taskτl such thatτi Â τl Â τj , τi andτj will be calledpredecessorandsuccessortasks, respectively. Both usually have the same period and deadline. If theyare executed in the same processor, in order to take care of the precedence constraint itis sufficient to assign a higher priority to the predecessor. If by applying the empty-slotsmethod it is found that the processor is schedulable, data will be automatically available tothe successor in time.

If, on the contrary, predecessor and successor execute in different processors, the successortask has to wait for data coming from the predecessor via the communication network.

Figure 1. P1, P2 processors 1 and 2. Predecessor-taskτp. Successor-taskτs. Both have the same period anddeadline, but theτp deadline is advanced to allow for the transmission. With soft-precedence constraints theexecution ofτs is initially deferred one period. From then on both tasks are executed periodically.

Although not explicitly differentiated, the current literature deals with two types of prece-dence constraints that will be calledsoft- andhard- precedence constraints. It must be notedthat since the empty slots method is used, the unit of time in what follows is the slot-time.

3.5.1. Soft-Precedence Constraints

This is the type of constraint used in the example of Tindell (1992). In this case, whenexecuting in different processors, it is tacitly assumed that at the starting of the system,the initiation of the successor may be deferred one period. With this initial phasing, theworst case occurs when the last bit of the message is received at the end of the predecessorperiod and the successor is the first task to be executed in its processor in the next period(Fig. 1). From then on, both tasks are executed periodically independently meeting theirown deadlines. The original predecessor deadline,Dp, must be modified to take into accountthe communication delay as indicated in (5):

D∗p = Dp −1 (6)

whereD∗p and1 denote the predecessor modified deadline and the communication delay,respectively. Obviously, in this case predecessor and successor can be viewed as indepen-dent tasks, loosely coupled only by the fact that one of them requires data produced by theother but can not be considered as parts of a job with the same total deadline.

3.5.2. Hard-Precedence Constraints

If communicating tasks are considered to be part of a job with a period equal to the tasks’period and the initial phasing of tasks is not allowed, the worst case of load, a simultaneousrelease of all tasks, must be analyzed. In order to be schedulable, all tasks forming part ofthe job must be executed within the period. A new type of constraint, thehard-precedence-constraint, arises. Letτh andτi be the predecessor and successor tasks respectively.

174 SANTOS ET AL.

When they are executed in different processors, the successor may not start executingwhen its time comes because it has to wait for data coming from the predecessor via thecommunication network. In that case, the starting ofτi ’s execution is event-driven bythe arrival of the data provided byτh. While τi waits, slots do not go empty because inthe meantime tasks with less priority thanτi execute. When data arrives in the processor,preemption takes place andτi starts executing. This mechanism guarantees that if expression(3a) and (3b) and the conditions found in the following theorem and its corollary hold, timeconstraints are met in the processor.

In the following theorem, the latest possible starting time for the execution ofτi isdetermined. As usual,S(i − 1),W(i−1)(t) andCi denote the set of the(i − 1) tasks alreadyallocated to the processor whereτi is going to be incorporated, the processor demand madeby them in the interval [1, t ] and τi ’s execution time, respectively.NE(i−1) denotes thenumber of slots left empty byS(i − 1) in the interval [1, Di ].

THEOREM The latest possible starting time for executingτi is

Lt = leastt | t = NE(i−1) − Ci + 1+W(i−1)(t) (7)

Proof: In order to meet its deadline,τi must start its execution at an instant such thatCi slotsare left empty byS(i −1) beforeDi . This instant is theCi -th empty-slot counted backwardsfrom Di , which in turn is the(NE(i−1) − Ci + 1+ W(i−1)(t))-th empty slot inS(i − 1)counted fromt = 1 on. According to (2), it is the leastt | t = NE(i−1)−Ci +1+W(i−1)(t).τh, therefore, must finish its execution at most at instantLi − 1.

Example. S(3) = {1, 4, 4; 2, 8, 8; 2, 10, 10}. NE(2) in [1, 10] is 4. The latest possiblestarting time for executingτ3 is 7 which is the leastt | t = NE(2) − C3 + 1+ W2(t) =4−2+1+4= 7. In Fig. 2, the example is illustrated. Shouldτ3 have a predecessor in thesame processor, the predecessor’s execution should finish at most att = 6. If τh andτi areexecuted in different processors and1 is the communication delay,Dh must be modifiedin order to allow for the data transmission andτi execution to be completed beforeDi . TheshortenedDh, notatedD∗h, is determined in the following corollary to the previous theorem.

COROLLARY If τh andτi reside in different processors

D∗h = (Li − 1)−1 (8)

must hold.

Proof: It follows from the previous theorem and expression (6).

When predecessor and successor tasks execute in the same processor, it suffices to givehigher priority to the predecessor to make sure that data needed by its successor will be

Figure 2. τ1 execute int = 1, 5 and 9. τ2 executes int = 2, 3, 10 and 11.NE(2) = 4 (t = 4, 6, 7 and 8).W2(7) = 4 (t = 1, 2, 3 and 5). By application of (7), the latest starting time forτ3 is t = 7. It will be executed int = 7 and 8, meeting its deadline.

Figure 3. a) τh andτi predecessor an successor tasks.Ch, Ci , execution time.Bh, length of the message.T ,period of both tasks. b) A job of four chained tasks. c) One predecessor with two successors.

available in time. The RM scheduling automatically takes care that

D∗h = Li − 1 (9)

holds.A graphG can be associated to the computation (Ramamritham 1990). Each node of the

graph represents a task and there is a directed arc from the node representingτh to the noderepresentingτi iff they are predecessor and successor respectively (Fig. 3a). The executiontime of a task is represented by a weight associated to the corresponding node. The lengthof the message to be transmitted is associated to the arc linking predecessor and successor.The graph is therefore acyclic, weighted and directed. The period, common to both tasks,is also shown.

A set of linked tasks constitute a job, sometimes called, as in Ramamritham (1990),subtasks and task respectively. In its simplest form, the graph of the job will be a chain inwhich only one successor follows one predecessor, as in the example of Fig. 3b. For thesake of clarity, executions times, length of messages and periods are not shown.

Sometimes, however, in order to have fault-tolerance in critical tasks, the execution mustbe duplicated in another processor. In that case, one predecessor may have two successors,as shown in Fig. 3c. If the three tasks are executed in different processors

D∗h = ∧(Li − 1−1, L j − 1−1) = ∧(Li , L j )− 1−1 (10)

where∧ denotes theinfimumoperator, must hold.

176 SANTOS ET AL.

Figure 4. A Job of seven tasks. Onlyτ1, τ3 andτ4 execute in the same processor.

If, on the contrary, both successors are executed in the same processor although thepredecessor is in a different one

D∗h = Li − 1−1 andD∗i = L j − 1 (11)

must hold. It is assumed thatτi has a higher priority thanτj in the shared processorA general expression covering (8) to (11) is given in (12)

D∗p = ∧r :τpÂτs

{Ls − 1− kr1} (12)

whereD∗p denotes the modified deadline of a predecessor as the infimum of the set of thelatest starting times of its successors(Ls)minus 1 minuskr1. kr takes value 1 if predecessorand successor are executed in different processors and value 0 if they are executed in thesame processor.

The graph may turn to be rather complicated but in any case, the above expressions aresystematically applied from the last successor backwards in order to find the shorteneddeadlines to be used in the empty-slot method to test the schedulability of each processorby application of (3a) and (3b).

Example. In Fig. 4, a job of seven tasks is represented. The subset(τ1, τ3, τ4) is executedin the same processor. Modified deadlines are:D∗5 = L6 − 1− 1; D∗4 = L6 − 1− 1;D∗2 = L3− 1−1; D∗1 = ∧(L3, L4, L5−1)− 1; D∗0 = ∧(L1, L2)− 1−1.

The precedencesτ1 Â τ3 andτ1 Â τ4 are taken care simply by giving appropriate higherpriorities. In the worst case, messages from bothτ4 andτ5 to τ6 shall be transmitted at mostin 1.

4. The Heuristic Approach

The system will be said to beabsolutely non-assignableif none of thenm possible assign-ments meets all the placement, time, memory, communication and precedence constraints.The set of tests to determine if the first four constraints are met, shall be called thealloca-bility test. If all tasks pass the allocability test, then the final precedence test is performedon the tentative assignment.

The assignment problem, in essence to determine which tasks go to which processor, isNP-hard and therefore heuristic methods constitute one of the few valid approaches to theproblem.

The rationale behind the method here proposed is:1) Assign preallocated tasks first. In this way, it is made sure that they are not jeopardized

by non preallocated tasks incorporated previously to the same processor. An early detectionof an absolute non-schedulability of the system is also achieved.

2) Assign to the same processor tasks that communicate with each other, putting longercommunications first. In this way, precedence constraints are more easily met, the load inthe communications network is lightened and therefore there are better chances to meet thereal-time communications constraints.

3) Assign the remaining tasks, longer utilization factors first, to the less loaded processors.In this way, the most critical tasks have a better chance to fit in a processor and are notjeopardized by less critical tasks incorporated previously.

In the proposed algorithm, the sets of ordered tasks, ordered processors and orderedcommunicating pairs shall be called thetasks, processors-andcommunication-stacks, re-spectively.

The method is stepped as follows (pseudo-code is presented in the Appendix):

Step I. The allocability test is performed on all pre-allocated tasks. If any of the tests fails,the system is absolutely non-schedulable, else Step II follows.

Step II. The processor-stack is assembled by randomly ordering the processors.

Step III. The communication-stack is assembled by ordering the communicating pairs bymonotonic decreasing times of communication. The task-stack is ordered by decreasingtask utilization factors.

Step IV. For each processor 1, 2, . . . ,n of the processor-stack, the communication-stack ispolled top-down until finding the first pair containing a task allocated to this processor.The allocability test is performed on the companion task; if it passes the test, the taskis definitely allocated to that processor and the pair deleted from the communication-stack. Each time a task is allocated to this processor the top-down polling of the wholecommunication-stack is repeated and candidate companion tasks tested in the sameway, until allocating all candidates or until finding that no further tasks can be allocatedto this processor.

After dealing with the last processor, only a subset of free (non allocated) tasks remains.

Step V. The processor-stack is reordered by increasing processor utilization factors.

178 SANTOS ET AL.

Ties between elements of a set with the same value are broken by a random ordering. Thecombined effect of ordering processors (Step V) and tasks (Step III) in those ways has adual advantage: 1) Free tasks with more stringent time requirements will be tried in theless loaded processors, making schedulings easier. 2) The possibility of incorporating anassociated communicating task is improved.

Step VI. The allocability test is performed on the first free task for allocation to the first,second,. . . , processor until finding one to which it can be allocated. If no allocation ispossible, Step IX follows, else the task is deleted from the stack and Step VII follows.

Step VII. The communication-stack remaining after the deletions of step IV is polledtop-down until finding the first pair containing a task allocated to the processor underconsideration. The allocability test is performed on the companion task; if it passesthe test, the task is definitely allocated to that processor and the pair deleted from thecommunication-stack. Each time a task is allocated to this processor, the top-downpolling of the whole stack is repeated and candidate companion tasks tested in the sameway. If the tasks-stack is not empty, Step V follows.

The tests are made easier and faster by the fact that, from past calculations, data aboutresidual memory in each processor as well as total load in the communications network areknown. When the task-stack is empty, a tentative assignment has been obtained. It must benoted, however, that communications have a strong influence on precedence constraints. Itmay be that the communication network meets its own time constraints but added commu-nication delays result in a violation of precedence constraints. This is found out when thetentative solution of Step VII is tested for precedence constraints in Step VIII.

Step VIII. The schedulability of each processor containing sending tasks is reverified withdeadlines corrected according to the precedence constraints. If the test is successfulone solution to the assignment problem has been found. If this is not true, the systemis not schedulable for that particular processor-stack. In that case, Step IX follows.

Step IX. A permutation of the processor-stack is generated and the process repeated fromStep III on. If all possiblen! permutations of the processor-stack have been tried withno success but the system has not been found to be absolutely non-schedulable, thesystem is deemed to be non-schedulable by this method.

In Fig. 5 the setA of all possible tasks/processors assignments (cardinalitynm), the subsetC of the assignments leading to a schedulable system and the subsetM of the schedulableassignments produced by the proposed method are represented. Obviously,C may be emptyand in that case the system is absolutely non-schedulable. IfM is empty butC is not, themethod has been unable to find a solution although it exists. It must be noted that thenature of the problem treated here, each task executing completely in one processor andtasks structured sequentially to constitute a job, favors, in the search for optimal or near-optimal solutions, the tendency to cluster communicating tasks within the same processor.Reducing communications delays has a double effect: communication constraints as wellas precedence constraints are more easily met. Since the search is directed away from

Figure 5. A, set of all possible task/processor assignments.C, schedulable assignments.M , schedulableassignments obtained by the heuristic method.

Figure 6. The graphs representing the six jobs of the problem. A number within a square besides the task indicatesthe processor to which it is preallocated.

non-optimal or non-near-optimal solutions, for certain problems the number of differentsolutions offered by the method may be low.

Example. Fig. 6 shows a set of six graphs representing three jobs of two tasks and three jobsof four, six and eight tasks each, to be executed in six processors interconnected througha 1Mb/s LAN. In Table I, utilization factors, memory requirements and preallocations,are shown. Since it is used for illustrative purposes only, the problem was softened bykeeping low the job utilization factors and by eliminating placement constraints due to faulttolerance. All 720 permutations led to a solution although only three of them were different.They are shown in Table 2. Solution 2 will be used as an example for a run, showing thesuccessive allocations, the evolution of stacks, etc., for each step of the algorithm.

180 SANTOS ET AL.

Table 1.Tasks preallocations, utilization factors, period, deadlineand memory requirements.

Task Period Deadline UF Memory Location

0 46 46 0.196 300

1 46 46 0.196 150

2 46 46 0.196 120 0

3 46 46 0.196 170

4 92 92 0.098 300

5 92 92 0.098 300 1

6 92 92 0.098 110

7 92 92 0.098 50 2

8 92 92 0.098 70

9 92 92 0.098 90

10 92 92 0.098 220

11 92 92 0.098 100

12 138 138 0.065 100

13 138 138 0.065 150

14 138 138 0.065 160 3

15 138 138 0.065 130 4

16 138 138 0.065 110

17 138 138 0.065 100

18 138 138 0.065 100 5

19 138 138 0.065 160

20 138 138 0.065 190

21 138 138 0.065 200

22 138 138 0.065 100 1

23 138 138 0.065 200

Step I. The allocability test is performed on tasks 2, 5, 7, 14, 15, 18 and 22, preallocatedto processors 0, 1, 2, 3, 4, 5 and 1, respectively. Since no test fails, the allocations aredefinitive and Step II follows.

Step II. The processor-stack, obtained by permutation of the first one, is〈0, 1, 2, 3, 5, 4〉.Step III. The communication-stack is〈11/13, 20/21, 21/23, 11/12, 19/20, 13/14,

18/19, 6/7, 0/1, 12/14, 8/9, 10/11, 14/15, 22/23, 20/22, 17/19, 7/9, 16/17, 16/18,2/3, 4/5, 6/8〉.

Table 2.The three different solutions found to the problem.

Processor Tasks

Solution 1 Solution 2 Solution 3

0 2; 3 2;3 2; 3

1 5; 16; 17; 19; 20; 21;22; 23 5; 16; 17; 19; 20; 21;22; 23 45; 22

2 6;7; 8; 9 6;7; 8; 9 6;7; 8; 9

3 10; 11; 12; 13;14 10; 11; 12; 13;14 10; 11; 12; 13;14

4 0; 1;15 4; 15 0; 1;15

5 4;18 0; 1;18 16; 17;18; 19; 20; 21; 23

Step IV.According to the processor-stack, Processor 0 is first considered. The allocabilitytest performed on task 3, companion of preallocated task 2, allows its incorporation. Thepair 2/3 is deleted from the communication-stack. Since now it does not contain any taskallocated to Processor 0, Processor 1 is brought into consideration. The allocability testsperformed on tasks 23 and 20, companions of preallocated task 22, allow their incorporation.The pairs 20/22 and 22/23 are deleted and the test on task 21, companion of tasks 20 and23 already incorporated, is performed. It can be allocated and the pairs 20/21 and 21/23are consequently deleted. Successive tests allocate tasks 19, 17 and 16, in that order andthe pairs 19/20, 17/19 and 16/17 are successively deleted. When trying to incorporatetask 4, companion of preallocated task 5 in the last but one pair, it is found that it is notpossible because the residual memory available at the processor is not enough. Followingthe procedure, tasks 6, 9 and 8 are incorporated to Processor 2 and tasks 13, 11, 12 and10 to processor 3, deleting, as a consequence, the pairs 6/7, 8/9, 6/8,11/13, 11/12, 13/14,12/14 and 10/11. The companion tasks of tasks 15 and 18, preallocated to processors 4and 5 respectively, are already allocated. At the end of this step, the subset of free tasks is{0, 1, 4}.

Step V.Processors 4 and 5 have the same utilization factor. The tie is randomly brokenand the processor-stack results to be〈5, 4, 0, 2, 1, 3〉. The task-stack is〈0, 1, 4〉, with arandom ordering of tasks 0 and 1, which have the same utilization factor.

Step VI. Task 0 is allocated to processor 5.Step VII. The communication-stack is now〈18/19, 0/1, 14/15, 16/18, 4/5〉. The first

pair containing a task allocated to the processor under consideration is 0/1. The allocabilitytest performed on task 1 allows its incorporation. The pair 0/1 is deleted from its stack.Since the remaining pairs do not contain tasks associated to tasks allocated to processor 5,Step V follows.

Step V.The processor- and task-stacks are now〈4, 5, 0, 2, 1, 3〉 and〈4〉, respectively.Step VI. Task 4 can be allocated to processor 4.Step VII. The task-stack is now empty. Tasks communicating via LAN are{18/19, 14/15,

16/18, 4/5}. The process is complete and a tentative solution has been found.

182 SANTOS ET AL.

Step VIII. The test shows that precedence constraints are met. Therefore, the solution isvalidated.

Many other solutions are possible. The problem is so soft that many of them can even befound by hand. The method, however, offers only three and the reason behind this smallnumber is that it optimizes communications, with an effective clustering of communicatingtasks in the same processor. Communications made via LAN are restricted to the pairs 4/5,14/15, 16/18 and 18/19 in the first two solutions and to the pairs 14/15, 20/22 and 22/23 inthe third one.

Since communicating tasks 14 and 15 are preallocated to different processors, they mustcommunicate via LAN. Task 22 is preallocated to processor 1 and, because of transitivecommunications, tasks 16, 17, 19, 20, 21 and 23 are allocated to the same processor insolutions 1 and 2. Since tasks 18 and 22 are preallocated to different processors, it followsthat the communications 16/18 and 18/19 must then necessarily be made via LAN. In thethird solution, tasks 16, 17, 19, 20, 21 and 23 are allocated, by transitive communications,to processor 5, to which task 18 is preallocated. Since tasks 18 and 22 are preallocated todifferent processors, it follows that communications 20/22 and 22/23 must then necessarilybe made via LAN.

In the first two solutions, the communication between tasks 4/5 also appear via LAN.This is due to the fact that the starting processor-stack is〈0, 1, 2, 3, 4, 5〉 in the first caseand 〈0, 1, 2, 3, 5, 4〉 in the second one. Tasks 5 and 22, preallocated to processor 1 aredefinitively allocated in Step I of the method. In Step IV, tasks 16, 17, 19, 20, 21 and 23 areallocated to the same processor because they have a heavier communication load than thepair 4/5. When trying to allocate task 4 to processor 1, the method finds that not enoughmemory is available and, consequently, can not do it. In Step VI, task 4 is allocated toprocessors 5 and 4 in the first and second solutions, respectively, because they are the lessloaded. In the third solution, the starting processor-stack is〈0, 2, 3, 4, 5, 1〉. All of themare filled, in that order, in Steps I to IV. Most of job 5 is allocated to processor 5, the lastbut one to be considered. Finally, the remaining task, 4, is allocated to processor 1, the lastto be considered.

The example shows what could be expected: Being actively guided towards optimal ornear-optimal solutions, the method does not produce solutions that, although possible, arefar from optimal. Hence the small number of solutions (0.41% of all permutations tried).

On the other hand, other problems may be expected to return a high number of differentsolutions. In fact, in certain cases, eithern! or none solutions will be found.

Example. Take the same problem of the previous example, this time without preallocations.The first solution found is presented in Table 3. It was obtained for the first processor-stack(〈0, 1, 2, 3, 4, 5〉). It is obvious that since no preallocations have been imposed and allprocessors are assumed to be equal, there aren! = 720 different solutions. They are easilyobtained by keeping invariant the rows of the tasks in the Table and putting in the processors’column all possible processors’ permutations (〈0, 1, 2, 3, 5, 4〉, 〈0, 1, 2, 4, 3, 5〉, etc.). If alltasks are preallocated, there is either only one or none solution at all.

Table 3. The first af then! solutions to theproblem with no preallocations.

Processor Tasks

0 0; 1

1 2; 3

2 4; 5

3 6; 7; 8; 9

4 10; 11; 12; 13; 14; 15

5 16; 17; 18; 19; 20; 21; 22; 23

5. The Method’s Performance. Results and Analyses

5.1. Number of Solutionsvs. Network Bandwidth for Different Average Processor Uti-lization Factors

The performance of the method was determined for different average processors’ utiliza-tion factor (APUF) and network bandwidths (NBW). APUF is, of course, the sum of theutilization factors of all tasks divided by the number of processors. It must be noted thatLiu and Layland (1973) found the theoretical processor utilization factor upper boundU ≤ m(21/m − 1) for the case ofm independent schedulable tasks in a single processor.This bound was later increased (Leung 1982; Joseph 1986; Lehozcky 1989; Santos 1993)and may reach value 1.

In order to determine the method’s performance, systematic tests were conducted on 920problems. They are based on three families of graphs, each graph representing a job. Thenumber of beginning tasks (with no predecessor) of each job and the number of successorsto each task is determined randomly between 1 and 3, with probabilities decreasing in thatorder. All tasks of a job have at least one successor except the last one. Six tasks (25% ofthe total) are randomly selected to be preallocated to some of the three first processors. Twosuccessors to one predecessor are randomly selected to be executed in different processors, tomake them fault tolerant. The length of messages to be transmitted between communicatingtasks is randomly assigned for each pair in the range 50 to 450 bytes. Hard-precedenceconstraints are assumed.

Graphs belonging to the first, second and third families have 4, 8 and 12 nodes respectively.Five underlying sets of three graphs, representing three jobs, are generated by instantiationof a graph of each family. The only exceptions to these rules are the first job in the first set ofgraphs and the third job in the fifth set of graphs. In both cases, the graphs were forced to bechains, in order to toughen the precedence constraints. The five sets are depicted in Figs. 7to 11, showing periods, communication requirements and preallocations (the processor isindicated by a number within a square at the right of the node). As can be seen, the periodsof the second and third jobs are twice and thrice the period of the first one, respectively.The period is common to all tasks of a job. Execution times are adjusted to produce APUFs

184 SANTOS ET AL.

Figure 7. Set of graphs 0.

between 0.1 and 0.8 in steps of 0.1. The NBW is varied between 0.1 and 100 Mb/s,producing payload bandwidths of 10 and 10,000 bytes/ms. An average memory utilizationfactor of 0.50 is used. Different job utilization factors, leading to different APUFs, areobtained by varying the execution times.

Each of the five underlying sets is used to find the performance of the method for eightdifferent tasks’ utilization factors (0.1 to 0.8) and 23 different bandwidths (in the range 0.1to 100 Mb/s). Each of the 920 triplets (graph, utilization factor and bandwidth) defines aproblem. The number of different solutions obtained for each duple (APUF and NBW) arelater averaged.

Results are presented in Fig. 12In Fig. 13, the average communication delays (in slots) are representedvs. APUF for

different NBW.The analysis of the results shown in the previous figures shows:a) For a given average processor utilization factor, APUF, the number of different solutions

found increases with LAN bandwidth. This is a consequence of smaller communicationsdelays that make easier to meet hard-precedence constraints.

b) As the APUF increases, solutions are found only with higher bandwidths and in smallernumbers. This is explained by the fact that the problem is harder to solve because increasingprocessor utilization factors, obtained through increments in tasks’ execution times, leaveless time free for communications. This, in turn, leads to the necessity of a higher bandwidth.

c) After a certain bandwidth, no additional different solutions are found. This is explainedby the fact that the number of different tentative solutions found at the end of Step VII prior

186 SANTOS ET AL.

Figure 10.Set of graphs 3.

Figure 11.Set of graphs 4.

Figure 12.Average number of different solutionsvs. network bandwidth (Mb/s), for different processor utilizationfactors.

Figure 13.Average communications delays (in slots)vs. average processor utilization factors for different networkbandwidths (Mb/s). For 0.3 and 0.7 Mb/s no solutions were found for APUF 0.7.

188 SANTOS ET AL.

to the precedence check is always the same. With lower bandwidths, some of them arediscarded for not meeting the precedence constraints. Increasing the bandwidth and conse-quently diminishing the communications delays allow the incorporation of prior discardedsolutions. When all solutions are incorporated nothing more can be gained by increasingthe bandwidth.

d) For a given bandwidth the number of solutions decreases as the APUF increases. Thisis explained by two facts. In the first place, some of the solutions found at the end of StepVII fail to meet the precedence constraints because the incremented execution times allowless time for communications. Secondly, increments in job utilization factors reduce thenumber of tasks that can be processed in a given processor and eventually all processorsare saturated but not all tasks have been allocated.

e) Average communication delaysvs. average processor utilization factors for a givenbandwidth are not monotonic. This is explained by the fact that different utilization factorslead to different assignments and the reshuffling of tasks produces different communicationneeds and, therefore, different delays.

f) No solution could be found for a 0.8 APUF. Since the problem is NP-hard, it is impos-sible to know if the solution does not exist or the method is unable to find it. Nevertheless,it must be pointed out that the graphs are not completely random but were arbitrarily tough-ened by forcing two jobs to be chains, thereby imposing a heavy burden on precedences.As will be seen later, hard-precedence constraints, as defined in this paper, play a decisiverole in the feasibility of the assignment.

5.2. Success Ratio vs. Average Processor Utilization Factors for Different NetworkBandwidths

Success ratio, SR, is the relation between the number of times that at least one solutionis found for a problem and the total number of problems considered. The curves repre-senting SRvs. APUF, for different NBWs and six processors were determined. APUFsvaried between 0.1 and 0.8 and NBWs between 0.2 and 20 Mb/s. 100 sets of three graphsrepresenting jobs of four, eight and twelve tasks were randomly generated. In this case, thenumber of starting tasks, in the first level, is randomly chosen in the intervals [1,3], [1,7] and[1,11] respectively. The number of tasks in successive levels is generated in a similar way,although the interval is, in each case, [1, number of tasks not assigned to a previous level].Connectivity between tasks in one level and tasks in the following levels is established, ineach case, with probability 20%. The communication load, in bytes, associated to each arcwas obtained by multiplying by 300 a randomly generated integer in the interval [1,10].Six tasks were randomly preallocated and two pairs of tasks in the same level of a given jobwere randomly selected to be executed in different processors. A typical set of graphs withthe associated communication loads and placement constraints is illustrated in Fig. 14.

Each of the 100 underlying sets of graphs was used to generate different problems cor-responding to the different duples (APUF, NBW). In order to assess the relative influenceof the type of precedence constraints on the SR, the same problem was attacked with both,soft and hard, precedences. Results are presented in Figs. 15 and 16. In each of them, 100

Figure 14. Typical set of graphs used to determine SRvs. APUF, for different NBW. A number within a squareindicates preallocation. The symbol∗ in a pair of tasks of the same job indicate that they can not be executed inthe same processor.

tasks sets were processed for each point. That makes 4,800 problems for the first figure and5,600 problems for the second one. As could be expected, in both cases the basic resultsare:

a) For a given NBW, the SR decreases as the APUF increases.

b) For a given APUF, the SR increases as the NBW increases.

The very important influence of the type of precedence constraint can be clearly seen:With soft-precedence, a SR of 98% is already reached for a NBW = 2 Mb/s (at APUF = 0.4).With hard-precedence, 20 Mb/s are necessary to reach only 96% at the same APUF. In thesoft-precedence case, the usually accepted 95% confidence interval produced the smallergap between the upper and lower limits of the interval, 6.98, for NBW = 2 Mb/s and APUF= 0,4. The larger gap, 20.2, was found for NBW = 0.6 Mb/s and APUF = 0.8. In the caseof hard-precedence constraints, the smaller and the larger gaps for SR> 0 were of 5.54and 20.2. They were found for the duples (20 Mb/s, 0.3) and (0.4 Mb/s, 0.2) respectively.Since SRs are referred to the total number of problems treated and not to the number ofproblems for which at least one solution exists, the results show a pessimistic view of theeffectiveness of the method: As the problems get tougher because of increasing APUFsand/or decreasing NBWs, it is possible that some of them have no solution at all.

190 SANTOS ET AL.

Figure 15.SRvs. APUF for different NBW, soft-precedences.

Figure 16.SRvs. APUF for different NBW, hard-precedences.

6. Comparison with Other Methods

The touchstone of a heuristic method is its comparison to other heuristics or simulations.The best comparison is made when exactly the same well specified set of problems, usedas benchmarks, are attacked by the different methods. After eliminating differences arisingfrom, for instance, the power of the computing equipment or the programming languageused, the times taken to find the first or the “best” solution can be used as a convenient metric.Unfortunately, that detailed information is not normally available in the papers. In order tofind it, some authors were contacted although the effort was of little avail. May be the timehas come to devise a well specified benchmark with adequate metrics to compare differentmethods to solve the multitasks/multiprocessor assignment problem. Besides time, otherindicators (e.g. success ratios, fault-tolerance, robustness, etc.) can be used as metrics, butit must be always possible to reproduce the experiments and verify obtained results.

The Tindell (1992) example is completely specified (with execution times, periods, pre-allocations, memory and communications requirements for each task, network bandwidthand memory capacity in each processor). Therefore it was taken as a convenient exampleto start comparisons.

6.1. Simulated Annealing

Tables 4 and 5 show the system of 8 processors and 43 real-time periodic tasks used asillustrative example in Tindell (1992). Table 4 lists the memory capacity of each processor.In Table 5 the period, worst case execution-time, memory requirement, communicationload and, if it exists, the pre-allocated or the forbidden processor for each task are shown.50/1 and 150/2 in the first line mean that taskτ0 must send messages of 50 bytes to taskτ1 and of 150 bytes to taskτ2. The following pairs are original and replica and can notbe executed in the same processor: (33,38), (34,39), (35,40), (36,41) and (37,42). Thesimulated annealing method starts with an arbitrary energy and temperature associated to arandom assignment in the system. The energy is a measure of the unschedulability and hightemperatures are associated to high energies. Jumps are performed from one assignment toother trying to lower the energy. Jumps from one energy point to another are performed inthe order of thousands for each temperature until reaching one solution.

Table 6 shows the solution presented in (Tindell 1992), with tasks allocated to processorsand the processor and memory utilization factors. The payload bandwidth of the Round-Robin network is 90 bytes/ms, corresponding approximately to a 0.9 Mb/s 802.5 Token-Ring LAN. Only soft-precedence constraints are considered. The communications delay,1, found with (5) is used to verify soft-precedence, according to (6).

The assignment, however, is incorrect since the processor 0 memory capacity (10,000) isexceeded by the memory requirements of the set of tasks assigned to it (12,600). It probablyis a typing error since the solution is correct if the memory requirement of task 0 is 300instead of 3000 as stated in Table 1 of the Tindell paper.

The heuristic method proposed here produced the first solution on the first trial. It isshown in Table 7. The average processor utilization is 56%, the average memory utilizationis 79.7%, the bus utilization factor is 61% and the bus relative load is 40%.

192 SANTOS ET AL.

Table 4. Processorsmemory capacity.

Processor Memory

0 10,000

1 10,000

2 10,000

3 12,000

4 7,000

5 7,000

6 12,000

7 10,000

To test all possible 40,320 permutations in an Alpha AXP 3000 took approximately 28min. 38,389 permutations produced schedulable assignments although only 26 of them weredifferent. For instance, the solution in Table 7 was also produced by the next generatedpermutation in which processors were ordered〈0, 1, 2, 3, 4, 5, 7, 6〉. The solution foundby simulated annealing was none of those found by the heuristic method. This fact seemedso curious that it led us to verify the simulated annealing solution finding it defective asexplained above.

The method, programmed in C, took less than 42 ms to find the first solution. Unfortu-nately the time taken by simulated annealing, indispensable to compare both methods, is notmentioned in Tindell (1992) and could not be obtained by other means. Nevertheless, giventhe complexity of the proposed problem, the short time itself suggests a good performanceof the heuristic method. Since 38,389 out of the 40,320 possible processors’ permutationsled to one of the 26 successful assignments, the probability of finding a solution on the firsttrial was better than 95%. Although the probability that it occurs is practically nil, in theworst case the first solution would have been found in the 1,931th trial, that is in less than82 s.

In a subsequent test, soft-precedence constraints were changed to hard, keeping invariantthe rest of the problem’s parameters. In Fig. 17, the graphs representing all the tasks of thejobs are shown. With the 90 bytes/ms payload bandwidth, none of the 40,320 permutationsproduced an assignment. This is due to the fact that the communications delay amounts to8 to 12 ms. This is comparable to the shorter periods (14 ms) which makes very difficult theexecution of the task and the transmission of data to its successor within one period. If thepayload bandwidth is increased to 400 bytes/ms (approximately corresponding to a 4 Mb/s802.5 LAN) the communication delay is reduced to 2 to 3 ms and the method produces 10different solutions out of 30,240 processor-stacks leading to schedulable assignments.

If the payload bandwidth is increased to 1,600 bytes/ms (approximately corresponding toa 16 Mb/s 802.5 LAN), the communications delay is reduced to 1 ms. The method produced26 different solutions out of 40,320 processor-stacks leading to schedulable assignments.

Table 5.Period, worst case execution time, memory requirements, lengthof message/destination, preallocation of each task.

Task Period WCET Memory Messages Location

0 60 4 3000 50/1; 150/2 0

1 60 4 1500 60/3; 70/4; 30/5

2 60 2 1200 20/3

3 60 2 1700 1

4 60 2 3000 60/6

5 60 4 3000 80/6

6 60 6 1100 2

7 35 2 500 40/8 1

8 35 2 700 1

9 35 8 900 90/11 0

10 35 14 2200 250/11

11 35 4 1000 1

12 14 2 1000 150/13; 150/14 2

13 14 2 1500 50/15

14 14 2 1600 50/15

15 14 2 1300 3

16 14 2 1100 50/17 3

17 14 2 1000 2

18 35 1 1000 50/19 1

19 35 1 1600 1

20 14 1 1900 40/21

The first solution found was identical to the first solution found for soft-precedence and the0.9 Mb/s LAN (Table 7). Although the processor and memory average utilization and therelative load were the same, the bus utilization factor fell to 4%. This was to be expectedsince the number of bytes to be transmitted is the same but the available bandwidth is muchlarger. Since now communication delays are much smaller than execution times, increasesin the LAN bandwidth do not necessarily lead to more solutions.

Table 8 shows the first solution found with hard-precedence constraints and a 4 Mb/sLAN. Bus utilization factor is 14% and relative load is 37%.

194 SANTOS ET AL.

Table 5.Continued.

Task Period WCET Memory Messages Location

21 14 2 2000 3

22 14 1 1000 40/23

23 14 1 2000 40/24

24 14 1 1000 20/25

25 14 1 2000 20/26

26 14 2 7000 20/27; 20/28

27 14 1 1100 50/29

28 14 1 900 30/29

29 14 1 500 6

30 14 1 600 50/31 7

31 14 2 800 70/31

32 14 2 1300 7

33 20 3 1000 50/35 2; 3

34 20 2 1000 50/35 0; 1

35 20 2 1000 60/36; 60/37

36 20 2 1000 6; 7

37 20 2 1000

38 20 3 1000 50/40 2; 3

39 20 2 1000 50/40 0; 1

40 20 2 1000 60/41; 60/42

41 20 2 1000 6; 7

42 20 2 1000

6.2. Other Heuristics

In the absence of a benchmark, comparison with other heuristics is also difficult. However,since Ramamritham (1992) is still one of the outstanding papers on heuristics to solve theproblem treated here, it can be used as a reference mark. Straight comparisons betweenboth methods are not possible because in that paper preemptions are not allowed, the onlyresource constraints refer to CPU and communications, a Cyclic Executive instead of aRate Monotonic scheduler is used, communications loads are measured in units of timeinstead of units of data (precluding the use of NBW as variable/parameter), Success Ratiois determinedvs. Laxity Factor for different Communication Factors (a variable and a

Table 6.The Tindell (1992) solution.

Processors Tasks PU% MU%

0 0; 1; 2; 4; 9; 34; 35; 37 72.9 127.0

1 3; 7; 8; 10; 11; 18; 19; 39 81.9 97.0

2 6; 12; 13; 14; 17; 33 82.1 72.0

3 5; 15; 16; 20; 21; 38 71.7 85.8

4 22; 23; 24; 25 28.6 85.7

5 00.0 00.0

6 26; 27; 28; 29; 36 45.7 87.5

7 30; 31; 32; 40; 41; 42 65.7 57.0

Table 7. First solution found by the heuristic method (soft-precedence constraints, 0.9 MB/s LAN).

0 0; 1; 2; 9; 34; 35; 37 70 96

1 3; 7; 8; 10; 11; 18; 19; 39 82 97

2 6; 12; 13; 14; 17; 33 82 72

3 15; 16; 20; 21; 38; 40 69 75

4 42; 4; 5 20 100

5 22; 23; 24; 25 29 86

6 26; 27; 28; 29; 36 46 88

7 30; 31; 32; 41 46 37

parameter definedad hocin the paper), no preallocations are assumed and, finally, becausea different task generator is used.

APUF and NBW, used as variable/parameter andvice versa, lend themselves to a moreclear performance evaluation and presentation, as shown in Figs. 15 and 16. On top of that,and for comparison purposes, APUFs can be translated into Laxity Factors, essentially afunction of the Task Utilization Factors and the number of processors. In our case, as inRamamritham (1990), LF = 1/2.2 APUF. This means that we cover an LF interval of [0.56,4.5], reaching a non zero SR even for APUFs of 0.8 and 0.7 for soft and hard-precedences,respectively, whereas in Ramamritham (1990) SR is plotted only up to an APUF of 0.5 andextrapolating the best curves, SR falls to zero before an APUF of 0.6.

196 SANTOS ET AL.

Table 8. First solution found by the heuristic method (hard-precedence constraints, 4 Mb/s LAN).

0 0; 1; 2; 9; 34; 35; 37 70 96

1 3; 7; 8; 10; 11; 18; 19; 39 82 97

2 6; 12; 13; 14; 17; 33 82 72

3 15; 16; 20; 21; 38 65 61

4 22; 23; 24; 25 29 86

5 4; 5 10 86

6 26; 27; 28; 29; 36 46 88

7 30; 31; 32; 40; 41; 42 66 57

Figure 17.The 11 jobs of the Tindell example.

7. Summary and Conclusions

The heuristic presented in this paper is an actively guided search towards optimal or near-optimal solutions to the problem of scheduling a set of preemptible, resource sharing andblockable real-time tasks on a set of heterogeneous processors. The Empty-Slots methodand the Rate Monotonic discipline are used to schedule tasks in processors. Fair Round

Robin is used to schedule the communication network. Preallocated tasks are tested first,allowing early detection of absolutely non-assignable systems. By sending, wheneverpossible, communicating tasks to the same processor, not only the communication networkis unloaded but, what is more important, precedence constraints are more easily met. Themethod is quite general in the sense that it deals with all type of constraints: placement (byreasons of preallocations or fault tolerance), resource (CPU and memory, easily extensibleto I/O), communication, time and precedence. A formal approach to theoretical aspectsof soft and hard-precedences is offered. The method’s performance is evaluated throughsimulations conducted in the order of thousands. Results are expressed and analyzed asnumber of solutionsvs. bandwidth for different average processor utilization factors and assuccess ratiovs. average processor utilization factors for different bandwidths. Comparisonwith other methods is then carried out.

In (Stankovic 1988) several research issues are pointed out as of necessary development inorder to build a science of large scale real-time systems. Among them, Scheduling Theory,essentially the assignation of resources according to well understood algorithms so that alldeadlines are met, stands out as one of the more important. That the timing behavior mustbe understandable, predictable and maintainable, is a requirement specially significant inthe case of multiprocessor/multitasks systems. Proposed methods to solve this problem dateback to the seventies although the bulk of related literature was published in the eightiesand nineties.

Unfortunately, most methods can not be compared simply because the problems theyattack are defined in different terms. Sometimes, small differences in those terms producebig differences in the evaluation results. For instance, if a Cyclic Executive was used inthis heuristic instead of Rate Monotonic, the results probably would not be so good. Evenif the same priority discipline is used, a different method to apply it (e.g. Lehozckyvs. Liu)would lead to different results.

The lack of a comparative base is a common phenomenon in the early times of a discipline.The appearance of standards is a sign of maturity. Consequently, may be the time has come todefine a benchmark to test the proposed multitasks/multiprocessor heuristics or simulations.Admittedly it is not an easy task because of the many attributes and parameters characterizingdistributed systems, but it is, no doubt, worth a try.

In the meantime, and with the caveat that what are being compared are methods to solvesimilar but not exactly identical problems, the heuristic here presented seems to have avery acceptable performance in terms of success ratios, time necessary to find solutions andnumber of different optimal or near-optimal solutions found.

Appendix

Program Heuristic Algorithm(tasks, processors, communications)begin

If Allocability test(processors← pre-allocated tasks) =TRUEThenbegin

AssembleProcessor-stackRND ;randomly orderingWhile all permutations have not been trieddobegin

198 SANTOS ET AL.

AssembleTask-stackDUF ;decreasing utilization factorAssembleCommunication-stackDCT ;decreasing communication timesFor eachprocessor∈ processorstack following stack orderbegin

While ∃ taskpair:= first pair ((τa, companiontask)or (companiontask,τa))∈communication-stack| τa ∈ {tasksallocated to processor} andAllocability test(processor← companiontask)=TRUEdo

beginAllocate(processor← companiontask)Deletepair(taskpair)end

endSort Processor-stackIUF ;increasing utilization factorWhile ∃ proc place:= first processor∈ processor-stack| Allocability test(processor← first

task in the task-stack)=TRUEdobegin

Allocate(proc place←first task in the task-stack)Deletefirst task(tasks-stack)While ∃ taskpair:= first pair ((τa, companiontask)or (companiontask,τa))∈

communication-stack| τa ∈ {tasksallocated to procplace} andAllocability test(proc place←companiontask)=TRUEdo

beginAllocate(proc place←companiontask)Deletepair(taskpair)

endSort Processor-stackIUF

endIf task-stack= ∅ and Precedencetest(processors)=TRUEthen solution++Permutation(processor-stack)

endReturn(the system is not schedulable for this method)

endelse Return(the system is absolutely non schedulable)

Acknowledgments

The authors wish to express their sincere appreciation to the referees whose observations ledto a definite improvement of the original version. Thanks are also due to Prof. A. Mendelzon(University of Toronto) for logistic support and to Prof. E. Brignole (UNS) for providingcomputing facilities.

References

Borriello, G. and Miles, D. 1994. Task scheduling for real-time multiprocessor simulations.11th Workshop onReal-Time Operating Systems and Software, pp. 70–73.

Cheng, A. M. K., Browne, J., Mok, A., and Wang, R. 1993. Analysis of real-time rule-based systems withbehavioral constraint assertions specified in Stella.IEEE Trans. Software Eng. 19(9): 863–885.

Chetto, H., Silly, M., and Bouchentouf, T. 1990. Dynamic scheduling of real-time tasks under precedenceconstraints.Real-Time Systems2: 181–194.

Ferro, E., Orozco, J., and Santos, J. 1993. Scheduling of real-time tasks with blocking conditions (in spanish).Proc. XIX Conferencia Latinoamericana de Informatica, pp. 14.19–14.27.

Joseph, M., and Pandya, P. 1986. Finding response times in a real-time system.The Computer Journal29(5):390–395.

Katcher, D., Arakawa, H., and Strosnider, J. 1993. Engineering and analysis of fixed priority schedulers.IEEETrans. Software Eng. 19(9): 920–934.

Leung , J., and Whitehead, J. 1982. On the complexity of fixed priority scheduling of real-time tasks.PerformanceEvaluation2(4): 237–250.

Lehozcky, J. P., Sha, L., and Ding, Y. 1989. The rate monotonic scheduling algorithm: Exact characterization andaverage case behavior.Proc. Real-Time Systems Symp. IEEE CS, Los Alamitos, CA, pp. 166–171.

Liu, C. L., and Layland, J. W. 1973. Scheduling algorithms for multiprogramming in a hard real-time-environment.J. ACM20(1): 46–61.

Ma, P.-Yi R., Lee, E., and Tsuchiya, M. 1982. A task allocation model for distributed computing systems.IEEETrans. Computers31(1): 41–47.

Muntz, R., and Coffman, E. 1970. Preemptive scheduling of real-time tasks on a multiprocessor system.J. ACM17(2): 324–338.

Obenza, R. 1993. Rate monotonic analysis for real-time systems.IEEE Computer26(3): 73–74.Ramamritham, K. 1990. Allocation and scheduling of complex periodic tasks.Proc. 10th International Conference

on Distributed Computing Systems, pp. 108–115.Ramamritham, K., Fohler, G., and Adan, J. 1993. Issues in the static allocation and scheduling of complex periodic

tasks.Proc. IEEE RTOSS, pp. 11–16.Santos, J., Gastaminza, M. L., Orozco, J., and Matrangolo, C. 1994. 802.5 priority mechanism in hard real-time

RMS applications.Computer Communications17(6): 439–442.Santos, J., Gastaminza, M. L., Orozco, J., Picardi, D., and Alimenti, O. 1991. Priorities and protocols in hard

real-time LANs: implementing a crisis-free system.Computer Communications14(9): 507–514.Santos, J., and Orozco, J. 1993. Rate monotonic scheduling in hard real-time systems.Information Processing

Letters48: 39–45.Sha, L., Rajkumar, R., and Lehoczky, J. 1990. Priority inheritance protocols: An approach to real-time synchro-

nization. IEEE Trans. Computers39(9): 1175–1185.Stankovic, J. 1988(Ed.). Real-time computing systems: The next generation.IEEE Computer21(10): 10–19.Tindell, K., Burns, A., and Wellings, A. 1992. Allocating hard real-time tasks: An NP-hard problem made easy.

Real-Time Systems4: 145–165.Xu, J. 1993. Multiprocessor scheduling of processes with release times, deadlines, precedence and exclusion

relations.IEEE Trans. Software Eng. 19(2): 139–154.Zhao, W., Ramamritham, K., and Stankovic, J. 1987. Preemptive scheduling under time and resource constraints.

IEEE Trans. ComputersC-36(8): 949–960.

a heuristic approach to the multitask-multiprocessor assignment problem using the empty-slots method...

Documents

utilization bounds for multiprocessor rate-monotonic...

monotonic extensions of petri nets

monotonic response of exposed base plates of columns

slots for web 07.03.2022 blank file.xlsx

deep cascaded multitask framework for detection of temporal

experimental analysis of circular disc with diametral slots

on monotonic convergence to stability

multitask classification hypothesis space with improved...

limitations of cross-monotonic cost-sharing schemes

efficient scheduling of real-time multitask applications in...

multitask learning and the reorganization of work

compact broadband antenna with vicsek fractal slots ... -...

limitations of cross-monotonic cost sharing schemes

on the non-monotonic behaviour of methane—steam reforming...

monotonic stress-strain behaviour of fibre reinforced

issues in parallel execution of non-monotonic reasoning...

hybrid probabilistic logic programs with non-monotonic...

single mode lasers based on slots suitable for photonic...

convex fuzzy games and participation monotonic allocation...

discrete-slots models of visual working-memory response...