a comparison of fast and low overhead distributed priority locks

16
storage per processor (the O(log n) bits are required to store the names of O(1) processors), and an average of O(log n) messages per critical section entry (if the load on the critical section is less than 100%). The low space and message passing overhead make them scalable and practi- cal for implementation. The processors synchronize by sending and interpreting messages according to a synchro- nization protocol. We assume that every message that is sent is eventually received. The third protocol also requires that messages are delivered in the order sent. Since all three of the protocols have similar theoretical performance, we implemented a simulation of the algorithms and made a performance study. Considerable attention has been paid to the problem of distributed synchronization. Lamport [12] proposes a timestamp-based distributed synchronization algorithm. A processor broadcasts its request for the token to all of the other processors, which reply with a permission. A processor implicitly receives the token when it receives permissions from all other processors. Ricart and Agrawala [19] and Carvalho and Roucairol [2] improve on Lamport’s algorithm by reducing the message passing overhead. How- ever, all of these algorithms require O(n) messages per re- quest. Thomas [20] introduces the idea of quorum consensus for distributed synchronization. When a processor requests the token, it sends a vote request to all of the other proces- sors in the system. A processor will vote for the critical section entry of at most one processor at a time. When a processor receives a majority of the votes, it implicitly receives the token. The number of votes that are required to obtain the token can be reduced by observing that the only requirement for mutual exclusion is that any pair of processors require a vote from the same processor. Mae- kawa [14] presents an algorithm that requires O(ˇ n) mes- sages per request and O(ˇ n log n) space per processor. Kumar [11] presents the hierarchical quorum consensus protocol, which requires O(n 0.63 ) votes for consensus, but is more fault-tolerant than Maekawa’s algorithm. Li and Hudak [13] present a distributed synchronization algorithm to enforce coherence in a distributed shared virtual memory (DSVM) system. In DSVM, a page of memory in a processor is treated as a cached version of a globally shared memory page. Typical cache coherence algorithms require a home site for the shared page, which tracks the positions of the copies of the page. The ‘‘distrib- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 32, 74–89 (1996) ARTICLE NO. 0006 A Comparison of Fast and Low Overhead Distributed Priority Locks 1 THEODORE JOHNSON 2 AND RICHARD NEWMAN-WOLFE 3 Department of CIS, University of Florida, Gainesville, Florida 32611-2024 74 0743-7315/96 $12.00 Copyright 1996 by Academic Press, Inc. All rights of reproduction in any form reserved. Distributed synchronization is necessary to coordinate the diverse activities of a distributed system. Priority synchroniza- tion is needed for real-time systems or to improve the perfor- mance of critical tasks. Practical synchronization techniques require fast response and low overhead. In this paper, we pre- sent three priority synchronization algorithms that send O(log n) messages per critical section request on average and use O(log n) bits of storage per processor. Previous algorithms re- quired 2(n 2 1) messages per critical section entry and O(n) bits of storage per processor. Two of the algorithms are based on Li and Hudak’s path compression techniques, and the third algorithm uses Raymond’s fixed-tree structure. Since each of the algorithms have the same theoretical complexity, we make a performance comparison to determine which of the algo- rithms is best under different loads and different request prior- ity distributions. We find that when the request priority distri- bution is stationary, the path-compression algorithm that uses a singly linked list is best overall, but the fixed-tree algorithm requires fewer messages when the number of processors is small and the load is high (100% or greater). When the request priority distribution is nonstationary, the fixed-tree algorithm requires the fewest messages when the load is 100% or greater. 1996 Academic Press, Inc. 1. INTRODUCTION Distributed synchronization is an important activity that is required to coordinate access to shared resources in a distributed system. A set of n processors synchronize their access to a shared resource by requesting an exclusive privilege to access the resource. The privilege is often rep- resented as a token. Real-time systems, or systems that have critical tasks that must execute quickly for good per- formance, need prioritized synchronization. In priority syn- chronization, every request for the token has a priority attached. When the token holder releases the token, it should be given to the processor with the highest prior- ity request. In this paper, we present three distributed priority syn- chronization algorithms, each requiring O(log n) bits of 1 We acknowledge the support of USRA Grant 5555-19 and NSF Grant DMS-9223088. 2 E-mail: [email protected]fl.edu. 3 E-mail: [email protected]fl.edu.

Upload: theodore-johnson

Post on 15-Jun-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Comparison of Fast and Low Overhead Distributed Priority Locks

storage per processor (the O(log n) bits are required tostore the names of O(1) processors), and an average ofO(log n) messages per critical section entry (if the load onthe critical section is less than 100%). The low space andmessage passing overhead make them scalable and practi-cal for implementation. The processors synchronize bysending and interpreting messages according to a synchro-nization protocol. We assume that every message that issent is eventually received. The third protocol also requiresthat messages are delivered in the order sent. Since all threeof the protocols have similar theoretical performance, weimplemented a simulation of the algorithms and made aperformance study.

Considerable attention has been paid to the problemof distributed synchronization. Lamport [12] proposes atimestamp-based distributed synchronization algorithm. Aprocessor broadcasts its request for the token to all ofthe other processors, which reply with a permission. Aprocessor implicitly receives the token when it receivespermissions from all other processors. Ricart and Agrawala[19] and Carvalho and Roucairol [2] improve on Lamport’salgorithm by reducing the message passing overhead. How-ever, all of these algorithms require O(n) messages per re-quest.

Thomas [20] introduces the idea of quorum consensusfor distributed synchronization. When a processor requeststhe token, it sends a vote request to all of the other proces-sors in the system. A processor will vote for the criticalsection entry of at most one processor at a time. When aprocessor receives a majority of the votes, it implicitlyreceives the token. The number of votes that are requiredto obtain the token can be reduced by observing that theonly requirement for mutual exclusion is that any pair ofprocessors require a vote from the same processor. Mae-kawa [14] presents an algorithm that requires O(Ïn) mes-sages per request and O(Ïn log n) space per processor.Kumar [11] presents the hierarchical quorum consensusprotocol, which requires O(n0.63) votes for consensus, butis more fault-tolerant than Maekawa’s algorithm.

Li and Hudak [13] present a distributed synchronizationalgorithm to enforce coherence in a distributed sharedvirtual memory (DSVM) system. In DSVM, a page ofmemory in a processor is treated as a cached version of aglobally shared memory page. Typical cache coherencealgorithms require a home site for the shared page, whichtracks the positions of the copies of the page. The ‘‘distrib-

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 32, 74–89 (1996)ARTICLE NO. 0006

A Comparison of Fast and Low Overhead Distributed Priority Locks1

THEODORE JOHNSON2 AND RICHARD NEWMAN-WOLFE3

Department of CIS, University of Florida, Gainesville, Florida 32611-2024

740743-7315/96 $12.00Copyright 1996 by Academic Press, Inc.All rights of reproduction in any form reserved.

Distributed synchronization is necessary to coordinate thediverse activities of a distributed system. Priority synchroniza-tion is needed for real-time systems or to improve the perfor-mance of critical tasks. Practical synchronization techniquesrequire fast response and low overhead. In this paper, we pre-sent three priority synchronization algorithms that send O(logn) messages per critical section request on average and useO(log n) bits of storage per processor. Previous algorithms re-quired 2(n 2 1) messages per critical section entry and O(n)bits of storage per processor. Two of the algorithms are basedon Li and Hudak’s path compression techniques, and the thirdalgorithm uses Raymond’s fixed-tree structure. Since each ofthe algorithms have the same theoretical complexity, we makea performance comparison to determine which of the algo-rithms is best under different loads and different request prior-ity distributions. We find that when the request priority distri-bution is stationary, the path-compression algorithm that usesa singly linked list is best overall, but the fixed-tree algorithmrequires fewer messages when the number of processors is smalland the load is high (100% or greater). When the request prioritydistribution is nonstationary, the fixed-tree algorithm requiresthe fewest messages when the load is 100% or greater. 1996

Academic Press, Inc.

1. INTRODUCTION

Distributed synchronization is an important activity thatis required to coordinate access to shared resources in adistributed system. A set of n processors synchronize theiraccess to a shared resource by requesting an exclusiveprivilege to access the resource. The privilege is often rep-resented as a token. Real-time systems, or systems thathave critical tasks that must execute quickly for good per-formance, need prioritized synchronization. In priority syn-chronization, every request for the token has a priorityattached. When the token holder releases the token, itshould be given to the processor with the highest prior-ity request.

In this paper, we present three distributed priority syn-chronization algorithms, each requiring O(log n) bits of

1 We acknowledge the support of USRA Grant 5555-19 and NSF GrantDMS-9223088.

2 E-mail: [email protected] E-mail: [email protected].

Page 2: A Comparison of Fast and Low Overhead Distributed Priority Locks

Two of our priority synchronization algorithms use thepath compression technique of Li and Hudak to achievelow message passing overhead. To avoid the O(n) storagecost of blocking, the algorithm uses distributed lists to blockprocessor externally instead of internally. The third algo-rithm uses a fixed tree, as in Raymond’s approach. Recentdistributed synchronization algorithms by Naimi andTrehel [22], Neilsen and Mizuno [17], and Chang et al.[4] make use of simple distributed lists to improve theperformance of their algorithms and avoid O(n) storage.However, these protocols do not provide prioritized mu-tual exclusion.

Some work has been done on distributed lists, primarilyin the context of directory-based cache coherence algo-rithms. For example, the scalable coherent interface (SCI)[9] uses a distributed queue to chain together all of theprocessors that are requesting access to a memory block.The algorithm is greatly simplified because the pointer tothe head of the list is stored in a standard place (the homememory block). Translated to a distributed algorithm, theSCI algorithm requires a manager at a fixed site to remem-ber the head of the list. In a path compression algorithm,no such fixed-site manager is available. We note that Liand Hudak found that their path compression algorithmhad far superior performance to algorithms that requiredfixed site managers. A shared-memory synchronization al-gorithm that is quite similar in nature to the SCI algorithmis the contention-free lock of Mellor-Crummey and Scott[16]. The recent contention-free priority locks are basedon the Mellor-Crummey and Scott lock.

The contribution of this paper is to present three newpractical distributed priority synchronization algorithmsthat require only O(log n) storage per processor andO(log n) message per synchronization request, where nis the number of processors. Two of the algorithms makea novel use of distributed lists to transfer the burden ofremembering which processors are blocked to the blockedprocessors themselves. We provide a performance analysisto show which of the algorithms have the best performance.We find that the algorithm performance depends on therequest priority distribution and on the metric used tomeasure the algorithm.

2. THREE DISTRIBUTED PRIORITY LOCKS

The algorithms execute as a distributed protocol. Whena processor decides to ask for the token, it sends its requestto the processor indicated by a forwarding pointer. Eventu-ally, the processor will receive the token and enter thecritical section. A processor will receive unsolicited mes-sages, and will treat them as events to be handled. Eventhandling will in general cause a change to the local statevariables and often cause messages to be sent. The unpro-cessed events are stored in a pending event queue. Sincewe want the algorithms to require only O(log n) bits ofstorage, they must in general process all events immedi-ately.

uted dynamic’’ algorithm of Li and Hudak removes theneed for a fixed reference point that will locate a sharedpage. Instead, every processor associates a pointer witheach globally shared page. This pointer is a guess aboutthe current location of the page. When the system is quies-cent, the pointers form a tree that is rooted at the currentpage owner. When a processor faults on a nonresidentpage, it sends a request to the pointed-to processor. Even-tually, the page is returned and the faulting processor un-blocks. The request for the page follows the chain ofpointers until it reaches a processor that owns the page(or will own the page shortly). Though a request for apage might make n 2 1 hops to find the owner, the pathcompression that occurs while the hops are being madeguarantees that a sequence of K requests for the pagerequires only O(n 1 K log n) messages. However, theblocking that the algorithm requires incurs an O(n) spaceoverhead, to store the identities of the blocked requests.

Trehel and Naimi [22, 21] present two algorithms fordistributed mutual exclusion that are similar to the ‘‘dis-tributed dynamic’’ algorithm of Li and Hudak. The algo-rithm in [21] uses a distributed list to reduce the O(n)storage required of each processor to O(log(n)). Chang etal. [4] present an improvement to the Trehel and Naimi[21] that reduces the average number of messages requiredper critical section entry. Johnson [10] makes a perfor-mance analysis of current nonprioritized synchronizationalgorithms, and finds that the algorithm of Chang et al. hasthe best overall performance.

Raymond [18] has proposed a simple synchronizationalgorithm that can be configured to require O(log n) stor-age per processor and O(log n) messages per critical sectionrequest. The algorithm organizes the participating proces-sors in a fixed tree. The execution of the algorithm is similarto that of the Li and Hudak algorithm. Neilsen and Mizuno[17] present an improved version of Raymond’s algorithmthat requires fewer messages because it passes tokens di-rectly between processors instead of through the tree. Wooand Newman-Wolfe [23] use a fixed tree based on a Huff-man code, which is optimal for a fixed request pattern.Because the tree is fixed, however, it does not adapt tothe pattern of requests in the system. Often, only a smallpopulation of the processors make requests for the token.Processors that do not request the token should not berequired to take part in the synchronization. Li and Hudakshow that their path compression algorithm requiresO(n 1 K log q) messages if only q of the n processors usethe page.

Some work has been done to develop prioritized criticalsection algorithms. Goscinski [7] has proposed a fully dis-tributed priority synchronization algorithm. However, thisalgorithm requires O(n log n) storage per processor andO(n) messages per critical section request. Recent workon multiprocessor priority synchronization algorithms hasfocused on contention-free algorithms, with algorithmsproposed by Markatos and LeBlanc [15], Craig [5], andHarathi and Johnson [8].

75COMPARISON OF DISTRIBUTED PRIORITY LOCKS

Page 3: A Comparison of Fast and Low Overhead Distributed Priority Locks

To simplify the presentation, we assume that all priori-ties are unique. Nonunique priorities can be made uniqueby attaching a timestamp and processor ID. Alternatively,the algorithms can be modified to handle nonunique priori-ties directly.

2.1. Single-Link Protocol

The first algorithm keeps blocked processors in a singlylinked list. (The code for the single-link protocol is listedin Appendix A.) Our starting point is the path compressionalgorithm of Li and Hudak. Translated to distributed syn-chronization, their algorithm works as follows. Every pro-cessor has a pointer, currentdir, that initially points inthe direction of the current token holder. When processorA decides that it needs the token, it sends a request to theprocessor indicated by currentdir. If a processor thatneither holds nor is requesting the token receives the re-quest from A, it forwards the request to the processorindicated by its version of currentdir and then setscurrentdir to A. If a processor that is using or is re-questing the token receives A’s request, it stores the re-quest and takes no further action. In Li and Hudak’s algo-rithm, the token holder (or a waiting processor) may storeO(n) such requests. This execution is illustrated in Fig. 1.When a processor releases the token, it checks to see ifthere are any blocked requests. If so, the processor sendsthe token to one of the requestors, sets its version of cur-rentdir to the new requestor, and unblocks any re-maining blocked requests. If there are no blocked requests,the processor holds the token until a request arrives, andsends the token to the requestor (with a correspondingupdate to currentdir).

In a nonprioritized lock, it is permissible to block arequest at a processor that is requesting the token becausethe blocker should get the token first anyway. In prioritysynchronization, every request has a priority attached. Theblocked request might have a higher priority than the

76 JOHNSON AND NEWMAN-WOLFE

blocker. Thus, a processor must be able to find the set ofall current requestors to register its request.

Thus, all requesting processors must know about eachother. Since each processor is allowed only O(log n) stor-age, the processors can only use pointers to form lists. Inthe first algorithm we make the requesting processors forma waiting ring. All waiting processors point to the nextlower priority processor, except for the lowest priorityprocessor which points to the highest priority processor.In addition to knowing the identity of your successor, a taskin the waiting ring also knows the priority of its successor’srequest. The token holder must be able to release thetoken into the waiting ring, so it points to a processor inthis ring. The processors that are not requesting the tokenmight make a request, so they also lie on paths that pointto the ring. The structure of the synchronization is shownin Fig. 2.

The processors handle two classes of events: those re-lated to requesting the token and those related to releasingthe token. Although these activities have some subtle inter-actions, we initially describe them separately.

2.1.1. Requesting the Token

Each processor p that is not holding or requesting thetoken stores a guess about the identity of the token holderin the local variable currentdir. Note that each proces-sor has its own version of currentdir.

When a processor decides to use the token, it sends aToken-Request to the processor indicated by current-dir (if the processor already has the token, the processorcan use it directly). When a processor that neither holdsnor is requesting the token receives a Token-Request mes-sage, it forwards the request to the processor indicated bycurrentdir. After receiving the message, the processorknows that the requestor will soon have the token, so theprocessor changes currentdir to point to the requestingprocessor. This will continue until the Token-Request mes-

FIG. 2. Structure of the distributed priority synchronization.

FIG. 1. Execution of Li and Hudak’s path compression synchroni-zation.

Page 4: A Comparison of Fast and Low Overhead Distributed Priority Locks

2.1.3. Implementation Details

The essential operation of the algorithm is as describedin the previous section. However, there are a number ofdetails that must be addressed to ensure the correct execu-tion of the algorithm. The complications occur primarilybecause of two concerns: out-of-order messages and theO(log n) storage requirement.

Out-of-Order Messages. Out-of-order messages occurwhen a message arrives that was not expected. Usually,the problem is due to noncausal message delivery. Thatis, processor A sends message m1 to processor B, thensends message m2 to processor C. Processor C receives m2

and sends m3 to B. At B, message m3 arrives before mes-sage m1. For example, a requesting processor can be giventhe token before being told that it is part of the waitingring. This problem usually occurs when a new processoris admitted to the waiting ring. If A has successor B in thewaiting ring, then admits C into the ring as its new succes-sor, the first message from C to B might arrive before thelast message from A to B. In our example, the last messagefrom A to B tells B that it is in the waiting ring, and thefirst message from C to B is the token. Considerable workhas been done on implementing causal communications[1], but this work requires that all messages are broadcast,which in general requires O(n) messages.

The messages that need to be processed in causal orderinvolve the waiting ring maintenance. Out-of-order recep-tion can be detected (i.e., you are given the token beforeentering the waiting ring), and the processing of the too-early message can be blocked until the appropriate prede-cessor message arrives. There can be only O(1) messagesthat can arrive too early, so delaying their processing doesnot impose too large a space requirement. Because theprotocol must handle noncausal messages, it also correctlyhandles messages from a processor that are delivered outof order.

No Blocking. There are many occasions when a proces-sor, A, cannot correctly interpret a message that has arrived(as discussed above). If the unexpected message involvesA’s state, then the message processing can be safelyblocked because there can be only O(1) such messages.However, the message might be a request from a differentprocessor, B. Since many processors might send their re-quest to A, processor A must be able to handle theserequests immediately.

In the algorithm by Li and Hudak, a processor blocksrequests from other processors when it is using or is re-questing the token. In our algorithm, the token holderdoes not block requests, instead it tells the requestors toform a waiting ring. Once a requesting processor joins thewaiting ring, it helps the new requests to also join thewaiting ring. However, between the time that a processor,A, requests the token and the time it joins the waitingring, it cannot handle requests from other processors. Theproblem is that during this time, A’s forwarding pointercurrentdir does not have a significant meaning. Han-

sage reaches the token holder or a processor that is in thewaiting ring.

When a Token-Request message reaches a processor,P, in the waiting ring, the correct position in the ring mustbe found for the requestor. If the requestor’s priority isbetween that of P and P’s successor, or if P is the lowestpriority processor in the ring and the requestor’s priorityis lower than P’s or greater than P’s successor (i.e., thehighest priority waiting processor), then the requestorshould follow P. P changes its ring pointer (we can re-usecurrentdir) to point to the requestor, and sends therequestor a Request-Done message, indicating the request-or’s successor in the ring. When the requestor receives theRequest-Done message, it sets its value of currentdirto its successor in the waiting ring. Otherwise, P sends theToken-Request message to its successor in the ring.

If a Token-Request message arrives at the token holderand the token holder is not using the token, the tokenholder replies to the requestor with the token. If the tokenholder is using the token, then there might or might notbe a waiting ring. If the waiting ring exists, the message isforwarded into the waiting ring. Otherwise, the tokenholder replies with a Request-Done message indicatingthat the requestor is the only processor in the waitingring.

2.1.2. Releasing the Token

If there are no processors waiting to acquire the token,the token holder sets an internal flag to indicate that tokenis available. Otherwise, the token holder releases the tokeninto the waiting ring. When a processor receives the token,it cannot immediately use the token, since the processordoes not know if it is the highest priority processor. Instead,each processor passes the token to its successor until thehighest priority processor is found. The lowest priorityprocessor knows that it is the lowest priority processor(because its successor has a higher priority) and thereforethat its successor is the highest priority processor. Thus, thelowest priority processor marks the token before passing itto its successor. When a processor receives a marked token,it accepts the token and enters the critical section.

After new token holder accepts the token, the waitingring structure must be repaired. The new token holdersends the address of its successor to its predecessor (theprocessor that forwarded the token), which updates itscurrentdir pointer. Exceptions occur if the new tokenholder was alone in the waiting ring, or if the previoustoken holder released the token directly to the new to-ken holder.

At this point, we can wonder if it might be better toreverse the direction of the pointers in the waiting ring. Ifwe require a processor in the waiting ring to point to theprocessor with the next higher priority request, it is thehighest priority processor that knows its status. However,the ring still needs to be repaired, and the token holderstill needs to update the currentdir pointer in its prede-cessor. In general, finding the predecessor would requirea circuit of the waiting ring, incurring a high message cost.

77COMPARISON OF DISTRIBUTED PRIORITY LOCKS

Page 5: A Comparison of Fast and Low Overhead Distributed Priority Locks

dling the requests of others will cause cycles among thenon-ring processors.

Since processor A cannot handle foreign requests andcannot block these requests internally, processor A willblock them externally. During the time that processor Ais requesting the token but has not yet been told that it isin the waiting ring, processor A will respond to tokenrequests by linking them into a blocking list. The blockinglist is managed as a LIFO. When a request arrives, proces-sor A responds with a Block message, with the address ofthe previous head of the blocking list (or a null pointer ifthe list is empty). When processor A joins the waiting ring,it sends an Unblock message to the head of the blockinglist. The Unblock message is relayed down the chain ofblocked processors. After unblocking, the processors re-submit their requests to A.

Miscellaneous. When a processor accepts the token, itsends a message out to repair the ring. The ring must berepaired before the token can be released back into thering. Therefore, when the ring is repaired an acknowledge-ment is sent to the token holder. The token holder blocksits release of the token until the acknowledgement thatthe ring is repaired has been received.

When a processor is in the waiting ring, it does not blocknew processors from entering the ring. In particular, thelowest priority processor will admit new processors intothe ring during the time between sending a marked tokento the highest processor and receiving a request to repairthe ring. Only the processor that points to the token holdercan repair the ring, so the request is passed along in thewaiting ring until it reaches a processor that points to thetoken holder. This processor repairs the ring and sendsan acknowledgment to the token holder. This process isillustrated in Fig. 3.

2.1.4. Nonstationary Priorities

The single-link algorithm as described does not makeuse of all the information available to it. A processor inthe waiting ring can learn about the identities of lowerpriority processors whenever it processes a token request.For an example, suppose that a request from r arrives ata processor p in the waiting ring, and r’s priority is lowerthan p’s priority. Then, p knows that it will receive thetoken before r. By keeping a map of these requests, p senda new request directly to a processor r that had previouslymade a request, avoiding several links in the waiting ring.

If the distribution of priorities is stationary (i.e., doesnot change over time), then maintaining the mapping isnot likely to be worth the additional protocol complexity.However, if priorities tend to decrease over time, then itis likely that new requests will have lower priorities thanthose of processors in the waiting ring. In particular, ifeach processor in the waiting ring remembers the lowestpriority request it has processed thus far (while in thewaiting ring), new requests can be sent directly to thelowest priority processor in the ring.

78 JOHNSON AND NEWMAN-WOLFE

In Section 4, we will use this heuristic when we test theperformance of the single-link algorithm in the presenceof non-stationary priorities. The specifics of the implemen-tation of the heuristic are discussed in Appendix B.

2.1.5. Correctness

In this section, we give some intuitive arguments forthe single-link algorithm’s correctness. We loosely refer toevents as occurring at a point in ‘‘time.’’ While global timedoes not exist in an asynchronous distributed system, wecan view the events in the system as being totally orderedusing Lamport timestamps [12] and view a point in timeas being a consistent cut [3].

We note that all processors that are not requesting thetoken lie on a path that leads to a processor that eitherholds or is requesting the token. This property can be seenby induction. We assume the property holds initially (andthis is required for correctness). The property can changeif a processor modifies its currentdir pointer, or if theprocessor it points to changes its state. A processor thatis not requesting the token will change its currentdirpointer if it relays a request. Then, however, it points tothe requesting processor. A nonrequesting processor canalso change its pointer if it sends the token to anotherprocessor, but currentdir is set to the new token holder.A processor can change its state from not requesting torequesting, but the property still holds. Finally, a processorcan change its state from holding the token to not-re-questing. However, after changing state the processorpoints to the new token holder or to a requesting processor.

The token is not lost because it is only released to proces-sors in the waiting ring (and a processor must enter thewaiting ring before accepting the token). The token isreleased to the highest priority processor in the waitingring at the time that lowest priority processor in the ringhandles the token.

FIG. 3. Repairing the waiting ring after passing the token.

Page 6: A Comparison of Fast and Low Overhead Distributed Priority Locks

section immediately. However, before releasing the token,the token holder must wait until it is certain that there areno inconsistent backward pointers that point to it. Other-wise, a request might be forwarded out of the waiting chainand cause cycles among nonchain requestors.

For example, processor A might release the token whileprocessor C still has a backward link to A. Suppose thata request from B is subsequently routed through A, andthen to C. If B’s request is subsequently routed throughC’s backward link, B’s request will again be routed throughA. Because A’s request had previously visited B, B pointsto A, so A’s request is sent back to A, where it isblocked permanently.

If we make the backward link always consistent, thenall links of processors in the waiting chain point to thetoken holder, or to other processors in the waiting chain(with one exception which we discuss shortly). For thisreason, we choose to keep the backward link consistent.As a result, when a processor in the waiting chain receivesthe token, it must pass the token backward if its backwardlink does not point to the original token sender.

When the token holder releases its token into the waitingchain, the link that points back to it suddenly points to aprocessor that is outside of the waiting chain. If a request issent along this inconsistent link, the system can experienceblocking as described above. We can solve this problemby observing that if the highest priority waiting processorknows that it is the highest priority waiting processor, itwill never send a request along its backward link. Instead,when the highest priority waiting processor receives a re-quest from a higher priority processor, it inserts the higherpriority processor before it in the waiting chain (and tellsthe new processor that it is the highest priority waitingprocessor). Thus, to ensure correctness we need to ensurethat the highest priority waiting processor knows that it is

FIG. 5. Inconsistency after admitting a processor to the waiting ring.

2.2. Double-Link Algorithm

In the single-link algorithm, if a request hits the waitingring at a processor with a lower priority, the request musttraverse most of the ring until reaching its proper position.Thus, we can hope to improve on the single-link algorithmby having blocked processors point to both the next higherand the next lower priority processor, forming a doublylinked list. Further, since a request can move directly to-wards its proper place, there is no need to maintain awaiting ring, so instead the processors form a waiting chain.This is illustrated in Fig. 4.

The execution of the double-link protocol is similar inmany ways to the single-link protocol, and it is best de-scribed by its differences from the single-link protocol. Thefirst difference is that when a processor is admitted to thewaiting chain, it is told its successor and its predecessor.The processor that admits the newcomer knows who itsnew neighbor is, but the other neighbor of the newcomerdoes not. This is illustrated in Fig. 5.

Thus, the second difference between the double-link andthe single-link algorithm is that now only one of the linksis guaranteed to be consistent, and the other can only beused for navigation. Which link is consistent depends onthe policy for admitting a processor to the waiting chain.If the requestor is admitted if its priority is between thatof the current processor and its successor (the next lowerpriority processor), then the forward link is always correct,but the backward link can be inconsistent. If the requestorcan be admitted between current processor and its prede-cessor, then the backward link is always consistent, butnot the forward link.

If we choose to make the forward link consistent, thenthe token holder can pass the token on along its forwardpointer. When a processor receives the token, it must havebeen the highest priority waiter, so it can enter the critical

FIG. 4. Structure of the waiting chain.

79COMPARISON OF DISTRIBUTED PRIORITY LOCKS

Page 7: A Comparison of Fast and Low Overhead Distributed Priority Locks

the highest priority waiting processor when the tokenholder releases the token. When a waiting processor re-ceives the token, it sends a message to inform its predeces-sor that it is the highest priority waiting processor. Thetoken holder cannot release the token until it receives anacknowledgment.

Since we do not assume causal message delivery, a pro-cessor may receive link change messages out of order.Fortunately, it is easy to determine the proper order ofthe link change messages, since a processor should onlychange its forward link to point to a higher priority proces-sor. An additional problem with the change link messagesis that a link change request might arrive after the processorhas already acquired and released the token, and has reen-tered the waiting chain. To solve this problem, each proces-sor appends an entry count to its name, and only respondsto a link change request if the entry count on the requestmatches its current entry count.

2.3. Fixed-Tree Algorithm

The single-link and the double-link algorithm maintaina dynamic tree, with the token holder as the root. We cantake a different approach, and maintain the processors ina fixed tree. Each processor maintains a single pointer,currentdir, which indicates the location of the token.If the token is in a subtree rooted at the processor, cur-rentdir points to the child whose subtree contains thetoken. If the token is not in the subtree rooted at theprocessor, currentdir points to the processor’s parent.This structure is illustrated in Fig. 6. Raymond [18] presentsa simple distributed nonprioritized synchronization algo-rithm with guaranteed O(log n) performance by using thistechnique. Raymond’s algorithm can be fairly easily modi-fied to permit prioritized synchronization. The code of thisalgorithm is presented in Appendix C.

In addition to keeping currentdir, each processorkeeps a priority queue of the requests that it has receivedfrom its neighbors. When the processor receives a requestfrom a neighbor (or makes a request itself), the request isput in the priority queue, replacing any previous requestsfrom that neighbor. If the request is the highest priority

80 JOHNSON AND NEWMAN-WOLFE

request in the list, the request is forwarded to the processorindicated by currentdir. This execution is illustrated inFig. 7.

When the processor receives the token, it removes thehighest priority request from its priority queue and sendsthe token to the appropriate processor (possibly to itself),and modifies currentdir accordingly. If the queue isnonempty after forwarding the token, the processor sendsa request to currentdir with the priority of the highestpriority request in the priority queue.

3. THEORETICAL ANALYSIS

In the path-compression algorithms, there are two com-ponents to the number of messages required for a proces-sor to enter the waiting structure: the number of hops forthe request to reach the ring, and the number of hopsaround the waiting structure. Previous analyses show thatthe amortized number of hops to find the waiting processesis O(log n) [6]. The number of hops that a request makesaround the waiting structure is difficult to determine, butwe can estimate that the number of hops is proportionalto the number of waiting processors. Let C be the time toexecute the critical section, and let R be the time betweenrequests. If n , R/C and requests are uncorrelated, thenthe waiting structure will be small. If n . R/C, then onaverage there will be n 2 R/C blocked processors. How-ever, the utilization of the lock should be less than 100%in the long term, or the system suffers from a serializa-tion bottleneck.

In the fixed-tree algorithm, a request must travel to thetoken, and then the token must travel to the requestor.Both distances will be proportional to the diameter of thetree, which is O(log n). However, some of the requests donot need to be forwarded the entire distance.

If the lock utilization is less than 100%, then an order-of-magnitude analysis tell us little about the relative perfor-

FIG. 7. Processing a request in the fixed-tree algorithm.

FIG. 6. Structure of the fixed-tree algorithm.

Page 8: A Comparison of Fast and Low Overhead Distributed Priority Locks

itized based on the deadlines of the requestors, then therequest priorities will tend to decrease over time. In ourstudy, we simulated both types of request priority distribu-tions. In the stationary priority experiments, the priorityof a request is an integer chosen uniformly randomly be-tween 1 and 10,000. In the nonstationary priority experi-ments, the priority of a request is a value chosen uniformlybetween 1 and twice the interrequest time minus the simu-lation time when the request is generated.

4.1. Stationary Priorities

In our first set of experiments, we plot the number ofmessages sent per critical section entry against the numberof participating processors. Every processor issues requestsat the same rate, and the load (nC/R) is varied between50 and 200%. The results of these experiments are plottedin Figs. 8–11. The single-link and double-link algorithmboth require about the same number of messages per criti-cal section, and far fewer messages than the fixed-treealgorithm when the load is less than 100%. However, thefixed-tree algorithm requires fewer messages when the loadis high (greater or equal to 100%) and there are only afew (less than 20) processors. In addition, the single-link

mance of the algorithms. If the lock utilization is 100%,then the set of waiting processes is likely to be long. Itwould seem that the fixed-tree algorithm has the perfor-mance advantage, because of its guaranteed O(log n) per-formance. However, in the path compression algorithmsa request might on average need to make significantly lessthan n 2 R/C hops to enter the waiting structure. Thus,again the relative performance of the algorithms is notclear.

4. PERFORMANCE ANALYSIS

Since a theoretical analysis of the three distributed prior-ity lock algorithms does not clearly show that one algorithmis better than another, we make a simulation study of thealgorithms. The simulator modeled a set of processors thatcommunicate through message passing. All delays are ex-ponentially distributed. The parameters to the simulatorare the number of processors, the message transit delay(mean value is 1 tick), the message processing delay (1tick), the time between releasing the token and requestingit again (the interaccess time, varied), and the time that atoken is held once acquired (the release delay, 10 ticks).The fixed-tree algorithm uses a nearly complete binary tree(requiring about the same number of bytes per processor asthe single-link algorithm).

We ran the simulator for varying numbers of processorsand varying loads, where we define the load to be theproduct of the number of processors and the release delaydivided by the interaccess time (nC/R). For each run, weexecuted the simulation for 100,000 critical section entries.We collected a variety of statistics, but principally theamount of time to finish the simulation (which capturesthe time overhead of running the protocol) and the numberof messages sent.

The performance of the algorithms depends on the dis-tribution of the request priorities. If the requests are priori-tized based on the importance of a job (i.e., critical taskshave high priorities), or if the requests are due to real-timerequests that use static priority assignments, the requestpriority distribution will be stationary. If requests are prior-

81COMPARISON OF DISTRIBUTED PRIORITY LOCKS

FIG. 8. Number of messages per critical section entry, 50% load,uniform requests and stationary priorities.

FIG. 9. Number of messages per critical section entry, 75% load,uniform requests and stationary priorities.

FIG. 10. Number of messages per critical section entry, 100% load,uniform requests and stationary priorities.

Page 9: A Comparison of Fast and Low Overhead Distributed Priority Locks

algorithm requires significantly fewer messages than thedouble-link algorithm when the load is high.

The single-link algorithm requires fewer messages thanthe fixed tree algorithm for two reasons. First, the fixed-tree algorithm must send the token over a long path whenthe token is released. The single-link algorithm usuallyrequires only one or two token hops. Second, path com-pression generally does a good job of compressing the tree,while in the fixed tree algorithm most processors are onthe periphery of the tree. When the load is high (100% ormore), the fixed-tree algorithm requires the least numberof messages when the system is small. This occurs becausethe path-compression algorithms require a couple of over-head messages to repair the structure when the token isreleased, while the fixed-tree algorithm does not. Also, inthe fixed-tree algorithm some requests get subsumed byhigher priority requests, while in the path compressionalgorithms the requests need to traverse large waiting lists.

The single-link algorithm has surprisingly good perfor-mance when the load is high. This occurs because thelowest priority requesting processor requires a very longtime to obtain the lock. When a processor releases thetoken, it sets its currentdir pointer to the lowest priority

82 JOHNSON AND NEWMAN-WOLFE

waiting processor. When it requests the token again, it ispointing to a processor that is in the waiting ring and whichpoints to the highest priority waiting processor. In thedouble-link algorithm, a processor points to the highestpriority waiter after it releases the token, and so is likelyto be far away from the waiting chain when it requests thetoken next.

One possible advantage of using a path-compressionalgorithm over the fixed-tree approach is that the path-compression algorithms react to the changes in the accesspatterns of the requestors. Typically, only a small subsetof the processors makes requests for the critical section,and this set changes dynamically over time. We revisedthe simulations so that about one tenth of the processorsmake requests at any given time. We reran the experimentswith a 50 and a 100% load, and plot the results in Figures12 and 13. The results confirm the hypothesis that path-compression algorithms react better to hot spots in therequest pattern than do fixed-tree algorithms, in additionto having better performance when the requests are uni-formly generated among the processors. However, the ef-fect is not large.

We also investigated the time overhead of running thedistributed priority lock algorithms. The rate at which acritical section can be accessed by different processors de-pends on the time to execute the critical section and on thetime to pass the CS token from one processor to another. InFig. 14, we plot the amount of time that the token is busy(either in use by a processor or in transit due to a request)when the load is 50%. In Fig. 15, we plot average time percritical section entry when the load is 100%. Under bothlow and high loads, the fixed-tree algorithm imposes asignificantly larger time overhead than the path-compres-sion algorithms do. The fixed-tree algorithm will saturatein its ability to serve the lock well before the path-compres-sion algorithms do, and while in saturation will serve criti-cal section requests at a lower rate. The path compressionalgorithms send the token almost directly to the next pro-cessor to enter, while the fixed-tree algorithm requires thetoken to follow the return path over which it traveled.

FIG. 11. Number of messages per critical section entry, 200% load,uniform requests and stationary priorities.

FIG. 12. Number of messages per critical section entry, 50% load,hot spots in the requests and stationary priorities.

FIG. 13. Number of messages per critical section entry, 100% load,hot spots in the requests and stationary priorities.

Page 10: A Comparison of Fast and Low Overhead Distributed Priority Locks

cantly fewer messages than the fixed tree algorithm, withthe single-link algorithm having a slight edge over thedouble-link algorithm. When the load is 100%, the fixedtree algorithm requires slightly fewer messages than thesingle-link algorithm. As the load increases beyond 100%,the path-compression algorithms require correspondinglymore messages, while the fixed-tree algorithm actually re-quires fewer messages (as some requests are blockedearly). Thus, for average loads 100% or greater, the fixed-tree algorithm requires fewer messages.

The path compression algorithms require many mes-sages if the load is 100% or greater. This occurs becauseof the long waiting chains that develop. While the heuristicdescribed in Section 2.1.4 ensures that low priority requestsare quickly routed to the end of the chain, some requestshave a higher priority and must traverse many nodes inthe chain. The guaranteed O(log(n)) performance of thefixed-tree algorithm ensures that it never suffers from ahigh message passing overhead. However, the fixed-treealgorithm requires fewer messages than the path compres-sion algorithms only if the long-term average load is 100%or greater. Since the simulators present a stochastic work-load to the algorithms, the simulator often presents shortperiods of high demand to the algorithms when the load

The fixed-tree algorithm requires that some processorsdo more work to execute the critical section protocol thanother processors. In particular, the root of the tree and itschildren can be expected to process many messages, asthey are on many internode paths. We found that the rootand its children process about one message per criticalsection entry, and nodes lower in the tree process propor-tionally fewer messages. In a very large system, this over-head might saturate the root node.

4.2. Nonstationary Priorities

We ran a set of experiments to test the performance ofthe distributed priority locks when the request prioritydistribution is nonstationary (i.e, similar to that encoun-tered with earliest deadline scheduling). For these experi-ments, both path compression algorithms use the heuristicdescribed in Section 2.1.4. In our first set of experiments,requests are uniformly generated among all of the proces-sors, and we varied the load and the number of processors.We plot the number of messages required per critical sec-tion entry against the number of processors for a load of50, 75, 100, and 200% in Figs. 16–19, respectively.

When the load on the critical section is low (less than100%), the path-compression algorithms require signifi-

83COMPARISON OF DISTRIBUTED PRIORITY LOCKS

FIG. 14. Lock utilization under a 50% load.

FIG. 15. Time per critical section entry under a 100% load.

FIG. 16. Number of messages per critical section entry, 50% load,uniform requests and nonstationary priorities.

FIG. 17. Number of messages per critical section entry, 75% load,uniform requests and nonstationary priorities.

Page 11: A Comparison of Fast and Low Overhead Distributed Priority Locks

is less than 100%. In spite of the increased number ofmessages required by the path compression algorithms dur-ing these periods, the path-compression algorithms stillrequire fewer messages per critical section entry over thelong term. The fixed-tree algorithm can be counted uponnot to impose an excessive message passing overhead whenthe lock is a severe serialization bottleneck, but in this casethe system has many problems beyond message-passingoverhead.

We also collected statistics about the time overhead im-posed by the distributed priority lock algorithms undernonstationary priorities. The execution time overheadcharacteristics of the algorithms are essentially the sameas those depicted in Figs. 14 and 15 (to save space we donot repeat these figures).

5. CONCLUSION

We have presented three algorithms for prioritized dis-tributed synchronization. Two of the algorithms use thepath compression technique of Li and Hudak for fast accessand low message passing overhead. The third algorithmuses the fixed-tree approach of Raymond. Each of these

84 JOHNSON AND NEWMAN-WOLFE

algorithms have a low message passing and space overhead.The O(log n) message passing overhead per request andthe O(log n) bits of storage overhead per processor makethe algorithms scalable.

To evaluate the algorithms, we made a simulation study.We examined the performance of the algorithms, in termsof message passing and time overhead, under two typesof request priority distributions. The request prioritiescould be stationary (i.e., the priority is the importance ofthe task) or nonstationary (i.e., the priority id the deadlineof the task). We found that when the request priority distri-bution is stationary, the single-link algorithm is best over-all, but that the fixed-tree algorithm requires fewer mes-sages when the load is high and the system is small. Whenthe request priority distribution is nonstationary, the fixed-tree algorithm requires significantly fewer messages thanthe path-compression algorithms when the long-term loadon the critical section is 100% or greater. If the load is lessthan 100%, the single-link algorithm requires the fewestnumber of messages.

The fixed-tree algorithm can place an O(log(n)) upperbound (i.e., the diameter of the tree) on the number ofmessage delays before a request for the token is registeredand an O(log(n)) upper bound on the number of messagessent per critical section entry (i.e., twice the diameter ofthe tree). However, the fixed-tree algorithm requires sig-nificantly more messages than the single-link algorithm forthe scenarios of interest, increases the effective time to usethe critical section, and places a disproportionate pro-cessing burden on the root of the tree. The single-linkalgorithm does not suffer from these problems, but has anO(n) upper bound on the number of messages requiredto register a critical section request. The implementor willneed to weigh the advantages and disadvantages of the al-gorithms.

The prioritized mutual exclusion algorithms presentedin this paper have performance comparable to the bestnonprioritized mutual exclusion algorithms. Johnson [10]made a performance study of four O(log(n)) mutual exclu-sion algorithms, and found that the algorithm of Chang etal. [4] had the best overall performance. This algorithmrequires 5.6 messages per critical section entry with 160processors and a uniform load, while the single-link algo-rithm requires between 8 and 9 messages. The additionalmessages required by the single-link algorithm are largelydue to the additional overhead messages.

APPENDIX A: THE SINGLE-LINK PROTOCOL

In this section, we provide the code for the single-linkalgorithm. The protocol uses the following variables:

Boolean tokenhldr True iff the processorholds the token

Boolean incs True iff the processor isusing the token

FIG. 18. Number of messages per critical section entry, 100% load,uniform requests and nonstationary priorities.

FIG. 19. Number of messages per critical section entry, 200% load,uniform requests and nonstationary priorities.

Page 12: A Comparison of Fast and Low Overhead Distributed Priority Locks

use of the currentdir variable. If the processor holdsthe token and currentdir points to the processor, thenthere is no waiting ring and hence no blocked processors.If currentdir points to a remote processor, that proces-sor is part of the waiting ring.

REQUEST TOKEN(request priority)priority=request priorityif(tokenhldr)//If you hold the token, use it.

incs=TRUEchange acked=TRUE//The ring doesn’tneed repair.currentdir=id//Indicate that the waitingring is empty.in ring=FALSE

elsesend(currentdir,RECEIVE RQST;id,priority)isrequesting=TRUE

A processor enters the waiting ring when it receivesa REQUEST DONE message. Later, the processor willreceive the token. Since the processor can handle tokenrequests now, it will wake up any blocked requests.

REQUEST DONE(successor,successor priority)in ring=TRUEcurrentdir=successorlink pri=successor priorityisrequesting=FALSE

if(areblocking)//Wake up blocked requests, ifany.

send(blocked dir,UNBLOCK; id)areblocking=FALSE

Most of the work of the protocol is in the RECEIV-E RQST event, which handles remote requests to obtainthe token. The handling of the event depends on the stateof the processor (holding the token, using the token, inthe waiting ring, requesting but not in the waiting ring, notrequesting). Note that the protocol as written requires aprocessor to enter the waiting ring before receiving thetoken, so two messages must be sent if the critical sectionis idle. This can be optimized by using a single specialmessage.

RECEIVE RQST(requestor,requestor pri)if(tokenhldr and not incs)//Send the token.

currentdir=requestorsend(requestor,REQUEST DONE;requestor,requestpri)send(requestor,TOKEN;id,FALSE,TRUE)tokenhldr[id]=FALSE

Boolean in ring True iff the processorknows it is in the waitingring

Boolean isrequesting True when the processorhas made a token requestbut has not joined thewaiting ring

Boolean changed acked True if the waiting ringhas been repaired

Boolean areblocking True if the processor isblocking token requests

Boolean areblocked True if you are in a list ofblocked requests

Integer currentdir Direction of the successorprocessor

Integer blocked dir Head of a list of blockedrequestors

Integer unblockdir Next blocked requestorInteger id Name of the processor

(never changes)Real priority The priority of the proces-

sor’s requestReal linkpri Priority of the successor’s

request

The protocol is specified by specifying how events arehandled. We assume that each event is handled atomicallyat the processor where it occurs, except where a specificcondition for waiting is specified. A processor makes anevent occur at a remote processor by sending a message.The parameters of the send procedure are

send (processor id, event; parameters for the event).

A message is sent to cause the specified event to occuron the remote processor, and the event is passed the param-eters.

The protocol driver takes events off of a queue and callsthe appropriate routine to handle them. The events maybe due the receipt of messages, or may be caused internally.

DPQ handler ( )while(1)

get an event from the event queuecall the appropriate routine toprocess the event

Initially, the currentdir pointers form a tree with thetoken holder at the root. The token holder has the variabletokenhldr set TRUE, and the variable is set to FALSEat all other processes. The variables incs, in ring, andisrequesting are FALSE. The constant id containsthe processor name. All other variables are set before theyare used.

When a processor needs to use the token, it generatesa REQUEST TOKEN event. We note here a particular

85COMPARISON OF DISTRIBUTED PRIORITY LOCKS

Page 13: A Comparison of Fast and Low Overhead Distributed Priority Locks

else if(tokenhldr and incs)if(currentdir==id)//No waiting ring, socreate one.

linkpri=request pricurrentdir=requestorsend(requestor,REQUEST DONE;requestor,requestpri)

else//Waiting ring exists, so forward therequest there.

send(currentdir,RECEIVE RQST;requestor,request pri)

else if(in ring)if(priority<link pri and priority>=requestpri)//new lowest priorityprocessor.

send(requestor,REQUEST DONE;currentdir,link pri)currentdir=requestorlink pri=request pri

else if((request pri<=priority andrequest pri>link pri) or

(priority<link pri andrequest pri>link pri))//found position in the ring.

send(requestor,REQUEST DONE;currentdir,link pri)currentdir=requestorlink pri=request pri

else//Not the right position, so forward therequest.

send(currentdir,RECEIVE RQST;requestor,request pri)

else//Not token holder, not in waiting ring.if(!isrequesting)//not requesting, soforward the request.send(currentdir,RECEIVE RQST;requestor,request pri)currentdir=requestorelse//Can’t forward the request, so block it.if(areblocking)

send(requestor,BLOCK;blocked dir,FALSE,id)

else//First blocked processor.send(requestor,BLOCK; NULL,TRUE,id)areblocking=TRUE

blockeddir=requestor

When the token holder releases the token, it sends thetoken to the waiting ring, if it exists. If no waiting ring exists,then currentdir will be pointing to the token holder.

RELEASE TOKEN( )wait until changed acked is TRUE

change acked=FALSEincs=FALSE

86 JOHNSON AND NEWMAN-WOLFE

if(currentdir!=id)//There is a blockedprocessor.

send(currentdir,TOKEN;id,FALSE,FALSE)tokenhldr=FALSE

This event unblocks the release of the token.

CHANGE ACK( )change acked=TRUE

The lowest priority processor tags the token. When aprocessor receives the token, it accepts the token only ifthe token is tagged. As an optimization, the processor canaccept the token if it is alone in the waiting ring. Afteraccepting the token, the token holder points to the proces-sor that sent the token, which is likely to be the lowestpriority processor in the waiting ring. Finally, the waitingring is repaired.

TOKEN(sender,tag,direct)wait until in ring is TRUE

if(tag or currentdir==id)tokenhldr=TRUEincs=TRUEif(currentdir==id)//If you are alone,don’t need to repair the ring.

change acked=TRUEelseif(!direct)//If the token holder didn’t

send the token, thesender is probably thelowest priorityprocessor.

currentdir=sendersend(currentdir,CHANGE LINK;id,currentdir,link pri)in ring=FALSE

elseif(priority<link pri)//lowest, so tagthe token.

send(currentdir,TOKEN,id,TRUE,FALSE)

elsesend(currentdir,TOKEN,id,FALSE,FALSE)

This event repairs the ring after a new processor acceptsthe token.

CHANGE LINK(tokhldr,successor,succ pri)wait until in ring is TRUE

if(currentdir==tokhldr)currentdir=successorlink pri=succ prisend(tokhldr,CHANGE ACK)

Page 14: A Comparison of Fast and Low Overhead Distributed Priority Locks

Most of the code to maintain the lowreq pointer occursin the RECEIVE RQST routine. We mark the new code byan asterisk (p).

RECEIVE RQST(requestor,requestor pri)if(tokenhldr and not incs)//Send the token.

...

else if(tokenhldr and incs)if(currentdir==id)//No waiting ring, socreate one.

linkpri=request pricurrentdir=requestorsend(requestor,REQUEST DONE;requestor, requestpri)

else//Waiting ring exists, so forward therequest there.

send(currentdir,RECEIVE RQST;requestor,request pri)

p if requestor pri < linkpri//usethe better guess about the lowest priority re-quest

p current dir = requestorp linkpri = requestor pri

else if(in ring)if(priority<link pri andpriority>=requestpri)//new lowestpriority processor.

send(requestor,REQUEST DONE;currentdir,link pri)currentdir = requestorlink pri = request pri

p lowreq = requestorp lowpri = request prielse if(request pri<=priority

and request pri>link pri) or(priority<link pri andreques pri>link pri))//foundposition in the ring.send(requestor,REQUEST DONE;currentdir,link pri)currentdir=requestorlink pri=request pri

else//Not the right position, so forward the request.p if lowreq != NIL and (re-

quest pri < lowpri orrequest pri > priority)

p send(lowreq,RECEIVE RQST;requestor,request pri)

p elsep send(currentdir,

RECEIVE RQST;requestor,request pri)

p if lowreq != NIL and re-quest pri < lowpri

elsesend(currentdir,CHANGE LINK;tokhldr,successor,succ pri)

A requesting processor blocks if its request is forwardedto another requesting processor that has not yet joined thewaiting ring. The blocked processors form a LIFO list,which removes the need for O(n) storage per processor.

BLOCK(next blocked,islast,blocker)unblock dir=next blockedlastblocked=islastcurrentdir=blockerareblocked=TRUE

When a requesting processor joins the waiting ring, itunblocks the head of the blocking list, which unblocks itssuccessor, and so on.

UNBLOCK(unblocker)wait until areblocked is TRUE

if(not lastblocked)//Relay the unblocking.send(unblockdir,UNBLOCK; unblocker)currentdir[id]=unblockersend(currentdir,RECEIVE RQST;id,priority)//Resubmit your request.isrequesting=TRUEareblocked=FALSE

APPENDIX B: HEURISTIC FOR NONSTATIONARYPRIORITIES

In this section, we discuss the modifications to the single-link protocol that allow the waiting processors to keep aguess of the lowest priority processor. The heuristic usesthe following two variables in addition to the ones usedby the single-link protocol.

Integer lowreq Current guess about the lowestpriority requesting processor

Real lowpri Priority of processor lowreq

When a processor is admitted to the waiting ring, itguesses that its successor is the lowest priority waiting pro-cessor.

REQUEST DONE(successor,successor priority)...if successor priority < priority

lowreq=successorlowpri=successor priority

else lowreq=NIL...

87COMPARISON OF DISTRIBUTED PRIORITY LOCKS

Page 15: A Comparison of Fast and Low Overhead Distributed Priority Locks

p lowreq = requestorp lowpri = request pri

else//Not token holder, not in waiting ring.

Finally, when a processor accepts the token, it recordsthe priority of the lowest priority processor in the ring.

TOKEN(sender,sender pri,tag,direct)...

if(!direct)//If the token holder didn’tsend the token, thesender is probablythe lowest priorityprocessor.

currentdir=senderp linkpri = sender pri

send(currentdir,CHANGE LINK;id,currentdir,link pri)

in ring=FALSEelse

...

APPENDIX C: THE FIXED-TREE PROTOCOL

The protocol uses the following variables:

Boolean tokenhldr True iff the processor holdsthe token

Boolean incs True iff the processor isusing the token

Integer currentdir Direction of the successorprocessor

Integer id Name of the processor(never changes)

In addition, the protocol uses the following routines tomanage the priority queue of pending requests. There mustbe room for as many entries in the priority queue as thereare neighbors of the processor.

Boolean pushreq Put the request from the( processor, processor with the attachedpriority) priority into the priority

queue. If there is anotherrequest from the sameprocessor already in thequeue, replace it.Return TRUE iff. therequest is the highest priorityone.

Integer popreq( ) Remove and return thedirection of the highestpriority request.

88 JOHNSON AND NEWMAN-WOLFE

Boolean qmt( ) Return TRUE iff. thepriority queue is empty.

Integer toppri( ) Return the priority of thehighest priority request in thepriority queue.

In the Request-Token routine, if processor is the tokenholder it just uses the token. Otherwise the processor putsits request on the priority queue. If the request is thehighest priority one, it passes the request in the directionof the token.

REQUEST TOKEN(request priority)priority=request priorityif(tokenhldr)//If you hold the token, use it.

incs=TRUEelse

if(pushreq(id,priority))//send therequest on if it’s the highest priority.

send(currentdir,RECEIVE RQST;id,priority)

If the processor is holding but not using the token whenit receives a request, return the token to the requestor. Ifthe processor is using the token, it puts the request in itspriority queue. Otherwise, the processor puts the requestin its priority queue, and passes the request along if therequest has a higher priority than previous requests. Theprocessor will ignore requests from the direction of thetoken holder. Such requests can occur if the processorrecently sent the token in the direction of currentdir.

RECEIVE RQST(requestdir,requestpri)if(tokenhldr and not incs)//Pass thetoken.

currentdir=requestdirsend(requestdir,TOKEN; NULL,0)tokenhldr=FALSE

else if(tokenhldr and incs)//Justremember the request.

(pushreq(requestdir,requestpri)else//Remember the request, pass along ifhighest.

if(requestor != currentdir)if(pushreq(requestdir,requestpri))

send(currentdir,RECEIVERQST; id,requestpri)

When the processor releases the token, it checks thepriority queue to see if there are any pending requests. Ifso, the processor sends the token in the direction of thehighest priority request. If there is another pending re-quest, the processor must ask for the token back on behalfof the requestor. The protocol is written to make the addi-tional request it passes the token (a small optimization).

Page 16: A Comparison of Fast and Low Overhead Distributed Priority Locks

mutual exclusion algorithm for distributed systems. Int’l Conf. onParallel Processing, 1990, pp. III295–302.

5. T. S. Craig, Queuing spin lock alternatives to support timing predict-ability. Technical Report, Univ. of Washington, 1993.

6. D. Ginat, D. D. Sleator, and R. Tarjan, A tight amortized bound forpath reversal. Information Processing Letters 31, 3–5 (1989).

7. A. Goscinski, Two algorithms for mutual exclusion in real-time dis-tributed computer systems. J. Parallel Distrib. Comput. 9, 77–82(1990).

8. K. Harathi and T. Johnson, A priority synchronization algorithm formultiprocessors. Technical Report tr93.005, UF, 1993. Available atftp.cis.ufl.edu:cis/tech-reports.

9. D. V. James, A. T. Laundrie, S. Gjessing, and G. S. Sohi, Scalablecoherent interface. Computer 23(6), 74–77 (1990).

10. T. Johnson, A performance comparison of fast distributed mutualexclusion algorithms. Proc. Int’l Parallel Processing Symp., 1995.pp. 258–264. Available at ftp.cis.ufl.edu:cis/tech-reports/tr94/tr94-022.ps.Z.

11. A. Kumar, Hierarchical quorum consensus: A new algorithm formanaging replicated data. IEEE Trans. Comput. 40(9), 994–1004(1991).

12. L. Lamport, Time, clocks, and the ordering of events in a distributedsystem. Comm. ACM 21(7), 558–564 (1978).

13. K. Li and P. Hudak, Memory coherence in shared virtual memorysystems. ACM Trans. Comput. Systems 7(4), 321–359 (1989).

14. M. Maekawa, A sqrt(n) algorithm for mutual exclusion in decentral-ized systems. ACM Trans. Comput. Systems 3(2), 145–159 (1985).

15. E. P. Markatos and T. J. LeBlanc, Multiprocessor synchronizationprimitives with priorities. Technical Report, Univ. of Rochester, 1991.

16. J. M. Mellor-Crummey and M. L. Scott, Algorithms for scalablesynchronization on shared-memory multiprocessors. ACM Trans.Comput. Systems 9(1), 21–65 (1991).

17. M. L. Neilsen and M. Mizuno, A dag-based neilsen for distributedmutual exclusion. International Conference on Distributed ComputerSystems, 1991, pp. 354–360.

18. K. Raymond, A tree-based algorithm for distributed mutual exclu-sion. ACM Trans. Comput. Systems 7(1), 61–77 (1989).

19. G. Ricart and A. K. Agrawala, An optimal algorithm for mutualexclusion in computer networks. Comm. ACM 24(1), 9–17 (1981).

20. R. H. Thomas, A majority consensus approach to concurrency controlfor multiple copy databases. ACM Trans. Database Systems 4(2),180–209 (1979).

21. M. Trehel and M. Naimi, A distributed algorithm for mutual exclusionbased on data structures and fault tolerance. IEEE Phoenix Confer-ence on Computers and Communications, 1987, pp. 36–39.

22. M. Trehel and M. Naimi, An improvement of the log(n) distributedalgorithm for mutual exclusion. Proc. IEEE Intl. Conf. on DistributedComputer Systems, 1987, pp. 371–375.

23. T. K. Woo and R. Newman-Wolfe, Huffman trees as a basis for adynamic mutual exclusion algorithm for distributed systems. Proceed-ings of the 12th IEEE International Conference on Distributed Com-puting Systems, 1992, pp. 126–133.

RELEASE TOKEN ( )incs=FALSEif(not qmt ( ))

If there is a waiting processor.currentdir=popreq ( )tokenhldr=FALSEif(qmt( ))

Is there a request to piggyback?send(currentdir,TOKEN,NULL,0)

elsesend(currentdir,TOKEN,id,toppri( ))

When the processor receives the token, it consults thepriority queue to determine the direction that the tokenshould be sent. The processor also puts the piggybackedrequest (if any) in the priority queue. If the processor isthe one who should receive the token, it enters the priorityqueue. Otherwise, it passes along the token possibly piggy-backing another request.

TOKEN(requestdir,requestpri)currentdir=popreq( )if(requestdir !=NULL)

test for the piggybacked request.pushreq(requestdir,requestpri)

if(currentdir!=id)if(qmt( ))

send(currentdir,TOKEN,NULL,0)else

send(currentdir,TOKEN,id.toppri( ))else//accept the token if your request is the highestpriority.

tokenhldr=TRUEincs=TRUEisrequesting=FALSE

REFERENCES

1. K. Birman, A. Schiper, and P. Stephenson, Lightweight causal andatomic group multicast. ACM Trans. Comput. Systems 9(3), 272–314 (1991).

2. O. S. F. Carvalho and G. Roucairol, On mutual exclusion in computernetworks. Comm. ACM 26(2), 146–147 (1983).

3. K. M. Chandy and L. Lamport, Distributed snapshots: Determiningglobal states of distributed systems. ACM Trans. Comput. Systems3(1), 63–75 (1985).

4. Y. I. Chang, M. Singhal, and M. T. Liu, An improved O(log(n))

89COMPARISON OF DISTRIBUTED PRIORITY LOCKS

Received July 18, 1994; revised March 22, 1995; accepted June 28, 1995