concurrency control for step-decomposed transactions

26
Pergamon Information Systems Vol. 24, No. 8, pp. 673-698, 1999 0 2000 Published by Elsevier Science Ltd. All rights reserved Printed in Great Britain 0306-4379100 $20.00 PII: SO306-4379(00)00004-1 CONCURRENCY CONTROL FOR STEP-DECOMPOSED TRANSACTIONS+ ARTHUR J. BERNSTEIN, DAVID S. GERSTL and PHILIP M. LEWIS Department of Computer Science, State University of New York at Stony Brook, Stony Brook, NY 11794-4400, USA (Received 1 May 1999; in final revised form 17 November 1999) Abstract - The throughput of a transaction processing system can be improved by decomposing transactions into steps and allowing the steps of concurrent transactions to be interleaved. In some cases all interleavings are assumed to be acceptable; in others certain interleavings are forbidden. In this paper we describe a new concurrency control that guarantees that only acceptable interleavings occur. We describe the implementation of the new control within the CA-Open Ingre@ database management system and experiments that were run to evaluate its effectiveness using the TPC-Ctm Renchmark Transactions. The experiments demonstrate up to 80% improvement when lock contention is high, when long running transactions are a part of the transaction suite, and/or when sufficient system resources are present to support the additional concurrency that the new control allows. Finally, we describe a new correctness criterion that is weaker than serializability and yet guarantees that the specifications of all transactions are met. The criterion can be used to determine the acceptable interleavings for a particular application. The specification of these interleavings can serve as input to the new control. 0 2000 Published by Elsevier Science Ltd. All rights reserved Key words: Concurrency Control, Semantics, Transaction Processing, Database Management Systems 1. INTRODUCTION Transaction processing systems frequently have strict requirements on throughput and latency. Serializability is the most stringent level of isolation and has been widely accepted as the correctness criterion of choice for transaction processing systems. It generally implies the use of the strict two- phase locking protocol, in which locks are held until transactions commit. Unfortunately, this implies that locks might be held for long periods, causing performance to suffer, particularly in applications having long running transactions and/or data hotspots. One common technique for dealing with this problem is to use an isolation level that is less stringent than serializability. With the READ COMMITTED isolation level, for example, perfor- mance gains are achieved by releasing (read) locks early. Unfortunately, only the SERIALIZABLE isolation level guarantees correct execution for all applications. Hence, the semantics of an applica- tion must be analyzed to determine whether it can be run correctly at a lower level. Furthermore, the use of READ COMMITTED as a technique for causing early lock release is inflexible since the application programmer has no control over the point at which locks are relea,sed. Some authors have proposed decomposing programmed transactions into atomic and isolated pieces called steps. St,eps release locks when they complete, and thus the steps of concurrently executing transactions can be interleaved. Since locks are held for shorter periods, performance is often improved. Furthermore, by choosing the step boundaries, the programmer can control the points at which locks are released. However, correctness remains a problem: interleaved schedules of steps will not be serializable and might not be correct. Hence, it is necessary for the application programmer to provide a specification of acceptable step interleavings and for the system to provide a concurrency control t,hat produces schedules based on that specification. In this paper we describe a project to design, implement, and test such a concurrency control. We refer t,o the control as an assertional concwrenc~ control (ACC). We have implemented an ACC within the CA-Open Ingre@ database management system and have tested it using TPC-Ctm Benchmark Transactions. The ACC improves the performance of benchmark transactions by up to 80% when lock contention is high, when long running transact,ions are a part of the transaction tR.ecommended by Dennis Shasha 673

Upload: arthur-j-bernstein

Post on 03-Jul-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Concurrency control for step-decomposed transactions

Pergamon Information Systems Vol. 24, No. 8, pp. 673-698, 1999

0 2000 Published by Elsevier Science Ltd. All rights reserved Printed in Great Britain

0306-4379100 $20.00

PII: SO306-4379(00)00004-1

CONCURRENCY CONTROL FOR STEP-DECOMPOSED TRANSACTIONS+

ARTHUR J. BERNSTEIN, DAVID S. GERSTL and PHILIP M. LEWIS

Department of Computer Science, State University of New York at Stony Brook, Stony Brook, NY 11794-4400, USA

(Received 1 May 1999; in final revised form 17 November 1999)

Abstract - The throughput of a transaction processing system can be improved by decomposing transactions into steps and allowing the steps of concurrent transactions to be interleaved. In some cases all interleavings are assumed to be acceptable; in others certain interleavings are forbidden. In this paper we describe a new concurrency control that guarantees that only acceptable interleavings occur. We describe the implementation of the new control within the CA-Open Ingre@ database management system and experiments that were run to evaluate its effectiveness using the TPC-Ctm Renchmark Transactions. The experiments demonstrate up to 80% improvement when lock contention is high, when long running transactions are a part of the transaction suite, and/or when sufficient system resources are present to support the additional concurrency that the new control allows. Finally, we describe a new correctness criterion that is weaker than serializability and yet guarantees that the specifications of all transactions are met. The criterion can be used to determine the acceptable interleavings for a particular application. The specification of these interleavings can serve as input to the new control. 0 2000 Published by Elsevier Science Ltd. All rights reserved

Key words: Concurrency Control, Semantics, Transaction Processing, Database Management Systems

1. INTRODUCTION

Transaction processing systems frequently have strict requirements on throughput and latency.

Serializability is the most stringent level of isolation and has been widely accepted as the correctness criterion of choice for transaction processing systems. It generally implies the use of the strict two- phase locking protocol, in which locks are held until transactions commit. Unfortunately, this

implies that locks might be held for long periods, causing performance to suffer, particularly in applications having long running transactions and/or data hotspots.

One common technique for dealing with this problem is to use an isolation level that is less stringent than serializability. With the READ COMMITTED isolation level, for example, perfor-

mance gains are achieved by releasing (read) locks early. Unfortunately, only the SERIALIZABLE

isolation level guarantees correct execution for all applications. Hence, the semantics of an applica- tion must be analyzed to determine whether it can be run correctly at a lower level. Furthermore,

the use of READ COMMITTED as a technique for causing early lock release is inflexible since the application programmer has no control over the point at which locks are relea,sed.

Some authors have proposed decomposing programmed transactions into atomic and isolated pieces called steps. St,eps release locks when they complete, and thus the steps of concurrently executing transactions can be interleaved. Since locks are held for shorter periods, performance is often improved. Furthermore, by choosing the step boundaries, the programmer can control the points at which locks are released. However, correctness remains a problem: interleaved schedules of steps will not be serializable and might not be correct. Hence, it is necessary for the application programmer to provide a specification of acceptable step interleavings and for the system to provide a concurrency control t,hat produces schedules based on that specification.

In this paper we describe a project to design, implement, and test such a concurrency control. We refer t,o the control as an assertional concwrenc~ control (ACC). We have implemented an ACC within the CA-Open Ingre@ database management system and have tested it using TPC-Ctm Benchmark Transactions. The ACC improves the performance of benchmark transactions by up to 80% when lock contention is high, when long running transact,ions are a part of the transaction

tR.ecommended by Dennis Shasha

673

Page 2: Concurrency control for step-decomposed transactions

674 ARTHUR J. BERNSTEIN et al.

suite, and/or when sufficient system resources are present to support the additional concurrency that the new control makes possible. These experimental results are described in this paper.

The ACC has been implemented by introducing a new lock mode, called assertional lock mode, into a conventional locking system. Assertional locks are weaker than conventional read/write

locks. While all conventional locks are released by a step when it completes, assertional locks are held between steps to control interleaving. The implementation of assertional locks is described in

this paper. The specification of acceptable step interleavings used by the ACC can be constructed by the

application programmer in an ad hoc fashion. Such ad hoc decompositions might be appropriate

when performance is degraded by a few serious points of lock contention. Performance might then

be significantly improved by decomposing a few transactions into a few steps and determining allowable interleavings using informal reasoning. In more complicated situations, a new technique

is needed to guarantee that the interleaving specification produces only correct schedules. We present such a technique, based on the semantics of the application’s transaction. We pro-

pose a new correctness criterion, called semantic cowectness, which requires that an interleaved schedule of the steps of a set of transactions have the same semantic result as a serial schedule of

the same transactions. We assume that the desired effect of each transaction can be described by a postcondition derived from its specification and use that postcondition to determine the precondi- tion of each step of the transaction. Alternatively, since step boundaries generally correspond to the

completion of major subtasks, it might be possible to determine the preconditions by inspection.

The ACC produces semantically correct schedules by ensuring that each step’s precondition is

true when that step is executed and by preventing transaction postconditions from being invalidated

by concurrently executing transactions. Assertions are protected using assertional locks. The

execution of a step is delayed if it can cause a currently locked assertion to become false. This paper is an expanded version of a conference paper presented earlier [5]. Although an

earlier version of the basic algorithm is described there, this paper contains new experimental results, a proof of the correctness of the algorithm, a description of the implementation, the han- dling of compensation, crashes and deadlocks, algorithm optimizations, and a generalization of the

correctness criterion.

2. PREVIOUS WORK

Considerable research has been done on the design of high-performance concurrency controls.

Many of the proposals try to increase performance above that available with strict two-phase

locking while retaining serializability as a correctness criterion. Some work uses the semantics of

abstract objects [33, 15, 34, 41. [29] d emonstrates how a set of transactions can be analyzed to

find a decomposition with the property that any interleaving of the decomposed transactions is equivalent to a serial execution of the transactions. Unfortunately, the serializability requirement places stringent limits on the decomposition.

Other related work considers models that do not use serializability as a correctness criterion.

A Saga [12, 81 is a long running transaction that is decomposed into steps. Since no restrictions are placed on how the steps of concurrent Sagas can be interleaved, it is implicitly assumed that

each step preserves database integrity. This assumption limits the extent to which the Saga can be decomposed. The performance of Sagas is tested using analytical and simulation method in [21]. Other examples of schemes involving decomposition, non-serializability, and compensation in a variety of ways include [25, 7, 20, 301. In these projects correctness iS generally defined in terms of restrictions on allowable schedules. A ConTract [32] can be thought of as a Saga whose steps are partially ordered. Not all step interleavings are permissible, however, and compensation is used to recover from incorrect schedules. A related approach is taken in [l].

[28] uses semantics in a more formal way. The database is decomposed into a collection of atomic data sets (ADS) such that the set of consistent database states is the Cartesian product of the consistent states of the ADS%. Transactions are decomposed into segments of code, called ADS segments, that access distinct ADS’s serializably. The notion of correctness in [28] is related to the concept of predicate-wise serializability [18, 19, 271. In both cases the database consistency

Page 3: Concurrency control for step-decomposed transactions

Concurrency Control for Step-Decomposed Transactions 675

constraint is used to decompose the database into subsets and transactions are required to be serializable (although not necessarily in the same order) in each subset.

The use of decomposed transactions and non-serializable schedules is also proposed in [II,

23, lo]. [ll] . t d m ro uces a model in which transactions are grouped into sets, and the steps of transactions wit,hin t,he same set can be arbitrarily interleaved, while transactions in different sets must be completely isolated from each other. The scheme is tested (under simulation) for a distributed system in [9]. [23] generalizes this by introducing the notion of multi-level atomicity.

Transactions are decomposed into steps and a hierarchical structure of allowable interleavings is established. The steps of transactions closely related in t,he hierarchy can be interleaved with each other while those distantly related cannot. The hierarchical nature implies that closeness is transitive: that is if Tl and Tz are close, and TJ and T3 are close, then Tl and T3 must be close. [lo]

generalizes [II] and [23] by associating with each interstep point a set, of transactions whose steps may interleave at that point. For implementation purposes the assumption is made that allowable

interleavings are transitive (if Tz can interleave at point Bl in ?“I, and T3 can interleave at some point in TL then T, can interleave at Bl). There are two shortcomings of these proposals: (1) they

do not give any guidance on t,he considerations that, should be used in decomposing transactions

and in deciding what constitutes a correct schedule, and (2) primarily because of transitivity, the technique used for specifying allowable interleavings is not, flexible.

The work described here essentially generalizes these proposals by presenting a non-transitive,

table-driven, algorithm that essentially enumerates the steps that can be interleaved between two successive steps of each transaction. Hence the technique for specifying allowable interleavings is completely flexible. The concurrency control that enforces the specification has been implemented and tested as described in subsequent sections. Equally important, we describe an analysis tech- nique that bases the decomposition and the interleaving specificat,ion on the semantic correctness criterion, thereby guaranteeing that when a transaction is executed it will satisfy its specifications.

A recent work that takes a similar approach to [6] is [2]. [2] uses z-logic to formalize the

semantics of transact,ions and bases a decomposition on t,hat, formalization. [2] is updated in

[17, 31, which g ive the design of an improved implementation of their method. The present paper

goes beyond 117, 31 in that, it describes a design for an assertional concurrency control based on a system that, has actually been implemented and also gives performance results for that system.

3. THE DESIGN OF AN ASSERTIONAJ~ CONCURRENCY CONTR.OL

We first, describe the use of assertional locks as a generalized mechanism to control the inter- leaving of the steps of a set of programmed transactions. No assumption is made concerning how the interleaving specification has been determined. We then introduce the notion of semantically

correct schedules and show that a semantic analysis of the transactions of an application can be used to determine interleavings that result in correct schedules. Finally we describe a simplified version of an assertional concurrency control that implements assertional locks. We show how the use of those locks can be based on the semantic analysis and hence how the control can guarantee

semant,ically correct, schedules.

5’. 1. Assertionub Locks on(l the Control of Step Intdeaviny

The ACC uses a new type of lock, called an ussertional lock, to control step interleaving. A transaction might hold an assertional lock on a data item in the same way that it holds a read or

write lock on the item. Assertional locks, however, are more complex than read/write locks since they are paramet,erized by the step that requests the lock. We assume that, each step type of each transactlion type is assigned a unique index and denote the step with index n by S,. We denote the assertional lock held by S, on a data item by d(n).

Step S;, of transaction Ti is granted an assertional lock, d(n), on a data item IC at the time it, is granted a conventional lock on Z. When S, completes, it releases the conventional lock, but retains the assertional lock in a new form. If the next step in Ti is &+I, then Ti’s assertional lock is converted to a d(TL + 1) lock. Hence, an assertional lock can be viewed as simply a marker

Page 4: Concurrency control for step-decomposed transactions

676 ARTHUR J. BERNSTEIN et al.

indicating the transaction’s control point. Using this scheme, it is a straightforward matter to

control step interleaving. Suppose the goal is to prevent Si of TI from being interleaved between two successive steps, Sj

and Sj+i, of T2. It is natural to assume that Sj and Si access common variables since if this were

not the case, the order in which they were executed would make no difference. Then it is only necessary to declare a lock conflict between d(i) and d(j) and between d(i) and d(j + 1). If z is a common variable and Sj acquires an d(j) lock on z, then SI’S request for an d(i) lock on z will be made to wait. The wait will continue when Tz enters Sj+i because of the conflict between A(i) and d(j + l), but the d(i) will be granted when Sj+i terminates (assuming the d(j + 1) lock is

converted to an assertional lock that does not conflict with the d(i) lock). Conflicts between assertional locks are stored in an interference table. If there are iV distinct

step types, the table has dimension N x N and the (i,j)“h entry has value true if a request for a d(i) lock conflicts with a granted d(j) lock. A single interference table is used for all data items. A conflict can be identified with a simple table look-up and hence can be done quickly.

A number of authors have described approaches for decomposing a transaction into steps and

scheduling the interleaving of these steps. The ACC can be used as the scheduling mechanism for many of these proposals.

3.2. Semantically Correct Schedules

The semantics of a transaction, Ti, can be formally characterized by the triple

{I A (5 = xi)} Ti {IA Qi} (1)

I is the consistency constraint of the database’, and Qi is called the result and asserts that T; has

performed its intended function. Z is the set of variables modified by Ti and xi is a corresponding set of logical variables. In the precondition the initial state of Z& is recorded in xi so that Qi can (optionally) express the final state in terms of the initial values of these variables. For example,

the consistency constraint of a banking application might state that all balances are non-negative. If Ti is a deposit transaction that adds dep (a non-negative number) to your bank balance, bal, and BAL is the logical variable corresponding to bal, then Qi might assert that the new value of your balance is dep more than the initial value that Ti read:

bal = BAL + dep (2)

(1) goes beyond the consistency requirement placed on a transaction by asserting that not only must Ti move the database from one consistent state to another, but that only a subset of the consistent states are acceptable when the transaction terminates. (1) can be regarded as a formal restatement of the specification of Ti. We can demonstrate that [an implementation of] Ti is correct

by proving that (1) is a theorem using a formal system such as that of [16] (although this is not generally done).

Serializability is the standard technique for guaranteeing the correctness of a schedule of correct

transactions. Since each transaction is individually correct and we assume that the database starts in a consistent state, a serial schedule must be correct since the precondition of each transaction,

I, is true when that transaction is initiated and each transaction sees a starting state that reflects the cumulative results of all prior transactions. A serializable schedule must also be correct since it is equivalent to a serial schedule.

Unfortunately, serializability is an overly restrictive condition since it not only guarantees that

each transaction executes correctly, but in addition requires that the final state be obtainable by a serial schedule. We propose a new correctness criterion, called semantic correctness, which eliminates this last requirement. A schedule, S, is semantically correct if

tin contrast to some other work, we consider the precondition of a transaction to be a predicate whose truth the transaction can assume when it is initiated. Thus the precondition of a withdraw transaction from a bank cannot assert that sufficient funds exists. Rather vithdraw must operate correctly from a state where sufficient funds do not exist and it does this by checking the balance and giving an “insufficient funds” message if necessary.

Page 5: Concurrency control for step-decomposed transactions

Concurrency Control for Step-Decomposed Transactions 677

is true.

Semantic correctness has two components. A semantically correct schedule must maintain

the consistency of the database, as indicated by the fact that I is a pre- and postcondition of (3). A semantically correct schedule must also transform the database to a state that reflects the cumulative results of all the transactions in S. We denote the assertion that describes that set of states by Qs, the cumulative result, and define it below. For example, if S consists of several deposit transactions on some bank account, Qs might assert that the final balance is greater than the initial balance by an amount equal to the sum of the deposits.

While the result of an individual transaction can (hopefully) be found in a specification docu- ment, the cumulative result of a schedule of transactions is not usually stated because of the large

number of possible schedules. Qs depends not only on the the results of the individual trans- actions in the schedule, but also on the way those transactions are interleaved. For example, if Tl deposits depl, Tz withdraws with 2, depl > withz, and the initial balance is 0, the cumulative

result of a schedule in which Tz precedes Tl can assert that the final balance is depl, whereas if the order is reversed the cumulative result can assert that the final balance is (depl - withz). Q.7 is useful because it sheds light on the conditions that must be enforced by a concurrency control

t,hat produces semantically correct schedules. We define the cumulative result of a (possibly interleaved) schedule, S, recursively. Let T; be

the l:lh transaction in S t,o complete, and let Q$ be the cumulative result, of the schedule, Si,

obtained from S by deleting all transactions Tj, j > i. We assume the existence of an initialization transaction, To, that, sets the database db, to its initial state, dbiT%it. Then the cumulative result of

S is defined to be.

and for k > 0

&OS z (db = dbi,if) (4)

Q: E (Qk A (1 A Qf-‘) Ig A q;-l(G)) (5)

To understand (5) we analyze each term separately. g denotes the set of database variables modified by Tk and z is a set of an equal number of logical variables that records their initial values (as seen by Tk). The first term in (5), Qk, is the result of Tk. It might, for example, describe the state produced by Tk in terms of those initial values, as in (2). Had S been a serial schedule,

those initial values would have been described by (I A Qt-I). By substituting z for G in the

second term of (5), (IA Qi-‘) Iz, we describe the state seen by Tk in terms of the logical variables

used in Qk. Suppose, for example, that there are two accounts at the bank with balances ha1 and bal’, that I a.sserts that, all balances must, be non-negative and that all balances are initialized to zero. Then if the transaction that deposits dep into baE is the first transaction to complete

(I A Q;--') I& E ((BAL > 0 A bal’ > 0) A (BAL = 0 A bal’ = 0))

Qi-’ would also describe the state when Tk completes, except for conjuncts involving G. Tk

might cause some of these conjuncts to become false. The remaining conjuncts of Qt-‘, which must aiso be included in Q”,, fall into two sets: those that do not involve G (and hence cannot possibly be made false by Tk) and those that involve G but are not made false by Tk. The former set is contained in the second term of (5) ( smce the substitution does not, apply to these t,erms).

In the example, these are the terms bal’ 2 0 A bal’ = 0. The latter set is denoted by q~-‘(ZjJ in

(5). In t,he example, this is the term bal > 0. If S contains K transactions then Qs z Q$. This definition implies semantic serializability: the cumulative result when Tk completes is

equivalont t,o the result that would have been obtained if Tk had been initiated after Sk-1 had completed. By requiring that QS be a postcondition of S, (5) asserts that a schedule of K transac- tions is semantically correct if its postcondition can be produced by a serial schedule of those same transactions. (Note that this does not imply that the state produced by a semantically correct

schedule can be produced by a serial schedule.) Semantic correctness requires only that, each transaction’s specification is met and that the

final st,ate reflect,s the result of all prior transact,ions. Since both these conditions are necessary,

Page 6: Concurrency control for step-decomposed transactions

678 ARTHUR J. BERNSTEIN et al.

semantic correctness is arguably the weakest possible correctness criterion. Semantic correctness is weaker than serializability since any schedule that is serializable is semantically correctt, but

semantic correctness allows schedules that result in states that could not have been reached in any

serial schedule. For example, a stock trading application might have a buy transaction type that takes as

parameters the identity of a stock and the number of shares, n, to be purchased and a result that states “when each share was purchased no cheaper unbought shares of the stock existed in the database”. In a semantically correct schedule, two concurrent transactions, Tl and Tz, could each buy some shares at $30 and some at $31 per share, even though initially there are n shares

available at $30. First Tl buys n/2 shares at $30; then TZ buys n/2 shares at $30; then, since

there are no more shares available at $30, TI buys n/2 shares at $31; and finally T2 buys n/2 shares at $31. When each transactions terminates its result is true since when each share was bought, no cheaper unbought shares existed in the database. The final state could not have been

produced by a serializable schedule since the purchase price of all shares bought by one or the other of the two transactions would have been $30.

3.3. Achieving Semantic Correctness

A proof of (1) can be abbreviated by an annotated program in which each [atomic] statement of

Ti, si,j, is preceded by an assertion, pre(si,j), its precondition, describing the state of the system

at the time siVj begins execution. Each assertion states some condition on the values of items

in Ti’s workspace and in the database. We say an assertion pe(si,j) is active if the statement

si,j is to be executed next. If si,j is executed starting in a state where pre(si,j) is true, the next assertion, ve(si,j+l), will be true of the state when si,j terminates. Hence, if when each statement is executed its precondition is true, the postcondition of the transaction will be true when the

transaction terminates. The major issue with respect to concurrency is invalidation: if the execution of the statements

of Ti and Tk are interleaved, we(si,j) might not be true of the database state when Si,j is ini- tiated. Thus, if Sk,1 is executed when we(si,j) is active and true, it might transform the state to one in which ve(Si,j) is false. If this occurs, we say that sk,l has invalidated ve(si,j). If sk,[ invalidates pe(si,j) then the postcondition of Ti might not be true when Ti terminates and hence the concurrent schedule involving Ti and Tj will not be semantically correct.

A sufficient condition to ensure that each transaction’s postcondition is true when it terminates

is that [26]:

{Pre(Si,j) APe(Sk,c)I Sk,1 h(Si,j)) (6)

is a theorem for all Sij and sk,J. (6) states that if Si,j and sk,l are the next statements to be

executed and if their preconditions are true, and if sk,l is executed next, the precondition of si,j will still be true when execution of sk,[ terminates. Hence the interleaving can be allowed. If (6) is true for all pairs of statements, when each statement in the schedule is executed its precondition

will be true. If (6) cannot be proven, we say that sk,( interferes with pre(si,j), and hence there is a possibility of invalidation at run time if the interleaving actually occurs. (As we shall see, even if the interleaving occurs, invalidation does not necessarily result.)

If no invalidation occurs in a schedule, I will be true of the final state produced since the

precondition of the last statement in the schedule is true when that statement is executed and, from (l), its postcondition has I as a conjunct.

Requiring non-interference at the statement level is too strong a condition for most applications

and hence unlimited interleaving leads to schedules that are neither serializable nor semantically correct. Instead, with the ACC we use the following strategy.

1. At design time, we decompose each transaction, Tip into a sequence of steps, Si,l; Si,z; . . . ; Si,Ni, and consider those assertions in the proof of (1) that appear at step boundaries, the inter-step nssertions. We denote the precondition of the step Si,j as ve(Si,j) and the

‘We consider schedules that are conflict equivalent to semantically correct schedules to be semantically correct themselves.

Page 7: Concurrency control for step-decomposed transactions

Concurrency Control for Step-Decomposed Transactions 679

postcondition of Si,j_l as post(Si,j_l) (where post(Si,j_l) * pe(Si,j)). Hence pre(Si,l) 3 I and post(Si,~,) z post(Ti) z (I A Qi). A proof of Ti can be abbreviated

{pTe(Si,l)} Si,i {we(Si,2)> Si,2 S,N,--I {pre(Si,rvt)} Si,N; {pOst(Si,~,)} (7)

Theoretically, the strategy used in the decomposition is to choose steps in such a way that the number of interstep assertions that are interfered with by some transaction step is small. As a practical matter, step boundaries are chosen to be points at which major modules are

called. It is expected that some steps will interfere with some interstep assertions. We refer to this interference as residual interference. Although it would be desirable to choose steps such

t,hat there is no residual interference, such a goal may not be consistent, with a decomposition having a reasonable granularity. In the limit, as st,ep size increases, each transaction becomes

a single step and residual interference disappears entirely.

2. At, execution time a strict two-phase locking algorithm is used to guarantee step isolation and atomicity, and thus any schedule produced by an ACC is equivalent to a serial schedule of steps. We assume that all schedules have this form in what follows. Hence, we need not be concerned with the invalidation of intra-step assertions. The only remaining problem is the invalidation of int,erst,ep assertions that might, occur at the points where residual interference

exists.

3. To deal with the potential for invalidation, the ACC controls step interleaving by locking active interstep assertions and results. Assertions are locked using assertional locks. Con- ceptually, in the serial schedule of steps, a lock on the interstep assertion Fe(Si,j) is held

between the t,imc Si,,_l completes and the time Si,,j is initiated. The lock ensures that /)r(x(Si,J) is true when Si,j is initiated by inhibiting the initiation of a step that can invalidate l>rf’(Sz.,j). In practice, assertional locks are acquired on items as they are accessed in such a manner that the effect is equivalent, in the serializable schedule of steps, to locking the intcrstep assertions between steps of the serial schedule of steps. Note the difference between

[26] and the approach taken here. (6) guarantees semantic correctness by assuming all pos- sible interleavings can occur and showing (at design time, using a non-interference proof) that, none will cause invalidation. With the ACC, interleavings that cause interference are

identified (at design time) and prevented (at run time).

Using the above approach the ACC guarantees that the precondit,ion of each step is true when it is executed. This in turn guarantees that each transaction’s result is true when it, terminates and that 1 is a postcondition of the schedule. One additional consideration must be dealt with: the invalidation of a transaction’s result after it terminates. A result cannot be invalidated if it does not reference database variables. For example, the result of the buy transaction of the stock trading application “when each share was purchased no cheaper unbought share of the stock existed in the database”, refers to a snapshot of the database and hence cannot be invalidated by subsequent changes t,o the database. This is the case that was considered in [6].

Suppose, however, that, a result, Qi, references a database variable and therefore can be invali- dated. (2) is an example of this. We might consider locking Qi when Ti terminates and holding the lock for the remainder of the schedule. To understand why this is t,oo st,rong a condition, consider

the deposit transaction described earlier.

(BAL = hal) step1 : B := bal;

{B = BAL = bal} step2: B := B + deep;

bal := B;

{bal = BAL + dep}

(8)

Page 8: Concurrency control for step-decomposed transactions

680 ARTHUR J. BERNSTEIN et al.

The transaction has been decomposed into two steps. B is a local variable and BAL is the

logical variable that records the the initial value of account balance, bal. Interstep assertions are

shown in the first column in curly brackets. For simplicity we have ignored I in the proof. The postcondition of step2 is the result (2). If the steps of two instances of the deposit transaction on the same account, Tl and Tz, are interleaved and Tl finishes first, its result will be true when

it terminates. However, the second step of TZ will invalidate Tl’s result. Such an interleaving is generally referred to as a lost update. We can prevent this invalidation by assertionally locking Tl’s result when it terminates. This lock will prevent the second step of Tz from completing (since

it interferes with the locked assertion) and can be used as an indication that TZ should be aborted. If the result lock is held for the remainder of the schedule, all subsequent deposit transactions

would be aborted. Note, however, that our concern is only with deposit transactions like T2 that

are interleaved with Tl. If no interleaving has occurred the transactions are executed serially and result invalidation is not a concern. Thus, TZ does not cause a lost update if it is initiated after T1 completes.

Hence, to ensure the semantic correctness of a schedule S, we require that the result of a

transaction, Tk, must be locked until all transactions whose steps are interleaved with Tk in S have

completed. If any interleaved transaction attempts to execute a step that would invalidate Qk, it must be aborted. This requirement corresponds to ensuring that the following theorem ([26]) holds for all concurrently executing transactions, Ti and Tk:

{Qil A prdw)} Sk,1 { Qi} (9)

Theorem 1 If in o schedule, S, when each step is executed its precondition is true and if the result of a transaction is not invalidated by concurrently executing transactions, then S is a semantically correct schedule.

Proof. See Appendix A. cl

The stronger the interstep assertions, the more likely they are to be interfered with. It is therefore important that the proof of Ti involve the weakest assertions that are sufficient to yield

Qi as a postcondition and to ensure that any part of I invalidated by Ti is restored before Ti commits. This issue is discussed in [6], which describes a proof, called a maximally reduced proof, using weaker assertions than those in (1). I can be represented as a conjunction 11 A I* A . . . A I,,. The precondition, pre(Si,l), of a maximally reduced proof of Ti is a conjunction of a subset of {II, 12, _ . . , I,,}. Thus, conjuncts of I that only reference database variables not accessed by Ti need

not be included in the subset. pre(Si,l) is sufficiently strong to demonstrate Qi as a postcondition and to demonstrate that any conjunct of I that is temporarily made false by a step of Ti has been restored to the true state by a subsequent step. The maximally reduced proof of Ti can still be

abbreviated as (7) with the meaning of each assertion pre(Si,j) adjusted accordingly. We have described a formal technique for determining interstep assertions and an interleaving

specification. Step decomposition, however, is frequently done informally. Each step is a method in a module that performs a complete subtask and returns the module’s data (not necessarily the

database as a whole) to a consistent state (or to some other state for which the interstep assertion is intuitively evident). For example, a market trading transaction for buying a stock might invoke

two modules: one for determining the amount of credit a customer has with the broker and another for searching the list of all pending sell orders for a suitable trade. If each module is taken to be a step, an informal approach to determining acceptable interleavings would assume that the modules are correct and take as the interstep assertion {customer’s credit = max_credit}.

3.4. The Assertional Concurrency Control

In an early, but conceptually simple design [6], the ACC is decomposed into two levels where the lower level implements a two-phase locking algorithm for use within each step. Since each step uses conventional shared and exclusive locks on data items in a two-phase fashion, step isolation is guaranteed. The higher level is responsible for dispatching steps to the lower level. It implements the abstraction of locks on assertions. At design time, transactions are decomposed into steps and

Page 9: Concurrency control for step-decomposed transactions

Concurrency Control for Step-Decomposed Transactions 681

an analysis is performed to determine residual interference. This analysis produces tables that (a) identify steps that interfere with interstep assertion and results and (b) identify transaction

prefixes that interfere with each conjunct of 1. The information in (b) is used for determining when a transaction can be initiated. At run time the upper level of the ACC uses these tables to decide whether or not a step should be dispatched. The precondition of step Si,j is locked when Si,j is executing (or eligible to execute) and the ACC will not dispatch Sk,, if Sk,l interferes with

pre(Si,j) and pre(Si,j) is locked.

A major problem with a two-level ACC is that the identity of some database items referenced

by a step or an assertion might not be known at design time, and hence assertional locking must, be more conservative than necessary. Thus, Sk,1 might interfere with pre(Si,j) because it cannot

be determined at design time that certain items accessed by Sk,l are distinct from those referenced

by pre(Si,j). In that case the design-time analysis must conclude that the high-level cont,rol should

not dispatch Sk,1 when pre(Si,j) is locked.

For example, if step Si,j has a precondition asserting account.bnl > $1000 for the account it is accessing, and step Sk,l executes the statement UPDATE accounts SET bal = bal - 100

WHERE <CONDITION>, then a two-level concurrency control would delay the execution of Sk,1 if it cannot be determined at design time that the two transactions will access different accounts. If the two transactions access distinct accounts, but the ACC delays Sk,l, then T,+ has been delayed unnecessarily, We can avoid this type of unnecessary delay by using information available at run time. If T, and Tk access different accounts, the ACC should allow Sk,l to execute despite interference. By integrating the two levels into a single level, we can construct an ACC that can

make such a determination efficiently.

As with a two-level control, in a one-level control each step uses conventional locks in a two- phase manner (and hence is isolated). Assertional locks are used in a non-two-phase fashion to ensure that when a step executes, its precondition is true and to guard results. Instead of locking

the abstraction of an assertion at the upper level of a two-level control, assertional locks are attached to individual database items in the same way that shared and exclusive locks are, and hence all locks can be checked in a uniform way. An assertional lock on the assertion A is denoted

d(A). Hence, in addition to being locked conventionally, a database item z referenced in A can be

locked with an d(A) lock. An d(A) lock on z prevents the update of z by a step that interferes with A. We say that the assertion A is locked when Ti holds d(A) locks on all items referenced

by A. Since the only way for the database to move from a state in which A is true to one in which it is false is for a transaction to modify an item that A references, Ti can preserve the t,rut,h of >4

by locking all such items.

As in the two-level ACC, interference between steps and assertions is determined at design time and is stored in an interference table, which can be efficiently accessed at run time by the ACC. Hence: the overhead of acquiring and releasing an assertional lock is comparable to that for conventional locks. In this section we describe a version of the one-level ACC that has a number

of inefficiencies, but is simple enough to be presented concisely. Some optimizations are described

in Section 3.7.

The primary concerns of the one-level ACC are to ensure that

1. pre(Si,l) is true when Ti is initiated,

2. pre(,!&), j > 1, is true when Si,j is executed, and

3. Qz is not invalidated by transactions interleaved with Ti.

In the simple version of the one-level ACC, presented below, the single interference table encodes the possible interleavings of steps. Thus, for example, if S k,,i; . ; Sk& interferes with pre(Si,l), the interference table will encode that Si,i cannot be interleaved between SkJ and Sk,l+i. Similarly, if Si,j int,erferes with pre(Sk,l+l), the interference table will encode that Si,j cannot be interleaved between Sk,l and Sk,J+i. The determination of when the lock on Qi can be released is done with timestamps.

The algorithm can be summarized as follows:

Page 10: Concurrency control for step-decomposed transactions

682 ARTHUR J. BERNSTEIN et al.

Simplified One-Level A CC Algorithm

Before initiating transaction I’;: Acquire a timestamp, pi, and d(pre(Si,l)) locks on all items referenced by pre(Si,l).

As a step executes: Acquire conventional read and write locks on items as they are ac-

cessed .

When a step Si,j terminates: If Si,j is not the final step, then atomically:

1. Convert, all d(pre(Si,j)) locks to d(pre(Si,j+l)) locks and release all conventional locks.

2. For all items referenced in pre(Si,,+l ) but not pe(Si,j), grant d(pre(Si,j+l)) locks.

If Si,j is the final step, then atomically:

1. For all items not, referenced in Qi, release all conventional and assertional locks.

2. For all items referenced in Qi, release all conventional locks, convert all d(pe(Si,j))

locks to d(Qi) locks, and timestamp each such lock with the current time.

We can interpret this algorithm as follows:

Acquire d(pre(S;,1)) locks: To ensure that pre(Si,l) is true when Ti is initiated, the ACC locks pre(Si,l) on behalf of Ti. Since pre(&,l) consists only of conjuncts of I, it can be

false only if some active transaction has modified an item: 2, referenced in ve(Si,l). Thus the ACC cxsmincs the locks 0x1 EE as the request for a d(pre(Si,l)) lock on 2 is processed to see if pe(Si.1) might have been invalidated. If Tk holds an d(pre(Sk,l)) lock on 2, then Si,l will either be serialized between Sk,l_l and Sk,l or between Sk,l and Sk,~+l. If either of the sequences Sk,];. . .;Sk,~_l or Sk,l;. . . ; Sk,! (as a whole) interferes with pe(Si,l) then the interference table will encode that the d(pre(Si,l)) lock on 3: cannot be granted at this time and Ti will be made to wait.

Acquire conventional read and write locks: Conflicts between conventional shared and ex-

clusive locks arc dealt with in the usual way. In addition, a request for an exclusive lock might, conflict with a previously granted assertional lock. If Si,j interferes with pre(Sk,l), then a request by Si,j for an exclusive lock on an item z cannot be granted if a d(pre(Sk,l))

lock exists on x (since Si,j might invalidate pre(Sk,l)). The interference table will encode this conflict,.

Convert d(pre(Si,j)) locks to d(pre(S;,j+l )) locks: When Si,j WAS initiated, pe(Si,j) was locked and true of the database. Hence, it follows from (1) that when Si,j completes (at the time this request is made) pe(Si,j) is still locked and pe(Si,j+l) is true of the database.

Thus the assertional lock on all items referenced by both pe(Si,j) and pe(Si,j+l) can be converted to a d(ve(Si,,+l)) lock. Any item referenced by pe(Si,j+l) but not, by ve(Si,j) must have been accessed by Si_j and hence Si_j holds a conventional lock on it. Hence a d(pe(Si,j+l)) lock can be granted on such items as well. Furthermore, once the lock on pe(S,,j+l) has been granted, we are guaranteed that any concurrently executing step, which might be serialized by the ACC between Si,j and Si,j+l, does not invalidate pe(Si,j+l). (The proof of this is given in Appendix 6.)

Release all conventional locks: Conventional locks are released by converting t,heir granted mode to N (none) but leaving the lock in place (except in the last step, when the lock mode is not converted to N .)

For the final step, retain locks on the assertions in Qi: Result locking is handled with time- stamps. Assertional locks on results are stamped with the time at which they are granted, and each t,ransaction carries the time at, which it was initiated. Suppose step Sk.1 of Tk with

Page 11: Concurrency control for step-decomposed transactions

Concurrency Control for Step-Decomposed Transactions 683

start time rk interferes with Qi and requests an exclusive lock on an item on which a result

lock, d(Qi), with timestamp 710ck is held. If r k > %,-k, the lock can be granted, and if rk < ‘r&k, the lock cannot be granted and Tk is aborted. A result lock can be released when all active transactions have start times greater than that of the lock. Optimizations to this and other aspects of this simplified version of the ACC are discussed in Section 3.7.

Note that the locking algorithm operates only on the lock table, never the value of the locked

item, and that all accesses to the interference table are indexed table lookups.

Theorem 2 The Simplified One-Level ACC Algorithm guarantees that when each step is initiated

its precondition is true anal the result of each transaction is not invalidated by concurrently executing

transactions.

Proof. See Appendix A. 0

Together Theorems 2 and 1 ensure t,hat the Simplified one-level ACC algorithm produces only semantically correct schedules.

One concern about non-isolated execution is the effect it might have on legacy transactions or ad--hoc interactive transactions. These transactions have not been, or cannat be, analyzed.

The ACC will ensure that these transactions execute in an isolated manner. The ACC requires multi-step transactions to holds assertional locks on the items they access until commit. It uses these assertional locks to ensure that unanalyzed transactions do not see the intermediate results of multi-step transactions and are completely isolated. Thus correct unanalyzed transactions will

operate correctly in t,he presence of multi-step transactions.

3.5. Example of Simplified ACC Algorithm

To illustrate the manipulation of locks performed by the simplified ACC algorithm consider an airlines reservation system with three transaction types: reservation, print-fit-info, and change-seat.

For each flight, the database maintains aggregate information and we will take the passenger count, cnt, as a representative of that information. In addition, a passenger list, list, is maintained which

contains an entry for each passenger. Each entry contains an integer seat number, seat#, and

a Boolean, smoke, that is true or false depending on whether smoking or non-smoking has been

requested. Two integrity constraints apply:

11 : cnt = 11ist(

I2 : (Vi)@ t[J. 2s z smoke + (list[i].seat# > 100))

Is indicates that smokers must sit in the back of the airplane. A sketch of the three transaction types together with a description of how they acquire and release locks is given.in Figure 1. Since

conventional locks are acquired and released in a two-phase fashion we can consider each step as an isolated unit a.nd concern ourselves only with how steps are interleaved..

Assuming that II is true when a reservation transaction is initiated, pre(Sl,z) will be true on the termination of S1,1. Hence, the assertional locks on list and cnt acquired by S’l,l can simply be converted from d(pre(Sl,l)) to d(pre(&,a)); no request or waiting is required.

Since 5’1,1 invalidates 11, it is necessary to prevent a print-flight-info transaction (which requires I1 as a precondition) from being interleaved between steps S1,1 and S1,2 of a reservation transaction.

This is done by declaring a conflict between a d(pre(Sl,l)) lock and a d(pre(&,l)) lock and also between a d(pre(Sl,z)) lock and a d(pre(&+)) lock, and encoding this in the interference table. Since a reservation t,ransaction will hold either a d(pre(S1,l)) or a d(pre(Sl,z)) on both cnt and list for the durat,ion of the transaction, the request for a d(pre(&,l)) lock on either item when a print_,flt_info transaction is initiated will be denied, preventing the interleaving. A change-seat

transaction, however, does not require II as a precondition and therefore can be interleaved between S1,l and S1,2. Hence, no conflict need be declared between d(pre(&,l)) and either d(pre(S1,1)) or d(pre(Sl,z)).

Note that conflicts between assertional locks are determined, and the interference table is created, at design time. Hence the ACC need only perform a table lookup at run time to decide whether or not a conflict exists.

Page 12: Concurrency control for step-decomposed transactions

684 ARTHUR .I. BERNSTEIN et al.

reservation

iI11 Acquire d(pre(Sl,l)) locks on list and cnt.

,SI,~: add entry to list Acquire write lock on list dynamically.

pre(S1,2) : {mt = Ilist( - 1) Release write lock on list and convert d(pre(S1.1))

locks on list and mt to d(pre(Sl,z)) locks.

Sl,z: increment cnt Acquire write lock on cnt dynamically.

(111 Release all locks.

print-f light_inf o

(1, A 12) Acquire d(pre(Sz,l)) locks on list and cd.

S2,1 : print flight, information Acquire read locks on list and cnt dynamically.

{I1 A 121 Release all locks.

chanqeseat

iI21 S3,, : change seat

(I.21

Acquire dbe(&,l)) lock on list. Acquire write lock on list dynamically.

Release all locks.

Fig. 1: Lock Manipulation Required by Transactions of an Airlines Reservations System

3.6. Additional Restrictions on the Specification of Transactions

When a transaction is executed in a serializable schedule, the values of all items it reads come from the same committed snapshot of the database. This is an unstated precondition of the proof

of each transaction. An assertion, pre(Si,j), of a proof (7) states additional properties of the values

of these items.

When transactions are not executed serializably, but are instead decomposed into interleavable

steps, a transaction is not guaranteed that the values it reads from the database come from the same committed snapshot, and as a result the truth of pre(Si,j) might not, be sufficient to guarantee that the transaction executes correctly. By correctly we mean that the results of the transaction (its database updates and its printouts) model the real world, not just that its postcondition is

true of the database state when it terminates.

For example, a retail store might maintain a database describing the inventory in each de-

partment. The database invariant might state only that the inventory amount for each item in each department be non-negative. Consider a transaction, T,, that transfers a specific item from Department A to Department B. Tl might be divided into two steps: the first step deletes the item from Department A’s inventory; the second step adds it to Department B’s inventory. Between steps, the database invariant is true, but the database state does not reflect the actual state of the store, since that item is not, in the inventory of either department. A second transaction, T2, that is required to print out a complete inventory of the store cannot execute between the steps of Tl even though the invariant is true between the steps, because T2 must see a committed snapshot in order to execute correctly.

To deal with this problem, the designer of a transaction must, explicitly state any additional conditions that are required of t,he state of the items read by the transaction in order to guarantee the correct execution of that transaction. There exists a variety of such conditions. One such condition is that, the value of each item read by the transaction must come from some committed snapshot, but that, each value can come from a different snapshot. A stronger condition is that

Page 13: Concurrency control for step-decomposed transactions

Concurrency Control for Step-Decomposed Transactions 685

the set of items read by the transaction is decomposed into several disjoint subsets, and the values

it reads of all items in the same subset must come from the same committed snapshot, but the values of items in different subsets can come from different snapshots. This type of condition is

considered in [28, 18, 19, 271. For example, a catalog mailing transaction might be decomposed into steps. Each step reads an address from the database and prints an address label. The transaction might require that all the information on a single label be read from some committed snapshot, but different labels might be allowed to come from different committed snapshots. Hence, a transaction

that updates an address can execute between steps of the mailing transaction.

A weaker condition requires only that the state read by a transaction be “committable”. A state is committable if it represent an acceptable description of the real world. For example, a transaction that adds several names to the mailing list database might be divided into steps each

of which adds one name and address. In between steps the mailing list is not committed, but is committable. A mailing transaction might find these committable states of the list acceptable. A

state that was not committable, on the other hand, might be one in which the mailing list contains incomplete addresses.

The conditions that must be imposed on the view of the database seen by a transaction de- pend upon the requirements of the application and are not derivable from the progra% for that transaction. For example, a manager executing an inventory transaction for a retail store might be satisfied with a committable database state (resulting from the intermediate state of another

transaction) or that manager might require exact results obtained from a committed snapshot. It would be impossible to determine these requirements of the manager from an inspection of the transaction program, since the same program could be used to satisfy either of these requirements.

Since the ACC can be used to enforce an arbitrary interleaving specification, such requirements

can easily be implemented. For example, suppose Tl requires that its view of z and y come from the same committed snapshot and S l,r is the first step in Tl to read either of them. Furthermore,

suppose T2 changes one or both of these variables and S 2,t is the first step in TJ to write either of

them. Then the suffix of Tl starting with S I,~ and the suffix of T2 starting with S’Z,~ must, mutually exclude one another. This restriction can be enforced by encoding in the interference tables that

the steps of one suffix interfere with the preconditions of the steps of the other.

9.7. Implementation and Optimizations

The Simplified ACC Algorithm described in Section 3.4 has a number of inefficiencies that are not present in the version of the ACC that we have implemented. Most noticeably, we have imple- mented the dynamic acquisition of assertional locks as contrasted with the static assertional lock acquisition used in the Simplified ACC Algorithm (and illustrated in Figure 1). In the Simplified ACC, assertional locks on all variables referenced in a step’s precondition are acquired when the step is initiated. In the implemented version, assertional locks are acquired dynamically. When each item is initially referenced in a step, the assertional lock is acquired (in addition to the con-

ventional read or write lock), with the additional requirement that all items referenced in the precondition of a step are assertionally locked before the step commits.

Dynamic lock acquisition not only reduces lock holding time by allowing a transaction to acquire an assertional lock on an item when the transaction first accesses the item, but also reduces the number of assertional locks that must be acquired in cases where the identity of an item referenced in an assertion is not known when the step is initiated. For example, in a banking application the first part of a step might identify the account to be accessed, and the second part might access that account. With static lock allocation, the step must (conservatively) assertionally lock all accounts, while with dynamic locking the step can acquire an assertional lock on the account t,o be referenced when that account, is accessed (and therefore has been identified).

Dynamic lock allocation does not entirely solve the problem of locking items early or unnec- essarily. The proof (7) requires that each step’s precondition (except for the first) is implied by the postcondition of the preceding step. Hence, any conjunct of I required as the precondition of some step must be included in the precondition of the transaction as a whole and therefore must be locked before (if static lock acquisition is used) or during (if dynamic lock acquisition is used)

Page 14: Concurrency control for step-decomposed transactions

686 ARTHUR J. BERNSTEIN et al.

the execution of the first step. To overcome this problem, the implemented version of the ACC bases assertional locking on a weakened version of (‘7), in which the precondition of each step (except for the first) follows from the postcondition of the preceding step conjoined with certain

conjuncts of I. For example, suppose the print_fEt_info transaction were decomposed into two steps: Sz,, prints the aggregate information about a flight and S 2,~ prints the passenger list with seat

numbers. Then 12 need not be a precondition of S 2.1. The optimized ACC requires only that I2 be locked before 5’2,~ commits’.

In general, suppose Si,j is the first step of Ti that has a conjunct of I, IP, in its precondition.

When Si9j requests assertional locks on variables referenced in IP, the ACC validates IP in a manner similar to the validation of conjuncts of I in pre(Ti) in the simplified ACC algorithm. Hence, the truth of I,, is assured at the point. in the execution of Ti at which it is required.

An additional problem with the simplified ACC algorithm is the requirement that every item

in an assertion be assertionally locked in order for an assertion to be considered locked. This

requirement is imposed because a transaction can invalidate an assertion by modifying only a subset of the variables in that assertion. Thus to protect a locked assertion, the ACC prevents

any step that interferes with the assertion from modifying any item referenced in the assertion. In

cases where the set.of transactions and steps can be characterized at design-time, however, we can allow a step, Si,j, to lock a subset of the items in pe(Si,j) if it can be shown that an undetected invalidation is impossible. This will be the case, for example, if any step that interferes with

pe(Si,j) must acquire a lock on at least one of the items that Si,j has locked. We have designed and implemented a tool to aid in this determination.

A transaction, Tk, must be aborted if it, attempts to execute a step that interferes with a locked

result, &i, and if it executed concurrently with Ti. A common paradigm for such a situation is the lost update situation on some item, 2. Ti and Tk both read 2, Ti writes z and commits (and its result asserts something about the new value of z) and then Tk attempts to write x. Instead of aborting Tk when it attempts its write, it would be preferable to delay the start of Tk until Ti completes. Since t,he pattern of interaction is known at design

set up to limit interleaving in exactly this way. In some cases entirely. These ciLses are discussed in [13].

3.8. Compensation and Deadlock

time, the interference table can be

result locks can be dispensed with

In a conventional database system that implements isolation, the effects of an aborted trans-

action can be undone, since intermediate states produced by the transaction are not visible to concurrently executing transactions. With the ACC, a step can be aborted before it commits, but since an intermediate state is exposed when a step commits, compensation must be used to reverse

the effects of a committed step. Hence, with the ACC, transaction rollback involves aborting the

current, uncommitted, step and compensating for prior steps that have committed (similar to the scheme in Section 4.3.2 of [22]). Compensation is procedural in nature and seeks to “semantically undo” the effects of committed steps (111. Compensating steps are written by the application

programmert. In the implemented ACC, a single compensating step CSi,j, with precondition im- plied by ve(Si,j+l), compensates for an entire sequence Si,l; . . . ; Si,j of committed, forward, steps.

The programmer must ensure that compensation preserves consistency and that the postcondition of each transaction admits the possibility of compensation. Thus for each [non-final] step, Si,j-1 , the programmer must prove that

is a theorem.

(1) Si,l;.. . ; Si,j-1; CSi,j-1 (1 A Qi} (10)

The programmer must also consider the effect of compensation on the execution of concurrent transactions. Thus in order to allow Sk,[ to be interleaved between Si,J _ 1 and Si,j, the programmer

tNote that in Figure 1, Iz is invariant to the steps of all transactions, so 12 will be a precondition of 572,~ in all interleavings. However, if the charrge-seat transaction were decomposed into several steps, this would not necessarily be the case.

tThe actual effects of compensation are determined by the transaction programmer. Thus forward recovery can be implemented.

Page 15: Concurrency control for step-decomposed transactions

Concurrency Control for Step-Decomposed Transactions 687

must show that CSk,l_-1 does not interfere with pre(Q)

h4%) A p4Sw)) CSI,J-I {pre(Si.,)}

The ACC logging algorithm uses the write ahead log (WAL) algorithm, and is based upon

ARIES [24] and MLR [22]. An active transaction writes log records for every update of the database, including updates during backward processing (undo due to abort). When a step, Si,j,

requests an update during forward processing, it reserves log space for both the forward log record and for the undo record that will be written should it abort. In addition, before Si,j commits,

it must, reserve log space for the compensating step that will be executed should &+I abortt. Since each step instance can either abort or be compensated for, but not both, only the difference

between the space already reserved for step abort and the space required for compensation must be allocated before step commit. Hence, the reservation algorithm ensures t,hat sufficient log space remains to abort or compensate for active transactions (as appropriate).

When a st,ep commits, it writes a step-commit log record. This record includes a reference to

the corresponding compensating step. Compensating steps are maintained by the database server

as stored procedures, and thus the server can autonomously initiate compensation by reading the log record and invoking the appropriate stored procedure. A unique identifier is passed to the stored procedure as a parameter when it is invoked. This identifier allows the compensating step to find any data st,ored for it by the forward step.

The general form of deadlock involves steps waiting for conventional and assertional locks. Step abort releases conventional, but not assertional locks. Thus the abort of a step involved in a deadlock might not, resolve the deadlock. In these cases compensation is necessary. However, if compensating steps must, wait for assertional locks, the deadlock might not be resolved. We

refer to this situation as an unrecoverable deudloclc. Unrecoverable deadlocks are prevented by ensuring that compensating steps never wait within deadlocks. Compensating steps never wait

for conventional locks in deadlocks since they kill those steps delaying them when a deadlocks is detected. Preventing compensating steps from waiting for assertional locks is done in two parts. First, we restrict compensating steps to access only those items already accessed by forward

execution. This restriction ensures that compensating steps do not have to acquire new assertional locks. Second, we modify the assertional conflict matrix so that the assertional locks acquired by a forward step ensure that the corresponding compensating step will not have to wait for an

a.ssertional lock. To do this, we determine when an assertional lock, d(pre(Sk,l)), can cause the assert,ional delay of a compensating step, CSi,j. To prevent this delay, we modify the conflict,

matrix so t,hat ST,j’s asser$ional locks block the forward progress of Sk,l_l, and Sk,l-l’s assertional locks block the forward progress of Si,j. This modified conflict matrix prevents a situation where Tk ha.s completed Sl;,,_l, Ti has completed Si,,j, and they accessed an item in common (the item

CS;,j is delayed at).

This algorithm has the beneficial side effect that after a crash (and the undo of all in-progress st,eps), any serializable execution of the pending compensating steps will produce a semantically correct schedule, and thus crash recovery is simplified. Details can be found in [13].

4. EXAMPLE

We consider a simple order processing system loosely based on the TPC-C benchmarks in order

to illustrate various facets of the ACC. The experiments described in the next section use actual benchmark code. The database tables are organized as follows (keys are underlined):

orders (orderid, customerid, number_of_distinct_items, price). Each tuple encodes an order. price contains the total price of all items to be shipped, and number-of-distinct-items contains the number of different, item-ids ordered. Each distinct it,em in the order is described by a tuple in the orderlines table.

tSince compensation might be aborted, space for the undo of compensation is also reserved.

Page 16: Concurrency control for step-decomposed transactions

688 ARTHUR J. BERNSTEIN et al.

stock (itemid, slevel). Each item has a tuple that contains the quantity of stock that is available to fill incoming orders.

prices (item-id, price). Each item has a tuple that contains the unit price.

orderlines (orderid, itemid, ordered, filled). Each tuple encodes an item named in some order.

The ordered field is the quantity ordered. The filled field is the quantity of the item that will be shipped for this order. A tuple with the same order-id must exist in the orders table. Tuples with the same item-id must exist in the stock and prices tables.

There is also a table current-o-number, with a single attribute current_ordernumber and a

single tuple which acts as a counter.

We deal with only two types of transactions. bill determines the billing information for an

order. It totals the prices of the orderlines for an order and puts that total price into the price field in the order record. It is not, decomposed into steps.

new-order, shown in Figure 2 (in pseudo embedded-SQL), enters the information about an order into the database. It inserts a single tuple into the orders table and inserts numitems

tuples into the orderlines table (one tuple for each item ordered). When it inserts an orderline tuple, it tests whether there is sufficient stock of the item requested to satisfy the request and if

not, supplies only the amount in stock. (We assume that the specification of new-order states that for each item ordered, either the amount requested is supplied or, if that amount is not available when the request is made, the amount in stock is supplied.) new-order is divided into steps. The first step ends when the first orderline is about to be inserted. Subsequent steps consist of the insertion of a single orderline.

/* items11 and quantI] are arrays with num_items. Each entry describes one

orderline */

new_order(cust_id, num_items, items[l, quant[l){

STEP 1; /* STEP BOUNDARY */

/* get and increment order number */

SELECT current_order_number INTO :o_num FROM current_o_number;

UPDATE current_order_number SET current_order_number = :o_num+l;

/* insert the new order */

INSERT INTO orders VALUES (:o_num,:cust_id,:num_items,O);

for (i=O;i<num_items;i++){ /* for each requested item */

STEP 2; /* STEP BOUNDARY (note --- boundary is within the loop)*/

/* get the lesser of requested and in-stock quantity and update stock level */

SELECT LEAST(:quant[il,s_level) INTO :filled FROM stock WHERE item-id = :items[i];

UPDATE stock SET s-level = s-level - :filled WHERE item-id = :items[i];

/* insert orderline */

INSERT INTO orderlines VALUES (:o_num,:items[i].:quant[il,:filled);

)

1

Fig. 2: new-order 'Transaction

Page 17: Concurrency control for step-decomposed transactions

Concurrency Control for Step-Decomposed iii-ansactions 689

For the purposes of this example, we are concerned with only a single conjunct of I; denoted

Ii. Informally, 11 asserts

For each order-id, the number of tuples in orderlines with that order-id is equal to the value of the num-distinctitems field in the tuple in orders with that order-id.

II applies to the entire database, but it can be rewritten as conjuncts, one for each order-id. We denote the conjunct of II for order o-rtunr as IF-n’Lm, The partial execution of new-order interferes with Ip_nz‘m where onum is the index of the order being added. Informally, the interstep assertion

is

For the order-id corresponding to If-num, the number of tuples in orderlines with that order-id is equal to i.

where i is the loop variable. The complete execution of new-order restores li’-num.

Even wit,hout a formal proof, it is intuitively clear that the execution of one new-order trans- action does not interfere with the execution of another new-order transaction for a different order.

Specifically, new-order does not require Ii as a precondition. Hence the steps of different instances of new-order transactions (for different orders) can be arbitrarily interleaved.

A schedule produced by the ACC might be non-serializable if some items go out of stock. For example if Ti and T,+ are concurrently executing instances of new-order, and both are ordering 10 units each of televisions and VCRs, Ti might have the order for televisions filled, but not the order for VCRs, while Tk has the order for VCR.s filled, but not the order for televisions. This non-serializable schedule is acceptable since (1) the final database state is consistent and (2) the

specification of each new-order is met.

On the other hand, it is also intuitively clear that the bill transaction for an order cannot

be executed until all the orderlines for that order have been inserted. (Otherwise it might not bill

for certain items.) Specifically bill does require Ip-71“r’L as a precondition, where onum is the

index of the order being billed. Thus bill cannot be interleaved between the steps of a new-order acting on the same order. Stated somewhat differently, bill must see a committed snapshot of

the items in the database related to the order being billed. In both cases the ACC enforces a

correct schedule. Consider now the possible need for compensation. bill is a single step and thus does not re-

quire compensation. The compensation for new-order consists of returning any items in

orderlines with order-id equal to onum to stock and removing the relevant tuples from orders and order-lines. The concurrent execution of multiple instances of new-order, one of which is

rolled back, can lead to a state that could not have been reached by the serial execution of the remaining transactions. For example, let Tl be an instance that is rolled back and let T2 be another instance some of whose steps are executed between the forward steps and the compensating step of Tl Tz might request items that are returned to stock by the compensating step of Tl T2 might or might not get the item depending on the item’s stock level and whether the request was made befort: or after the item was returned by Ti. The schedule is acceptable since it is semantically correct.

5. EXPERIMENTAL RESULTS

An ACC was implemented as a modification to t,he concurrency control provided in the CA- Open Ingrestm database system, and the resulting system was tested using the TPC-Ctm Bench- mark Transactions. The same load was applied to the unmodified Open Ingres system, and the performance of the two systems was compared. These experiments and their results are described in this section.

Open Ingres version 2.0 provides serializability as the default isolation level using strict two- phase locking of tables, pages, and tuples with intention locks when appropriate. Additionally, indices are used when appropriate, to allow the system to use small granularity locks (page and tuple) as much as possible.

Page 18: Concurrency control for step-decomposed transactions

690 ARTHUR J. BERNSTEIN et al.

An optimized one-level ACC was implemented by adding assertion mode locks to conventional

locks within Open Ingres. Assertional lock conflicts are checked by indexing into a vector array. Hence the time to acquire an assertional lock is comparable to that required to acquire a con-

ventional lock. It was not, necessary to change the basic deadlock detection algorithm (with the exception of modifying the algorithm’s notion of conflict to include assertional conflicts). However, the action taken when a deadlock is detected was changed and, in the ACC, depends on the nature of the deadlock.

In addition to the events discussed in Section 3.4, before a step terminates, the ACC stores an

end-of-step record, used in crash recovery, in the log. These log record are not forced, since the

steps of an uncommitted transaction need not be durable. Furthermore the transaction saves some of its work area in a database table so that compensation can be initiated if the transaction needs

to be rolled back due to a crash. When the transaction commits, the log is flushed and all locks are released. The system uses a single log, and thus when a transaction T commits, flushing the

log makes all the preceding steps of other transactions (whose results the steps of T might have read) durable. These actions represent the additional overhead (beyond acquiring and releasing assertional locks) of the ACC and are included in the measured results. Additional details of the implementation of the ACC are contained in [13] and [14].

5.1. The TPC-C Benchmark

The TPC-C Benchmark (designed by the Transaction Processing Performance Council) is a

popular benchmark for online transaction processing systems. The benchmark simulates a simple

order processing system for a geographical area served by a set of warehouses. The area served by

each warehouse is divided into districts. An order is placed by a customer over a terminal connected

to a particular warehouse. There are five transaction types, new-order, payment, delivery,

stock-level, and order-status, of varying frequency, response time requirements and data access characteristics.

As an example of the interaction between transactions, consider new-order and payment

transactions. Each tuple in the district table describes a district and contains a counter used to number the orders in that district. Orders in a district are required to be consecutively numbered. Each new-order transaction increments the counter and hence must acquire a write lock on the tuple. This tuple also contains the year-to-date total payments for orders placed within the dis- trict, and hence new-order and payment transactions relating to the same district conflict since

they all update the tuple. This conflict can have a substantial impact on performance, since these two transaction t,ypes together are specified to constitute approximately 86% of the transaction

mix. The ACC is capable of recognizing that updates to the counter and the year-to-date payment field do not interfere and hence allows transactions of these two types. within the same district, to interleave.

Serializable isolation is specified for all transaction types except one, which is allowed to run

at the READ COMMITTED level. The standard requires that 1% of new-order transactions abort. The experiments discussed here are based upon the transactions and arrival model of revision 3.0 [31]. While the general format of the tested transactions satisfy the requirements of the benchmark, most of the experiments violated the benchmark specificqtion in at, least one respect.

Additionally the decomposed transactions do not, in general, satisfy the isolation requirements of the specification.

Each transaction type within the TPC-C benchmark was analyzed and decomposed into steps. The decomposition and interference analysis was similar to that in Section 4, except that the TPC- C analysis was more involved since the number of tables and transactions is greater, and since the TPC-C specifications give twelve consistency constraints that form the basis for I. An interference table for the benchmark application was constructed at design time, based on the decomposition, the postcondition of each transaction type, and I. This table was loaded into the database system before the initiation of transactions.

The decomposition was made using data output by an instrumentation of the database engine for measuring lock contention. Lock contention with and without decomposition was measured.

Page 19: Concurrency control for step-decomposed transactions

Conc:llrrcncy Control for Step-Decompostxi ‘I‘ransactions Ii9 I

The instrumenta.tion provided inforrnation about the specific stat,ements requesting and holding

contention causing locks, as well as the magnitude of the delays (correcting for multiple blocking

and waiting t,ransactions). This information wzhs used, along with a semantic analysis of the application, t.o decompose the transactions so that the expected benefit of t,he decomposition was maximized. Details are contained in [13].

Only four of the five transaction types were decomposed (since it was detctrmincd that. the fifth does not substantially contribute to locking delays). In total, eleven distinct forward steps t.ypes were defined. In addition compensating steps were‘ implemcnteti, including one for t.11~ new-order transaction as a whole. sincz tho specification forces the aborts of new-orders to occur during the

order of thts final item.

5.2. Exprrimcnts

WC: expect,ed t!he performance benefits of the ACC to bc most obvious when lock contention

was high, either due t,o hotspots or due to long-running transact.ions. To study these effects, we identified a number of variables whose values could be used to parameterize the experiments. For the ttxptriments reported on here, these include:

Degree of Concurrency - The level of concurrency is lirnit.ed by the number of t.erminals con- ner:t.etI T.O a wartbhouse. The greater the number of terminals per warehouse, the greater the potcbntial for lock contention,. Our experiments varied the number of terminals per warrhoust:

from ten (as required by the specification) to forty.

Distribution of Reads and Writes - The amount of lock contcntion is directly related to the

number of t,ransactions and the mode of the locks t,hey acquire. Some experiments replaced t.hf> read-only stock-level transaction with a read-writct restock transaction. restock acquires writ.cl locks on a large number of commonly ordered items (items arc’ select.eti in a

non-uniform manner).

Lock Duration The time a transaction holds a lock has an effect on lock contention. As

locks are hold for longer periods, especially when held on hotspots, lock queues tend to

grow. Duration was varied by adding simulated compute time before lock release for the

stock-level/restock t,ransactions.

5.3. Instrllnlentatzorl

The actual perforrnanct! results described in Section 5.4 are based upon timings performed in application code, not in the database engine itself. Some instrumentation was added t,o thr

database engines and used in a parallel series of experiments to determine the nature of the lock conflicts. Since this instrumentation reduces the efficiency of the server, instrumcntat,ion runs and the runs used for performance comparisons were executed separately.

The instrumentation of the server seeks to determine the statements acquiring and waiting for contention causing locks. The system records the occurrence of a specific set of significant events (for example when a lock is granted from the wait queue). The events are recorded with sufficient information to reconstruct, all lock delays offline. These dolays arc then loaded into a database where t,hcy (‘an b(: queried.

5.4. Experimental Results

Response timtt was used as a measure of t.he benefit of the .4CC. Throughput is also favorably affected by t,hc ACC. Since: the benchmark specifies that each virbuat terminal spend most, of its time either keying-in or thinking, however, a large decrease in response time achieves a much smaller increase in throughput at a fixed number of terminals. Increasing the number of terminals while holding response t.ime below the response time t,hrcshold in the benchmark specifications would also itlust.rate the benefit of t,he ACC. Unfortunately our experimental setup proved insufficient to drive t,he dat,abLs(l at t.he required level of concurrency.

Page 20: Concurrency control for step-decomposed transactions

692 ARTHUR 3. BERNSTEIN et al.

restock with 3 Second Delay stock_level with 3 Second Delay restock with 0 Second Delay stock_level with 0 Second Delay

0.5 I

10 __ 20 30

Number of Terminals per Warehouse

i ._

40

Fig. 3: Ratio NACC/ACC Response Time in Four Configurations

Figure 3 compares the average response time using strict two-phase locking and the ACC. The ordinate is the ratio of the two measures and values greater than one indicate that the ACC

outperforms strict two-phase lockingAs can be seen from the figure, the benefit of the ACC is not evident in the stock-level configuration with no sleep time and is most evident in the restock

configuration with three seconds added sleep time. The ordering of the benefit of the ACC is consistent with the measured total contention in the strict two-phase locking system in various

configurations.

The superior performance of the ACC is to be expected in high contention environments where

lock conflicts have a major effect on performance. A puzzling aspect of the results is the magnitude of difference between the various configurations. The benchmark specification calls for stocklevel to constitute approximately four percent of the workload. The difference in mean response time between, for example, the stock-level experiment with no added compute time and that with three seconds added compute time is greater than the three seconds multiplied by the four percent of the transactions directly effected. This is not surprising since presumably the stocklevel transactions are delaying other transactions and thus other transactions response times are also affected. What is surprising is that the measured magnitude of the delays to transactions waiting for stock-level still accounts for only a small portion of the response time difference.

The mystery is actually explained by the same instrumentation results that exposed it. While the majority of the response time differences between the experiments are attributable to increased response times for transactions not directly affected by the stock_level/restock transactions. By taking the database of delay instances from the instrumentation and recursively creating chains of delaying transactions, we find that the stock_level/restock transactions are delaying a small number of transactions directly. These directly delayed transactions then delay a large number of other transactions. This results in a higher than expected drop in response time in the ACC. Details of the analysis and the algorithms for reconstructing lock wait information are contained in [13].

Page 21: Concurrency control for step-decomposed transactions

Concurrency Control for Step-Decomposed Transactions 693

6. CONCLUSION

Our goal in this paper has been to demonstrate that the performance of a transaction processing system can be improved by decomposing the transactions into atomic, interleavable steps and then scheduling those steps with an assertional concurrency control to achieve semantic correctness. Semantic correctness is a new correctness criterion, weaker than serializability, that guarantees only that, each transaction satisfies its specifications. We designed such an ACC and implemented it within the Open Ingres database management system. We then performed experiments using the TPC-C benchmark transactions, which we decomposed into steps according to our theory. The experiments demonstrated significant performance improvements, up to SO%, when lock contention is high, when long running transactions are a part of the transaction suite, and when sufficient system resources arc present to support the additional concurrency that the ACC makes possible.

Acknowledgements - This paper is based upon work supported by NSF grant CCR-9402415. The authors would like to express their gratitude to Computer Associates International tm for the donation of the copy of CA-Open Ingre@ used in these experiments. An earlier version of this work appeared in [5].The Authors would like to express their gratitude to Wai-Hong Leung who made substantial contributions to the design of the ACC and performed many of the experiments reported on in [5].

REFERENCES

[l] D. Agrawal, A. El Abbadi, and A. Singh. Consistency and orderability: semantics-based correctness criteria for databases. ACM Transactions on Database Systems, 18(3):460-486 (1993).

[2] P. Ammann, S. .Jajodia, and I. Ray. Using formal methods to reason about semantic-based decomposition of transaction. In Ppnceedings of the Conference on Very Large Databases, Ziirich, Switzerland, pp. 218-227, Morgan Kaufman (1995).

[3] P. Ammann, S. Jajodia, and I. Ray. Applying formal methods to semant,ic-based decomposition of transaction. ACM ?Fansactions on Database Systems, 22(2):215-254 (1997).

[4] B. Badrinath and K. Ramamritham. Semantics-based concurrency control: beyond commutativity. ACM Transactions on Database Systems, 17(1):163%199 (1992).

[5] A. Bernstein, D. Gerstl, W. Leung, and P. Lewis. Design and performance of an assertional concurrency control. In IEEE lnnternational Conference on Data Engineering, Orlando, FL, pp. 436-445, IEEE Computer Society

(1998).

[S] A. Bernstein, and P. Lewis. High performance transaction systems using transaction semantics. Distributed and Parallel Databases, 4( 1):25-47 (1996).

[7] A. Buchmann, M. Ozsu, M. Hornick, D. Georgakopoulos, and F.. Manola. A transaction model for active distributed systems. In A. Elmagaramid, editor, Datubase Transaction Models for Advanced Applications, pp. 123-158, Morgan Kaufman Publishers (1992).

[8] J’. Chrysanthis, and K. Ramamritham. ACTA:the saga continues. In A. Elmagaramid, editor, Database

Tran.saction Models for Advanced Applications, pp. 349-398, Morgan Kaufman Publishers (1992).

[9] R. Cordon and H. Garcia-Molina. The performance of a concurrency control algorithm that exploits seman- tic knowledge. In Proceeding of the 5th IEEE International Conference on Distributed Computing Systems, Denver, CO, pp. 350 -358, Computer Society Press (1985).

[lo] A. Farrag, and M. Ozsu. Using semantic knowledge of transactions to increase concurrency. ACM Trcmsactions on Database Systems, 14(4):503-525 (1989).

[ll] H. Garcia-Molina. Using semantic knowlege for transaction processing in a distributed database. ACM pans- actions on Database Systelhs, 8(2):186-213 (1983).

[12] H. Garcia-Molina, and K. Salem. SAGAS. In Proceedings of ACM-SIGMOD International Conference on Management of Data, San Francisco, CA, pp. 249-259, ACM Press (1987).

1131 D. Gerstl. Senrc&ic Concurrenq Controt, Recovery, und Performance Profiling for Improving Response Time in Database Systems. PhD thesis, State University of New York at Stony Brook (1998).

[14] D. Gerstl, A. Bernstein, and P. Lewis. Implementing and using multi-step transactions. Internal Report (1999).

[15] M. Hcrlihy. Apologizing versus asking permission:optimistic concurrency control for abstract data types. ACM Transactions on Database Systems, 15(1):96-124 (1990).

[16] C. Hoare. An axiomatic basis for computer programming. Comm?.mications of the Association for Computing Machinery, 12(10):576-580 (1969).

[17] S. Jajodia, I. Ray, and P. Ammann. Implementing semantic-based decomposition of transaction. In Proceedings gth Conf. on Advanced Information Systems Engineering, Barcelona, Spain, pp. 75-88, Springer-Verlag (1997).

[18] H. Korth, W. Kim, and F. Bancilhon. On long-duration CAD transactions. Information Sciences, 46:73-107 (1988).

Page 22: Concurrency control for step-decomposed transactions

694 ARTHUR J. BERNSTEW et al.

1191 H. Korth, and G. Speegle. Formal model for correctness without serializability. In Proceedings of ACM- SIGMOD Intemotaonal Conference on Management of Data, Chicago, IL, pp. 379-386, ACM Press (1988).

[20] E. Kiihn, F. Puntigam, and A. Elmagaramid. Multidatabase transaction and query processing in logic. In A. Elmagaramid, editor, Database ‘Transaction Models for Advanced Applications, pp. 297-348, Morgan Kaufman Publishers (1992).

[21] D. Liang, and S. Tripathi. Performance analysis of long-lived transaction processing systems with rollbacks and aborts. IEEE tinsactions on Knowledge and Dafa Engineering, 8(5):802-815 (1996).

[22] D. Lomet. MLR: A recovery method for multi-level systems. In Pruceedtngs of ACM-SIGMOD International Conference on Management of Data, San Diego, CA, pp. 185-194, ACM Press (1992).

[23] N. Lynch. Multilevel atomicity-a new correctness criterion for database concurrency control. ACM tinsactions on Database Systems, 8(4):484-502 (1983).

[24] C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. ARIES: a transaction recovery method

supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM tinsactions on

Database Systems, 17( I):944162 (1992).

[25] M. Nodine, S. Ramaswamy, and S. Zdonik. A cooperative transaction model for design databases. In A. Elmagaramid, editor, Database tinsaction Models for Advanced Applications, pp. 349-398, Morgan Kaufman Publishers (1992).

1261 S. Owicki. and I>. Grios. An axiomatic proof technique for parallel programs i. Acta Informdica, 6:319-340 (1976).

(27) R. Rastogi, S. Mehrotra, Y. Breitbart, H. Korth, and A. Silberschatz. On correctness of non-serializable executions. In A CM Princtples of Database Systems, pp. 97-108 (1993).

[28] L. Sha, J. Lehoczky, and E. Jensen. Modular concurrency control and failure recovery. IEEE 7kansactlons on Computers, 37(2):146 -169 (1988).

[29] D. Shasha, F. Llirbat, E. Simon, and P. Valduriez. ?‘ransaction chopping: algorithms and performance studies. ACM nansacfiorrs on Database Systems, 20(3):325-363 (1995).

[30] A. Sheth, M. Rusinkiewicz, and G. Karabatis. Using polytransactions to manage interdependent data. In Elma- garamid, A., editor, Database ‘Transaction Models for Advanced Applxalions, pp. 555-582, Morgan Kaufman Publishers (lYY2).

(311 Transaction Processing Performance Council (TPC). TPC Benchmark’“’ C, Standard Specification, Revi- sion 3.1 (lY96).

(321 H. Wichter, and A. Reuter. The contract model. In A. Elmagaramid, editor, Database 7tunsaclion Models for Advanced Applrcalions, pp. 220-263, Morgan Kaufman Publishers (1992).

133) W. Weihl. Commutativity based concurrency control for abstract data types. IEEE 7bnnsaclions on Computers. 37(12):1488. 1505 (1’388).

[34] G. Weikum. Principles and realization strategies of multilevel transaction management. ACM 7kunsacliorrs on

Database Systems. 16( 1):132-180 (1991).

APPENDIX A: PROOF OF THEOREMS

Theorem 1 If in a schedule, S, when each step is executed its precondition is true and the result of a transactron is not invalidated by concurrently executing transactions, then S is a semantically correct schedule.

Proof. Consider a schedule, S, in which the transactions T,, T2, . . . , Tk are executed concurrently under the control of the ACC. Since the ACC guarantees that when each step is initiated its precondition is true, it follows that I is true when S completes. We show that QS is true when S completes by using induction on the number of transactions. We do this by demonstrating that Qt is a postcondition of the schedule, Sk, where Sk is obtained by deleting from S all transactions T, where j > k.

For the case k = 1 WC have

Since S1 consists only of the transaction Tl, it follows from (1) that Q1 is true in the final state. Assertions in the term db = dbinil about t.he initial state of the database variables E (which are modified in T1) are replaced by assertions about the logical variables %. Some of these logical variables are used in Q1 to denot.e the initial values of the corresponding real variables. Conjuncts

Page 23: Concurrency control for step-decomposed transactions

Concurrency Control for Step-Decomposed Transactions 695

of db = dbinit not involving q remain part of Q&. We assume that all conjuncts of db = dbinit describing the initial state of ?Ei are interfered with and hence the third term of (5) is empty. Thus:

(5) is true in the base case. The inductive assumption asserts that Qz-’ is a postcondition of Sk-l. Consider Sk. Qk is a postcondition of Sk since the ACC guarantees that when each step of Tk

starts, its precondition is true. To demonstrate that the second t,erm of (5) is also a postcondition of Sk, we consider two cases.

case 1: Tk starts after Tk_1 terminates -- Then, by the induction hypothesis, Qt-’ describes the database state when Tk starts in Sk. The reasoning used in the base case demonstrates

that (5) is true of the final state of Sk.

case 2: Tk starts before Tk-1 terminates - Then the steps of Tk are interleaved with some

subset, {Tj, Tj+l,. . . , Tk_l}, of the transactions in Sk-l. By the inductive assumption, Qk-’

is a postcondition of Sk-l. Consider an arbitrary conjunct, C, of Qi-‘. If the last time C became true in Sk-1 was at the completion of T,, i < j, then C is true in Sk when Tk is initiated and C 1% is true in Sk when Tk completes, since Tk only modifies q. If i > j,

then since the ACC guarantees that Tk does not invalidate the precondition of any step of, or the postcondition of, any transaction in the subset, C is true in Sk when Tk completes.

Hence, Qi--’ 12 .L 13 true in the final state of Sk. Thus, the second term of (5) is true when

Sk completes.

Since the third term of (5) contains only conjuncts of Qt-’ that are not interfered with by Tk, they all must, be true when Tk completes in Sk. Hence, 0% is the postcondition of Sk. I7

Theorem 2 The Simplified One-Level ACC Algorithm guarantees that when each step is initiated its precondition is true and the result of each transaction is not invalidated by concurrently executing transactions.

Proof. When each step is initiated its precondition is true follows from Lemma 1, and the result of each t,ransaction is not invalidated by any concurrently executing transaction is implied by

Lemma 2. CI

Lemma 1 The Simplified One-Level ACC Algorithm guarantees that when each step is initiated its precondition is true.

Proof. (k)y contradiction) Consider a schedule, SC, produced by an ACC using the Simplified One- Level ACC Algorithm and an equivalent serial schedule of steps in step-commit order, .!Yc~~‘~~‘.

Assume that in ScSeriai some step executes from a state not satisfying its precondition. For this to occur, either an initial step of a transaction executes from a state not satisfying its precondition, or an active assertion of some transaction is invalidated by a concurrent step. We examine the first instance in ScSerin’ where one of these occurs:

An initial step S~,J of a transaction T; executes from a state not satisfying its precon-

dition: Si,, must request assertional locks on all items in pre(Si,l), and the invalidated assertion

must be a conjunct of I, I,. Name the transaction that invalidated I,, Tk, and its current control point is after a step Sk,l, where Sk,J; . . . ; sk,[ interferes with I,. Sk,J must modify at least one item mentioned in pre(S,,l). Name the last such item 2. There are two cases:

(A) If Sk,l modifies z after Si,l initiates: Then the ACC will dela,y Sk,l at least until Ti releases

its FXX(S~,~) locks, and therefore &,I precedes Sk,! in ScSeria’, contradicting our assumption

that Tk’s control point is after SI,J.

(B) If Sk,1 modifies z before &J initiates: Si,l will be delayed since Sk,l; . ; Sk,l interferes with I,,. Tk must restore I, before it commits and Si,j and will serialize after Tk restores Inn, which is no earlier than. the commit of some later step Sk,nz where Sk,l; . . ; Sk,,,, does not interferes

with &. Thus Sk,=% precedes &,I in ScSerial, and IP is true when $1 executes, contradicting

our assumption that S~J executes from a state not satisfying its precondition.

Page 24: Concurrency control for step-decomposed transactions

696 ARTHUR J. BERNSTEIN et al.

Modilication ofy

T, - s k.I

-__- T time >“,I s s,,,

I+’

Fig. 4: y is Modified before Ti is Initiated-Actual Schedule

Modification of y

I’,__ i S t.1

______ S,.,., S,,,

Fig. 5: y is Modified after Z’, is Initiated-Actual Schedule

Modification of y

I-,__ :

T, ___ ‘w,

v

Fig. 6: Commit Order Serialization of Figure 5

An executing step, Sk,,, invalidates the active assertion of a concurrent transaction: Name the invalidated assertion pe(Si,j). Sk,1 must have modified a variable, 2, referenced in pre(Si,j), and .S’~.J commits between the commit of Si,j_i and Si,j. Sk,, interferes with pe(Si,j).

There are two cases to consider:

(A)

(B)

If Sk,1 modifies x after Si,j_i commits: Then Ti has a ve(Si,j) lock on x when Sk.1 tries to update x, and the ACC will delay Sk,l at least until the commit of Si,j. Therefore Si,j precedes Sk,, in ScBerial and pe(Si,j) is not an active assertion when Sk,, executes, contradicting our assumption that SkJ invalidates the active assertion pe(Si,j).

If Sk,l modifies x before Si,j-i commits: Consider a modification of some item y by Sk,, satisfying the following conditions:

(1) y is held with an assertional lock by Ti after Si,j initiates (y might have been locked earlier, or might have been locked when Si,j initiates).

(2) y is the last item Sk,[ modifies satisfying condition (1).

The modification of y by Sk,1 must have occurred before Si,j_i committed (if not, Sk,l would wait for the d(pe(Si,j)) lock to be released on y, and Si,j would precede Sk.1 in ScBeria’). There are two cases:

(i) Sk,1 modified y after Ti initiated: Name the step of Ti executing when Sk,1 last modified y: Si,j-p (Figure 5).

Split Sk,l into Si,, and S& where SL,, are the operations of Sk,[ in SC before the commit

of Si,j_p and SE,r are the operations after (Figure 7). SC is equivalent to the serial

Page 25: Concurrency control for step-decomposed transactions

Concurrency Control for Step-Decomposed ‘Ikansactions 697

Modification of y

Ti ___ ‘4-p ______ sfi.j-l ‘i.1

---+

Fig. 7: An Equivalent Schedule to Figure 5

Modification of y

______

Fig. 8: Another Serialization of Figure 5

Modification of y

T,_ : S

kJ

Fig. 9: Commit Order Serialization of Figure 4

schedule SC’ = . . . ; Si,l; Si,j_+; . . . ; Si,j_l; Sz,l; Si,j (Figure 8). Since the ACC alIowed Sk,l to modify y while Si,j_-p was executing, S~,J (as a whole) does not interfere with

~&j-p). S& d oes not modify any item mentioned in pe(Si,j_p) (since Tk does

not release asseitional locks until commit, and by condition (2)). Thus Sl,, does not interfere with pe(Si,jUp).

In SC’, therefore, pe(Si,j_,) is true when Si,j_-p+l initiates and pre(Si,j) is true when Si,j_l completes. Since Si,l does not access any item in pre(Si,j) (by condition (2)), re(Si,j) is true when Sk,1 completes in SC’ and therefore true when Sk,l completes in SCserial, contradicting our assumption that Sk,1 invalidates pre(Si,j).

(ii) This access occurred before Ti initiated (Figure 4): Split SkJ into Si 1 and S& where

Si,, are the operations of Sk,l in SC before the initiation of SiVj and ,!!& are the opera- tions after (Figure 10). SC is equivalent to the serial schedule SC’ = . . . ; S:,,; &,I; . . ; Si,i_l; Sz,l; Si,j (Figure 11).

The ACC allows &,I to get assertional locks on all items in pre(Si,l), and thus the sequence Sk,, ; . . .; Sk,l (as a whole) does not interfere with pre(Si,l). S& does not modify any item mentioned in pre(Si,l) ( . since Tk does not release assertional locks until commit, and by condition (2)). Thus the sequence S~C,~; . . ; Si,, does not interfere with

pre(Si,l).

Thus in SC’, pre(Si,l) is true when SQ initiates and pre(Si,j) is true when &_I completes. Since S& does not modify any items in pe(Si,j) (by condition (2)), pre(Si,i) is true when Sk,l completes in SC’ and therefore true when Sk,1 completes in Scserial, contradicting our assumption that Sk,l invalidates pe(Si,j). 0

Page 26: Concurrency control for step-decomposed transactions

ARTHUR J. BERNSTEIN et al.

hloditicauon of y

T, ,,mC >“--- L, s,.,

Fig. 10: An Equivalent Schedule to Figure 4

Modification of y

Fig. 11: Another Serialization of Figure 4

Lemma 2 The Simplified One-Level ACC Algorithm guarantees that the result of each transaction is not invalidated by concurrently executing Iransactions.

Proof. (by contradiction) Assume that St,1 invalidates Q1 sometime after the commit of T, in the equivalent serial schedule of steps, Sc,,+-ial. Since the d(Q,) locks are held until all concurrent. transactions have completed? and since Tk is a concurrent transaction, the invalidation must occur before T, releases its d(Qi) locks. Consider the addition of an imaginary step of Ti, Sl,n+lr that does nothing and occurs at the point Ti releases its A(&,) locks. The d(Q,) locks provide the same

protection that a regular precondition lock does, and thus by Lemma 1 Sk,l cannot invalidate Q, and serialize between the completion of S,,, and [the imaginary] S,,n+l. Thus in Scaerial, Q1 is not invalidated after the commit of T, by a concurrent transaction, contradicting our assumption. 0