a model of concurrency control in distributed database systems

6
Volume 14, Number 5 INFORMATION PROCESSING LETTERS 23 July 1982 A MODEL OF CONCURRENCY CONTROL IN DISTRIBUTED DATABASE SYSTEMS P.G. REDDY, Subhash BHALLA and B,E. PRASAD Computer Cenrre, Indian Institute of Technofogv, Delhi, New Delhi 110016, India Receivexl 11 May 1981; revised version received 27 April 1982 Keywork Concurrency control, transaction processing. database, stochastic process 1. Introduction Concurrency control has been an active re- search and development field for the past several years. The set of updata algorithms proposed so far, for distributed databases, can be broadly di- vided into two classes: the centralized and distrib- uted control algorithms. A representative central- ized control algorithm has been chosen for the purpose of study here. A model of such a scheme, based on a stochastic process, has been developed and a parallel between the initial phase of seizing locks and the pure birth process has been drawn. Though the exact formula obtained is not of any immediate consequence, it has given us an insight into the behaviour of the update algorithm. With the help of the model, we are able to identify key parameters which help us in imprcving the cf- ficiency of the concurrency control scheme. 2. The algorithm For expository convenience, a simple set-up has been chosen. The system is assumed to consist of a multiple number of nodes and each node is as- sumed to have a copy of the database. The two- phase locking al r&m, with preordehg as a means of dead-l&k prevention has been adopted for the study here [2] (see also [41, CLAA algo- rithm). In the two-phase centralized control algo- rithm, one special node (the central node) is as- signed the concurrency control function. An up- date transaction T that arrives at node X is processed #asfollows: Step 1. ‘Node X sends a request to the central node to obtain locks for all the items referenced by transaction T. Step 2. The central node checks all the re- quested locks. In case all the locks can be granted, a grant message is sent back to node X. If some items are already locked, then the request is queued. There is a queue for each item and a request w,aits in one queue at a time. To prevent deadlocks, all transactions request locks for their items in the same predefined order. Step 3. Once node X gets the grant message, it can proceed with the transaction. Step4. The locks held by transaction T at the central node are released only after T has been able to communicate its update values to all the nodes in the set-up. Let us consider a set of objects (items) at the central node. Let their names be 11, 12, 13,. . . , IN. Let the transactions which arrive for processing be given as in Table 1. Tl will find all the locks free. It gets a grant message as soon as it arrives at the central node (see Table2: Locks obtained by the transactions are denoted by a * ‘. fn case the transaction is waiting ii a queue to obtain an item, this is denoted by a ‘0’. T2 will obtain locks for items 13, 17, I8 and 110 but it will find the item 111 occupied, hence it will wait in the queue for item 208 0020-O 190/82/0000-OOOf~/$O2.75 @ 1982 North-Holland

Upload: pg-reddy

Post on 23-Aug-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A model of concurrency control in distributed database systems

Volume 14, Number 5 INFORMATION PROCESSING LETTERS 23 July 1982

A MODEL OF CONCURRENCY CONTROL IN DISTRIBUTED DATABASE SYSTEMS

P.G. REDDY, Subhash BHALLA and B,E. PRASAD Computer Cenrre, Indian Institute of Technofogv, Delhi, New Delhi 110016, India

Receivexl 11 May 1981; revised version received 27 April 1982

Keywork Concurrency control, transaction processing. database, stochastic process

1. Introduction

Concurrency control has been an active re- search and development field for the past several years. The set of updata algorithms proposed so far, for distributed databases, can be broadly di- vided into two classes: the centralized and distrib- uted control algorithms. A representative central- ized control algorithm has been chosen for the purpose of study here. A model of such a scheme, based on a stochastic process, has been developed and a parallel between the initial phase of seizing locks and the pure birth process has been drawn. Though the exact formula obtained is not of any immediate consequence, it has given us an insight into the behaviour of the update algorithm. With the help of the model, we are able to identify key parameters which help us in imprcving the cf- ficiency of the concurrency control scheme.

2. The algorithm

For expository convenience, a simple set-up has been chosen. The system is assumed to consist of a multiple number of nodes and each node is as- sumed to have a copy of the database. The two- phase locking al r&m, with preordehg as a

means of dead-l&k prevention has been adopted for the study here [2] (see also [41, CLAA algo- rithm). In the two-phase centralized control algo- rithm, one special node (the central node) is as-

signed the concurrency control function. An up- date transaction T that arrives at node X is processed #as follows:

Step 1. ‘Node X sends a request to the central node to obtain locks for all the items referenced by transaction T.

Step 2. The central node checks all the re- quested locks. In case all the locks can be granted, a grant message is sent back to node X. If some items are already locked, then the request is queued. There is a queue for each item and a request w,aits in one queue at a time. To prevent deadlocks, all transactions request locks for their items in the same predefined order.

Step 3. Once node X gets the grant message, it can proceed with the transaction.

Step4. The locks held by transaction T at the central node are released only after T has been able to communicate its update values to all the nodes in the set-up.

Let us consider a set of objects (items) at the central node. Let their names be 11, 12, 13,. . . , IN. Let the transactions which arrive for processing be given as in Table 1.

Tl will find all the locks free. It gets a grant message as soon as it arrives at the central node (see Table2: Locks obtained by the transactions are denoted by a ‘ * ‘. fn case the transaction is waiting ii a queue to obtain an item, this is denoted by a ‘0’. T2 will obtain locks for items 13, 17, I8 and 110 but it will find the item 111 occupied, hence it will wait in the queue for item

208 0020-O 190/82/0000-OOOf~/$O2.75 @ 1982 North-Holland

Page 2: A model of concurrency control in distributed database systems

Volume 14, Number 5 INFORMATION PROCESSING LETTERS 23 July 1982

Table 1

Arrival Transaction Items requested by the transaction

1st Tl 14, 16, Ili 2nd T2 13, 17, IS, 110, I1 1 3rd T3 112, 114 4th T4 18, 115 5th T5 113,114 6th T6 113,117

111 after obtaining locks for 13, 17, IS and 110. T3 gets locks for I12 and 114, therefore the requesting node for T3 is sent a grant message. Such transac- tions are encircled in Table 2 (similarly for trans- actions T4, T5 and T6, see Table 2).

We proceed to form the mathematical model of this process in the next section.

3. Model of transaction behaviour

Our main objective is to identify the set of key parameters which can help us in increasing the efficiency of the concurrency control algorithm. In order to increase the efficiency of the concurrency control technique, our specific objectives for the given algorithm are

(1) to increase the number of those transactions which do not need to wait in a queue for obtaining a lock (transactions of no-wait group), and

(2) to reduce the average waiting time that the remaining transactions (wait group) need to spend for obtaining locks held by other transactions.

For the purpose of making a stochastic process model, we consider the duration of time at the central node when the system just starts receiving the transactions and processes them up to the time until any one of the transactions returns locks held by it. We are now able to relate the process that we have described to the pure birth process [3). When exactly n locks have been granted, i.e., n items from the database are held by transitions which have so far arrived at the system, we say that the system is in state n. A stochastic process is a time-homogencus birth process, if the probabil- ity of a transition from state n to state n + 1 is given by

P(n+n+ 1 in(t, t+At)) =

=&At, X,20

and if no other type of transition (e g., n + n - 1) is possible. It is obvious that n can only increase and the identification of the two processes will be complete when we have expressed X, in terms of the parameters of our model. Some underlying

Table 2 Transaction processing map

ITEMS II 12 13 14 15 16 17 18 19 110 111 I12 113 114 115 116 1

TRANSACTIONS

0 11 * * * 12 +t- ** * 0

I --

0 T3 * *

T4 0

-- 1

T5 * 0 - r

T6 1 0

203

Page 3: A model of concurrency control in distributed database systems

Volume 14, Number 5 INFORMATION PROCESSING LET I-ERS 23 July 1982

assumptions which we need to express are the following:

(1) The duration of time under consideration for the stochastic model is the time before any transaction completes its processing and returns its locks, since the start of the process.

(2) There is a certain fixed probability of a transaction arriving at the system and requiring at least one item.

(3) Time-interval At is sufficiently small, such that at most. one item is picked up during any such interval (for the purpose of granting locksj.

(4) l N, the total number of objects (items) which constitute the database, is very large.

3. I. Derivation of parameters

X, denotes the rate at which the system may change its state from n to n + 1. We are interested in studying the two groups of transactions (no-wait group and wait group). We define

A, =X,+X,, n=O, l,..., N- 1,

where X v is a component of X *, when the transac- tion in question belongs to transactions of the no-wait group and X, is the other component of X, when the transaction may belong to the wait group. The expressions for A v and X, (see Ap- pendix A) are:

X,=(N-n)cYAt, n=O, l,..., IN- 1,

and

A,= n(N-n)flAt, n=O, l,,. ,N- 1.

Let X(t) be the number of items, which are in busy state at time t and let

P,,(t)=P[x(t)=n(X(O)=O], n=O, l,...,N- 1.

P,(t) is the probability that, at time t, there ar : n items in the resource list, that have been allotted to various transactions. Under the Markov as- sumptions (see Appendix A) that have been made, X(t) is ti pure birth process with parameters h n (n = 0, 1 , . . . 9 N). Hence for such a case 13)

wher,e

Aj == ‘0% ” %-, n n , n=O, l,...,N- 1,

II (hieAj)

i=o i+j

also Xi #Xj for all i #j [‘.’ A, = (N - n)(a + n/S)]. In order to keep the mathematical treatment of

the model simple, we will not consider finding out components of P,(t) contributed by transactions of either the no-wait group or the wait group (see [l] for details).

4. IDiscussion

The expression for P,(t) is mainly dependent on h,, i.e., if a total of m transactions is assumed to have reached the central node at any instant of time and the total number of database items de- sired by them is M, then n will tend to approach Matafasterrate,ifeachofX,(n=O, l,...,N-1) has a higher value. Apparently, increasing h, is beyond our control and it does not fall in line with our objectives. We can only manipulate the value of one component of h, at the cost of the other. Looking at the expression for h,, again

‘n =Xv+h2, n=O, l,..., N- 1.

We would like to increase the number of those transactions which belong to the no-wait group and want to reduce the number of transactions falling in the wait group. In other words, one needs to increase the contribution to h, from the transactioirs in the no-wait group (h,) and de- crease the contribution to A, from the transactions ia the wait group (X,). We define the term perfor- mance factor (PF) for which we need to obtain a high value, in order to improve the efficiency of the concurrency control algorithm. Hence

A, z - PF x or W

= (N-da: n(N - n)@ Or

n

P,,(t) = x AJ1, exp( -hjt] j=O

240

a = _-- n/3’

Page 4: A model of concurrency control in distributed database systems

Volume 14, Number 5 INFORMATION PROCESSING LETTERS 23 July 1982

Heace, the two parameters which have a bear- ing on the performance of concurrency control algorithm are a/j3 (to be called the coefficient of interference, CI) and n.

5. Application of the model

We consider various ways of obtaining a higher performance factor, firstly by varying CI and then by varying n.

(1) To improve the value of CI, we need to introduce a bias in the normal situation such that the value of (Y increases (maybe at the cost of the value of /?).

Consider that, while designing the system, we experimentally determine the most frequently oc- curring transactions and make a list of items which these may desire. We modify the serial list with the locks granted. The items desired by the most fre- quently occurring transactions are grouped to- gether. This grouping may be put as close as possible to the beginning of the serial order list. Hence, in the example considered in Table2, let Tl and T3 be the most frequently occurring trans- actions. The items desired by them are grouped together 8s shown in Table 3.

The total number of transactions which have

been granted all the items desired by them has increased from 2 to 4. It is evident from the situation that an introduction of a convenient bias in the serial order at the time of creation of the database will result in an improvement in the efficiency of the algorithm.

(2) The other factor which the performance factor depends on, is n. PF varies inversely as n, where n is the number of locks held by the trans- actions which have arrived at the central node so far. This corresponds to the well-known fact that locks, at the central node need to be released at the fastest rate possible. If the total time for which a transaction holds an item locked is termed as ‘dead-time’, we need to reduce the dead-time to a bare minimum, in order to keep n low. The later algorithm considered in [4] has concentrated on reducing the dead-time (e.g., CLA, etc.). A very obvious way to reduce dead-time would be to rely on the machine for faster processing, to the maxi- mum extent possible.

There are other schemes in which transactions, instead of obtaining locks and then going for execution, compute the update values first and then check if the values that they have used are up-to-date. In such a scheme the dead-time will be smaller than the previous case. One such similar scheme (though for a distributed control) has been proposed in [S], which has been shown to be

Table 3 Transaction processing map with reordered entries

ITEMS I1 12 13 14 I6 I11 ‘15 17 18 19 110 112 I14 I13 I15 116 117

TRANSACTIONS

0 Tl **SC

21i

Page 5: A model of concurrency control in distributed database systems

Volume 14, Number S INF’ORMATION PROCESSING LETI’ERS 23 July 1982

performing better than the existing centralized control as well as distributed control algorithms.

6. Conclusion

A representative centralized concurrency con- trol algorithm has been studied in this work. A model of such a scheme, based on a stochastic process has been developed. The modeI has helped us in identifying two key parameters which en- hance our kno! /ledge of transaction bchaviour un- der the considered scheme of concurrency control. It has also given credence to some of the intuitive results that existed in our minds. One other way the model can help us with is in comparison of various algorithms (which fall in the class of algo- rithms considered above) in the light of parame ters identified in the model.

Appendix A

It is assumed that the database consists of N objects (items). For every change of state (n --, n + I) there is an unassigned item being picked up by the system from the database. Let the probability of an event occurring in the group of items during a time interval (t, t + At) be given as follows:

p ( An item of database mull

I be picked up by a tran iaction =yAt+O(At)

where 0( At) means tire order of At, i.e., O(At)/At + 0 as At + 0 (see [3Tj for terms in use) and y is a constant. For the purpose of clarity, and also because we are interested in studying transaction behaviour in terms of the two groups of traasac- tions, let

i

An item of database will p l)e picked up by a transaction

1

=I cx At -+ 0( At) P(v) of no-wait group

and

An item of database will p be picked up by a transaction z fl At -+ O( &)p(w)

Of Wait group

where

a == yP(v), P=vP(w)

and where P(v) and P(w) are the probabilities that the transaction picking up an item from the data- base belongs to the no-wait group and the wait group, respectively.

Other Markov assumptions that we need to make are

More than one item will be picked up

p.

I

by a transaction during time (t to

t+At)

Any of the n items of the database

p. I

I

will be picked up by a transaction

during (t, t + At)

As there are N -n items, which remain to be picked up, the total contribution to X, from trans- action of the no-wait group is (N - n)ar or

h,=(N-n)cu.

The contribution from transaction of the wait group is obtained as follows. Of the possible pairs which could be formed, there are n(N - n) which consist of one item in the locked list and the other item in the free list. These are the only pairs which can give rise to the transition n --* N + 1 and the ’ total probability associated with them is n(N - n)/3 or h, = n(N - n)/3. Hence,

L =h,+A,

=(N-n)cu+n(N-n)&

=(N-n)(ar+/3n), n==O, ItS.e9N~ 1.

The probability that during (t, t + At) the num- ‘ber of items which have been picked up will re- main at n is given by 1 - X, At + 0( At), and the probability that the number will increase by more than one is O( At).

Acknowledgment

The authors would like to thank the referee for his/her helpful comments.

212

Page 6: A model of concurrency control in distributed database systems

Volume 14, Number 5 INFORMATION PROCESSING LETTERS 23 July 1982

References

[I] D.J. Bartholomew, Stochastic Models for Social Processes (Wiley, New York, 196?) Chapter 8.

[2] P.A. Bernstein and N. Goodman, Fundamental algorithms for concurrency control in distributed database systems, Harvard University, TR, 1980.

[4] H. Garcia-Molina, Performance of update algorithms for replicated data in a distributed database, Ph.D. Thesis, Dept. of Computer Science, Stanford University, 1979.

[5] G. Gardarin and W.W. Chu, A reliable distributed control algorithm for updating replicated databases, 6th Data Com- munications Symposium, 1979.

[3] U.N. Bhat, Elements of Applied Stochastic Processes (Wi- ley, New York, 1971) Chapter 14.6.

213