performance of multiversion and distributed two-phase locking concurrency control mechanisms in...

~? Applications

Performance of Muitiversion and Distributed Two-Phase Locking Concurrency Control Mechanisms in Distributed Databases

ALBERT BURGER VIJAY KUMAR

MARY LOU HINES

Computer Science Telecommunications, 5100 Rockhill, Uniuersity of Missouri-Kansas City, Kansas City, Missouri 64110

ABSTRACT

In this paper, we have presented a detailed simulation study of a distributed multiversion and a distributed two-phase locking concurrency control mechanism (CCM). Our experiment concentrated on measuring the effect of message overhead, read:write ratios, data partitioning, and partial replication on the performance of these mechanisms. The effect of these parameters has not been investigated in any previous work. We simultated a blind-write model for two reasons: (a) all other works studied the behavior of multiversion CCMs under read-before-write model and observed a similar result, and (b) the performance of any multiversion CCM has not been studied under a blind-write model. A blind-write model is not unrealistic, and intutively the multiversion should provide much better performance. We observed that multiversion outperforms wound-wait (WW) in both partitioned and partially replicated databases. Multi version (MV) handles read-only and write-only transactions efficiently, and after a certain write percentage the throughput improves with this percentage. The message overhead progressively becomes less significant as the MPL (multiprogramming level) increases, indicating that in a heavily loaded system the throughput is least sensitive to message cost. We found that in the partially replicated case, 50% write does not show the lowest performance, as observed in the partitioned case. ©Elsevier Science Inc. 1997

l. I N T R O D U C T I O N

The behavior of database concurrency control mechanisms (CCMs) has been an area of extensive research. Most of the work, however, is conf ined to the per formance study of CCMs for central ized database systems. The picture is somewhat different for dis t r ibuted database CCMs, where corn-

INFORMATION SCIENCES 96, 129-152 (1997) © Elsevier Science Inc. 1997 0020-0255/97/$17.00 655 Avenue of the Americas, New York, NY 10010 PII S0020-0255(96)00159-4

130 A. BURGER ET AL.

paratively very few performance studies have been done. There is an increasing interest in the performance of distributed database systems, in particular the use of multiple versions of data for improving concurrency [1-6, 15, 16]. The basic idea projected in these papers is to support more than one version of a data item and to allow a transaction to access the correct version. Informally, a correct version of a data item is the version which a transaction would have used if it would have run serially. For example, suppose transaction T 1 precedes transaction T 2 in execution order. If T 2 created a version xi+ 1 (i = 1,2 . . . . . n) of a data item x, and if T 1 wants to access x, then the correct version would be xi in the serialization order. The availability of x, indicates that the transaction which created x~ committed successfully. In a single-version approach, T 1 would have either rolled-back or blocked depending upon the underlying CCM. The multiversion approach has a number of incarnations, each proposing a different method for using a version. For example, the scheme proposed in [1, 6] maintains only one version of a data item, whereas in [2-5] a number of versions are maintained. The algorithms that support multiple versions of a data item guarantee that read-only transactions will not be rolled-back and a write-write conflict will not arise. This means that read-only transactions (long or short) will read the correct version of the desired data item while new versions are concurrently created by update transactions.

The potential of multiversion schemes for improving useful concurrency has led to a number of theoretical studies [3, 7] where the serializability theory has been extended to establish their correctness. These studies have motivated the database community, only recently, to study the performance of multiversion CCMs. As far as we know, there are only three papers [4, 6, 11] which deal with this topic. We briefly review these works and justify the need for further investigation.

In Lin and Nolte [6], a simplified simulation model is used to compare the performance of basic timestamp, multiversion timestamp, and two- phase locking algorithms. They did not include different data distributions (partitioned, replicated, etc.), and simplified communication delay by com- bining CPU processing time, communication delays, and I / O processing time for each transaction. They found that their multiversion CCM performs only marginally better then the two-phase locking algorithm, and when the average transaction size is large, the two-phase locking outper- formed the multiversion.

In Carey and Muhanna [4], the performance of several multiversion CCMs is described for centralized database systems. They report that multiversion schemes offer significant performance advantages over

CONCURRENCY CONTROL MECHANISMS 131

single-version algorithms. This was especially true for read-only transactions where the majority of read requests were satisfied in a single disk access. The results of this work, however, may not be used to analyze the performance of multiversion CCMs for distributed systems.

Son and Haghighi [11] compare the performance of a multiversion CCM with their single-version counterpart using a simultation model. They have used only data partition; the communication aspects have not been investigated in detail. They found that under multiversion, read requests are processed much more efficiently, but if the number of read-only transactions is small then multiversion does not offer any significant performance improvement. The results also support the findings of [6] that the multiversion CCM performs only marginally better in some specific cases.

These works provide useful information about the behavior and capabil- ity of multiversion schemes under a number of environments; however, they have some limitations. They did not investigate in detail the effects of data distribution, message overhead, transaction size, and type on performance. Recognizing the importance of previous works and their limitations, we decided to study the behavior of a multiversion scheme in a distributed environment from a different perspective. We studied the effect of a number of useful parameters, in particular, message overheads, data distribution (partition and replication), transaction type (read:write ratio), and size on the performance of a multiversion scheme and a distributed two-phase locking called wound-wait [8].

One of the significant differences between our work and the works reviewed above is that we have simulated a write as a "blind-write" (a read is not performed before the data item is written). There are two reasons for using this model: (a) all other works have studied the behavior of multiversion CCMs under read-before-write model and have observed a similar result, and (b) the performance of any multiversion CCM has not been studied under a blind-write model. A blind-write model is not unrealistic, and it occurs in real-life information processing. For example, recording new telephone numbers, opening new accounts, changing ad- dresses, etc., are some of the typical cases.

We did not simulate buffer management and recovery issues. To our knowledge, no other works have included these parameters. We believe, as do others, that including these parameters does not change the behavior significantly. The performance may show some degradation but the behavior is unlikely to change. Furthermore, for a comparative study, it is useful to keep the simulation model similar to earlier works to identify the effect of the specific parameter studied, and in our case, message overhead and blind-write fall into this category.

132 A. B U R G E R ET AL.

The rest of the paper is organized as follows. In Section 2, we describe distributed multiversion and distributed two-phase locking schemes, and the distribution of data we have simulated. Section 3 describes out simulation model, and Section 4 discusses the results. Section 5 concludes the paper with future directions.

2. DISTRIBUTED MULTIVERSION TIMESTAMP O R D E R I N G AND DATA DISTRIBUTION

There are several incarnations of multiversion (MV) approach based on timestamping order. MV is a pessimistic CCM with distributed control [2]. we describe the mechanism which we have simulated. Timestamps (TS) are assigned to transactions when they enter the system. Every data item, called a version, is associated with a write TS (WTS) and a TS of executed read (RTS). A transaction which requests a data item is called "requester". Read and write operations on a data item are performed as follows.

Processing a read. A transaction reads only a committed version (created by a committed transaction). If the transaction which is processing a version is active, then the reader transaction waits until the creator transaction has committed or aborted. If the creator is aborted, then the read request is scheduled for another version. Thus a read operation may be delayed but is never rejected, and can be described as follows:

find the version with largest WTS < the requester's TS ; if transaction which created this version has committed then begin

grant read; create new RTS for the version end else block the read until the creator transaction of this version has

committed or aborted;

Processing a write. Write creates a new version of the data item. A write is rejected, i.e., the requester is rolled-back, if a request is made too late. This happens if there exists a read timestamp for that version greater than the timestamp of the requester. A write operation can be described as follows:

obtain the version which has the largest WTS < requester's TS; if there exists a RTS for that version > TS of the requester then

reject write and roll-back the transaction; {the write came too late}

C O N C U R R E N C Y C O N T R O L MECHANISMS 133

New Tranaaction T 1. Timestamp = 3

.L Versions of X N 1

XN1 Timestamp = 2

XN1 Timestamp = 6

@ Versions of X N 1

X N2 Timestarap = 2, Read timestarnp = 4

XN2 Timestamp = 6

~ Versions Of XN1

XN3 Timestamp = 2

XN3 Timestamp = 6

Fig. 1.

else begin

create a new version; {TS of new version is TS of the requester transaction}; mark transaction as not committed; grant write

Figure 1 illustrates the working of the algorithm. Let there be three nodes N1, N:, and N 3 in the system and data item X be replicated at N1 as XN1, at N 2 as XN2, and at N 3 as XN3. A transaction T 1 at node N 1 wants to update X. Let there exist two versions of X; one with timestamp 6 and the other with timestamp 2. Thus there exists a total of six physical copies of X identified as XN(TS 2), XN2(TS 2), XN3(TS 2), and XN(TS 6), XN2(TS 6), XN3(TS 6). Let Tl's timestamp be 3. Assume further that for XN2(TS 2) a read timestamp of 4 is recorded. When T 1 requests a write on X, the request is translated as write requests on AN,, XN2, and XN. At nodes N~ and N3 the request is granted, but at node N: the write request is rejected, since the read timestamp of X&(TS 2) is 4 > 3, and therefore, 7"1 is rolled-back. T 1 restarts with a timestamp of, say, 10. Again a write on X is translated as writes on X1, )(2, and X 3. This time, however, all requests can be granted, since no conflicting read timestamps exist at any node. The consequence of Tl's write is a new version of X on all three nodes XN,(TS 10), XN~(TS 10), XN3(TS 10).

2.1. VERSION MANAGEMENT

In MV, the number of versions of a data item always increases. It is not possible to retain all versions due to storage limitations and processing overheads. It is, however, not necessary to retain all the versions, and a subset of them can be removed dynamically. We assume a virtual memory

scenario where a required version may be in the memory or on the disk. The directory that holds information about versions is memory-resident. Our method of deleting a redundant version is similar to [11].

Recall that a read operation reads a version, which is created by a committed transaction. At a successful read, the version acquires the read timestamp of the transaction. This means that a version which has a write timestamp smaller than the oldest committed transaction is redundant because no transaction will every need to access this version. This mechanism requires that a read transaction must notify its commitment to the system. For this reason, the system maintains information about the oldest transaction in a memory-resident directory. Let there be two versions of data item x identified a s wts=Oxrts=O and wtS=3xrtS=4, and the timestamp of the oldest committed transaction (which read x) as recorded by the system be 4. In this scenario, no transaction will ever exist in the system with a timestamp smaller than 4; therefore, no transaction will ever require wts=Oxrts=O and it can be deleted. The version deletion algorithm can be described as follows:

find a version with a smallest WTS, say V; if the WTS of the oldest committed transaction > the WTS of V,

then delete V.

2.2. WOUND-WAIT ALGORITHM

The wound-wait (WW) algorithm was proposed by Rosenkrantz et al. [8]. The algorithm uses a two-phase locking policy and timestamps to resolve conflickts between transactions. It works as follows: If a requester transaction (transaction making a request for a lock) is started up earlier than a holder transaction (transaction holding the entity requested by the requester), then wound (roll-back) the holder; otherwise, block the requester. The superior performance of WW has been reported in earlier works [12-14]. It is observed there that two-phase locking CCMs are not consistently superior, especially when the frequency of deadlock is high. The mixed approach, especially WW, outperforms other CCMs for some system dynamics. We selected WW for our investigation mainly because no other paper has compared its behavior with an MV scheme and it is deadlock-free.

2.3. ARCHITECTURE OF THE SIMULATED SYSTEM

Figure 2 presents a high-level architecture of our simulated distributed database system. We have enhanced the T M / D M / R M model [2] by

C O N C U R R E N C Y C O N T R O L MECHANISMS 135

Node 1

N o d e 2

Fig. 2. Architecture of a distributed database platform.

including some extra components for developing the simulation model. These components do not affect the basic functionality of a distributed database system.

A transaction is divided into subtransactions when it enters the system. A subtransaction is sent to the node where the data is available and is executed there under the local concurrency control mechanisms. The TM (Transaction Manager) of a node executes subtransactions locally under a CCM. At each node, a Recovery Manager (RM) handles an undo operation. If a subtransaction fails, then the parent transaction is rolled-back and restarts after some delay to avoid repeated restart. The TC (Transac- tion Coordinator) determines at which node data items requested by a transaction are located. If a file is locally available, it is accessed there;

otherwise, if there is no local copy and multiple copies exist at more than one node, then one copy is randomly selected. It creates one subtransaction for each node that needs to be visited and acts as the coordinator in the two-phase commit process.

The CCM (Concurrency Control Manager) coordinates concurrency control activities with other nodes. In the case of data replication, it implements a read-one-write-all policy for read requests. For a write request, it consults all nodes that hold a copy of the desired data item. It also acts as a lower-level coordinator for two-phase commit protocol. The CCP (Concurrency Control Participant) implements the chosen concurrency control mechanism for simulation (multiversion or distributed two- phase locking). A Facility Manager (FM) at each node manages CPU and I / O resources. Requests from subtransactions are processed in FIFO except roll-backs, which have the highest priority.

Messages during transaction processing are handled by the module called Network Interface (NWI). NWI sends and receives messages from other nodes through the NW (NetWork) module. The CPU takes some time to process (send, receive, etc.) these messages. In concordance to earlier works, we assume that the underlying network is reliable, no message is lost during transmission, and each message is subject to a network delay. An SM (Source Module) at each node creates transaction workload and maintains the multiprogramming level. A DD (Data Dictio- nary) at every node contains information about data distribution and replication.

2.4. TRANSACTION MODEL

Figure 3 illustrates the execution of a distributed transaction. A transaction during its execution may visit more than one node. All subtransactions of a parent transaction are executed strictly serially, and if any one of them fails, then the parent transaction is rolled-back. At the end of the execution of all subtransactions, the parent transaction is committed under two-phase commit protocol. Even though the dispatches of subtransactions of a transaction appear sequential, they are dispatches concurrently. Par- ent transactions originate from a fixed number of terminals and their number in the system is the sum of terminals connected to each node.

3. SIMULATION MODEL

Figure 4 presents the simulation model of a node. Other nodes of the network are identical and fully connected through a network. The Ap- pendix gives a detailed transaction flow diagram.

C O N C U R R E N C Y C O N T R O L M E C H A N I S M S 137

Create subtransactions

Send subtransaction to a node for execution

Terminal I Terminal

Roll-back parent transaction q, Yes]

- - - ) Subtransaction failed? No

Commit parent transaction

All subtransactions f in ished ?

Fig. 3. Transaction processing model.

Rolled-back subtransaction i f - - -

New Transactions

Wound holder

CPQ Yes

f i "4" No

More requests?

Request item

2 Conflict? No

W W ~ MV

Block Transaction?

- - M V

Roll-Back?(

) Commit subtransaction

Moremquests?

Write too late?

Yes I I

Roll - back

Fig. 4. Simulation model of a node.

The CPQ (Central Processing Queue) holds four types of t ransact ions/ subtransactions (called members): new, active (which has made at least one resource request), to be rolled-back, and to be committed, these members receive services depending on their status. A new member (transaction) arrives at the CPQ if the multiprogramming level (MPL) allows it to do so and competes for CCM service. A rolled-back member leaves the node, and its node of origin (home node) is informed about its status and then can resubmit it after some delay. A member joins one of the IOQs (Input Output Queue) for accessing the data if it has been granted permission to do so. In a conflict, proper action is taken by the CCM being simulated. In WW, a member goes to the BTQ (Blocked Transaction Queue) if it is blocked. From BTQ it goes to CPQ to resume execution when the desired entity becomes free. A roll-back process starts from one of the IOQs, and the requester, in the meantime, waits until the holder is rolled-back.

In MV also, a blocked transaction goes to BTQ and then to CPQ. However, in the case of a write request, if it is not too late, it goes to one of the IOQs to access the data item; otherwise, it is rolled-back. A read request may have to wait for some time but is never denied. A node communicates with other nodes about the progress of a subtransaction at appropriate execution phases. A transaction commits under two-phase commit protocol when all its subtransactions have been committed at their nodes of execution. The transaction execution flow presented in the Appendix illustrates the message communication and two-phase commit processes.

4. SIMULATION RESULTS AND DISCUSSIONS

We simulate the database as a collection of files consisting of number of pages, which is the unit of access. Data distribution and partial replication are implemented at the file level. In the partitioned model, each node stores four files which are distributed randomly. In partial replication, a file is duplicated at two nodes, i.e., a node can have a maximum of eight files. A transaction accesses of maximum of two files, and from each of these files it can access a minimum of three pages or a maximum of nine pages. Table 1 lists the set of important parameter values.

Experiment 1: No Message Overhead and Partitioned Database

Figures 5 and 6 show the relationship between the throughput under MV and WW, respectively, with different read:write ratios. 0%W means

C O N C U R R E N C Y C O N T R O L M E C H A N I S M S

TABLE 1

Simulation Parameter Values

Parameter description Values

Total number of nodes Largest transaction size Smallest transaction size Average transaction size Total number of files in the entire database Number of pages per file Database partitioned Database replication Data in the buffer probability Message delay Message processing Number of I / 0 servers per node I / O time CPU processing time Time to search a version Create a version in the memory Prewrite to write Remove a version

4 18 pages 6 pages

12 pages 16

100 4 files/node 2 copies/file

0.5 10 ms

0.5 ms 2

30 ms 2 ms

0.5 ms 0.7 ms 0.1 ms 0.2 ms

all t r ansac t ions a re read-only , 2 5 % W means on the average t ransac t ions p e r f o r m 25% wri te ope ra t ions , and so on. These graphs ind ica te tha t the t h roughpu t u n d e r W W dec l ines compara t ive ly m o r e rap id ly with MPL. A str iking resul t in M V is tha t the t h roughpu t with 0 % W and 1 0 0 % W is ident ica l and is lowest with 5 0 % W .

No message overhead. Database: Partitioned

J.~/" ~ 0*/*W / MV ~ f ~ / " ~ 25%w/Mv

---cr-~ 50%W/MV 7$%W/MV IO0%W/MV

15 25 35

Fig. 5. Throughput vs MPL.

O~ O%W/WW 25% W / W W 5 0 % W / W W 75° /oW/ww -~I~° ~ °

100%W/WW

No message overhead. Database: Partitioned.

5 15 MPL 25 35

Figure 7 combines Figures 5 and 6. We observe that the throughputs of read-only transactions in MV and WW, and 100%W transactions in MV, are the same. The difference begins to show for other read:write ratios. In WW, the throughput decreases with an increase in write percentage, but in MV, this pattern is not strictly followed. There are two reasons for this behavior. First, there is no write-write conflict in MV, since each write creates a new version, and second, we simulated blind-write, i.e., transac-

- n - - 0 % W / M V e, 0 % W / W W

~ > ~ - - 2 5 % W / M V © - - 2 5 % W / W W c~ 5 0 % W / M V ~ 5 0 % W / W W

7 5 % W / M V ~ - - 7 5 % W / W W ~ . ~ ~ ~ . ~ - ~ 1 0 0 % W / M V - ~ - - 1 0 0 % W / W W ~ ~ - ~

N o m e s s a g e o v e r h e a d , Database: Part i t ioned

15 MPL 25

Fig. 7. T h r o u g h p u t vs M P L for M V a n d W W C C M s .

tions do not read before they change the data item value. We wanted to study the behavior of MV under a scenario which was not investigated in earlier works. Blind-writes are not unrealistic, and they do appear fre- quently in database systems.

To explain the 50%W result in MV, we define a term called "conflict potential." The conflict potential between two transactions defines the lieklihood that a transaction will conflict with another transaction. As explained before, under MV two operations may conflict if only one of them is a write operation. The conflict potential between transactions is defined as the percentage of all pairs of operations (each pair consists of one operation from each conflicting transaction) which may cause a conflict.

When there is no write-write conflict, then it is intuitively clear that the probability of conflict will decrease with an increase in the percentage of writes. Figure 8 shows the result of write% versus conflict potential. We observe that under MV, the value of conflict potential increases with write% because there are enough reads for creating significant conflicts with write, when it reaches 50% write, then it appears that every write conflicts with reads because they are equal in number, and, therefore, the value of conflict potential reaches a maximum. After that, when W% increases, then at least there will be one write, which will not conflict with another write. A further increase in W% increases the probability of no conflict; consequently, the conflict potential declines further. A 100%W

100- ~

~=6o. O

~4o- o

- - c ~ - MV Database: Partitioned ~ . ~

20 40 W r i t e % 60 80 100

Fig. 8. C o n f l i c t p o t e n t i a l vs w r i t e % .

completely eliminates the possibility of conflict, and the value of conflict potential reaches zero. In WW, on the other hand, two writes always conflict, and for that reason, the value of conflict potential continuously increases with write%.

Figure 9 shows the relationship between write% and roll-back percentage. It shows how the roll-back percentage is related to conflict potential. At 50%W, the roll-back percentage is the highest in MV and then it declines with write%. This indicates that with 50%W, the majority of writes conflict with read; consequently, transactions are rolled-back. In WW Figure 9, on the other hand, roll-back percentages continuously increase, which also confirms that the roll-back percentage increases with MPL, but for every MPL 50%W has the highest roll-backs.

Figure 10 shows the difference in conflict potential between MV and WW, which increases with write%. We expected the difference to be reflected in the throughputs of MV and WW. As shown in Figure 11, the nature of the curves is similar to Figure 10. This indicates a proportionate increase in the throughput difference between MV and WW with write probability. We argue at this point that a multiversion scheme outperforms a distributed two-phase scheme by eliminating write-write conflicts, partic- ularly at higher write probabilities.

[~--- MPL=8_WW No message overhead. Database: Partitioned • ~ MPL=I6_WW ~ . ~ ~

~ > - - MPL=24 WW - - ~ MPL:32_WW

¢ MPL=16 MV ~ ~ ~ - - ~ - MPL=24_MV j

MPL=32_MV ~ ~ _ ~

i i i i i 20 40 60 80 100

Fig. 9. Roll-back% vs write%.

M V D a t a b a s e : P a r t i t i o n e d

. J / /

f j ~ j / .

'0 40 60 Write%

Fig. 10. Di f ference in conflict potent ia l vs write%.

80 100

Experiment 2: With Message Overhead and Partitioned Database

Message overhead is likely to have a significant effect on the performance of CCMs. We wanted to investigate the degree of performance degradation due to this overhead to find the effect of different types of messages. We divided the entire message (responsible for processing a

D M P L = 8 D a t a b a s e : P a r t i t i o n e d

• > - - M P L = 1 2

o M P L = 1 6 " / ~

- - - ~ M P L = 2 0 z~

- ~ - - M P L = 2 4 / / , / J

- ' ~ - M P L = 2 8 ~

- ~' M P L = 3 2 / Z'~

m m i n n

20 40 60 80 100 Write%

Fig. 11. Throughput difference vs write%.

O % W / M V D a t a b a s e : P a r t i t i o n e d . W i t h m e s s a g e o v e r h e a d 2 5 % W / M V

5 0 % W / M V ~ - - E 3

7 5 % / M V J ~ E : ~ -

5 15 25 35 MPL

transaction) processing cost in two parts: CPU time required to process a message (send or receive) and the transmission delay through the network. We studied the effect of these costs individually to see which had a stronger effect.

Figures 12 and 13 show the relationship between the throughputs and MPL for MV and WW with write%, respectively. Figure 14 combines the results of Figures 12 and 13. We do not observe any noticeable difference

- - ~ . - OOloWIW'W

- 2 5 % W / W W

• - ~ - - - 5 0 % W / W W

- ~ 7 5 % W / W W

Database: Partitioned. With message overhead

j o j [ f -

15 MPL 25

Database: Partitioned. With message overhead

/ l ~ . . , , j - E., o_~_~.l. ~ . -- ~ OO, oW i W w

- ~ . . . . 7 5 % / M V ~ - - 7 5 % W / W W

I i I I

5 15 25 35 MPL

Fig. 14. Throughput vs M P L

in the behavior of MV and WW with message overhead. Of course, there is some performance degradation, which is illustrated in Figures 15 and 16.

Figures 15 and 16 illustrate the difference between throughput with and without message overhead for MV and WW. We have identified throughput values with message overhead by msg label and without message overhead by no msg label.

- - ~ MV/msg ~ - - - WW/msg - - < : ~ MV/no msg - - , ~ ~ / n o msg

Database : P a r t i t i o n e d . 2 5 % W

15 25 MPL

Fig . 15. Throughput vs M P L .

.o b-,

- - ~ p = O / d = 0

p ~:O/d ~:0 p ~ O / d = O

D a t a b a s e : Partitioned. 50% Write. CCM: MV

15 25 MPL

We observe that the effect of message overhead on throughput is more visible at lower MPL, and it declines as MPL increases. To confirm which parameter value (message processing or network delay) was more effective, we performed a simple experiment with three settings. In the first setting both delays were set to nil, in the second setting there was only message processing delay, and in the third setting we introduced communication delay. Figure 16 illustrates the result of this experiment, where p indicates message processing value and d network delay value. The figure indicates that the throughputs with no delay of any kind and some message processing delay are very similar. However, with the introduction of network delay, a noticeable decline in throughput was observed, mainly at smaller values of MPL. At the higher end of MPL, the effect of network delay was not significant.

Experiment 3: Partially Replicated Database with and without Message Overhead

Figure 17 shows that relationship between throughput for a partially replicated database without any message overhead. We observe that the throughput declines as the write probability increases. In this scenario, we do not observe the lowest throughput at 50%W as we did in the partitioned case. We compared the throughput of partitioned and replicated cases (Fig. 18), showing the relationship between MPL and throughput.

0%W_MV No message overhead. Database: Replicated 2 5 % W _ M V ~ - -

50%W_MV ~-~-~---~ 7 5 % W _ M V ~ - - ~

~ _ o~--- o - ~ _ o

! , i i

15 MPL 25 35

Fig. 17. T h r o u g h p u t vs M P L .

We observe that in the case of read-only transactions, the throughput is higher in the replicated case. This was expected since, in this case, data was locally available most of the time. However, with increasing write probability, the cost of multiple updates increased and the throughput was better in the partitioned case.

Figure 19 shows the result of WW in the same scenario. Figure 20 shows a comparison of WW and MV under replicated database. We again observe that MV offers a better performance in this case.

No message overhead ~ t ~ _ ~

~ -- ~ .... O%W/part ~ -- O%W/repl

~ 2 5 % W / p a r / ® - - 2 5 % W / r e p l

- - - o ~ - - - 5 0 % W / p a r l £} 5 0 % W / r e p l

75%W/part ~ 75%W/repl

15 25 MPL

Fig. 18. T h r o u g h p u t vs M P L .

0%W_WW N o m e s s a g e overhead . Database: Repl icated 25%W_WW _~E--- .... E3

50% W WW /LT--~ - ~ D ....

7 5 % W _ W W / "

c ¢ ~-- _ _ _ - - o ~ - ~ - - o - - - o -~

i i 15 25

Fig. 19. T h r o u g h p u t vs MPL.

Figure 21 shows the relationship between the throughputs of MV for replicated database with and without message overhead. The label NO indicates no message overhead. This figure illustrates t h a t the effect of message overhead is significant only at lower MPL in all types of read:write ratios. At the higher end of MPL, the throughputs converge to the same point.

O%W MV ~ O%W_WW ~ 75%W_WW - - - ~ - 2SO/oW_MV ° 2s~w_ww _ ~

50%W_MV - - ~ 5 0 ° / ° W - ~ " ~ ~ 75%W MV ~

w / ~ - ~ ~ No message overhead,

15 MPL 25

Fig. 20. T h r o u g h p u t vs MPL.

"• 35,

Database: Partially replicated. CCM: MV ~ - ~ _ ~ ~ ~

~ ~ ~-- O~oW No~ ~ 0~w_~ - . - - - - o ~ 50%W_No msg o --- 50*/oWjrtsg

75%W No msg v - - 75%W_msg

Fig. 21.

15 25 35 MPL

R e p l i c a t e d , w i t h a n d w i t h o u t m e s s a g e o v e r h e a d .

5. CONCLUSION AND FUTURE WORK

In this paper, we presented a performance comparison of a distributed multiversion and distributed WW under partitioned and replicated databases using simulation modeling. Our aim has been to investigate mainly the effect of message cost and read:write ratios on the performance of these CCMs, since these parameters have not been investigated in earlier works. We did not simulate buffer management and storage cost of these CCMs because we wanted to compare our results with other works and to do so the simulation environment needed to be closely comparable. Further, as has been observed in other works, effect of storage cost for maintaining versions is not a part of the working of MV but a separate activity, and the number of versions maintained does not affect its performance. We observed that multiversion outperforms WW in both partitioned and partially replicated database. We observed that MV handles read-only and write-only transactions efficiently, and after certain write percentage the throughput improves with this percentage. We also observed that the performance of MV reaches its minimum with equal read:write ratio. The message overhead progressively becomes less significant as MPL increases, indicating that in a heavily loaded system, the throughput is least sensitive to message cost. We found that in the partially replicated case, 50% write does not show the lowest performance, as observed in the partitioned case. Our results confirm some of the findings of earlier works.

A P P E N D I X

A. BURGER ET AI.

Delay Queue

New Transactions

l Create subtransactions

Send a subtransaction to node of execution

All subtransactions aborted?

~ _ I Wait for response

Send ABORT mesage to nodes that executed < subtransactions

Fig. A.1.

~ ~Wait for response

Yes Fail ire?

All subtransaetions finished?

Send PREPARE mesage to nodes where subtransactlons executed

• Yes Subtransac~

~Wait for response

on Failure? No

All subtransactions responded? - ¥es Send COMMIT message

Walt for response

No All subtransactions committed?

Parent t~ansaction committed

Transaction flow diagram.

> Send failure message to TC < - - - ~ Wait for abort message

Roll-back subt ransact lon ! Send aborted m~essage to TC

Aborted s u b l a n s a c t i o n I~8um¢8

as a new subtransactlon

New Subtransact ions L

Send Read/Write r~quest to local CCM

Request granted? Yes

Inltlate I / 0

Process page

All operat

Send end of ex~

¢ ons done?

/ No cution to TC

~Wai t for response

Abort from TC Prepare message arrived? ~ Yes

Send prepare message to CCM /

No Ready messag~e from CCM?

,1. Send ready riles.sage to TC

No ? Wait for response

Commit message from TC?

Subtransaction committed <------ Send committed message to TC <----- Commit sub~ansaction

Fig. A.2. Subtransact ion flow diagram.

152 A. B U R G E R E T AL.

R E F E R E N C E S

1. R. Bayer, H. Heller, and A. Reiser, Parallelism and recovery in database systems, ACM Trans. Database Syst. 5(2):139-156 (1980).

2. P. Bernstein and N. Goodman, Concurrency control in distributed database systems, ACM Comp. Survey 13(2):185-221.

3. P. Bernstein and N. Goodman, Multiversion concurrency control--Theory and algorithms, A C M Trans. Database Syst. 8(4):465-483 (1983).

4. M. Carey and W. A. Muhanna, The performance of multiversion concurrency control algorithms, ACM Trans. Cornp. Syst. 4(4):338-378 (1986).

5. A. Chan, S. Fox, W. Lin, A. Nori, and D. Ries, The implementation of an integrated concurrency control and recovery scheme, in: Proc. ACM SIGMOD, Orlando, FL, June 2-4, 1982.

6. W. Lin and J. Nolte, Basic timestamp, multiple version timestamp, and two-phase locking, in: Proc. 9th VLDB Conf., Florence, Italy, 1983.

7. C. Papadimitriou and P. Kanelakis, On concurrency control by multiple versions, ACM Trans. Database Syst. 9(1):89-99 (1984).

8. D. Rosenkrantz, T. Stearns, and P. Lewis, System level concurrency control for distributed database systems, ACM Trans. Database Syst. 3(2) (1978).

9. D. Reed, Implementing atomic actions on decentralized data, ACM Trans. Comp. Syst. 1(1):3-23 (1983).

10. R. Sterns and D. Rosenkrantz, Distributed database concurrency controls using before values, in: Proc. ACM SIGMOD, Ann Arbor, MI, Apr. 29-May 1, 1981,

11. S. H. Son and N. Haghighi, Performance of multiversion database systems, in: Proc. 6th IEEE Int. Conf. on Data Engrg., Los Angeles, CA, Feb. 5-9, 1990.

12. V. Kumar, Performance comparison of database concurrency control mechanisms based on two-phase locking, timestamping and mixed approaches, Inform. Sci. 51(3) (1990).

13. V. Kumar and M. Hsu, A superior two-phase locking algorithm and its performance, Inform. Sci. 54(1/2) (1991).

14. R. Argrawal, M. J. Carey, and L. W. McVoy, The performance of alternative strategies for dealing with deadlocks in database management systems, IEEE Trans. Software Engrg. SE-13(12) (1987).

15, A. Chan and R. Gray, Implementing distributed read-only transactions, IEEE Trans. Software Engrg. (1985).

16. S. Son and Y. Kim, A software prototyping environment and its use in developing a multiversion distributed database system, presented at 18th Int. Conf. on Parallel Processing, IL, Aug. 1989.

Received 1 February 1996; revised 27 May 1996

performance of multiversion and distributed two-phase locking concurrency control mechanisms in...

Documents

optimistic concurrency control in a distributed...

concurrency distributed databases

15 concurrency control - informatik • informatik •...

multiversion concurrency control - uni konstanz

8. distributed concurrency...

distributed optimistic concurrency control with reduced

concurrency and distributed systems using jruby

concurrency and distributed systems

rethinking serializable multiversion concurrency control...

concurrency in distributed systems: mutual exclusion

extracting more concurrency from distributed transactions

rethinking serializable multiversion concurrency...

transactions and concurrency control - freie...

multiversion concurrency control

optimistic concurrency control for distributed...

a sophisticate's introduction to distributed database...

1 chapter 17: concurrency control. 2 lock-based protocols...

1 7. distributed concurrency control chapter 11 distributed...

concurrency control in distributed database...

lecture 09 –distributed concurrency management · lecture...