group commit timers and high-volume transaction systems · group commit has been in use for a...

36
Af'TANDEM Group Commit Timers and High-Volume Transaction Systems Pat Helland Harald Sammer Jim Lyon Richard Carr Phil Garrett TANDEM COMPUTERS INCORPORATED Andreas Reuter UNIVERSITY OF STUTTGART Technical Report 88.1 March 1988 Part Number: 12522

Upload: others

Post on 13-Oct-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

Af'TANDEM

Group Commit Timers andHigh-Volume TransactionSystems

Pat HellandHarald SammerJim LyonRichard CarrPhil GarrettTANDEM COMPUTERS INCORPORATED

Andreas ReuterUNIVERSITY OF STUTTGART

Technical Report 88.1March 1988Part Number: 12522

Page 2: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem
Page 3: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

Group Commit Timers and High-VolumeTransaction Systems

Technical Report 88.1March 1988

Part Number 12522

Pat HellandHarald Sammer

Jim LyonRichard CarrPhil Garrett

TANDEM COMPUTERS INCORPORATED

Andreas ReuterUNIVERSITY OF STUTTGART

Page 4: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem
Page 5: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

Contents

1. Introduction................................................. 12. Group Commi t and Timers 13 • Our Mode 1 ...........•...................•...............•.... 24. Motivation................................................... 35. Response Time With Zero Timers ....•..•.•.......••........••.• 45.1 Calculating CO .. 56. Response Time With Non-Zero Timers ..••.•••.•............•.... 86.1 Calculating Cd 87.1 Minimizing Response Time ..••..•..•..•.•••.•............•.•.. 107.2 Computing the Optimal Timer Delay (D) .••••.•..••...•..••.... 118. calculating A From System Statistics 129. The Effect of Group Commit Timers on Throughput ••........•..• 1310. Testimonial 1611. Future Research 1611.1 Shepherds 1711 . 2 AMi no I" Anoma 1y 1812. Conclusions 19

AppendixA. Hieroglyphics 20B. References of •••••• 28

When "Group Commit Timers and High-Volume Transaction Systems" wasfirst published in the proceedings of the Second InternationalWorkshop on High-Performance Transaction Systems (Asilomar,California, September 1987), the graphs in Figures 4 and 5 wereaccidentally switched. The authors apologize for the error. Inthis publication, the graphs are correct.

Page 6: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem
Page 7: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

GROUP COMMIT TIMERS AND HIGH VOLUMETRANSACTION SYSTEMSPAT HEllAND, HAAAu> SAMMER. JIM LYON, RICHARD CARR. AND Pl-HL GARRETI'Tandem Computers. Inc.

ANDREAS REtrrERUniversity of Stuttgart.

1) lritroduction

In the summer of 1986, Tandem released the B30 version of its Transaction MonitoringFacility (TMF). This release included a GROUP CO~\1ITmechanism. It also includeda primitive mechanism to gate the commit writes using manually adjusted TIMERS[lJ.Subsequent experiments on large benchmarks have shown that timers reduce theoverhead of transactions running on high volume systems. This reduced overheadincreases system throughput while meeting response time constraints.

\Vhile other systems have used Group Commit Timers[21, we have investigated the factthat various timer values are appropriate for various system loads. TIris paper discussesTIMERs and describes how a transaction's commit overhead is reduced by making itdelay. It turns out that the system can set the timer dynamically to minimize averageresponse time. Calculations for the optimal TIMER DELAY are presented. Finally.some directions for future research are described.

2) Group Commit and Timers

Group Commit refers to the technique used in high volume transaction systems wheremany transactions are committed with a single disk I/O to the log. That single I/O maycontain commit records for several transactions.

Group Commit has been in use for a number of years now [Gawlick]. Without groupcommit, the transaction rate of a single-log system would be limited by the number oftransfers per second that the log could support. Group Commit allows transactionsystems with disk-based logs to break through that barrier of roughly 30 transaction persecond.

When Group Commit Timers are in use, the system behaves as follows:

When a transaction needs to write a commit record it will wait.

When the Group Commit Timer pops, all of the transactions waiting to get acommit record written will have a record written as a part of a single disk 1/0 tothe log disk.

1

Page 8: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

We also defIne special behavior when the Group Commit Timer value is O. This is:

If a transaction needs to write a commit record and there is no log I/Ooutstanding, then the commit record is immediately written to the log.

H a log I/O completes and other transactions are waiting to commit, then commitrecords for all waiting transactions are written using a single 110. .

When non-zero Group Commit T11DerS are in effect, the system awakens periodically andwrites commit records for all waiting transactions using a single I/O. The result is thatcommit writes occur at scheduled intervals. Non-zero Group Commit Timers are a lotlike people waiting on the street corner for a bus that arrives every five minutes. Whenthe Group Commit Timer is zero, the bus driver will leave when the first passengerarrives.

3) Our Model

We have chosen to examine the effect of Group Commit Timers on CPU response time.One could justifiably argue that this is inadequate because disk I/O and the effects ofqueuing for disk I/O have not been included. This has been done for the followingreasons:

• Since we have a single writer of the Log and the Log is kept on a separatedisk from the database, queuing for I/O on the Log disk is negligible.

• We believe that Group Commit Timers will not significantly affect thequeuing for disk at a fixed transaction rate.

• The response times that we use are for comparison purposes only. Weintend to minimize the time that a transaction spends queued for the CPUand utilizing the CPU.

• Most importantly, we were getting a headache understanding this much.

Fundamentally, we are assuming that Group Commit Timers do not affect the I/Ocomponent of the response time. We hope to see funher extensions of this work. whichincorporate disks and disk queuing into the model.

2

Page 9: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

4) Motivation

Timers cause individual transactions to be delayed. Intuitively, this also implies anincrease in the transaction's response time. Surprisingly, timers can reduce the averagetransaction's CPU response time by reducing the per-transaction overhead.

To understand this, one must realize that transactions spend much of their time waiting inthe CPU queue. By reducing the total demand for CPU resources, we reduce the lengthof the queue. The time that a transaction waits in the queue can be dramatically reducedif the CPU is heavily loaded.

Group Commit Timers cause the number of transactions in each Group Commit Buffer toincrease. The number of Group Commit writes drops. This decreases both the averageCPU cost for each transaction and the total CPU work performed by the system. As thetotal system CPU work drops, so does the queue length for the CPU and the averageresponse time for a transaction. If the savings in CPU queuing exceeds the average waittime for the Group Commit Timer, then Timers improve the transaction's response time.

1.3 ,-------- ---,

1.2 -

0.5

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

.:s:sJ Delay TimeTimer Value, in Seconds

tZ223 Tx's Work ~ Commit Work

Fig.l. Response Time vs. Timer ValueNote that as the timer values increase, the time spent for the

transactions work (including CPU queuing) drops.

3

Page 10: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

The remainder of this paper is organized as follows:

Section 5 describes the teehnique for calculating the average response time of atransaction when the Group Commit Timer is zero.

Section 6 describes how transaction response times are calculated in a system in whichtimer values are non-zero.

Section 7 discusses a teehnique by which timer values may be automatically determinedas a function of measurable system parameters.

Section 8 describes how to calculate all of the necessary parameters for the equationsfrom the system. This allows the system to plug in values to the described equations andset the timers dynamically.

Section 9 presents some predictions of the effect of Group Commit Timers on varioustransaction loads..

Section 10 consists of a brief description of the use of the Group Commit Timers atTandem.

Section 11 discusses areas in which we feel funher research may be beneficial.

Finally. Appendix A contains all of the mathematics necessary to derive the formulaspresented. Ifyou trust our math, the appendix may be skipped.

5) Response Time With Zero Timers

Before examining the expected response times for transactions when timers are non-zero,it is important to analyze the behavior of transactions when the Group Commit Timersare zero. In this discussion, as in the rest of the paper, we examine the total time spent bythe transaction either executing in the CPU or waiting in the CPU queue. The time forthe physical 1/0s to the database has been omitted. We are assuming that Group CommitTimers do not affect the 1/0 time.

Let's divide the transaction into two parts to examine the CPU costs associated with it.We choose to consider the transaction's CPU cost as the sum of:

A - The CPU cost of a transaction (excluding COMMIT).

B - The CPU cost for one transaction's COMMIT (i.e. the cost to write a GroupCommit Buffer).

Both A and B are measured in seconds.

4

Page 11: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

Define:

Co

T

Uo

- The average number at transactions in a Group Commit Buffer when a timervalue at zero is used.

• The transaction rate (transactions per second).

- The CPU utiliZi2tion ofa system when no timers are used.

- The average transaction's CPU response time on a system when the timer has avalue of zero.

We can now calculate CPU utilization by determining the amount of work for eachtransaction times the number of transactions per second. The utilization is dimensionless.

Uo =(A + B I Co) * T

Next, assuming M1M11 queuing gives us a response time for a transaction of:

CPUResponseTime =(CPU Work)

(1 - utilization)seconds.

This means that the transaction's average CPU response time (that is. its response timeexclusive of I/O) will be:

(A+ B)Ro=

(1 - Uo)

This can be rewritten as:

seconds.

(A + B)Ro=

(1 - T(A + 8/ Co))

5.1) Calculating Co

seconds.

Unless B is small with respect to A, the calculation for the average transactions responsetime will be highly dependent on Co (the fullness of the Group Commit Buffer). Tounderstand the calculation of Co, we must review what can cause a Group Commit Bufferto be written.

There are two reasons why a Group Commit buffer would be written when the timer iszero:

If a transaction is ready to commit and no I/O is in progress.

If an I/O completes and a transaction is waiting to commit

5

Page 12: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

Let's defme the following terms:

L - The amount of time it takes a Log I/O to c~lete. This time is wall clock timerather than CPU time. We are not including queuing for the disk.

Wt - The total writes to the log per second.

Wi - The immediate writes to the log per second. These are the writes caused by thetransaction arriving and findmg no I/O to the log in progress.

Wit - The delayed writes to the log per second. These are the writes caused by an I/Ocompleting with one or more transactions waiting.

COd - The average number of transactions in a delayed write buffer.

Note that:

Now, we remember that each immediate write will have one transaction in each buffer.Each delayed write will have Cod transactions in each buffer. So, now we can state that:

Wi + WdCOdCo=

Wt

After much gnashing of teeth, this becomes:

Co = LT + e-LT (see Appendix A.I)

As you can see, when the timer is zero, the number of transactions in a Group Commitbuffer is a function of the transaction rate and the time a write to the log takes. It iscompletely independent of any other factors. For this reason, it is possible to pick areasonable time for the log write and chart Co as a function of transaction rate. See figure2.

6

Page 13: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

10

9

8

.. 70u"0

CZI 6..eQ... 5c:0-... 40..c:0.. 3~

2

Fig.2.

Transactions per Second

Zero-Timer Group Commit Buffer Sizewith L = .033 seconds

7

Page 14: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

6) Response Time With Non-Zero Timers

The calculation of the system utilization and transaction response time follows the samepattern for systems using non-zero timers as it does for systems with zero timers. Thereare two differences: we expect the utilization to be different and the response tiine withnon-zero timers must include an expected wait for the timer to pop. Let's define:

D - The timer delay (in seconds). Every D seconds, the timer will pop and a GroupCommit Buffer write will be attempted.

- The average number of transactions in a Group Commit Buffer when timers arenon-zero.

- The CPU utiliziltion when timers are non-zero.

- The average transaction's CPU response time when timers are non-zero.

We can now state that:

(A + B / Cd) * T

(A+ B)--- +0/2(1 - Ud)

seconds.

In a system using non-zero Group Commit Timers, there is only one reason for writing aGroup Commit Buffer. the timer pops, and at least one transaction needs a commit recordwritten.

We must determine Cd to be able to evaluate this.

6.1) Calculating Cd

To evaluate Ci, we must first discuss the conditions under which a Group Commit bufferwill be wrinen. When the timer pops (every D seconds), the Group Commit buffer willbe wrinen to the log if one or more transactions are waiting to write a commit record. Allof the waiting transactions will be included in the buffer. We are assuming that a fullbuffer is so unusual a condition as to be negligible.

What we must be careful to consider, however, is the probability that the timer will popand there will be no transactions waiting to be committed. The Group Commit bufferwill not be wrinen under those circumstances.

To evaluate this probability, we use a Poisson distribution. We represent the probabilitythat exactly n transactions are waiting for a commit record to be written as:

Pro(n)

where TO is the transaction rate (1) times the delay interval (D). This represents theexpected mean number of transactions.

8

Page 15: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

We can now describe the expected number of records 'Minen in a Group Commit bufferas:

The probability that a Group Commit write will occur is the same as the probability thatat least one transaction is ready to commit when the timer pops. This is:

So, the average number of transactions in a Group Commit buffer will be:

Expected records per write

Probability that a write occurs

Which is:

Which boils down to become:

TO

1 - e-TD(see Appendix A.2)

Now that we know how to calculate the average response time, what do we do y,i.th it?Our goal is to employ the timer mechanism in such a way that it minimizes the averagetransaction's response time. If we can derive a formula based on information available inthe system, we can construct an algorithm to set the timers dynamically, based upon thesystem load.

9

Page 16: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

7.1) Minimizing Response Time

The average response time for a transaction (excluding I/O) when timers are in use is:

(A+ B)~ = + 0/2 seconds.

(1 - Ud)

Our goal is to rearrange this formula to be a function of:

T

B

A

D

- The transaction rate of the system. This we can directly measure as the systemoperates.

- The cost of writing tl Group Commit Buffer. This is Q consumt cost on Q givensystem. Its value is determined by using perfamuznce measuring tools.

- The average transaction's CPU cost (excluding the cost of commit). This we cancalculate as a function of B, T, measured CPU utilization, and measured rateof writing the Group Commit Buffer.

- The Group Commit Timer Delay.

Once we have this formula, we then assume that B, A, and T are constants with respect to~. The calculation of Ret can then be viewed as a function in terms of D. The minimumvalue of~ can then be found by taking its derivative and solving for:

Rd' =0

0- TAD -B +Be-TC

B(A + 8)(TDs-TC - 1 + S-TD)~. =------------

(0 - TAD - 8 + 8e·TC )2

This strategy gives us:

OA + DB-~= + 0/2 (see Appendix A.3)

+ 1/2

(see Appendix A.4)

If we assume that Rd' =0, we can then derive that:

B(1 - e·TO) + ,,( 2B(A + B)(1 _S·TO -TOe-TO))0=

1 - TA(see Appendix A.S)

This is the formula that defmes the proper value for a Group Commit Timer.

10

Page 17: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

7.2) Computing the Optimal Timer Delay (D)

As we have just seen, D may be expressed as:

8{1 • e-To) + ".}( 2B(A + 8)(1 - e-TO -TOe-TO) )0=

1 - TA

Unfortunately. this solution for D involves D in the right half side of the equation. Wehave chosen to detennine a method to guess at an initial value for D and then refine theguess to be within acceptable limits.

When we solved for D. we did so assuming that Rd' = O. We then rearranged thefunction to arrive at a value for D. In doing so. we really arrived at a different function Ywhich we know has the same value for D at the point Y =O. It is impottant to realize thatthe function Y is not really the same as Rd'. We have chosen as Y the function:

y =28(A+B)(TDe-TO -1 + e-To) + {D - TAD - B + Be-ro)2

(see Appendix A.6)

Once we have an initial guess for the value of D. we will use the function Y to find abetter guess. The Newton-Raphson method[PREss] extrapolates to flOd the next estimateof the root that we are searching for. This is done by geometrically extending the tangentline at the current point Y(DJ until it crosses zero. This then becomes our next guessY(Di+l)'

So. to iterate. we have the equation:

To do this we need Y'. We have concluded that this is:

y' = 2{ (1 - TA)D - B(1 • e-To)) (1 - TA - TBe-TO)

- 2B(A + B)T2De-ro(see Appendix A.6)

For our initial guess. we will assume that TD is large. This causes e -TO to approach zero.We will take as our guess the value of D when e-TD is zero. This gives us:

B + ..J2B(A + B)Do =

1 - TA

So. to calculate the optimal timer values. we will first calculate Do using the aboveformula. We then calculate successive D j values for a few iterations. In most cases. thisconverges very rapidly.

11

Page 18: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

8) Calculating A From System Statistics

The preceding section derives a formula for the timer delay value which is a function ofB, T, and A. B is a constant (for a given release of the operating system). T ismeasurable from the system. It remains to describe how A can be calculateddynamically.

Defme the following terms:

R - The frequency with which the Group Commit Buffer is written. The system cantrivially maintain statistics to provide this.

Urys - The total system work. This is represented by the CPU utilization. Again, thesystem can tril7ially maintain statistics to provide this.

As discussed above, all of the work for transactions running on the system may beconsidered to be divisible into two parts: A and B. B is the cost of writing a commitbuffer. A is the remaining cost of the transaction.

This means that we can consider the work performed by the system to have twocomponents:

U.ys = ( R • B ) + Other transaction work

The other transaction work may be averaged among the transactions processed during thesample interval. This implies that:

Other transaction work =(A • T)

This leaves us with:

Uays =(R • B) + (A *1)

Where R, Usys, and T are measurable and B is a constant. Rearranging things leaves uswith:

A=Usya - (A • B)

T

12

Page 19: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

9) The Errect of Group Commit Timers on Tbrou&hput

To learn about the effects of Group Commit timers, we set up some spread sheets to plotthroughput verses response time for a number of artificial transactions. In the fl!St, weassumed that:

A =100 Milliseconds

B =50 Milliseconds

L • 33 Milliseconds

This nicely models a machine in which a relatively low transaction rate is achieved on asingle processor. This gave us the following results:

Ii;, '" I ' I , I

3 s 6 7 8 9

Fig.3·

Tronsoc1ions per Second+ Non-Zero Timers

System Throughput vs. Response Time _. Trial 1

13

Page 20: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

Next, we ttied to analyze what would happen if the clock nte of the machine was 10times as great. This would give us:

• 10 Milliseconds

.5 Milliseconds

• 33 Milliseconds (remember, L is wall clock time)

A

B

L

1.2

1.1

,..•" 0.9c:0

'"• 0.81/1.....• 0.7e~

• 0.6•c:0a. 0.5••II:• 0.4at0

'" 0.3•>c0.2

0.1

0

10 20 30 40 50 60 70 80 90

o Zero TImersTronsoctions per S.cond

+ Non- Z.ro TImers

Fig.4. System Throughput vs. Response Time •• Trial 2

14

Page 21: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

Notice that the dramatic results are a consequence of the ratio between A and B (which iswhat we experience on our machine). If you change this ratio because you have extragood logging or you have IlJlef transactions. the effect is present. but not IS pronounced.Let's try:

A • 100 Milliseconds

B = 10 Milliseconds

L .33 Milliseconds (remember, L is wall clock time)

3.4

3.2

3

2.8,..• 2.6'1:1c:0 2.4ue

1Il 2.2......e 2e~ 1.8e• 1.6c:0~ 1.4•elII: 1.2eell0..e 0.8>~

0.6

0.4

0.2

0

2

c Zero Timers

Fig.S·

Transactions per Second+ Non-Zero Timers

System Throughput vs. Response Time •• Trial 3

15

Page 22: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

10) Testimonial

In the Spring 'of 1987. Tandem gathered 32 of VLX processors together to perform alarge benchmark as a prelude to the announcement of our NonStop SQL product. Whenwe suned the benchmarlc (which consisted of DebitCred.it£3] transactions with 2 secondresponse time). we were not using the Group Commit Timers. Running with timer valuesof zero, we were able to process 165 transactions per second. .

Mter performing some "back-of-the-envelope" calculations, the Group Commit Tunerwas set. By using the Group CommitTimer (and the Audit Flush Timer discussed belowin section 11), we were able to process 208 transactions per second with 2 second.

11) Future Research

A number of other applications for dynamic timers have been discussed amongst ourcohons. The idea of Group Commit Timers is an approach that is not specifically relatedto Group Commit. The idea works whenever you have multiple activities which could beaccomplished more efficiently if bundled together into one unit of work.

Some ideas that have come to mind include:

• TIMERS AND MESSAGE BASED SYSTEMS

The Tandem system is a distributed multi-processor system. The various CPUscommunicate with each other by sending messages. Currently, somewhere inexcess of 20% of our machine cycles are spent sending messages.

The overhead for sending a message has two components: a "per-message"overhead and a "per-byte" overhead. If we could bundle multiple messages into asingle transmission, some of the "per-message" overhead cost could be amortizedover the messages that were bundled.

• TIMERS FOR DATA COMMUNICATION

Most of the data communications protocols have large overheads associated witheach transmission. By bundling multiple messages going in the same direction,into a single transmission, many CPU cycles could be saved.

As with Group Commit. the savings associated with increased sharing of the samework could reduce the response time of the work enough to justify hangingaround for a while to see if anyone else wants to hitch a ride.

16

Page 23: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

• AUDIT FLUSH TIMERS

The Tandem system is a distributed system in which the before and after auditimages for database updates are frequently generated and buffered in a CPU thatis not physically connected to the disk holding the log{4.51. To commit atransaction, this audit must be transmitted to one of the CPUs that is physicallyconnected to the Log disk. .

We have found that significant savings may be obtained by using timers to delaythe transmission of the audit images for a while. When timers are used here, thebuffer that is sent will contain the audit images for more transactions that areready to commit.

• LOCK RELEASE TIMERS

This was an idea that didn't work. In the Tandem system, the record locks on thedatabase are maintained in the process that manages the disk containing therecords being locked. To release the locks, the system must dispatch the diskprocess. We tried to see if by delaying the dispatch of the disk process, we couldcause the disk process to release the locks for multiple transactions at the sametime.

Well, it would release the locks for multiple transactions at the same time, but theCPU savings weren't wonh the increased lock contention. Oh, well.

11.1) Shepherds

As discussed above, the Tandem system is already employing two timers that affect thelife of a transaction. We are considering the possibility of adding more.

When one function controlled by a timer may be begun as soon as another functioncontrolled by a timer completes, a token (or Shepherd) may be passed as each function isprocessed. Even if the second function (e.g. the Group Commit buffer flush) must waitfor a bunch of completions of the first function (e.g. Audit Flush from all of theconstituent processors), the shepherd may be used.

The basic advantage to this approach is that you can get savings by sharing the work ofall of the various functions while only delaying for the processing of the fIrst function.Once the first function starts, you, your other transactions, and the shepherd will getimmediate service from the remaining functions.

Our guesstimates indicate that shepherds come out dead even with multiple timers on oursystem in which two levels of timers are employed. Once we start bundling more of thesystem functions into activities that can be simultaneously performed for multipletransactions, shepherds will be looked at even more closely.

17

Page 24: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

11.2) A Minor Anomaly

When we went to model the number of transactions in a Group Commit bUffer when non­zero timers are usecL we found that the number was worse than the zero timer cases in avery small range (this effect is visible in Figure 6). The problem occurs at lowtransaction rates when timers are fU'St enabled and the timer value is still smaller than L(the wall clock time to do a log 110). At this point, our Response Time model ceases tobe valid.

We decide not to worry about this at this time and chose, instead, to restrict the system tobehave as if the timer value was zero when the calculated optimal timer value wassmaller than L.

10

9

~

•. 8.::l

CZI-E 7

E0(J 6IQ.::l0 5~

0~•Q. ~

•c0- 3uI)

•c2I)..

~

10 20 30 40 50 60 70 80 90

Fig. 6 -

c:J Ziro TImlrsTransactions plr Second

+ Non-Zlro TImlrs

Non-Zero Timer vs Zero Timer Buffer SizesThis is assuming the following values:

L •.033 secondsA· .010 ~ondsB •.005 seconds

18

Page 25: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

12) Conclusions

Group Commit timers have been shown to provide very noticeable increases inthroughput while still meeting response time objectives. They offer a surprising gain fora modest amount of implementation.

We feel quite strongly that timers are important. The arbitrary selection of a specificvalue for the timer could harm some transaction mixes. An example of this would be asystem running serialized batch transactions. The use of a non-zero timer will only addto the response time. To expect a system manager to wisely select an appropriate timervalue is also untenable. For these reasons, it seems clear that the system shoulddynamically calculate an appropriate timer value based on the system load.

While more work is needed to perfect the model, we feel comfortable using theseformula in our production software to calculate the Group Commit Timer value.

19

Page 26: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

A) Appendix A: Hieroglyphics

This appendix fills in all of the "leaps of faith" that have been made in the rest of thepaper.

A.I) Calculate Co

As described in section 5.1. the formula for the expected Group Commit Bufferpopulation (0)) when no timers are used is:

Wi + WdCodCo=

A.I.I) Calculate COd

The calculation for Cod is almost identical to the calculation for Cs. The only difference isthat the time to accumulate transactions is not D. but L (the service time on a log I/O).Following the same arguments as those presented for Ct. we have:

Expected records per delayed writeCod=------------

Probability that a delayed write occurs

Which is:

COd =

Note that:

00

L iPLT(i) =is'

00

L iPLT(i)i-<l

This is because when i = o. OPi(LT) equals O. But note that:

00

L iPLT(i) =LTi-O

20

Page 27: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

Hence:

00

L iPLT(i) = LTi-1

Now we observe that:

=1 - PLT(O)

This means that:

COd =

A.l.2) Calculate Wd

The number of delayed writes to the disk is somewhat complicated to compute. Adelayed write occurs whenever a log write fInishes and 1 or more transactions is waitingto commit. These transactions must have arrived while the log I/O was outstanding (or inthe time interval L). The probability that one or more transactions is waiting can then beexpressed as:

(1 - PLT(O»

This means that the number of delayed writes is:

Wd = W, (1 - PLT(O»

= (Wi + Wd) (1 - PLT(O»

= Wj(1 - PLT(O» + Wd(1 - PLT(O»

= Wi(1 • PLT(O» + Wd • WdPLT(O)

So:

=Wi(1 - PLT(O»

Wj(1 - PLT(O»=-----

PLT(O)

21

Page 28: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

A.l.3) Calculate WI

Let's start out with:

=

Wi(1 • PLT(O))= Wj+ -----

PLT(O)

=

Giving us:

Which means that:

WjPLT(O)

-PLT(O)

1

Wj(1 • PLT(O))+

PLT(O)

A.l.4) Put Co Together

Now that we know both Cod and Wd, we can calculate that:

LT

Co=

Next, we recall that

Wi + WdCOd

Wt

Substituting gives us:

Co =Wt

= PLT(O) + LT

Now. we have to look in our textbook[PREss] to tell us the value of the PoissonProbability Function. According to the magic formula in the book,

P1I (OJ:; fi-);

22

Page 29: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

Hence,

PLT(O) =e-LT

And we get:

Co =LT + e-LT

A.2) Calculate Cd

The calculation for Ci is almost identical to the calculation for Okl. The only difference isthat the time to accumulate transactions is D when calculating Cs and L (the service timeon a log I/O) when calculating Cod. For this reason, the discussion here and in section 6.1is almost identical to the discussion in sections 5.1 and A. I. I.

In section 6.1, we determined that:

Note that:

This is because when i =0, OPi(TD) equals O. But note that

00

I. iPTO(i) = TOi.e

23

Page 30: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

Hence:

Now we observe that:

=1 - PTO{O)

This means that:

TOCd=

1 - Prt>{O)

TOCd =

1 - e-TO

A.3) Calculate~

As described in section 6,~ is:

(A + B)Rc1 = + 0/2

(1 - Ud)

Substituting for Ud (again described in section 6) we have:

Ra=

=

=

(A+ B)+ 0/2

1 • T{A+B/Cd)

(A + B)+ 0/2

1 • TA· TB/Cd

CaA + CdB+ 0/2

Cd-TACd-TB

Substituting for Ci (described in section 6.1) yields:

ATO/(1 - e -TO) -+- BTO/(1 - e -11))~ = + 0/2

TO/{1 - e -TO) • T2AO/(1 - e -TO) • TB

24

Page 31: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

Multiplying above and below by (1 • ~ -TO) yields:

ATD + BTD

=

--------+ 0/2TO • T2AD • TB + TBe-TO

AD+BD+ 0/2

O· TAD· B + Be-TO

A.4) Calculate Its'

Let's make the following assignments:

f=AD + BD

g =0 . TAD· B + Be-TO

Then we have:

1'g • fg'Ret' = + 1/2

g2

1'= A+B

g' = 1 - TA - TBe -TO

1'g = (A + B)(D - TAD - B + Be -TO)

= AD· TA2D· AB + ABe-TO + BD· TABD· B2 + B2e-TO

fg' = (DA + DB)(1 • TA· BTe -TO)

= AD • TA2D • ABTDe -TO + BD • TABD • B2TDe -TO

1'g - fg' =So, when we combine fg and fg' we get:

AD· TA2D· AB + ABe -TO+ BD· TA8D· 82 + 82e-TO

• AD + TA2D + A8TDe -TO - BD + TABD + B2TDe-TO

= -A8 - B2 + A8e -TO + 82e -TO + ABTDe -TO+ B2TDe -TO

= (A + B)(-B + Be -TO + TBDe -TO)

= B(A + 8)(TDe -TO -1 + e -TO)

25

Page 32: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

Now, we can assemble R.' as:

fg - fg'~'= + 1/2

g2

B(A + B}(TDe -TO -1 + e -TO)= +1n

(0 - TAD - B + Be-TO)2

A.5) Solve for D When <R4t =0)

Now, we are going to attempt to solve for Ri' = O. We do this by assuming that~' = 0and munching around with the equation.

B{A + 8){TDe -TO -1 + e -TO)Rd' = + 1/2

(D - TAD - B + Be-TO)2

B{A + B)(TDe -TO -1 + e -TO)0= + 1/2

(0 - TAD - 8 + 8e-TO)2

So this gives us:

8(A + 8)(TOe -TO -1 + e -TO)-1/2 =--------­

(0 - TAD - 8 + Be-TO)2

(0 - TAD - 8 + 8e-TO)2 =-28(A + 8)(TOe -TO -1 + e -TO)

( 0(1 - TA) - B{1 - e -TO) )2 =2B{A + B)(1 - e -TO - TOe -TO)

Taking the square root of both sides we have:

O{1-TA) - B{1-e -TO} =..J( 2B{A + B)(1 - e -TO - TOe -TO)

0(1-TA) = 8(1-e -TO) + ,,( 28{A + B){1 - e -TO - TOe -TO)

0=B{1 - e-TO) +..J( 28{A + 8)(1 - e-TO -TOe-TO) )

1 - TA

26

Page 33: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

A.6) Calculating Y and Y'

We take the following equation from section A.S above.

(D· TAD· B + Be·TO)2 =. 2B(A + B)(TOe -TO -1 + e -TO)

Modifying this slightly we have:

0= (0· TAD - B + Be-TO)2 + 2B(A + B)(TOe -TO -1 + e -TO)

Let's defme a function Y as:

Y = (0 - TAD - B(1 - e-TO)2 + 2B(A + B)(TDe -TO -1 + e -TO)

The function Y has a value of 0 exactly when ~. = O. Therefore, if we fmd the value foro that causes Y to equal 0, we will have found the value for 0 which causes ~' to equalO. This value for 0 is the optimal value for the timer.

So, as mentioned above in section 7.2, we need to find Y' to use the Newton-Raphsonmethod to approximate where Y = O. Let's rearrange Y slightly to get:

y = 28(A + 8)(TOe-TO ·1 + e-TO) + «1 - TA)O - B(1 - e-TO»And we get:

Y' = 2B(A + B)(Te -TO - T2De -TO •Te -TO) +

2«1 - TA)O • B(1 - e-TO»(1 - TA - TBe-TO)

y' • 2( (1 - TA)O - 8(1 - e-TO»(1 - TA - TBe-TO)

- 28(A + B)T2De-TO

27

Page 34: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

B) References

[1] HElLAND, PAT. "The Transaction Monitoring Facility (TMF)", IEEEDatabase Engineering, June 1985.

[2] GAWUCK, DIETER, AND KINKADE, DAVID, "Varieties of Concurrency. Control in IMSNS Fast Path",IEEE Database Engineering, June 1985.

[3] ANON, ET AL., "A Measure of Transaction Processing Power", TandemTechnical Repon TR 85.2, February 1985.

[4] BORR. ANDREA. "Transaction Monitoring in ENCOMPASS: ReliableDistributed Transaction Processing", Proc. Seventh InternationalConference on Vcry Large Data Bases. September 1981.

[5] BORR, ANDREA, Robustness to Crash in a Distributed Database: A NonShared-Memory Multi-Process Approach", Proc. Tenth InternationalConference on Very Large Data Bases, August 1984.

28

Page 35: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem
Page 36: Group Commit Timers and High-Volume Transaction Systems · Group Commit has been in use for a number of years now [Gawlick]. Without group commit, the transaction rate of a single-logsystem

Distributed byAtf'TANDEM

Corporate Information Center10400 N. Tantau Ave., LOC 248-07Cupertino, CA 95014-0708