copyright 2003 2013 by curt hill transaction management an overview
DESCRIPTION
Examples Buying a product –Finding how many there are –Removing the sold ones from stock –Problem is two requests that appear simultaneously and cannot both be satisfied Complicated queries –Any query where a single table appears more than once in the From –The multiply referenced tables should be the same even with concurrent updates Copyright © 2003 – 2013 by Curt HillTRANSCRIPT
Copyright © 2003 – 2013 by Curt Hill
Transaction Management
An Overview
Copyright © 2003 – 2013 by Curt Hill
Transaction• Any one execution of a user program
– A sequence of SQL statements– A program that accesses the DBMS to
accomplish a similar action• Mostly interested in transactions
that change the database– An interaction with queries is also
interesting• A transaction is the unit of interest
for concurrent execution and recovery
Examples• Buying a product
– Finding how many there are– Removing the sold ones from stock– Problem is two requests that appear
simultaneously and cannot both be satisfied
• Complicated queries– Any query where a single table appears
more than once in the From – The multiply referenced tables should
be the same even with concurrent updates
Copyright © 2003 – 2013 by Curt Hill
Copyright © 2003 – 2013 by Curt Hill
The acronym ACID• Atomic
– Transaction perceived to be indivisible• Consistent
– Transforms database from one consistent state to another
• Isolated– Understandable without regards to any
other agents or transactions• Durable
– Once committed permanence is guaranteed even with system crashes
Copyright © 2003 – 2013 by Curt Hill
Atomic• Either all the actions are applied or
none of them are applied• No incomplete actions are allowed• Since a transaction is made up of
many smaller actions:– Some of these may be done before a
problem occurs– The DBMS must be able to undo any
of the smaller pieces of a transaction if the entire transaction is aborted
Copyright © 2003 – 2013 by Curt Hill
Errors• What causes a transaction to fail?• Expected problem
– The withdraw amount exceeds the ATM machine or source account
– The transaction aborts itself• Transient problem
– For reasons seen later the DBMS aborts the transaction and restarts it later
• System error– Disk failure, power failure among others
Copyright © 2003 – 2013 by Curt Hill
Durability• A transaction may either commit or
rollback• Committed transactions must be
durable• Rolled back transaction must be
completely undone– As if they were never executed at all– Queries are no problem, updates are
• Once a transaction executes a commit or rollback this should be accomplished even if the system crashes
Copyright © 2003 – 2013 by Curt Hill
Consistency• Two domains:
– A database is consistent if all constraints are met
– A database is consistent if it correctly models some real-world situation
• The first is met by the normal checking of the DBMS
• The second is the responsibility of the transaction
Copyright © 2003 – 2013 by Curt Hill
Consistency example• Transferring money from one
account to another– Removing the money from one
account leaves the database inconsistent in the second sense
– This is corrected when the money is added to the second account
• Both actions must be in the same transaction
Copyright © 2003 – 2013 by Curt Hill
Isolation• Interleaving of the actions within
several different actions may occur• The DBMS must guarantee that
result will be the same as if they were completely serialized
• This interleaving is from concurrent execution
• Without interleaved execution, performance will be poor
Copyright © 2003 – 2013 by Curt Hill
Transaction Schedules• A transaction is a list of actions• The actions include:
– Reads and writes of tuples– Commit or rollback commands– Others: arithmetic, comparisons, etc
• Two lists may only interact through the reads and writes
• A transaction schedule is a ordering all of the actions from a group of transactions
Copyright © 2003 – 2013 by Curt Hill
Schedules• A schedule interleaves the actions
of several transactions• A complete schedule has all the
actions of all the transactions• A serial schedule removes
interleaving• Isolation dictates that the serial
schedule ends with the same result as an interleaved schedule
Copyright © 2003 – 2013 by Curt Hill
Concurrency• Can we afford a serial schedule?
– One transaction completely finished before the next one is begun
• No– The impact on performance is too large
• Instead– We have to handle the transactions
concurrently• The issues are discussed in the
concurrency presentation
Copyright © 2003 – 2013 by Curt Hill
Serializability• A serializable schedule has an
equivalent effect to a serial schedule• The serializable schedule allows
interleaved operations, while the serial schedule does not
• The idea is that we get concurrent execution and consistent results
• Consider some examples using two transactions
Copyright © 2003 – 2013 by Curt Hill
Notes on Examples• We have two transactions, T1 and
T2• Each does two reads and writes• These will be denoted by R(A) and
W(A)– Read A (a page)– Write A (a page)
Copyright © 2003 – 2013 by Curt Hill
Example 1 – No Commonality
T1 T2
R(A)
R(Y)R(B)
R(X)W(A)
W(B)
W(X)
W(Y)
Copyright © 2003 – 2013 by Curt Hill
No Commonality• Easy example• Since there are no pages in common• Any interleaving works
– Provided the order is maintained on each side
• Painless and free concurrency• Somewhat more difficult if the Write
is an insert in a B+Tree index which causes splits– Then the pages could be different and
still interfere
Copyright © 2003 – 2013 by Curt Hill
Example 2 – Commonality T1 T2
R(X)
R(Y)
R(Y)
R(X)W(X)
W(Y)
W(X)
W(Y)
Copyright © 2003 – 2013 by Curt Hill
Commonality• This serializes the same as T1
completed and then T2• The order of this could change and
give a different serial • Such as the following
Copyright © 2003 – 2013 by Curt Hill
Example 2 – AgainT1 T2
R(X)
R(Y)
R(Y)
R(X)W(X)
W(Y)
W(X)
W(Y)
Copyright © 2003 – 2013 by Curt Hill
Again• This serializes the same as T2
completed and then T1• Either of these are acceptable• All we have to do is guarantee that
our interleaved schedule is equivalent to some serial schedule, not a particular serial schedule
• We do not care which customer gets the product as long as game is fair
Copyright © 2003 – 2013 by Curt Hill
Non-Serializable Schedule T1 T2
R(X)
R(Y)
R(Y)
R(X)
W(X)
W(Y)
W(X)
W(Y)
Copyright © 2003 – 2013 by Curt Hill
No Equivalent Serial Schedule
• T2’s write of X is lost• T1’s write of Y is also lost• This assumes that the entire page
is given to or received from buffer pool
• There are many other interleavings that also lose something
• This lost update is one of several concurrent execution anomalies
Copyright © 2003 – 2013 by Curt Hill
Interleaved Execution Anomalies
• The above is similar to updating a shared variable in memory
• Databases allow both commit and rollback operations which cause further problems
• These are unlike the shared memory problem
Copyright © 2003 – 2013 by Curt Hill
Four combinations• RR
– Two separate transactions doing a Read– This is only always harmless case
• RW– T1 wants to Read and T2 write page
• WR– T1 wants to Write and T2 read page
• WW– Both want to write same page– This was first example seen
Copyright © 2003 – 2013 by Curt Hill
Reading Uncommitted Data
• Uncommitted data is any data modified by a transaction
• The modification may be in the past or future
• Reading uncommitted data is called a dirty read
Copyright © 2003 – 2013 by Curt Hill
An Aborted Transaction Problem
T1 T2
R(X)
R(X)W(X)
Rollback
T1 T2
R(X)
R(X)W(X)
RollbackW(X)
Copyright © 2003 – 2013 by Curt Hill
Rollback Problems• Notice in previous that T1 was a query,
not an update that had problems• This is an example of a WR problem• We do not need a rollback to cause the
problem• Consider the next situation where each
record has $10,000 – T1 increases this by 10% and T2 increase by
$1000
Copyright © 2003 – 2013 by Curt Hill
A WR ProblemT1 T2
R(X)
R(Y)
R(Y)
R(X)W(X)
W(Y)
W(X)
W(Y)Commit
Commit
Copyright © 2003 – 2013 by Curt Hill
Results of the WR anomaly• The X record gets increased by
2000– 2000 = 10000*10% + 1000
• The Y record gets increased by 2100– 2100 = (10000+1000)*10%
• Not consistent or serializable• T2’s read of X and T1’s read of Y
are dirty reads
Copyright © 2003 – 2013 by Curt Hill
RW Problems• An RW problem can result in an
unrepeatable read• Two reads in a row that yield
different results
Copyright © 2003 – 2013 by Curt Hill
An Unrepeatable ReadT1 T2
R(X)
R(X)
R(X)
W(X)
W(X)Commit
Commit
Copyright © 2003 – 2013 by Curt Hill
What’s the fix?• Lock an object in one of several
ways to prevent this type of interleaving
• The most common protocol is Strict Two Phase Locking
• AKA Strict 2PL• There are other protocols as well
Copyright © 2003 – 2013 by Curt Hill
Types of Locks• Shared lock
– Used for reading• Exclusive lock
– Used for writing but also allows reading• The lock may be on a:
– Tuple– Page– Relation– This is its granularity
• The larger the granularity the less concurrency and easier to manage
Copyright © 2003 – 2013 by Curt Hill
Strict 2PL Rules• A transaction requests a lock when it
desires to access the object– Shared lock for reads and exclusive for writes
• It holds all locks until complete– Either a commit or rollback
• These locks are inserted by the DBMS, not necessarily observed in the transaction
• A transaction that cannot get the lock is suspended until the item is available
• If both reading and writing is desired get the exclusive lock
Copyright © 2003 – 2013 by Curt Hill
Some examples revisited• Two new commands are added• S(A) gets a shared lock on A• X(A) gets an exclusive lock on A• These locks are held until a
commit or rollback
Copyright © 2003 – 2013 by Curt Hill
WW Problem with LocksT1 T2
R(X)
R(Y)
R(Y)
R(X)
W(X)
W(Y)
W(X)
W(Y)
X(X)X(X)
Commit
X(Y)
X(Y)
Commit
Suspendeduntil Commit
Copyright © 2003 – 2013 by Curt Hill
WR Problem with locksT1 T2
R(X)
R(X)W(X)
Rollback
X(X)S(X)
Shared lock causessuspension until rollback
Copyright © 2003 – 2013 by Curt Hill
Another WRT1 T2
R(X)
R(Y)
R(Y)
R(X)
W(X)
W(Y)
W(X)
W(Y)Commit
Commit
X(X)
X(X)
Exclusive lock suspended until commit occurs
Copyright © 2003 – 2013 by Curt Hill
DeadLock
T1 T2
R(X)
R(Y)W(X)
W(Y)
X(X)
X(Y)
X(Y)X(X)
Suspended since T2 has Y Suspended since T1 has X
Both are now deadlocked
Copyright © 2003 – 2013 by Curt Hill
What do we do?• The usual solution is timers• When a transactions suspension
for a lock exceeds a threshold value– Abort it– Rollback all actions– Restart the whole transaction
Copyright © 2003 – 2013 by Curt Hill
Performance• What does locking do to the
performance of a DBMS?• It must slow the DBMS• This is better than incorrect results• It will slow serializable schedules
that otherwise had no problems– This disregards lock bookkeeping
• Consider a previous example
Copyright © 2003 – 2013 by Curt Hill
A Serializable Schedule T1 T2
R(X)
R(Y)
R(Y)R(X)
W(X)
W(Y)W(X)
W(Y)Commit
Commit
Copyright © 2003 – 2013 by Curt Hill
Commentary• This schedule was equivalent to
the serial schedule of T1 followed by T2
• It also had nice concurrency• Introducing locking preserves
correctness but destroys concurrency
Copyright © 2003 – 2013 by Curt Hill
A Serialed Schedule T1 T2
R(X)
R(Y)
R(Y)
R(X)
W(X)
W(Y)
W(X)
W(Y)Commit
Commit
X(X)
X(X)X(Y)
X(Y)
Lock prevents any actionuntil T1’s commit
Copyright © 2003 – 2013 by Curt Hill
SQL• The transaction statements of SQL
are:– Commit– Rollback– Begin
• Not needed for simple queries• Commit is the default for changes
if neither is given
Copyright © 2003 – 2013 by Curt Hill
Begin• Begin is used to show boundaries• Form is:
BEGIN;orBEGIN TRANSACTION
• The transaction is then terminated by the Commit or Rollback
Copyright © 2003 – 2013 by Curt Hill
Example• BEGIN;
Insert Into course VALUES ('PHYS', 141, 4, 'College Physics')Insert Into course VALUES (‘CSCI', 242, 3, ‘Data Structures')COMMIT
• The more common usage is through a program which determines whether to commit or rollback
Copyright © 2003 – 2013 by Curt Hill
Save Points• A save point is a named point to
complete the rollback• Example:
SAVEPOINT xyz…ROLLBACK to SAVEPOINT xyz
• This prevents rolling back to BEGIN• This was introduced in SQL 1999
Copyright © 2003 – 2013 by Curt Hill
Granularity• The size of the thing to lock
– Tuple– Page– Relation
• Tradeoff– Locking large things impedes
concurrency– Locking small things requires more
overhead and allows bad things to happen
Copyright © 2003 – 2013 by Curt Hill
Phantom Read Problem• Two reads with different results• T1 queries table and finds all rows with
certain criteria• Does exclusive lock on all of these• T2 inserts a new record which matches
the criteria• T1 now re-queries the criteria and gets
a different set• Putting a shared lock on whole table
solves problem but limits concurrency
Copyright © 2003 – 2013 by Curt Hill
Recovery• The Recovery Manager is responsible for
several important issues• Removing rollbacks from the file system• Guaranteeing that the commit changes
are preserved in the file system• Defending against crashes and
rebuilding the database when one occurs
• Recovery management is covered in a subsequent presentation
Closing Thoughts• ACID is the ideal of database
reliabilty• This is not always possible in
distributed databases– CAP theorem precluded– There we settle for BASE
Copyright © 2003 – 2013 by Curt Hill