advanced transaction management chapter 13. © jim gray, andreas reuter transaction processing -...
TRANSCRIPT
Advanced Transaction Management
Aug. 2 Aug. 3 Aug. 4 Aug. 5 Aug. 6 9:00 Intro &
terminologyTP mons& ORBs
Logging &res. Mgr.
Files &Buffer Mgr.
Structuredfiles
11:00 Reliability Lockingtheory
Res. Mgr. &Trans. Mgr.
COM+ Access paths
13:30 Faulttolerance
Lockingtechniques
CICS & TP& Internet
CORBA/EJB + TP
Groupware
15:30 Transactionmodels
Queueing AdvancedTrans. Mgr.
Replication Performance& TPC
18:00 Reception Workflow Cyberbricks Party FREE
Chapter 13
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 2
Outline
Mixing heterogeneous TMs
High-Availability Commit & Transfer of Commit
Optimizing Commit
Disaster Protection via Data/Application Replication
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 3
Mixing Transaction Managers
Four standards: LU 6.2 ~ APPC ~ CPIC ~ CICS: de facto TP standard
X/Open + OSI/TP : The de jure TP standard. OTS: The CORBA standard TIP: De facto interoperability standard Almost everyone interoperates with LU6.2 LU6.2 has evolved to have presumed abort, not reuse
aborted trids, .. other fixes LU6.2 is "open" two phase commit, documented
interface, reconnection / resolve is documented. Internally, everyone uses private protocols with many
tricks.
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 4
Mixing "OLD" Transaction Managers
Many old TP monitors are not open: Do not expose 2PC (prepare() and commit()) => insist on being root commit coordinator.
All will become X/Open-compliant eventually and thus be open TP monitors.
If stuck with an "closed" TM: Can still get atomicity if: 1. Only one closed TM involved. 2. TM is direct not queued
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 5
Mixing with a Closed Transaction Manager
All "open" TMs and RMs prepared, closed TM does "RUMP"
deferred_update(int id, complex_type list_of_updates) /* rump logic */
{Begin_Work(); /* start a new transaction */
select count(*) from done where id = :id; /* test if work was done */
if not found then /* if not done */
do list_of_updates; /* then do the list of updates.*/
insert into done values (:id); /* flag transaction done */
Commit_Work(); /* commit update and flag */
acknowledge; /* reply success to caller */
} /* in both cases. */
Status_Transaction(TRID trid)
{ select count(*) into :ans from done where trid = :trid; return ans:}
Transaction Gateway to Closed Transaction Mgr
If Not duplicate Do transaction Insert trid in done table Commit Acknowledge
Do Transaction While not acknowledge Send trid + data Wait
Done Table
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 6
Mixing Open Transaction Managers
Gateway translates between external and internal TRID. Gateway translates between external and internal protocols Participates in transaction resolution (is a TM in both worlds)
Local Protocol
Transaction Gateway
OSI Protocol Stack
"Foreign" Transaction Managers
"Our" Transaction
Manager
his trid our tridTrid Map Table
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 7
Mixing Open Transaction Managers
Multiple entry problem:TRID enters system twice at two different paths."works" but looks like two separate transactions.commit dependency is external to system.
Fancy option problem:External/internal TM has an option the other does not.Fakes (or turn off) optimizations/options not supported by one side or the other
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 8
Outline
Mixing heterogeneous TMs
High-Availability Commit & Transfer of Commit
Optimizing Commit
Disaster Protection via Data/Application Replication
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 9
Non-Blocking Commit
The problem: what if the coordinator fails.
Solutions: 1. wait
2. appoint a new coordinator
Appointment can be thought of as a process pair (n-plex)
Works great in a cluster (no communications failures).
Primary Backup Participants
Prepare (+ list of participants and sessions) ack
Prepare
Prepared
Commitack
Commit
Committed
Write Commit Log Record
Log
Completeack Write "Complete" Log Record
Process Pair
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 10
Non-Blocking Commit in a WAN: 3 or Heuristic or Operator Command
Wide area net can partitionProcess pairs cannot reliably decide to take over.Solution(s): 1. Three phase protocol
Broadcast participant list and decision as part of phase 1.5; let (majority) of participants decide if coordinator fails.
2. Heuristic decisionsDefault to commit/abort.Announce Heuristic Mismatch at reconnect if wrong guess
3. Human decisionAnnounce Operator Mismatch at reconnect if wrong guess.
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 11
Transfer of Commit
What if a participant
is more secure than the coordinator?
is more reliable than the coordinator?
Is faster than the coordinator?
Transfer commit authority to him?
Gas Pump
LA Bank
VisaSF Bank
Gas Pump
LA Bank
VisaSF Bank
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 12
Transfer of Commit
Is also an optimization:
saves messages if done as part of commit.
called nested commit protocol
or last resource manager optimization
2 messages vs 5 messages (plus one lazy msg)
Begin Dequeue Prepare doit
Enqueue Commit_Work()Phase 2 Commit
Begin Dequeue doit
Enqueue
Phase 2 CommitCommit
Prepare
No Transfer of Commit Transfer of Commit
complete
complete
Commit_Work()
work request
work request + You are Root!
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 13
Transfer of Commit: More Complex Case
More complex if the root has more than one branch:
Need to set up new sessions among "trusted" nodes
root sends new root name to all participants at phase 1
Lybia
US
Deutschland
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 14
Outline
Mixing heterogeneous TMs
High-Availability Commit & Transfer of Commit
Optimizing Commit
Disaster Protection via Data/Application Replication
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 15
Optimizing Commit
Can optimize:Delay: milliseconds/commitMessage cost: number, size, urgency of messagesIO cost: number, size, or urgency of IOCPU cost: cycles usedThroughput: maximum commit rate.
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 16
Commit: the General Case
Prepare(): 1 rpc or message pair per RM and one per non-root TM1 forced IO per RM (prepare record)1 forced IO per TM(commit record)
Commit(): The same.Summary of 2PC cost:
IO: 2(RM+TM)RPCs: 2(RM+(TM-1))Messages: 4(RM+(TM-1)) (equivalent to RPCs)Delay: 2IO ~ 50ms ~ 10Kins.
4 msg ~ 20ms ~ 50Kins50ms*(RM+TM) + 20ms*(RM+TM-1)
These are the error-free counts (i.e. the minimum values)
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 17
Commit: Simple Optimizations
Presumed abort saves a TM IO (implicit in protocol above)
Do phase 1, phase2 in parallel (saves delay)
Common log (saves RM log forces)
IO: 2(TM)
Messages: 4(RM+TM-1) (equivalent to RPCs)
Delay: 2*IO*TM + 4*M*(RM+TM-1)
~50ms*TM+40ms*(RM+TM-1)
Use Local RPC (10x faster)
~50ms*TM + RM+40ms*(TM-1)
Use WADS for low IO latency(3ms vs 25ms)
~ 6ms*TM + RM + 40ms*(TM-1)
Simple case of 1 TM 2 RM:
~ 8ms delay for a commit.
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 18
Group Commit Optimization
Amortizes IO and messages across several transactions
Adds delay
If N transactions in a group:
IO, Message cost per transaction is ~ 1/N
Small extra delay if one slow step in original path.
As system heats up (commit rate rises) to 25tps
start to install group commit with a 30ms threshold
(at 100tps: 3.3 trans/group).
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 19
Simple Commit Optimizations
Read-only: just get phase1 call to release locks.
Note: may violate ACID, should release read locks
at phase 2 if any locks acquired during phase 1.
Saves messages (Phase 2) and IO (no RM IO).
True read-only transaction must prepare at phase 1
unlock at phase 2.
Unjoin: RM does no work at commit/abort.
Lazy: user-requested group commit. Piggybacks on others.
no extra IO or messages.
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 20
Transaction Commit Trees
one node deep bush general case
share log transfer Parallel ParallelLRPC commit transfer transfer
.
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 21
Transfer of COMMIT: Linear COMMIT
Parent and other sub-trees prepare
then transfer commit authority to remaining child.
Last in chain becomes commit coordinator.
More delay, fewer messages
For N=2, Same delay, 3 vs 4 messages.
Always use it.TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
TM
RM
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 22
Outline
Mixing heterogeneous TMs
High-Availability Commit & Transfer of Commit
Optimizing Commit
Disaster Protection via Data/Application Replication
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 23
Disaster Recovery at a Remote Site
Replicate Data
Applications
Network connection at 2 (or more sites)
Symmetric design:
Either site can process transactions
Asymmetric design:
One site is master of each data item.
Allows: Caching
Batching of updates at backup
So far, asymmetric design is most popular.
To get symmetry, have each node master 1/2 of the db/net.
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 24
Sample Physical LOG RECORD
Basic idea of asymmetric design:
send log from primary to backup
backup applies log to its copy
backup is in constant media recovery
backup processes/sessions/data ready to take over
Client
Primary Backuplog
Session
System Pair
Clients
Primary Backuplog
Symmetric:Two System
Pairs
System PairsBasic Idea
Primary Backup log
Primary
Hub:Central Site Backs
upSeveral Primaries
client Client
Primary
Backup
log &archivedumps
Vault:Backup stores Log
andArchive Dumps
client
Backup
Primary Primary
client
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 25
Sample Physical LOG RECORD
Need some way to decide failure.
Easy in a cluster
Hard in a WAN (partition possible)
Solutions: Extra wires
Wires on demand (dialup)
Human (operator)
Quorum device.
Kind of log?
Logical log is best
loose coupling (allows backup to be a different TM/RM
failure independence (different from physiological log)
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 26
Takeover Logic
/* initialization */
Tell primary I'm here
Setup all RMs and application processes
Open all initial sessions to clients.
/* the main backup loop */
While (not primary) {redo log} /* the main backup loop */
/* Takeover */
redo rest of log
resend most recent message on each session
abort any incomplete transactions
/* Become Primary */
tell application processes to start accepting requests.
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 27
Session Takeover
Just like process pairs Session sequence numbers eliminate duplicates So, get at-least-once delivery: resend msg at takeover
Primary Backup
Network Switches Clients
OSI, SNA,TCP/IP, X..25,etc
Primary Backup
Front Ends Switch Clients
OSI, SNA,TCP/IP, X..25,etc
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 28
Catch-up After Failure
Failed node at restart executes normal restart
Then enters backup logic.
If both fail, outside observer must say who is best
backup has to match its log to new primary.
Design issue: are nodes bit-for-bit identical?
If so, backup must “trim” log to match primary.
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 29
How Safe?
1-SAFE: no extra delay, risks lost transactions
2-SAFE: extra delay (if backup up),
single fault tolerant, high availability
VERY-SAFE: extra delay, no lost transactionslow availability
client
commitcommitok
client
commitcommit
client
commit
commitok
client
out of service
client
commit
commitok
client
commitcommit
primary backup primary backup
Both Up Primary Up, Backup Down
1-Safe
2-Safe
Very Safe
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 30
System Pairs vs Replicated Data
System pairs replicate the application DB application processes sessions
Data replicators only replicate data.
Other aspects left as an exercise for the application designer.
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 31
System Pair Benefits
Tolerates faultsHardwareEnvironmentOperationsHeisenbugs
Can replace software/hardware onlineCan move backup to new building or...Allows design diversity: backup can be completely different
S tep 1 : Bo th sy stems are ru n n in g v ersio n V1 . S tep 2 : Back u p is co ld -lo ad ed as v ersio n V2 .
S tep 3 : S WITCH to Back u p . S tep 4 : Back u p is co ld -lo ad ed as v ersio n V2
PrimaryV1
BackupV1
PrimaryV1
BackupV2
V1
Backup
V2
PrimaryV2
Backup
V2
Primary
© Jim Gray, Andreas Reuter Transaction Processing - Concepts and Techniques WICS August 2 - 6, 1999 32
Outline
Mixing heterogeneous TMs
High-Availability Commit & Transfer of Commit
Optimizing Commit
Disaster Protection via Data/Application Replication