massively distributed database systems - transaction management spring 2014 ki-joune li lik pusan...

48
Massively Distributed Database Systems - Transaction management Spring 2014 Ki-Joune Li http://isel.cs.pusan.ac.kr/~lik Pusan National University

Upload: clinton-terry

Post on 28-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Massively Distributed Database Systems

- Transaction managementSpring 2014Ki-Joune Li

http://isel.cs.pusan.ac.kr/~likPusan National University

2

Basic Concepts of Transaction

• Transaction• A set of operations• Atomic :

• All or Nothing : Consistent State of Database• Example : Flight Reservation• Cf. Partially Done : Inconsistent State

3

Transaction States

Active

PartiallyCommitted

Failed Aborted

Committedthe initial state; the

transaction stays in this state while it is executing

after the discovery that normal execution can no longer proceed. after the transaction has been rolled back and the

database restored to its state prior to the start of the transaction.

- restart the transaction or - kill the transaction

ALL

NOTHING

4

Transition between consistent states

• Transaction

• Example : Flight Reservation

Consistent State

Consistent State

Set of operations

BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi;END_TRANSACTION (a)

BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi full =>ABORT_TRANSACTION (b)

5

ACID Properties

• Atomicity. • All or Nothing• Not Partially Done• Example : Failure in Flight Reservation

• Consistency. • Execution of a transaction preserves the consistency of

the database.

State 1 State 2

All

Nothing

State 2’PartiallyDone

Consistent

Consistent

6

ACID Properties

• Isolation. • Although multiple transactions may execute concurrently, each transaction

must be unaware of other concurrently executing transactions.

• Intermediate transaction results must be hidden from other concurrently ex-ecuted transactions.

• Durability. • After a transaction completes successfully, the changes it has made to the

database persist, even if there are system failures.

DB

Transation 1

Transation 2No Effect

7

Example• Transaction : Transfer $50 from account A to account B:

1. read(A)2. A := A – 503. write(A)4. read(B)5. B := B + 506. write(B)

• Consistency requirement • the sum of A and B is unchanged after the transaction.

• Atomicity requirement

• Durability

• Isolation

Example : Concurrent Execution

• Two Transactions • T1 : transfer $50 from A to B, • T2 transfer 10% of the balance from A to B

Serial Schedule Concurrent Schedule

8

9

Serializability

• What happens after these transactions ?• Serial Schedule :

• Always Correct• T1 T2 and T2 T1

• Concurrent Schedule• Serializable if

Result (T1 || T2) = Result(T1 T2) or Result(T2 T1)

10

Transaction Management

• Transaction Management• Guarantee ACID Properties of Transaction by

• Concurrency Control : Isolation and Consistency• Recovery : Atomicity and Durability

Transaction management: Concurrency Control

12

Serializability

• For given transactions T1, T2,.., Tn, • Schedule (History) S is serializable if

Result(S) Result(Sk) where Sk is a serial excution schedule.• Note that Result(Si ) may be different from Result(Sj ) (i j )

• How to detect whether S is serializable • Conflict Graph

13

Conflict Graph

T1

T2

r(a)

w(a)

affects

r(b)

w(b)

S1S1

T1

T2

r(a)

w(a)

affects

r(b)

w(b)

S2S2

Res(S1) Res( (T1, T2) )Res(S1) Res( (T1, T2) )

Res(S1) Res( (T1, T2) )Res(S1) Res( (T1, T2) )

Res(S1) Res( (T2, T1) )Res(S1) Res( (T2, T1) )

14

Detect Cycle in Conflict Graph

T1

T2

r(a)

w(a)

affects

r(b)

w(b) T2T2

T1T1

If Cycle in Conflict Graph Then Not Serializable Otherwise Serializable

15

How to make it serializable

• How to make it serializable• Control the order of execution of operations in concurrent transactions.

• Two Approaches• Two Phase Locking Protocol

• Locking on each operation

• Timestamping : • Ordering by timestamp on each transaction and each operation

16

Lock-Based Protocols

• A lock • mechanism to control concurrent access to a data item

• Data items can be locked in two modes :• Exclusive (X) mode : Data item can be both read as well as written. X-lock is requested using lock-X instruction.• Shared (S) mode : Data item can only be read. S-lock is requested using lock-S instruction.

• Lock requests are made to concurrency-control manager.

• Transaction can proceed only after request is granted.

17

Lock-Based Protocol

• Lock-compatibility matrix

• A transaction may be granted a lock on an item • if the requested lock is compatible with locks already held • Any number of transactions can hold shared locks

• If a lock cannot be granted, • the requesting transaction is made to wait • till all incompatible locks held have been released. • the lock is then granted.

18

Lock-Based Protocol in Distributed DBS

• Majority Protocol• Local lock manager at each site administers lock and unlock re-

quests for data items stored at that site.• When a transaction wishes to lock an un replicated data item Q re-

siding at site Si, a message is sent to Si ‘s lock manager.• If Q is locked in an incompatible mode, then the request is delayed un-

til it can be granted.• When the lock request can be granted, the lock manager sends a mes-

sage back to the initiator indicating that the lock request has been granted.

• If Q is replicated at n sites, then a lock request message must be sent to more than half of the n sites in which Q is stored.

• The transaction does not operate on Q until it has obtained a lock on a majority of the replicas of Q.

• When writing the data item, transaction performs writes on all replicas.

19

The Two-Phase Locking Protocol

• This is a protocol which ensures conflict-serializable schedules.

• Phase 1: Growing Phase• transaction may obtain locks • transaction may not release locks

• Phase 2: Shrinking Phase• transaction may release locks• transaction may not obtain locks

• The protocol assures serializability

20

Two Phase Locking

2PL

Strict 2PL

21

Problem of Two Phase Locking Protocol

• Deadlock• Growing Phase and Shrinking Phase

• Prevention and Avoidance : Impossible• Only Detection may be possible

• When a deadlock occurs• Detection of Deadlock : Wait-For-Graph• Abort a transaction

• How to choose a transaction to kill ?

22

Timestamp-Based Protocols

• Each transaction is issued a timestamp when • TS(Ti) <TS(Tj) :

• old transaction Ti and new transaction Tj

• Each data Q, two timestamp :• W-timestamp(Q) : largest time-stamp for successful write(Q) • R-timestamp(Q) : largest time-stamp for successful read(Q)

23

Timestamp-Based Protocols : Read

• Transaction Ti issues a read(Q)• If TS(Ti) W-timestamp(Q),

• then Ti needs to read a value of Q that was already overwritten. • Hence, the read operation is rejected, and Ti is rolled back.

• If TS(Ti) W-timestamp(Q), • then the read operation is executed, and • R-timestamp(Q) is set set

24

Timestamp-Based Protocols : Write

• Transaction Ti issues write(Q).• If TS(Ti) < R-timestamp(Q),

• then the value of Q that Ti is producing was needed previously, • and the system assumed that that value would never be produced. • Hence, the write operation is rejected, and Ti is rolled back.

• If TS(Ti) < W-timestamp(Q), • then Ti is attempting to write an obsolete value of Q. • Hence, this write operation is rejected, and Ti is rolled back.

• Otherwise, • the write operation is executed, • and W-timestamp(Q) is reset

25

How to manage global TS – Clock

• In distributed systems• Multiple Physical Clocks : No Centralized Clock

• Logical Clock rather than a Global Physical Clock

26

Global Clock and Logical Clock

• Global Clock• TAI (Temps Atomique International) at Paris• Time Server• Broadcasting from Satellite • Granularity

• Logical Clock• Not absolute time but for ordering • Lamport Algorithm

• Correction of Clock• T(A, tx) < T(B, rx)

27

Logical Clock

• Logical Clock• Not absolute time but for ordering • Lamport Algorithm

• Correction of Clock – Never run backward• C(A, tx) < C(B, rx)

12345

56789

1011121314

678

151617

101112

910

1314

1819

12345

56789

1011121314

678

151617

101617

1819

1819

1819

28

Implementation of Concurrency Control in Distributed Systems

• Three Managers• TM (Transaction Manager) : Ensure the Atomicity • Scheduler : Main Responsibility for Concurrency Control• DM (Data Manager) : Simple Read/Write

In a single machine

In distributed sys-tems

Transaction Management: Recovery

30

Failure Classification

• Transaction failure :• Logical errors: internal error condition• System errors: system error condition (e.g., deadlock)

• System crash: • a power failure or other hardware or software failure • Fail-stop assumption: non-volatile storage contents are assumed to

not be corrupted by system crash• Database systems have numerous integrity checks to prevent corrup-

tion of disk data

• Disk failure

31

Recovery Algorithms

• Recovery algorithms : should ensure • database consistency• transaction atomicity and• durability despite failures

• Recovery algorithms have two parts1. Preparing Information for Recovery : During normal transaction2. Actions taken after a failure to recover the database

32

Storage Structure

• Volatile storage:• does not survive system crashes• examples: main memory, cache memory

• Nonvolatile storage:• survives system crashes• examples: disk, tape, flash memory,

non-volatile (battery backed up) RAM

• Stable storage:• a mythical form of storage that survives all failures• approximated by maintaining multiple copies on distinct nonvolatile

media

33

Recovery and Atomicity

• Modifying the database • must be committed• Otherwise it may leave the database in an inconsistent state.

• Example• Consider transaction Ti that transfers $50 from account A to

account B; goal is either to perform all database modifications made by Ti or none at all.

• Several output operations may be required for Ti • For example : output(A) and output(B). • A failure may occur after one of these modifications have been

made but before all of them are made.

34

Recovery and Atomicity (Cont.)

• To ensure atomicity despite failures, • we first output information describing the modifications to stable

storage without modifying the database itself.

• Log-based recovery

35

Log-Based Recovery

• A log : must be kept on stable storage. • <Ti, Start>, and <Ti, Start>• < Ti, X, V1, V2 >

• Logging Method• When transaction Ti starts, <Ti start> log record

• When Ti finishes, <Ti commit> log record

• Before Ti executes write(X), <Ti, X, Vold , Vnew > log record• We assume for now that log records are written directly to stable storage

• Two approaches using logs• Deferred database modification• Immediate database modification

36

Deferred Database Modification

• The deferred database modification scheme • records all modifications to the log, writes them after commit.

• Log Scheme• Transaction starts by writing <Ti start> record to log. • write(X) :

• <Ti, X, V> • Note: old value is not needed for this scheme• The write is not performed on X at this time, but is deferred.

• When Ti commits, <Ti commit> is written to the log • Finally, executes the previously deferred writes.

37

Deferred Database Modification (Cont.)

• Recovering Method• During recovery after a crash,

• a transaction needs to be redone if and only if both <Ti start> and<Ti commit> are there in the log.

• Redoing a transaction Ti ( redoTi) sets the value of all data items updated by the transaction to the new values.

• Deletes Ti such that <Ti ,start> exists but <Ti commit> does not.

38

Deferred Database Modification : Example

• If log on stable storage at time of crash is as in case:(a) No redo actions need to be taken

(b) redo(T0) must be performed since <T0 commit> is present

(c) redo(T0) must be performed followed by redo(T1) since <T0 commit>

and <Ti commit> are present

T0: read (A) T1 : read (C) A: = A - 50 C:= C- 100 write (A) write (C) read (B) B:= B + 50 write (B)

39

Immediate Database Modification

• Immediate database modification scheme • Database updates of an uncommitted transaction • For undoing : both old value and new value

• Recovery procedure has two operations• undo(Ti) : restores the value of all data items updated by Ti

• redo(Ti) : sets the value of all data items updated by Ti

• When recovering after failure:• Undo if the log <Ti start>, but not <Ti commit>.• Redo if the log both the record <Ti start> and <Ti commit>.

40

Immediate Database Modification : Example

• Recovery actions in each case above are:(a) undo (T0): B is restored to 2000 and A to 1000.

(b) undo (T1) and redo (T0): C is restored to 700, and then A and B are

set to 950 and 2050 respectively.

(c) redo (T0) and redo (T1): A and B are set to 950 and 2050

respectively. Then C is set to 600

41

Idempotent Operation

• Result (Op(x)) = Result (Op(Op(x))• Example

• Increment(x); : Not Idempotent• x=a; write(x); : Idempotent

• Operations in Log Record• Must be Idempotent, otherwise

• Multiple Executions (for redo) may cause incorrect results

Mutual Exclusions and Elec-tion algorithms

43

Mutual Exclusion : Monitor (Coordina-tor)• In a single system : Easy to implement by semaphore• In distributed systems : No shared data for semaphore• A Centralized Algorithm

• Simple• But single point of failure

44

Mutual Exclusion : Distributed Algorithm

• No central coordinator• Rule : When a system receive a request

• Not in CS or Not to enter : OK to requester • In CS : No reply• To enter : compare Timestamp and one with lower TS

wins

45

Mutual Exclusion : Token Ring

• One with token can enter CS• If a system wants to enter CS :

• wait Token and process it• Otherwise, pass token to the next

Token

Wait until

it gets the token

46

Mutual Exclusion : Comparison

# of messagesper request

delayper request Problem

Monitor 3 2 Crash ofmonitor

DistributedAlgorithm 2(n-1) 2(n-1) n points of

Crash

Token Ring 0 to n-1 0 to n-1 Token lost

47

Election : Bully Algorithm

• When it is found that a coordinator is required

• When it is found that the coordinator has crashed,• Circulate message with priority

48

Election : Ring Algorithm

2

3

45

6

7

8 1

Site 3 finds that the co-ordinator has crashedElect [3]

Elect [3,4]

Elect [3,4,5]

Elect [3,4,5,7]

Elect [3,4,5,7,8]

Elect [3,4,5,7,8,2]

No response Elected [3]

Elected [3]

Elected [3]

Elected [3]

Elected [3]

Elected [3]