concurrency. busy, busy, busy... in production environments, it is unlikely that we can limit our...

25
Concurrency

Post on 21-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Concurrency

Page 2: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Busy, busy, busy...• In production environments, it is unlikely that we can limit our

system to just one user at a time.– Consequently, it is possible for multiple queries or transactions

to be submitted at approximately the same time.

• If all of the queries were very small (i.e., in terms of time), we could probably just execute them on a first-come-first-served basis.

• However, many queries are both complex and time consuming.– Executing these queries would make other queries wait a long

time for a chance to execute.

• So, in practice, the DBMS may be running many different transactions at about the same time.

Page 3: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Concurrent Transactions

• Even when there is no “failure,” several transactions can interact to turn a

consistent state

into an

inconsistent state.

Page 4: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Example• Assume A = B is a constraint required for consistency.

• Note that we omit OUTPUT steps for succinctness; they always come at the end. We deal only with Reads and Writes in the main memory buffers.

• T1 and T2 individually preserve DB consistency.

Page 5: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

An Acceptable Schedule S1

• Assume initially A = B = 25. Here is one way to execute (S1= T1; T2) so they do not interfere.

Page 6: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Another Acceptable Schedule S2

• Here, transactions are executed as (S2=T2; T1). The result is different, but consistency is maintained.

Page 7: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Interleaving Doesn't Necessarily Hurt (S3)

Page 8: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

But Then Again, It Might!

Page 9: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Semantics of transactions is also important.

Page 10: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

We Need a Simpler Model• Coincidence never happens

• Focus on reads and writes only.

– rT(X) denotes T reads X

– wT(X) denotes T writes X

– Transaction is a sequence of r and w actions on database elements.

– If transactions are T1,…,Tk, then we use ri and wi, instead of rTi and wTi

• Schedule is a sequence of r and w actions performed by a collection of transactions.

– Serial Schedule: All actions for each transaction are consecutive.

r1(A); w1(A); r1(B); w1(B); r2(A); w2(A); r2(B); w2(B);…

– Serializable Schedule: A schedule whose “effect” is equivalent to that of some serial schedule.

Page 11: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Conflicts• Suppose for fixed DB elements X and Y,

ri(X); rj(Y) is part of a schedule, and we flip the order of these operations. Then ri(X); rj(Y) ≡ rj(Y); ri(X)

This holds always (even when X=Y)

• We can flip ri(X); wj(Y) as long as X≠Y

However, ri(X); wj (X) wj(X); ri (X)

In the RHS, Ti reads the value of X written by Tj, whereas it is not so in the LHS.

• We can flip wi(X); wj(Y); provided X≠Y

However, wi(X); wj(X) wj(X); wi(X);

The final value of X may be different depending on which write occurs last.

Page 12: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Conflicts (Cont’d)There is a conflict if one of these two conditions hold.

1. A read and a write of the same X, or

2. Two writes of the same X

• Such actions conflict in general and may not be swapped in order.

• All other events (reads/writes) may be swapped without changing the effect of the schedule (on the DB).

Definitions• Two scheduless are conflict-equivalent if they can be converted into the

other by a series of non-conflicting swaps of adjacent elements

• A schedule is conflict-serializable if it can be converted into a serializable schedule in the same way

Page 13: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Example

r1(A); w1(A); r2(A); w2(A); r1(B); w1(B); r2(B); w2(B)r1(A); w1(A); r2(A); w2(A); r1(B); w1(B); r2(B); w2(B)

r1(A); w1(A); r2(A); w2(A); r1(B); w1(B); r2(B); w2(B)

r1(A); w1(A); r2(A); r1(B); w2(A); w1(B); r2(B); w2(B)

r1(A); w1(A); r1(B); r2(A); w2(A); w1(B); r2(B); w2(B)

r1(A); w1(A); r1(B); r2(A); w1(B); w2(A); r2(B); w2(B)

r1(A); w1(A); r1(B); w1(B); r2(A)w2(A); r2(B); w2(B)

r1(A); w1(A); r2(A); w2(A); r1(B); w1(B); r2(B); w2(B)

r1(A); w1(A); r2(A); r1(B); w2(A); w1(B); r2(B); w2(B)

r1(A); w1(A); r1(B); r2(A); w2(A); w1(B); r2(B); w2(B)

r1(A); w1(A); r1(B); r2(A); w1(B); w2(A); r2(B); w2(B)

r1(A); w1(A); r1(B); w1(B); r2(A)w2(A); r2(B); w2(B)

Page 14: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Conflict-serializability• Sufficient condition for serializability but not necessary.

Example

S1: w1(Y); w1(X); w2(Y); w2(X); w3(X); -- This is serial

S2: w1(Y); w2(Y); w2(X); w1(X); w3(X);

• S2 isn’t conflict serializable, but it is serializable. It has the same effect as S1.– Intuitively, the values of X written by T1 and T2 have no effect,

since T3 overwrites them.

Page 15: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Serializability/precedence Graphs• Non-swappable pairs of actions represent potential conflicts

between transactions.

• The existence of non-swappable actions enforces an ordering on the transactions that house these actions.

• Nodes: transactions {T1,…,Tk}

• Arcs: There is an arc from Ti to Tj if they have conflict access to the same database element X and Ti is first; in written Ti <S Tj.

Page 16: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Precedence graphs

16

r2(A); r1(B); w2(A); r3(A); w1(B); w3(A); r2(B); w2(B)r2(A); r1(B); w2(A); r3(A); w1(B); w3(A); r2(B); w2(B)

r2(A); r1(B); w2(A); r2(B); r3(A); w1(B); w3(A); w2(B)r2(A); r1(B); w2(A); r2(B); r3(A); w1(B); w3(A); w2(B)

Note the following:w1(B) <S r2(B)

r2(A) <S w3(A)

These are conflicts since they contain a read/write on the same elementThey cannot be swapped. Therefore T1 < T2 < T3

Note the following:w1(B) <S r2(B)

r2(A) <S w3(A)

These are conflicts since they contain a read/write on the same elementThey cannot be swapped. Therefore T1 < T2 < T3

Note the following:r1(B) <S w2(B)

w2(A) <S w3(A)

r2(B) <S w1(B)

Here, we have T1 < T2 < T3, but we also have T2 < T1

Note the following:r1(B) <S w2(B)

w2(A) <S w3(A)

r2(B) <S w1(B)

Here, we have T1 < T2 < T3, but we also have T2 < T1

Page 17: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

• If there is a cycle in the graph – Then, there is no serial schedule which is conflict equivalent to S.

• Each arc represents a requirement on the order of transactions in a conflict equivalent serial schedule.

• A cycle puts too many requirements on any linear order of transactions.

• If there is no cycle in the graph– Then any topological order of the graph suggests a conflict

equivalent schedule.

Page 18: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Why the Precedence-Graph Test Works?

Idea: if the precedence graph is acyclic, then we can swap actions to form a serial schedule.

Proof: By induction on n, number of transactions.

Basis: n = 1. That is, S={T1}; then S is already serial.

Induction: S={T1,T2,…,Tn}. Given that the precedence graph is acyclic, there exists Ti in S such that there is no Tj in S that Ti depends on.

– We swap all actions of Ti to the front (of S).

– (Actions of Ti)(Actions of the other n-1 transactions)

– The tail is a precedence graph that is the same as the original without Ti, i.e. it has n-1 nodes.

By the induction hypothesis, we can reorder the actions of the other transactions to turn it into a serial schedule

Page 19: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Schedulers• A scheduler takes requests from transactions for reads and writes, and

decides if it is “OK” to allow them to operate on DB or defer them until it is safe to do so.

• Ideal: a scheduler forwards a request iff it cannot result in a violation of serializability.

– Too hard to decide this in real time.

• Real: a scheduler forwards a request if it cannot result in a violation of conflict serializability.

Page 20: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Lock Actions• Before reading or writing an element X, a transaction Ti requests a lock on X

from the scheduler.

• The scheduler can either grant the lock to Ti or make Ti wait for the lock.

• If granted, Ti should eventually unlock (release) the lock on X.

• Shorthands:

– li(X) = “transaction Ti requests a lock on X”

– ui(X) = “Ti unlocks/releases the lock on X”

Page 21: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

• The use of locks must be proper in 2 senses:

– Consistency of Transactions:• Read or write X only when hold a lock on X.

– ri(X) or wi(X) must be preceded by some li(X) with no intervening ui(X).

• If Ti locks X, Ti must eventually unlock X.

– Every li(X) must be followed by ui(X).

– Legality of Schedules: • Two transactions may not have locked the same element X without one

having first released the lock.

– A schedule with li(X) cannot have another lj(X) until ui(X) appears in between.

Validity of Locks

Page 22: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Legal Schedule Doesn’t Mean Serializable

Consistency constraint

required for this example:

A=B

Page 23: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Two Phase LockingThere is a simple condition, which guarantees conflict-serializability: In every transaction, all lock requests (phase 1) precede all unlock requests (phase 2).

Page 24: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Why 2PL Works?• Precisely: a legal schedule S of 2PL transactions is conflict

serializable.

• Proof is an induction on n, the number of transactions.

• Remember, conflicts involve only read/write actions, not locks, but the legality of the transaction requires that the r/w's be consistent with the l/u's.

Page 25: Concurrency. Busy, busy, busy... In production environments, it is unlikely that we can limit our system to just one user at a time. – Consequently, it

Why 2PL Works (Cont’d)• Basis: if n=1, then S={T1}, and hence S is conflict-serializable.

• Induction: S={T1,…,Tn}. Find the first transaction, say Ti, to perform an unlock action, say ui(X).

• We show that the r/w actions of Ti can be moved to the front of the other transactions without conflict.

• Consider some action such as wi(Y). Can it be preceded by some conflicting action wj(Y) or rj(Y)? In such a case we cannot swap them.

– If so, then uj(Y) and li(Y) must intervene, as

wj(Y)...uj(Y)...li(Y)...wi(Y).

– Since Ti is the first to unlock, ui(X) appears before uj(Y).

– But then li(Y) appears after ui(X), contradicting 2PL.

• Conclusion: wi(Y) can slide forward in the schedule without conflict; similar argument for a ri(Y) action.