(c) oded shmueli 20041 transactions lecture 5: multiversion concurrency control (chapter 5, bhg)...

43
(c) Oded Shmueli 2004 1 Transactions Lecture 5: Multiversion Concurrency Control (Chapter 5, BHG) More than one version per data item presents opportunities for better performance

Post on 19-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

(c) Oded Shmueli 2004 1

Transactions Lecture 5:Multiversion Concurrency Control (Chapter 5, BHG) More than one version per data item

presents opportunities for better performance

(c) Oded Shmueli 2004 2

The basic idea

Each Write(x) produces a new version for x. For Read(x), the DM has to decide which version to

use. Benefit: avoid rejecting Read operations that arrive

“too late”. Older versions may be useful for recovery. Cost: storage, management complexity. Versions: are due to active or committed

transactions. Users cannot “see” versions. The DBS should behave as if there was one version

per item.

(c) Oded Shmueli 2004 3

Extending the theory

DM executions are represented by MV histories. Users regard 1V serial histories as correct. To prove a CC algorithm correct, we need show that

its MV histories are equivalent to serial 1V histories. Denote the versions of X as xi, xj etc. The subscript indicates the writing transaction. A Write is of the form wi[xi]. A read is of the form ri[xj] (j=I is possible).

(c) Oded Shmueli 2004 4

Equivalence – First try

“Definition:” Hmv is equivalent to H1v if every pair of conflicting operations in Hmv is in the same order in H1v.

H1= w0[x0] c0 w1[x1] c1 r2[x0] w2[y2] c2 H2= w0[x] c0 w1[x] c1 r2[x] w2[y] c2 However, in H1 r2 reads from T0 and in H2

from T1. Need to use view equivalence (same reads-

from and same final writes).

(c) Oded Shmueli 2004 5

Is SG(H) acyclic sufficient?

H3= w0[x0] w0[y0] c0 r0[x0] r1[y0] w1[x1] w1[y1] c1 r2[x0] r2[y1] c2

SG(H3) is acyclic; H3 is not equivalent to a serial 1V history.

For example, consider H4, H5 H4= w0[x] w0[y] c0 r1[x] r1[y] w1[x]

w1[y] c1 r2[x] r2[y] c2 In H4: T2 reads x and y from T0. In H3: T2 reads x from T1. H5= w0[x] w0[y] c0 r2[x] r2[y] c2 r1[x]

r1[y] w1[x] w1[y] c1

T0

T1

T2

SG(H3)

(c) Oded Shmueli 2004 6

An overview

A complete mv history H is serial if for all Ti, Tj in H either all operations of Ti precede all operations of Tj or vice versa.

A serial mv history H is 1-serial if for all i,j,x if Ti reads x from Tj (ri[xj]) then either i=j or Tj is the last transaction preceding Ti to write any version of x.

H is one-copy serializable (1SR) if C(H) is equivalent to a 1-serial mv history.

H an mv history over T. C(H) is equivalent to a serial 1v history over T iff H is 1SR.

An mv H is 1SR iff there exists a version order << such that MVSG(H, <<) is acyclic.

(c) Oded Shmueli 2004 7

Complete mv HistoryT0

w0([x]

w0([y]

w0([z]

c0

T1

r1([x]

w1([y]

c1

T2

r2([x]

r2([z]

w2[x] c2

T3 w3([y]

r3([z]

w3([z]

c3

T4

r4([x]

r4([z]

c4r4([y]

w0([x0]

w0([y0]

w0([z0]

c0

r1([x0] w1([y1] c1

r2([x0]

r2([z0]

w2[x2] c2

w3([y3]

r3([z0] w3([z3]

c3

r4([x2]

r4([z3]

c4r4([y3]

H6

(c) Oded Shmueli 2004 8

Another Complete mv History

w0([x0]

w0([y0]

w0([z0]

c0

r1([x0] w1([y1] c1

r2([x0]

r2([z0]

w2[x2] c2

w3([y3]

r3([z0] w3([z3]

c3

r4([x2]

r4([z3]

c4r4([y3]

w0([x0]

w0([y0]

w0([z0]

c0

r1([x0] w1([y1] c1

r2([x0]

r2([z0]

w2[x2] c2

w3([y3]

r3([z0] w3([z3]

c3

r4([y1]

r4([z3]

c4r4([x2]

H6

H7

(c) Oded Shmueli 2004 9

mv History - Definitions

T={t0,…,Tn} transactions, ordered by <i. H maps wi[x] to wi[xi], ri[x] to ri[xj], ci to ci, ai

to ai. A complete mv history H over T is a p.o. <:

1. H = h(Uin

=1Ti ) for some translation function h;2. for each Ti and all operations pi, qi; in Ti, if pi <i

qi, then h(pi) < h(qi);3. if h(rj[x]) = rj[xi], then wi[xi] < rj[Xi];4. if wi[x] <i ri[x], then h(ri[x]) = ri[xi]; and5. if h(rj[x]) = rj[Xi], i ≠ j, and cj H, then ci < cj.

(c) Oded Shmueli 2004 10

More Definitions

If H satisfies (4), it preserves reflexive reads-from relationships.

If H satisfies (5), it is recoverable. A mv history H is a prefix of a complete mv history. An mv history preserves reflexive reads-from

relationships (or is recoverable) if it the prefix of a complete mv history that does so. Isn’t this always true?

C(H) is as defined for 1V histories. If H is an mv history then C(H) is a complete mv

history.

(c) Oded Shmueli 2004 11

Equivalence of mv histories

Again, we assume no transaction reads or writes twice to the same item x.

Ti reads x from Tj in mv history H if Tj reads the version of x produced by Tj (iff rj[xi] H).

Two mv histories are equivalent () if they have the same operations and the same reads-from relationship.

But, same operations same reads-from. Proposition 1: two mv histories over the same set of

transactions are equivalent iff the histories have the same operations.

(c) Oded Shmueli 2004 12

Equivalence of an mv history to a 1v history Intuitively, the 1v history is a valid 1 version view of

the mv history. Formally:

Same set T={To,…,Tn} of transactions. Same <i. Their operations in 1-1 correspondence (A 1-1, onto

function mapping ai to ai, ci to ci, wi[x] to wi[xi] and ri[x] to some ri[xj]).

There reads-from relations are the same if they are preserved under that function.

So, an mv history H and 1v history are equivalent if they have the same reads-from relationship.

Note – We don’t worry about final writes as all writes are in the mv produced state.

(c) Oded Shmueli 2004 13

Serialization graphs

Two operations conflicts if they operate on the same version and one is a write, i.e., are of the form wi[xi]<rj[xi].

Let H be an mv history. SG(H) has nodes for committed transactions in H and edge TiTj (I ≠ j) if for some x rj[xi] is an operation in C(H).

Proposition 2: Let H, H’ be mv histories. If HH’ then SG(H)=SG(H’).

(c) Oded Shmueli 2004 14

Serialization graphs

w0([x0]

w0([y0]

w0([z0]

c0

r1([x0] w1([y1] c1

r2([x0]

r2([z0]

w2[x2] c2

w3([y3]

r3([z0] w3([z3]

c3

r4([x2]

r4([z3]

c4r4([y3]

H6

T0 T3

T2

T1

T4SG(H6)

(c) Oded Shmueli 2004 15

Another Complete mv History

w0([x0]

w0([y0]

w0([z0]

c0

r1([x0] w1([y1] c1

r2([x0]

r2([z0]

w2[x2] c2

w3([y3]

r3([z0] w3([z3]

c3

r4([y1]

r4([z3]

c4r4([x2]

H7

T0 T3

T2

T1

T4SG(H7)

(c) Oded Shmueli 2004 16

One Copy Serializability

A complete mv history H is serial if for all Ti, Tj in H either all operations of Ti precede all operations of Tj or vice versa.

Recall H3 that although it is serial, it behaves differently than a serial 1V history.

H3= w0[x0] w0[y0] c0 r0[x0] r1[y0] w1[x1] w1[y1] c1 r2[x0] r2[y1] c2

A serial mv history H is 1-serial if for all i,j,x if Ti reads x from Tj (ri[xj]) then either i=j or Tj is the last transaction preceding Ti to write any version of x.

H3 is not 1-serial as it doesn’t read x from T1. H8 is 1-serial: H8= w0[x0] w0[y0] w0[z0] c0 r1[x0] w1[y1] c1 r2[x0] r2[z0] w2[x2]

c2 r3[z0] w3[y3] w3[z3] c3 r4[x2] r4[y3] r4[z3] c4

(c) Oded Shmueli 2004 17

One Copy Serializability

An mv history H is one-copy serializable (1SR) if C(H) is equivalent to a 1-serial mv history.

1SR is a prefix commit-closed property. Reason: the committed projection of a 1SR

history is equivalent to a 1-serial mv history (Exercise 5.4).

So, unlike view serializability, we need not require that the committed projection of every prefix of an MV history be 1SR.

(c) Oded Shmueli 2004 18

One Copy Serializability

Again - an mv history H is one-copy serializable (1SR) if C(H) is equivalent to a 1-serial mv history.

H6=C(H6) is equivalent to H8. It is 1SR. H7=C(H7) is equivalent to no 1-serial history, so it’s

not 1SR. A serial history can be 1SR and not 1-serial: H10= w0[x0] c0 r1[x0] w1[x1] c1 r2[x0] c2 is not 1-

serial (T2 reads x from T0). But, H10 is 1SR since it’s equivalent to: H11= w0[x0] c0 r2[x0] c2 r1[x0] w1[x1] c1 We take 1SR as the correctness criterion, this need

be justified.

(c) Oded Shmueli 2004 19

Justifying the correctness criterion Theorem 3: Let H be an mv history over T.

C(H) is equivalent to a serial 1v history over T

iff

H is 1SR.

(c) Oded Shmueli 2004 20

(If)

H is 1SR. There exists 1-serial mv Hs s.t. C(H) Hs. Let H’s be serial 1V obtained from Hs by eliminating subscripts

(from version to item). H’s Hs: Suffices to show same read-from.

Say Tj reads x from Ti in Hs. Hs is 1-serial no wk[xk] lies between wi[xi] and rj[xi]. So, no wk[x] lies between wi[x] and rj[x] in H’s. So, Tj reads x from Ti in H’s.

Say Tj reads x from Ti in H’s. If rj[x] was obtained from rj[xi] we have same in Hs. If rj[x] was obtained from rj[xk], k≠ i, then: j=i, then Ti reads x from Ti in H’s, by (4) in mv history definition, Ti

reads x from Ti in Hs, so k=i, contradiction. j ≠ i: since Hs is 1-serial wi[xi] < wk[xk] or rj[xk] < wi[xi]. Then, Tj does

not read x from Ti in H’s, contradiction. C(H) Hs, Hs H’s C(H) H’s. Done.

(c) Oded Shmueli 2004 21

(Only If)

Given C(H) H’s, H’s serial 1V. Translate H’s to a serial mv Hs:

ci ci wi[x] wi[xi] rj[x] rj[xi] s.t. Tj reads x from Ti in H’s

Hs H’s, same read-from by construction. Claim 1: Hs is a complete mv history. Next slide. Claim 2: Hs is 1-serial.

Say Tj reads x from Ti, x ≠ i. Since H’s is a serial 1V history, no wk[x] lies between wi[x] and rj[x]. So, no wk[xk] between wi[xi] and rj[xi] in Hs. So, Hs is 1-serial.

Claim 3: Hs H’s C(H) H is 1SR.

(c) Oded Shmueli 2004 22

Claim 1: Hs is a complete mv history. The translation from H’s to Hs (1), (2). Condition (3): Since H is an mv history, each rj[xk] in

C(H) is preceded by wk[xk]. So, in H’s (serial 1v), each rj[x] is preceded by some w[x] op. So, in Hs each rj[xi] is preceded by wi[xi].

Condition (4): Suppose wj[x]<rj[x] in H’s. H’s is serial, so Tj reads x from Tj in H’s, so rj[x] is translated into rj[xj] in Hs.

Condition (5): Say rj[xi] in Hs, then Tj reads x from Ti in H’s. Hs is serial, in it Ti, Tj commit. So, ci<cj in H’s. The translation retains positions, so ci<cj in H’s.

(c) Oded Shmueli 2004 23

The 1-serializability theorem

Given a CC mechanism, ensure all histories are 1SR.

All known mv CC algorithms totally sort versions. A version order << for x is a total order on its

versions. A version order for H is the union of the version

orders of all data items. MVSG(H, <<) is SG(H) with additional version order

edges: (recall nodes are committed transactions) For each rk[xj] wi[xi] in C(H) (i,j,k distinct):

If xi << xj then add Ti Tj Otherwise add Tk Ti

(c) Oded Shmueli 2004 24

MVSG(H6,<<)

w0([x0]

w0([y0]

w0([z0]

c0

r1([x0] w1([y1] c1

r2([x0]

r2([z0]

w2[x2] c2

w3([y3]

r3([z0] w3([z3]

c3

r4([x2]

r4([z3]

c4r4([y3]

H6

T0 T3

T2

T1

T4

SG(H6)

x0<<x2, y0<<y1<<y3,

z0<<z3

MVSG(H6,<<)

T0 T3

T2

T1

T4

new

old

rk=4([yj=3]

wi=1([y1]

rk=1([xj=0]

wi=2[x2]

rk=2([zj=0]

wi=3([z3]

For each rk[xj] wi[xi] in C(H) (i,j,k distinct):If xi << xj then add Ti TjOtherwise add Tk Ti

(c) Oded Shmueli 2004 25

Using MVSG

Suppose SG(H) is acyclic. A serial mv history Hs obtained by topologically

sorting Hs may not be equivalent to any serial 1V history.

This is due to changing read-from relationships in mapping version ops to item ops.

Version order edges detect this change. If rk[xj] wi[xi] in C(H), version edges force either

wi[xi] precede wj[xj] or follow rk[xj]. So, in translating ops on xi and xj to ops on x, the

read-from is not changed. MVSG needs be acyclic for all this to work.

(c) Oded Shmueli 2004 26

Using MVSG(H,<<)

Theorem 4: An mv history H is 1SR

iff there exists a version order << s.t. MVSG(H,<<) is

acyclic.

(c) Oded Shmueli 2004 27

(If)

Topologically sort G=MVSG(H, <<), produce a Hs = Ti1,…,Tin. C(H) is a mv history. Hs is a mv history (same ops as C(H)). Since C(H) and Hs have same ops, they are equivalent by

proposition 1, i.e. C(H) Hs. Hs is 1-serial since:

Say Tk reads x from Tj, k ≠ j. Let wi[xi] be any write on x (i ≠ j, I ≠ k).

If xi<<xj then G includes Ti Tj, so Tj follows Ti in Hs. If xj<<xi then G includes Tk Ti, so Ti follows Tk in Hs. So, no transaction that writes x is between Tj and Tk in Hs. So, Hs is 1-serial

Conclusion: H is 1SR.

(c) Oded Shmueli 2004 28

(Only if)

Define MV(H, <<) as a graph containing only version order edges. These edges depend only on the operations in H and on <<.

Let Hs C(H) be 1-serial mv history (one exists). In SG(Hs), Ti Tj implies Ti precedes Tj in Hs. Define <<: xi << xj only if Ti precedes Tj in Hs. All edges in MV(Hs,<<) are s.t. Ti Tj implies Ti precedes Tj in Hs. So, in MVSG(Hs,<<) = SG(Hs) MV(Hs, <<), Ti Tj implies Ti

precedes Tj in Hs. MVSG(Hs,<<) is thus acyclic. Hs C(H). By proposition 1, they have the same ops. So, MV(C(H), <<) = MV(Hs, <<). SG(C(H)) = SG(Hs). MVSG(C(H), <<) = MVSG(Hs, <<). MVSG(C(H), <<) is acyclic. MVSG(C(H), <<) = MVSG(H, <<). MVSG(H, <<) is acyclic.

(c) Oded Shmueli 2004 29

Mv CC mechanisms

Can define based on 2PL, TO and SGT.

(c) Oded Shmueli 2004 30

Multiversion timestamp ordering Each transaction Ti has a unique ts(Ti). Operations are ts-tagged; versions are tagged with

the ts of the writing transaction. Ops are processed first-come-first-served.

ri[x] ri[xk], xk has largest ts ≤ ts(Ti). Send to DM. wi[x]

If it has already processed rj[xk] s.t. ts(Tk) < ts( Ti) < ts( Tj), then it rejects wi[x].

Otherwise, it translates wi[x] into wi[xi] and sends it to the DM. ci is delayed until cj for all Tj that wrote versions Ti has

read.

(c) Oded Shmueli 2004 31

Intuition

“Simulate” a 1V ts-order execution. In such an execution a read of x gets the

latest x data produced by a transaction with a lesser ts.

In such an execution if x was produced at ts t1 and read by T3 with ts t3 t>t1, then writing x with a write with ts t2 s.t. t1 <t2 < t3 would invalidate the read.

(c) Oded Shmueli 2004 32

Implementation

For each version xi maintain interval(xi) = [wts, rts]: wts = ts(xi) rts = max (ts of a read op on xi, wts)

intervals(xi) = {interval(xi) | xi a version of x}. Op processing. Find i itervals(xi) with max i.wts < ts(Ti):

ri[x]: set i.wts = max (i.wts, ts(Ti)). wi[x]: if i.rts > ts(Ti) then reject else send to DM and create a new

interval(xi)=[wts = ts(Ti), rts = ts(Ti)]. Space: need delete ‘old’ versions.

Delete from old to new, otherwise wrong versions may be read. It’s also possible that when a read arrives a version with a

smaller ts is no longer available.

(c) Oded Shmueli 2004 33

Correctness: properties of histories p1: ts(Ti)=ts(Tj) iff i = j. p2: For all rk[xj]: wj[xj] < rk[xj] and ts(Tj) ≤ ts(Tk). p3: For all rk[xj], wi[xi] in H s.t. i ≠ j:

ts(Ti) < ts(Tj), or ts(Tk) < ts(Ti), or i=k and rk[xj] < wi[xi].

That is: rk[xj] there is no other version with ts between ts(Tj) and ts(Tk). If xk exists and k ≠ j, then rk[xj] < wk[xk].

p4: For all rj[xi] in H s.t. i ≠ j and cj in H, ci < cj. That is, H is recoverable. See Note.

P1-P4 H preserves reflexive read-from relationships. Otherwise, wk[xk] < rk[xj] and j ≠ k. By p2, ts(Tj) < ts(Tk). By p3, either ts(Tk) < ts(Tj) (impossible), or ts(Tk) < ts(Tk) (impossible), or rk[xj] < wk[xk] (impossible). Contradiction.

(c) Oded Shmueli 2004 34

Theorem: Every history H produced by MVTO is 1SR. Define version order xi << xj iff ts(Ti) < ts(Tj). In G = MVSG(H, <<) Ti Tj ts(Ti) < ts(Tj). Suppose Ti Tj in SG(H), it is due to a read. By p2,

ts(Ti) ≤ ts(Tj). By p1 ts(Ti) ≠ ts(Tj), so ts(Ti) < ts(Tj). Let rk[xj], wi[xi], s.t. i,j,k distinct. They generate a

version edge: case 1: xi <<xj. Ti Tj in G. ts(Ti) < ts(Tj). case 2: xj << xi. Tk Ti in G. By p3:

i=k, impossible – distinct. ts(Ti) < ts(Tj). Impossible as xj << xi. OR ts(Tk) < ts(Ti). Only possible option.

So, G is acyclic. By theorem 4, H is 1SR.

(c) Oded Shmueli 2004 35

Two version 2PL (2V2PL)

Idea: Use versions (resp., 2PL) for rw (resp., ww) synchronization.

DM stores one or two versions of each data item. If there are two versions, only one written by a

committed transaction. If Ti wrote x and has not yet committed, one version

is the before image of x and the other the one Ti wrote.

Once Ti commits, the older version can be deleted.

(c) Oded Shmueli 2004 36

2V2PL: the mechanism Use three lock types: R, W , C (certify). Upon Ti completion, all of Ti’s W locks are converted to C. wi[x]: Get wli[x]. Note: x not xi. Conflicts may block the

request, otherwise set w-lock on x, wi[xi]. ri[x]: rli[x]. Only conflicts with C.

if owns wli[x], ri[xi] else, once rli[x], ri[xj] to DM where xj is the only committed

version of x. Recoverable! ci: convert wl’s to cl’s.

On such locks no other wl’s but other rl’s are possible. cl[x] only when no more readers on x!

Abort during commit no lock release prior to obtaining all cl’s. Effect: minor reader delays! But, readers delay commits!

RW C

R y y n

W y n n

C y n n

holder

(c) Oded Shmueli 2004 37

Correctness: properties of histories Let fi denote certification of Ti. q1: For all committed Ti’s, fi follows all Ti’s

operations and precedes ci. q2: For all rk[xj] (Tk committed):

j ≠ k cj < rk[xj] j = k wk[xk] < rk[xk]

That is, read yours or a committed version.

(c) Oded Shmueli 2004 38

Correctness: properties of histories q3: For all wk[xk] and rk[xj],

wk[rk] < rk[xj] j=k. That is, read your own produced value. q4: If rk[xj], fi and wi[xi] are in H:

fi < rk[xj], or rk[xj] < fi

That is, rk[xj] totally ordered with respect to certification operations. See Note 1.

q5: For all rk[xj], wi[xi], s.t. i,j,k distinct committed, fi < rk[xj] fi < fj. See Note 2.

(c) Oded Shmueli 2004 39

Correctness: properties of histories q6: For all rk[xj] and wi[xi], i ≠ j and i ≠ k, if

rk[xj] < fi then fk < fi. See Note. q7: For all wi[xi] and wj[xj], i ≠ j, fi < fj or fj < fi.

(c) Oded Shmueli 2004 40

Theorem: Every history H produced by 2v2PL is 1SR. Writing the proof carefully is a homework

assignment.

(c) Oded Shmueli 2004 41

Using more than 2 versions

Suppose write locks don’t conflict. Will have more than two versions, still only the most

recently committed is read. Then, we can allow transactions to read these

versions: Can’t certify a transaction until all versions it read from

other transactions are certified. Convert a write lock on x to a certify lock only if there are

no read locks on certified versions of x. So, read locks on uncertified versions are ignored. Cascading aborts are now possible. If Ti produced x

that was read by Tj and Ti aborts, Tj is aborted as well.

(c) Oded Shmueli 2004 42

Multiversion mixed method

Distinguish pre-identified queries and updaters. Updaters use strict 2PL. When the TM receives an updater’s commit, it assigns the

updater a timestamp: Updaters have timestamps consistent with their order in SG(H)

(readers committed). Each write produces a ts-tagged new version. Upon receiving a query’s 1st operation the TM assigns it a ts less

than that of all committed updater. Query q’s ri[x] is translated to the largest xi.ts < q.ts. This means xi was written by a committed transaction. Future writes will not be rejected due to this read. Queries set no locks, never wait, never cause updaters to wait. Disadvantage: reads out of date, timestamps, space mgmt.

(c) Oded Shmueli 2004 43

Claim: Every history H produced by Multiversion mixed method is 1SR. Writing the proof carefully is a homework

assignment.