cs 347: distributed databases and transaction processing data replication
DESCRIPTION
CS 347: Distributed Databases and Transaction Processing Data Replication. Hector Garcia-Molina. Replication Space. Updates at any copy at fixed (primary) copy at one copy but control can migrate no updates. Replication Space. Correctness no consistency local consistency - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/1.jpg)
CS 347 Notes08 1
CS 347: Distributed Databases and
Transaction ProcessingData Replication
Hector Garcia-Molina
![Page 2: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/2.jpg)
CS 347 Notes08 2
Replication Space
• Updates– at any copy– at fixed (primary) copy– at one copy but control can migrate– no updates
![Page 3: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/3.jpg)
CS 347 Notes08 3
Replication Space
• Correctness– no consistency– local consistency– order preserving– serializable schedule– 1-copy serializability
![Page 4: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/4.jpg)
CS 347 Notes08 4
Replication Space
• Expected Failures– processors: fail-stop, byzantine?– network: reliable, partitions, in-order
msgs?– storage: stable disk?
![Page 5: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/5.jpg)
CS 347 Notes08 5
Replication Space
• Implementation Details– update propagation
– physical log records– logical log records– sql updates– transactions
– reads at backup?– architecture
– cross backups– multi-computer copy
– initialization of backup copy
![Page 6: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/6.jpg)
CS 347 Notes08 6
Cross Backups
primary copyDB1
backup copyDB2
primary copyDB2
backup copyDB1
site A site B
![Page 7: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/7.jpg)
CS 347 Notes08 7
Multi-Computer Sites
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
P3
L3
X3
B2
L2’
Y2
B3
L3’
Y3
primarysite
backupsite
![Page 8: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/8.jpg)
CS 347 Notes08 8
1-Safe Backups
– Transactions commit at primary– Redo log records propagated– Transaction commit at backup
P1
L1
X1
B1
L1’
Y1
![Page 9: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/9.jpg)
CS 347 Notes08 9
1-Safe Backups– Transactions can get lost
P1
L1
X1
B1
L1’
Y1
P1
L1
X1
B1
L1’
Y1
T1, T2, T3 T1, T2
T1, T2, T3 T1, T2, T4, T5
![Page 10: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/10.jpg)
CS 347 Notes08 10
2-Safe Backups
– Transactions do 2-phase commit– Redo log records propagated in prepare– Transactions not lost, but
• longer delay, contention• cannot process unless both sites are up
– After failure, go to 1-safe (no backup)
P1
L1
X1
B1
L1’
Y1
![Page 11: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/11.jpg)
CS 347 Notes08 11
What is Correctness?
• In 2-safe• In 1-safe
![Page 12: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/12.jpg)
CS 347 Notes08 12
What is in Paper You Read?
• Specific Senario– updates at fixed primary site– each site has multiple computers– primary-backup sites are matched– clean site failures; stable storage; rel
net– log shipping– no reads at backup– no initialization
![Page 13: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/13.jpg)
CS 347 Notes08 13
Main Problem: Update Dependencies
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
B2
L2’
Y2
primarysite
backupsite
Ta(1)
Ta(2)
Tb
data dependency: TaTb
![Page 14: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/14.jpg)
CS 347 Notes08 14
Main Problem: Update Dependencies
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
B2
L2’
Y2
primarysite
backupsite
Ta(1)
Ta(2)
Tb
data dependency: TaTb
Ta(1) Tb
?
![Page 15: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/15.jpg)
CS 347 Notes08 15
Main Problem: Update Dependencies
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
B2
L2’
Y2
primarysite
backupsite
Ta(1)
Ta(2)
Tb
data dependency: TaTb
Ta(1) Tb
?
• should not install Ta• should not install Tb
![Page 16: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/16.jpg)
CS 347 Notes08 16
Dependency Reconstruction Algorithm
• Locking at backup to detect dependencies
• Ensure locks granted in same order as they were granted at primary
![Page 17: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/17.jpg)
CS 347 Notes08 17
Example: Dependency Reconstruction
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
B2
L2’
Y2
primarysite
backupsite
Ta(1)
Ta(2)
Tb
data dependency: TaTb
5 6
18
tickets reflectlocal commit
order
![Page 18: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/18.jpg)
CS 347 Notes08 18
Example: Dependency Reconstruction
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
B2
L2’
Y2
primarysite
backupsite
Ta(1)
Ta(2)
Tb
data dependency: TaTb
5 6
18
Ta(1) Tb5 6
?
![Page 19: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/19.jpg)
CS 347 Notes08 19
Example: Dependency Reconstruction
P1
L1
X1
B1
L1’
Y1
P2
L2
X2
B2
L2’
Y2
primarysite
backupsite
Ta(1)
Ta(2)
Tb
data dependency: TaTb
5 6
18
Ta(1) Tb5 6
?
• Say Tb requests lock first at B1;• Tb request delayed until all lockswith tickets <6 have been granted
![Page 20: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/20.jpg)
CS 347 Notes08 20
Epoch Algorithm
• Backup updates are installed in batches
• Epoch delimiters written on log
![Page 21: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/21.jpg)
CS 347 Notes08 21
Writing Delimiters at Primary
master
slave
slave
15 16
1615
15 16
log time
![Page 22: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/22.jpg)
CS 347 Notes08 22
Problem with Commits
master
slave
slave
15 16
1615
15 16
log time
prepare commit
T
T’s commit record in Epoch 15 in some logs;in Epoch 16 in others
![Page 23: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/23.jpg)
CS 347 Notes08 23
Solution: Bump Epoch
master
slave
slave
15 16
1615
15 16
log time
prepare commit
T
prepare ack reports epoch number;coordinator bumps epoch if necessary
![Page 24: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/24.jpg)
CS 347 Notes08 24
Installing an Epoch at Backup
master
slave
slave
15
1615
15 16
log time
end of 16 install 1616
end of 16
![Page 25: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/25.jpg)
CS 347 Notes08 25
To Install Epoch X at Backup J
• Redo transactions:– If commit(T) X, commit T– If prepare(T) X but commit(T) > X:
• If T’s primary peer was coordinator, do not commit;
• Else check with the backup of T’s coordinator B’:
– If B’ committing T in epoch X, then we commit T– Else do not commit T
– Otherwise do not commit T (defer to next epoch)
commit(T) X means that T’s commit recordfound in epoch X (or earlier) at node J.
![Page 26: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/26.jpg)
CS 347 Notes08 26
Why Do We Need Coordinator Check?• Assignment: Construct 2 scenarios
that look the same to backup J:– In Scenario 1, T should be installed– In Scenario 2, T should not be
installed
![Page 27: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/27.jpg)
CS 347 Notes08 27
Scenario 1
B’
slave
15 16
15
log time
16P(T)
C(T)P(T)
C(T)
![Page 28: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/28.jpg)
CS 347 Notes08 28
Scenario 2
B’
slave
15 16
15
log time
16P(T)
C(T)P(T)
C(T)
![Page 29: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/29.jpg)
CS 347 Notes08 29
Scenario 3: Possible?
B’
slave
15 16
15
log time
16P(T)
C(T)P(T)
C(T)17
17
Note that T commits at slave but not at B’!!
![Page 30: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/30.jpg)
CS 347 Notes08 30
Scenario 4: Possible?
B’
slave
15 16
15
log time
16
P(T)
C(T)P(T)
C(T)17
17
Note that T commits at B’ but not at slave!!
![Page 31: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/31.jpg)
CS 347 Notes08 31
Comparison of Options
• 2-safe• 1-safe
– dep reconstruction– epoch
• Specific Senario– updates at fixed primary site– each site has multiple computers– primary-backup sites are matched– clean site failures; stable storage; rel
net– log shipping– no reads at backup– no initialization
![Page 32: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/32.jpg)
CS 347 Notes08 32
How to Evaluate
• What system?– actual system(s)– simulation– testbed
• What transactions?– real transactions– synthetic transactions
![Page 33: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/33.jpg)
CS 347 Notes08 33
Metrics
• IO utilization• CPU utilization• Throughput (given max delay?)• Transaction commit delay• Backup copy lag• Network overhead• Probability of inconsistency
![Page 34: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/34.jpg)
CS 347 Notes08 34
Sample Results
![Page 35: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/35.jpg)
CS 347 Notes08 35
Sample Results
![Page 36: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/36.jpg)
CS 347 Notes08 36
And Now For SomethingCompletely Different:
• Updates– at any copy– at fixed (primary) copy– at one copy but control can migrate– no updates
have seen
next: available copies
![Page 37: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/37.jpg)
CS 347 Notes08 37
PC-lock available copies
• Transactions write lock at all available copies• Transactions read lock at any available copy• Primary site (static) manages
U – set of available copies
X1 X2 X3
*
X4
downprimary
![Page 38: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/38.jpg)
CS 347 Notes08 38
Update Transaction
(1) Get U from primary(2) Get write locks from U nodes(3) Commit at U nodes
C0
PrimaryC1
BackupC2
Backup
Trans T3, U={C0, C1}
U={C0, C1}
Uupdates, 2PC
![Page 39: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/39.jpg)
CS 347 Notes08 39
A potential problem - example
Now: U={C0, C1}
-recovering-
C0
PrimaryC1
BackupC2
Backup
Trans T3, U={C0, C1}
I am recovering
![Page 40: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/40.jpg)
CS 347 Notes08 40
A potential problem - example
Later: U={C0, C1, C2}
-recovering-
C0
PrimaryC1
BackupC2
Backup
Trans T3, U={C0, C1}
You missed T0, T1, T2
T3 updates T3 updates
![Page 41: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/41.jpg)
CS 347 Notes08 41
Solution:
• Initially transaction T gets copy of U’ ofU from primary (or uses cached value)
• At commit of T, check U’ with current Uat primary (if different, abort T)
![Page 42: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/42.jpg)
CS 347 Notes08 42
Solution Continued
• When CX recovers:– request missed and pending transactions
from primary (primary updates U)– set write locks for pending transactions
• Primary polls nodes to detect failures(updates U)
![Page 43: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/43.jpg)
CS 347 Notes08 43
Example Revisited
C0
PrimaryC1
BackupC2
Backup
Trans T3, U={C0, C1}
You missed T0, T1, T2
U={C0, C1, C2}
-recovering-
U={C0, C1}
I am recovering
prepare prepare
reject
![Page 44: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/44.jpg)
CS 347 Notes08 44
Available Copies — No Primary
• Let all nodes have a copy of U (not just primary)
• To modify U, run a special atomic transaction at all available sites(use commit protocol)– E.g.: U1={C1, C2} U2={C1, C2 , C3}
only C1, C2 participate in this transaction– E.g.: U2={C1, C2 , C3} U3={C1, C2}
only C1, C2 participate in this transaction
![Page 45: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/45.jpg)
CS 347 Notes08 45
• Details are tricky...• What if commit of U-change
blocks?
![Page 46: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/46.jpg)
CS 347 Notes08 46
Node Recovery (no primary)• Get missed updates from any active
node• No unique sequence of transactions• If all nodes fail, wait for - all to recover
- majority to recover
![Page 47: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/47.jpg)
CS 347 Notes08 47
recovering node
How much information (update values) must beremembered? By whom?
Committed:A,B,C,D,E,F
Pending: G
Committed:A,C,B,E,D
Pending: F,G,H
Committed:A,B
Example
![Page 48: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/48.jpg)
CS 347 Notes08 48
Correctness with replicated data
S1: r1[X1] r2[X2] w1[X1] w2[X2] Is this schedule serializable?
X1 X2
![Page 49: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/49.jpg)
CS 347 Notes08 49
One copy serializable (1SR)
A schedule S on replicated data is 1SR if it is equivalent to a serial history of the same transactions on a one-copy database
![Page 50: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/50.jpg)
CS 347 Notes08 50
To check 1SR
• Take schedule• Treat ri[Xj] as ri[X] Xj is copy of
X wi[Xj] as wi[X]
• Compute P(S)• If P(S) acyclic, S is 1SR
![Page 51: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/51.jpg)
CS 347 Notes08 51
S1: r1[X1] r2[X2] w1[X1] w2[X2] S1’: r1[X] r2[X] w1[X] w2[X]
S1 is not 1SR!
Example
T1T2
T2T1
![Page 52: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/52.jpg)
CS 347 Notes08 52
Second example
S2: r1[X1] w1[X1] w1[X2]
r2[X1] w2[X1] w2[X2]
S2’: r1[X] w1[X] w1[X]
r2[X] w2[X] w2[X]
P(S2): T1 T2
S2 is 1SR
![Page 53: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/53.jpg)
CS 347 Notes08 53
Second example
S2: r1[X1] w1[X1] w1[X2]
r2[X1] w2[X1] w2[X2]
S2’: r1[X] w1[X] w1[X]
r2[X] w2[X] w2[X]
• Equivalent serial schedule
SS: r1[X] w1[X]
r2[X] w2[X]
![Page 54: CS 347: Distributed Databases and Transaction Processing Data Replication](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813665550346895d9df25b/html5/thumbnails/54.jpg)
CS 347 Notes08 54
Summary
• Updates– at any copy– at fixed (primary) copy– at one copy but control can migrate– no updates
have seen
available copies