distributed systems - brown universityall facebook data node a node b node c shard 1 shard 1 shard 1...

48
Distributed Systems Day 13: Distributed Transaction “To Be or Not to Be… Distributed .. Transactions”

Upload: others

Post on 22-May-2020

45 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

DistributedSystemsDay13:Distributed Transaction

“ToBeorNottoBe… Distributed ..Transactions”

Page 2: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

Summary

• BackgroundonTransactions• ACIDSemantics

• DistributeTransactions• Terminology: Transactionmanager,Coordinator,Participant• TwoPhaseCommit

• AddingIsolationwithLocks:optimisticV.pessimistic• PerformanceIssues

• ConsistencyModels• Serializability VersusLinearizability

Page 3: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

https://cloud.google.com/spanner/docs/transactions

Page 4: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

https://blog.couchbase.com/optimistic-or-pessimistic-locking-which-one-should-you-pick/

Page 5: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

AllFacebookData

NodeA

NodeB

NodeC

Shard1

Shard1

Shard1

NodeD

NodeE

NodeF

Shard2

Shard2

Shard2

Partitiondataintoshards,mapsshardstoserverwithconsistenthashing

k1 v1

k2 v2

k3 v3

k4 v4

k5 v5

k0 v0

Hash table

k4 v4

k5 v5

k4 v4

k5 v5

k4 v4

k5 v5

Maintainm

ultiplecopiesForfaulttoleranceandtoreducelatency

ClientssendrequestsToallreplicas

• Replication• Lazy,Passive,Active• Consistencyforasingleshard

• DistributedTransaction• Consistent/Atomicchangetodatain

multipleshards• Multipleshardsè Canusetraditional

replicationtechniques FE

FE

k2 v2

k3 v3

k2 v2

k3 v3

k2 v2

k3 v3

Page 6: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

WhatisTransaction?

• Asetofoperationsthatneedtobeperformedtogether.• Example 1:transferringmoneybetweenaccounts• Example 2:shoppingcartcheckout

Read(R)Update(R,$50)Read(T)Update(T,$150)

InitiallyTheoandRodrigohave$100.Goal:Transfer$50fromRodrigotoTheo.

NodeA

NodeB

NodeC

Shard1

Shard1

Shard1

NodeD

NodeE

NodeF

Shard2

Shard2

Shard2

Rodrigoisinshard1 Theoisinshard2

Page 7: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

WhatisTransaction?

• Asetofoperationsthatneedtobeperformedtogether.• Example 1:transferringmoneybetweenaccounts• Example 2:shoppingcartcheckout

Read(R)Update(R,$50)Read(T)Update(T,$150)

InitiallyTheoandRodrigohave$100.Goal:Transfer$50fromRodrigotoTheo.Ideal:eitherthewhole4operationshappenornonehappenWorstcase:onlyasubsetoccur

NodeA

NodeB

NodeC

Shard1

Shard1

Shard1

NodeD

NodeE

NodeF

Shard2

Shard2

Shard2

Rodrigoisinshard1 Theoisinshard2

Page 8: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TransactionBackground

• ACIDSemantics• Atomicity• Consistency• Isolation• Durability

• Transactionsareeasyfortraditionaldatabases• Traditional databases areonasingle serverà failure is``all-or-nothing’’

• Allcomponents ofthetransactionsfail• Distributed transactionsà differentcomponents canfail

• NeedtoprovidesTransactionsemanticswhenonlyasubset ofthecomponents fail

Allornothingsemantics:alloperationssucceedsorfails.

Intermediatestatesarenotexposedtotheoutsideworld(nopartialwritesareexposed)

Resultsofa‘committed’transactionspersistsafterthetransaction(andthroughfailures)

Transitionsfromoneconsistentstatetoanotherconsistentstate

Page 9: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

DistributedTransactionsSemantics

• TransactionManager• Serverinchargeoforchestratingthetransaction

• Stepsfortransaction• Client initiates atransaction

• TMgivesclientaTransactionID(TID)• Client submits operationstoTM

• TMrelaysoperationstoreplicas• Client commitstransaction

• TMperformstwophasecommit

NodeA

NodeB

NodeC

Shard1

Shard1

Shard1

NodeD

NodeE

NodeF

Shard2

Shard2

Shard2

FE

TransactionManager

Page 10: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

DistributedTransactionsSemantics

• TransactionManager• Serverinchargeoforchestratingthetransaction

• Stepsfortransaction• Client initiates atransaction

• TMgivesclientaTransactionID(TID)• Client submits operationstoTM

• TMrelaysoperationstoreplicas• Client commitstransaction

• TMperformstwophasecommit

Client-SideCode:tid =openTransaction();RVal =a.get(tid,Rodrigo);a.update(tid,Rodrigo,RVal - 50);Tval =b.get(tid,Theo);b.update(tid,Theo,Tval+50);

closeTransaction(tid)orabortTransaction(tid)

NodeA

NodeB

NodeC

Shard1

Shard1

Shard1

NodeD

NodeE

NodeF

Shard2

Shard2

Shard2

FE

TransactionManager

TransactionManager• TMhandsoutTIDs• TMmanagesandrelays

operationstoReplicaLeaders• TMkeepstrackofallReplicas

involvedinthetransactions

ReplicaManager• Prepareoperations• Storeoperationslocallyinlog• Butdonotcommitoperations

Page 11: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommit

Page 12: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommit

• ProvidesAtomicityandConsistency• NOTIsolationandDurability

• Assumptions:eachservermaintainsatransactionlog• Transactionlogisstoredinpersistentmemory

• Iffailureà itemsinTransactionLogsurvives

• Terminologychanges:• Coordinator<--- ThetransactionManager• Participantß Leaderofareplica

NodeA

NodeB

NodeC

Shard1

Shard1

Shard1

NodeD

NodeE

NodeF

Shard2

Shard2

Shard2

FE

TransactionManagerCoordinator

Participant Participant

Page 13: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommit

• Phase1:• Coordinatorsends requestforvotes• Participants vote

• Phase2:• Coordinatorcountsvotes• Coordinatorinformsoftransactionstatus

• AttheendofPhase2:eitherallparticipantscommitorallabort

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

Page 14: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

CoordinatorStateDiagramforTwo-PhaseCommit

Abort Commit

Wait

Init

allcommit/commitanyabort/abort

appcommit/votereq

Coordinator

Coordinator pariticpantspariticpantspariticpantspariticpants

Response

Ready?

Vote:CommitorAbort

Page 15: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

ParticipantStateDiagramforTwo-PhaseCommit

Abort Commit

Uncertain

Init

votereq/commitvotereq/abort

abort/ack commit/ack

Participant

Coordinator pariticpantspariticpantspariticpantspariticpants

Response

Makeachange

CommitorAbort

Page 16: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

StateDiagramsforTwoPhaseCommit

Abort Commit

Wait

Init

Abort Commit

Uncertain

Init

allcommit/commitanyabort/abort

appcommit/votereq votereq/commitvotereq/abort

abort/ack commit/ack

Coordinator Participant

Page 17: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommit

• Phase1:• Coordinatorsends requestforvotes• Participants vote

• Phase2:• Coordinatorcountsvotes• Coordinatorinformsoftransactionstatus

• AttheendofPhase2:eitherallparticipantscommitorallabort

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

Page 18: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

CoordinatorCoordinatorfails,participantswillbeuncertain(waiting)

Participantfails,coordinatorwillbewaiting

Page 19: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

CrashPoints

Abort Commit

Wait

Init

Abort Commit

Uncertain

Init

allcommit/commitanyabort/abort

appcommit/votereq votereq/commitvotereq/abort

abort/ack commit/ack

Coordinator Participant

Page 20: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

CrashPoints

Abort Commit

Wait

Init

Abort Commit

Uncertain

Init

allcommit/commitanyabort/abort

appcommit/votereq votereq/commitvotereq/abort

abort/ack commit/ack

Coordinator Participant

Page 21: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

• DetectfailureusingTimeouts• IsthismodelSynchronousorAsynchronous?

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

Coordinatorfails,participantswillbeuncertain(waiting)Participantfails,coordinatorwillbewaiting

Page 22: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

• DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORTà Transactionterminates

• ParticipantdetectsCoordinatorfailure• WhyCan’tParticipant automatically ABORT?

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

Coordinatorfails,participantswillbeuncertain(waiting)Participantfails,coordinatorwillbewaiting

Page 23: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

ParticipantRecoveryfromCoordinatorFailure

Abort Commit

Uncertain

Init

votereq/commitvotereq/abort

abort/ack commit/ack

Participant

• ParticipantinUncertainstate– waitingforcoordinatortosay“commit”or“abort”

• Itdetectsfailureofcoordinator– Usingtimeout

• ParticipantinUncertain state– can’tassumeeitheroutcome

- BADthingshappenifparticipantmakeswrongassumptions

– waitsforcoordinatortorestart- Onrestartcontactcoordinatorforfinaloutcome

Page 24: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

ParticipantRecoveryfromCoordinatorFailure

Abort Commit

Uncertain

Init

votereq/commitvotereq/abort

abort/ack commit/ack

Participant

• ParticipantinUncertainstate– waitingforcoordinatortosay“commit”or“abort”

• Itdetectsfailureofcoordinator– Usingtimeout

• ParticipantinUncertain state– can’tassumeeitheroutcome

- BADthingshappenifparticipantmakeswrongassumptions

– waitsforcoordinatortorestart- Onrestartcontactcoordinatorforfinaloutcome

Page 25: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

CoordinatorRecoveryfromParticipantFailure

• Coordinatorinwaitstate– waitingforparticipanttosay“commit”or“abort”

• Itdetectsfailureofparticipant– Usingtimeout

• Coordinatorassumestheysaid‘no’– Takesnoresponseasanabort– Aborttransaction!!

• IfparticipantFails,Coordinatorcanmakeprogress

Abort Commit

Wait

Init

allcommit/commitanyabort/abort

appcommit/votereq

Coordinator

Page 26: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

• DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORTà transactionterminates

• ParticipantdetectsCoordinatorfailure• Theparticipant mustwait forcoordinator• The transaction isstalled!

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

Coordinatorfails,participantswillbeuncertain(waiting)Participantfails,coordinatorwillbewaiting

Page 27: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommitandCAPTheorem!

• Duringapartition• Does2PCpickAvailabilityorConsistency?

ReadytoCommit?

Coordinator

Participant

Participant

NetworkPartition

Page 28: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

CAPTheorem

• C:Consistency(Linearizable)• A:Availability• P:Partitiontolerance

• Givena“Partition”,youmustpickbetween“Availability”and“Consistency”• PickConsistently:Someclients(notall)canchange“data consistently”• PickAvailability:Allclientscanchangedatabut“inconsistently”

Page 29: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommitandCAPTheorem!

• Duringapartition• Does2PCpickAvailabilityorConsistency?

ReadytoCommit?

Coordinator

Participant

Participant

NetworkPartition

Page 30: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

• DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORT

• ParticipantdetectsCoordinatorfailure• WhyCan’tParticipant automatically ABORT?

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

Coordinatorfails,participantswillbewaitingParticipantfails,coordinatorwillbewaiting

CoordinatorassumesABORTTransactionends

ParticipantABORTEventuallytransactionends

ParticipantthatvotedNOcanabortHowever,VotedyescannotABORT

Page 31: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommitWithFailures• Whatistheimpactoffailureson2PC?

• 2PCissynchronous• Failure==nodefailureornetworkfailure• Failure-->theprotocolblocks/stalls

• DetectfailureusingTimeouts• CoordinatordetectsparticipantfailureandassumesABORT

• ParticipantdetectsCoordinatorfailure• WhyCan’tParticipant automatically ABORT?

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

CountVotes!!!!!!

Coordinator

Coordinatorfails,participantswillbewaitingParticipantfails,coordinatorwillbewaiting

CoordinatorassumesABORTTransactionends

ParticipantABORTEventuallytransactionends

ParticipantthatvotedNOcanabortHowever,VotedyescannotABORT

Page 32: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommit:AddingIsolationwithLocks

Page 33: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

https://github.com/facebook/rocksdb/wiki/Transactions

Page 34: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

https://apacheignite.readme.io/docs/concurrency-modes-and-isolation-levels

Page 35: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

PessimisticVersusOptimisticLocking

• Trade-off:concurrencyversusisolationPessimistic:• Getalllocksbeforetransaction• Releasealllocksaftertransaction

• Releaselocksaftercommit/abort• Preventsanyoneelsefromusingthe

dataduringtransaction• Lockspreventread/writeofdata• Locksstopothertransactions

Optimistic• Nolocks• Getacopyofdatabeforetransaction• Aftertransactionchecktomakesuredata

hasnotchanged• IfthedatachangedthenABORT!!!!• Datachangesmeanssomeoneelse

changedthedata

Page 36: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

PessimisticVersusOptimisticLocking

• Trade-off:concurrencyversusisolation

Pessimistic

HighlevelofconcurrencyHighthroughput:especiallyifallreadsManyTransactionswillabortifmanywrites

Optimistic

LowlevelofconcurrencyLowperformanceSequentialorderingoftransactions

Page 37: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

TwoPhaseCommit:PracticalIssues

Page 38: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

PracticalPerformanceIssueswith2PC

• Synchronization:2PCOverheads• Multiple ``rounds’’ofcommunication• Threeroundsofcommunication

• 3(N)messages• Duringthese roundsresourcesarefrozen

ReadytoCommit?

CoordinatorParticipant

Participant

Abort/Commit?

CoordinatorParticipant

Participant

Vote[Yes/No]

CoordinatorParticipant

Participant

Page 39: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

PracticalPerformanceIssueswith2PC

• Blocking:2PC• Duringfailureà 2PCcanblock• When2PCblocksà thenothertransactions areunabletoprogress

Page 40: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

PracticalPerformanceIssueswith2PC

• Synchronization:2PCOverheads• Multiple ``rounds’’ofcommunication• Threeroundsofcommunication

• 3(N)messages• Duringthese roundsresourcesarefrozen

• Blocking:2PC• Duringfailureà 2PCcanblock• When2PCblocksà thenothertransactions areunabletoprogress

https://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

Page 41: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

DistributedTransactionNoLongerConsideredDead!

https://thenewstack.io/microsoft-orleans-brings-distributed-transactions-to-cloud/

2019:MS’sOrleansDist.TransactiontotheCloud!!!

2007:AvoidDistributedtransactions

https://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

2012:Google’sSpannerDist.Transaction!!!

Page 42: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

DistributedTransactionsandACID

Page 43: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

HowdoyougetACIDinDistributedTransactions?• TwoPhaseCommità A+C

• 2PC:atomicandconsistent changefromonestatetoanotherstate

• TwoPhaseCommit+Locksà A+C+I• Locksprovideisolation bypreventingconcurrenttransactionfromaccessingdata

• TwoPhaseCommit+Locks+Logsà A+C+I+D• Logsensurethetransactions persists

Page 44: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

ConsistencyModelsRevisited

Page 45: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

ConsistencySpectrum

StrictSerializability

Linearizable

Sequential

Causal+

Eventual

WEAKCONSISTENCY

STRONGCONSISTENCY

SLOWERBUTEASYTOPROGRAM

FASTBUTHARDERTOPROGRAM

Page 46: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

StrictSerializability

• Totalorder+FIFO+“Time”à foratransaction• Afteratransactioncommits,allfuturereadswillseecommitteddata

• Requires2PC+pessimisticLocks• Lowperformance:reads/writehavehighlatencyandlowthroughput

• StrictSerializability V.Linearizability• StrictSerializability =Linearizability forTransactions• Linearizability =Totalorder+FIFO+Realtimeforindividualoperations• StrictSerializability =Totalorder+FIFO+Realtimefortransactions(groupsofoperations)

Page 47: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

StrictSerializability

• Totalorder+FIFO+“Time”à foratransaction• Afteratransactioncommits,allfuturereadswillseecommitteddata

• Requires2PC+pessimisticLocks• Lowperformance:reads/writehavehighlatencyandlowthroughput

• StrictSerializability V.Linearizability• StrictSerializability =Linearizability forTransactions• Linearizability =Totalorder+FIFO+Realtimeforindividualoperations• StrictSerializability =Totalorder+FIFO+Realtimefortransactions(groupsofoperations)

Page 48: Distributed Systems - Brown UniversityAll Facebook Data Node A Node B Node C Shard 1 Shard 1 Shard 1 Node D Node E Node F Shard 2 Shard 2 Shard 2 Partition data into shards, maps shards

Summary

• BackgroundonTransactions• ACIDSemantics

• DistributeTransactions• Terminology: Transactionmanager,Coordinator,Participant• TwoPhaseCommit

• AddingIsolationwithLocks:optimisticV.pessimistic• PerformanceIssues

• ConsistencyModels• Serializability VersusLinearizability