distributed systems · • cap theorem. raft • proposed by ongaroand ousterhoutin 2014 • five...
TRANSCRIPT
DistributedSystemsDay12:Consistency
“GottoCatchThemAll…”
Today
• RaftRecap
• Consensus
• CAPTheorem
Raft• Proposed byOngaro andOusterhout in2014
• Fivecomponents• Leaderelection• Logreplication• Safety• Clientprotocol• Membershipchanges
• Assumescrashfailures(sonobyzantinefailures)
• Nodependency ontimeforsafety• Butdepends ontimeforavailability
• Tolerates(N-1)/2failures
RaftProperties• Safety:atmostoneleader
• Eachfollowervotesforatmostonecandidate• Acandidateneedsamajoritytobeleader
• Liveness:eventuallytherewillbealeader• Challenge:ifmultipletrytocallforelectionà splitvote• Timeout+randomness:randomnesshelpstoensurethatoneserverdetects
fasterthantheothers
• LogSafety: ifleadercommits,thendataisinallfutureleaders• ElectionModifications:followersonlyvoteforclientwithhigherterm/index• CommitModifications:NewLeaderdoesnotcommituntilentriesincurrent
termhavebeenagreedonbyfollowers
HowdoyouChangeClusterSize?N=5
N=7
Goal:addtwonewserverstocluster
Challenge:consistentlygetallserverstoagreeonnewclustersize
Case:Duringclusterupdate,leaderfailsANDdifferentservershavedifferentnotionsofsizeàmultipleleaders
HowdoyouChangeClusterSize?N=5
N=7
Goal:addtwonewserverstocluster
Challenge:consistentlygetallserverstoagreeonnewclustersize
Case:Duringclusterupdate,leaderfailsANDdifferentservershavedifferentnotionsofsizeàmultipleleaders
Solution:Needaprotocoltoconsistentlyupdatethecluster
ConfigurationChanges• Cannotswitchdirectlyfromoneconfigurationtoanother:conflictingmajoritiescouldarise
• SwitchingfromN=3toN=5Seethepaperfordetails
Cold Cnew
Server1Server2Server3Server4Server5
MajorityofCold
MajorityofCnew
time
Today
• RaftRecap
• Consensus
• CAPTheorem
ApproachestoReplication
PassiveReplication• Totalordering• Protocols:Zookeeper,Paxos,Chubby
ActiveReplication• FIFOordering• Toleratesbyzantinefailures
Lazyreplication• Causalordering• Protocols:Gossip,DynamoDB,
CassandraDB,VoldemortDB,MongoDB
ActiveReplication PassiveReplication LazyReplication
ServerB
ServerC
ServerA
FE
FEServerB(follower)
ServerC(Follower)
ServerA(leader)FE
FE FE ServerB
ServerC
ServerA
FE
ApproachestoReplication
PassiveReplication• Totalordering• Performanceissues:slowandlimitsparallelism
• Allservers processthesame request
Lazyreplication• Causalordering• Performance:faster
– Anyserver canprocessanyrequest– More parallelism˜
PassiveReplication LazyReplication
ServerB(follower)
ServerC(Follower)
ServerA(leader)FE
FE FE ServerB
ServerC
ServerA
FE
ThinkingAboutConsistency
• Allreplicasareoneserver
• Ifdifferentclientswriteandreadtothis’’one’’server,whatshouldweexpect?
ServerB(follower)
ServerC(Follower)
ServerA(leader)FE
FE
Get(c)
Get(c)
set(c=5)
set(c=7)
Get(c)
Get(c)
Get(c)
Get(c)
C1
C2
C3
C4
ConsistencySpectrum
StrictSerializability
Linearizable
Sequential
Causal+
Eventual
WEAKCONSISTENCY
STRONGCONSISTENCY
SLOWERBUTEASYTOPROGRAM
FASTBUTHARDERTOPROGRAM
Linearizable
• Totalorder+FIFO+“Time”• Recall:Totalorder– allserversoperateinsameorder• Linearizable isasubsetofTotalorder• OrderingmustbeFIFOandbasedontime
Get(c)
Get(c)
set(c=5)
set(c=7)
Get(c)
Get(c)
Get(c)
Get(c)
C1
C2
C3
C4
Initialc=3
Sequential
• Totalorder+FIFO
• SequentialisasubsetofTotalorder• OrderingmustbeFIFO• Requests fromdifferentclients canbereshuffled
Get(c)
Get(c)
set(c=5)
set(c=7)
Get(c)
Get(c)
Get(c)
Get(c)
C1
C2
C3
C4
Initialc=3
Causal+
• MustrespectCausality
• Needvectorclockstotrackandmaintaincausality
• Onlycausallyrelatedeventsneedtobeordered
• NOTOTALORDERING!!!!!
Get(c)
Get(c)
set(c=5)
set(c=7)
Get(c)
Get(c)
Get(c)
Get(c)
C1
C2
C3
C4
Initialc=3
Eventual
• AnythingCanhappen
• Ifnowritesà eventuallyallservers returnthesamedata
Get(c)
Get(c)
set(c=5)
set(c=7)
Get(c)
Get(c)
Get(c)
Get(c)
C1
C2
C3
C4
Initialc=3
ConsistencySpectrum• Linearizable:totalorder+realtime• Sequential:totalorder+clientorder• Causal+:causallyordered+eventuallyeveryoneagree• Eventual:eventuallyeveryoneagrees
StrictSerializability
Linearizable
Sequential
Causal+
Eventual
WEAKCONSISTENCY
STRONGCONSISTENCY
SLOWER
BUTEASYTOPROGRAM
FASTandParallel
BUTHARDERTOPROGRAM:needconflictresolution
Today
• RaftRecap
• Consensus
• CAPTheorem
CAPTheorem
• ConsistencyModel:Howdoesyoursystemreactduringpartition
• C:Consistency(linearizable)• Allaccess islinearizable• Requires consensus
• A:Availability• Allclients canmakeprogress
• P:Partitiontolerance• CandAcanhappenduringapartition
CAPTheorem
ServerB(follower)
ServerC(Follower)
ServerA(leader)
FE
FE
NetworkPartition
• ConsistencyModel:Howdoesyoursystemreactduringpartition
• C:Consistency(linearizable)• Allaccess islinearizable• Requires consensus
• A:Availability• Allclients canmakeprogress
• P:Partitiontolerance• CandAcanhappenduringapartition
Can’tcommit:• Nottheleader• Cantreachleader
Noavailability
CAPTheorem
ServerB(follower)
ServerC(Follower)
ServerA(leader)
FE
FE
NetworkPartition
• ConsistencyModel:Howdoesyoursystemreactduringpartition
• C:Consistency(linearizable)• Allaccess islinearizable• Requires consensus
• A:Availability• Allclients canmakeprogress
• P:Partitiontolerance• CandAcanhappenduringapartition I’llcommit:thereWILL
BECONFLICTS
NOCONSISTENCY
availability
ServerB(follower)
ServerC(Follower)
ServerA(leader)
FE
FE
NetworkPartition
I’llcommit: thereWILLBE CONFLICTS
NO CONSISTENCY
availability
ServerB(follower)
ServerC(Follower)
ServerA(leader)
FE
FE
NetworkPartition
Can’t commit:• Not theleader• Cant reach leader
No availability
Raft(linearizable):PassiveReplication:• Strongconsistency• Duringpartition---
• someclientswillmakenoprogress• Becauseleaderisunavailable
EventualConsistency:• Duringpartition---
• Someclientswillmakeprogress• Sinceclientscanchangesamedata
• Noconsistencygaurantees
CAPTheorem
• C:Consistency(Linearizable)• A:Availability• P:Partitiontolerance
• Givena“Partition”,youmustpickbetween“Availability”and“Consistency”• PickConsistently:Someclients(notall)canchange“data consistently”• PickAvailability:Allclientscanchangedatabut“inconsistently”
Today
• RaftRecap• Challenges: howtochangethesizeofthecluster
• Consensus:ConsistencyModels• Definitions ofdifferentconsistencymodels• Differencesbetween themodels
• CAPTheorem:Given‘P’,youcanonlyhave“A”and“P”.• Whendesigning asystemthatmusttoleratepartitions, youmustpickbetween “A”and“P”.