distributed consensus a.k.a. "what do we eat for lunch?"

Post on 28-Nov-2014

2.739 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Distributed Consensus is everywhere! Even if not obvious at first, most apps nowadays are distributed systems, and these sometimes have to "agree on a value", this is where consensus algorithms come in. In this session we'll look at the general problem and solve a few example cases using the RAFT algorithm implemented using Akka's Actor and Cluster modules.

TRANSCRIPT

Konrad 'ktoso' Malawski GeeCON 2014 @ Kraków, PL

Konrad `@ktosopl` Malawski

Distributed Consensus A.K.A.

“What do we eat for lunch?”

Konrad 'ktoso' Malawski GeeCON 2014 @ Kraków, PL

Distributed Consensus A.K.A.

“What do we eat for lunch?”

Konrad `@ktosopl` Malawski

real world edition

Konrad `@ktosopl` Malawski

hAkker @

Konrad `@ktosopl` Malawski

typesafe.com geecon.org

Java.pl / KrakowScala.pl sckrk.com / meetup.com/Paper-Cup @ London

GDGKrakow.pl meetup.com/Lambda-Lounge-Krakow

hAkker @

You?

Distributed systems?

You?

Distributed systems?

?

You?

Distributed systems?

?

?

What is this talk about?

The network. !

How to think about distributed systems. !

Some healthy madness.

Code in slides covers only “simplest possible case”.

Ordering[T]

Slightly chronological. !

By no means is it “worst to best”.

Consensus

Consensus - informal

“we all agree on something”

Consensus - formalTermination

Every correct process decides some value.

!

Validity If all correct processes propose the same value v,

then all correct processes decide v.

!

Integrity If a correct process decides v,

then v must have been proposed by some correct process.

!

Agreement Every correct process must agree on the same value.

Consensus

Consensus

Distributed Consensus

Distributed Consensus

What is a distributed system anyway?

Distributed system definition

A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable.

— Leslie Lamport

http://research.microsoft.com/en-us/um/people/lamport/pubs/distributed-system.txt

Distributed system definition

A system in which participants communicate asynchronously using messages.

http://research.microsoft.com/en-us/um/people/lamport/pubs/distributed-system.txt

Distributed Systems - failure detection

Distributed Systems - failure detection

Distributed Systems - failure detection

Jim had quit CorpSoft a while ago, but no-one ever told Bob…

Distributed Systems - failure detection

Distributed Systems - failure detection

Failure detection:• can only rely on external knowledge • but what if there’s no-one to tell you?

• thus: must be in-some-way time based

Two Generals Problem

Two Generals ProblemYellow and Blue armies must attack Pink City.

They must attack together, otherwise they’ll die in vain. Now they must agree on the exact time of the attack.

!They can only send messengers, which Pink may intercept and kill.

Two Generals Problem

Two Generals Problem - happy case

I need to inform blue about my attack plan.

I don’t know when yellow will attack…

Two Generals Problem - happy case

1) Initial message not lost

Two Generals Problem - happy case

I don’t know if Blue will also attack at 13:37… I’ll wait until I hear back from him.

Two Generals Problem - happy case

I don’t know if Blue will also attack at 13:37… I’ll wait until I hear back from him.

Why?

2) Message might have not reached blue

Blue must confirm the reception of the command

1) Yellow is now sure, but Blue isn’t!

1) Yellow is now sure, but Blue isn’t!

Why?

2) Blue’s confirmation might have been lost!

This is exactly mirrors the initial situation!

2 Generals Problem Translated to Akka

2 Generals translated to Akka:

Akka Actors implement the Actor Model: !

Actors: • communicate via messages • create other actors • change their behaviour on receiving a msg

!

2 Generals translated to Akka:

Akka Actors implement the Actor Model: !

Actors: • communicate via messages • create other actors • change their behaviour on receiving a msg

!

Gains? Distribution / separation / modelling abstraction

2 Generals translated to Akka:

case class AttackAt(when: Date)

Presentation–sized–snippet = does not cover all cases

2 Generals translated to Akka:! !class General(general: Option[ActorRef]) extends Actor {!!! val WhenIWantToAttack: Date = ???! ! general foreach { _ ! AttackAt(WhenIWantToAttack) }! ! def receive = {! case AttackAt(when) =>! println(s”General ${otherGeneralName} attacks at $when”)!! ! ! println(s”I must confirm this!")! ! sender() ! AttackAt(when)! }!! def otherGeneralName = !! ! ! if(self.path.name == “blue")!“yellow" else "blue"! }!

Presentation–sized–snippet = does not cover all cases

2 Generals translated to Akka:! !class General(general: Option[ActorRef]) extends Actor {!!! val WhenIWantToAttack: Date = ???! ! general foreach { _ ! AttackAt(WhenIWantToAttack) }! ! def receive = {! case AttackAt(when) =>! println(s”General ${otherGeneralName} attacks at $when”)!! ! ! println(s”I must confirm this!")! ! sender() ! AttackAt(when)! }!! def otherGeneralName = !! ! ! if(self.path.name == “blue")!“yellow" else "blue"! }!

Presentation–sized–snippet = does not cover all cases

2 Generals translated to Akka:! !class General(general: Option[ActorRef]) extends Actor {!!! val WhenIWantToAttack: Date = ???! ! general foreach { _ ! AttackAt(WhenIWantToAttack) }! ! def receive = {! case AttackAt(when) =>! println(s”General ${otherGeneralName} attacks at $when”)!! ! ! println(s”I must confirm this!")! ! sender() ! AttackAt(when)! }!! def otherGeneralName = !! ! ! if(self.path.name == “blue")!“yellow" else "blue"! }!

Presentation–sized–snippet = does not cover all cases

2 Generals translated to Akka:! !class General(general: Option[ActorRef]) extends Actor {!!! val WhenIWantToAttack: Date = ???! ! general foreach { _ ! AttackAt(WhenIWantToAttack) }! ! def receive = {! case AttackAt(when) =>! println(s”General ${otherGeneralName} attacks at $when”)!! ! ! println(s”I must confirm this!")! ! sender() ! AttackAt(when)! }!! def otherGeneralName = !! ! ! if (self.path.name == “blue")!"yellow" else "blue"! }!

Presentation–sized–snippet = does not cover all cases

2 Generals translated to Akka:

val system = ActorSystem("two-generals")!!val blue = ! system.actorOf(Props(new General(general = None)), name = "blue")!!val yellow = ! system.actorOf(Props(new General(Some(blue))), name = "yellow")!

The blue general attacks at 13:37, I must confirm this!!The yellow general attacks at 13:37, I must confirm this!!The blue general attacks at 13:37, I must confirm this!!...

Presentation–sized–snippet = does not cover all cases

8 Fallacies of Distributed Computing

8 Fallacies of Distributed Computing

1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn’t change. 6. There is one administrator. 7. Transport cost is zero. 8. The network is homogeneous.

Peter Deutsch “The Eight Fallacies of Distributed Computing” https://blogs.oracle.com/jag/resource/Fallacies.html

Failure Models

Failure models:

Fail – Stop

Fail – Recover

Byzantine

Failure models:

Fail – Stop

Fail – Recover

Byzantine

Failure models:

Fail – Stop

Fail – Recover

Byzantine

Failure models:

Fail – Stop

Fail – Recover

Byzantine

2-phase commit

2PC - step 1: Propose value

2PC - step 1: Propose value

2PC - step 1: Promise to agree on write

2PC - step 2: Commit the write

2PC - step 1: Propose value, and die

2PC - step 1: Propose value to 1 node, and die

2PC: Prepare needs timeouts

2PC: Timeouts + recovery committer

2PC: Timeouts + recovery committer

2PC: Timeouts + recovery committer

2PC: Timeouts + recovery committer

2PC: Timeouts + recovery committer

Still can’t tolerate if the “accepted value” Actor dies

2PC: Timeouts + recovery committer

2PC: Timeouts + recovery committer

2 Phase Commit translated to Akka

2PC translated to Akka

case class Prepare(value: Any)!case object Commit!!sealed class AcceptorStatus!case object Prepared extends AcceptorStatus!case object Conflict extends AcceptorStatus!!

Presentation–sized–snippet = does not cover all cases

2PC translated to Akka

case class Prepare(value: Any)!case object Commit!!sealed class AcceptorStatus!case object Prepared extends AcceptorStatus!case object Conflict extends AcceptorStatus!!

Presentation–sized–snippet = does not cover all cases

2PC translated to Akka class Proposer(acceptors: List[ActorRef]) extends Actor {! var transactionId = 0! var preparedAcceptors = 0!! def receive = {! case value: String =>! transactionId += 1! acceptors foreach { _ ! Prepare(transactionId, value) }!! case Prepared =>! preparedAcceptors += 1! ! if (preparedAcceptors == acceptors.size)! acceptors foreach { _ ! Commit }!! case Conflict =>!! ! ! ! ! context stop self! }! }!

Presentation–sized–snippet = does not cover all cases

2PC translated to Akka class Proposer(acceptors: List[ActorRef]) extends Actor {! var transactionId = 0! var preparedAcceptors = 0!! def receive = {! case value: String =>! transactionId += 1! acceptors foreach { _ ! Prepare(transactionId, value) }!! case Prepared =>! preparedAcceptors += 1! ! if (preparedAcceptors == acceptors.size)! acceptors foreach { _ ! Commit }!! case Conflict =>!! ! ! ! ! context stop self! }! }!

Presentation–sized–snippet = does not cover all cases

2PC translated to Akka class Proposer(acceptors: List[ActorRef]) extends Actor {! var transactionId = 0! var preparedAcceptors = 0!! def receive = {! case value: String =>! transactionId += 1! acceptors foreach { _ ! Prepare(transactionId, value) }!! case Prepared =>! preparedAcceptors += 1! ! if (preparedAcceptors == acceptors.size)! acceptors foreach { _ ! Commit }!! case Conflict =>!! ! ! ! ! context stop self! }! }!

Presentation–sized–snippet = does not cover all cases

2PC with ResumeProposer in Akka

case class Prepare(value: Any)!case object Commit!!sealed class AcceptorStatus!case object Prepared extends AcceptorStatus!case object Conflict extends AcceptorStatus!case class Committed(value: Any) extends AcceptorStatus!

Presentation–sized–snippet = does not cover all cases

2PC with ResumeProposer in Akka!class ResumeProposer(! proposer: ActorRef, ! acceptors: List[ActorRef]) extends Actor {!! context watch proposer!! var anyAcceptorCommitted = false!! def receive = {! case Terminated(`proposer`) =>! println("Proposer died! Try to finish the transaction...")! acceptors map { _ ! StatusPlz }!! case _: AcceptorStatus =>! // impl of recovery here! }!}

Presentation–sized–snippet = does not cover all cases

2PC with ResumeProposer in Akka

Presentation–sized–snippet = does not cover all cases

Quorum

Quorum voting

From the perspective of the Omnipotent Observer *

Quorum voting

* does not exist in a running system

From the perspective of the Omnipotent Observer *

Quorum voting

Quorum voting

Quorum voting

Quorum voting

Quorum voting

Quorum voting

Quorum voting – split votes

Quorum voting – split votes

Quorum voting – split votes

Quorum voting – split votes

Quorum voting – split votes

James Mickens “The Saddest Moment” http://research.microsoft.com/en-us/people/mickens/thesaddestmoment.pdf

Paxos

Basic Paxos =

“choose exactly one value”

Paxos: a high-level overview

It’s the distributed systems algorithm

Paxos: a high-level overview

JavaZone had a full session on Paxos already today…

A few Paxos whitepapers

"Reaching Agreement in the Presence of Faults” – Lamport, 1980 …

“FLP Impossibility Result” – Fisher et al, 1985 “The Part Time Parliament” – Lamport, 1998

… “Paxos made Simple” – Lamport, 2001

“Fast Paxos” – Lamport, 2005 …

“Paxos made Live” – Chandra et al, 2007 …

“Paxos made Moderately Complex” – Rennesse, 2011 ;-)

Lamport’s “Replicated State Machine”

Paxos: The cast

Paxos: The cast

Paxos: The cast

Paxos: The cast

Paxos: The cast

Paxos: The cast

!

Consensus time! Chose a value (raise your hand)

Consensus time! Chose a value (raise your hand):

v1 = Basic Paxos + Raft v2 = Just Raft

Consensus time! Chose a value (raise your hand):

v1 = Basic Paxos + Raft v2 = Just Raft

Consensus time! Chose a value (raise your hand):

v2 = Just Raftv1 = Basic Paxos + Raft

Consensus time! Chose a value (raise your hand):

v1 = Basic Paxos + Raft v2 = Just Raft (if enough time, Paxos)

Basic Paxos simple example

Paxos: Proposals

ProposalNr must: • be greaterThan any prev proposalNr

used by this Proposer • example: [roundNr|serverId]

Paxos: 2 phases

Phase 1: Prepare Phase 2: Accept

Paxos, Prepare Phase

n = nextSeqNr()

Paxos, Prepare Phase

acceptors ! Prepare(n, value)

Paxos, Prepare Phase

case Prepare(n, value) =>! if (n > minProposal) {! minProposal = n! accVal = value! }!! sender() ! Accepted(minProposal, accVal)

Paxos, Prepare Phase

case Prepare(n, value) =>! if (n > minProposal) {! minProposal = n! accVal = value! }!! sender() ! Accepted(minProposal, accVal)

Paxos, Prepare Phase

value = highestN(responses).accVal ! // replace my value, with accepted value!

Paxos, Accept Phase

acceptors ! Accept(n, value)

Paxos, Accept Phase

case Accept(n, value) =>! if (n >= minProposal) {! acceptedProposal = minProposal = n! acceptedValue = value! }!!learners ! Learn(value)!sender() ! minProposal

Paxos, Accept Phase

Paxos, Accept Phase

Paxos, Accept Phase

if (acceptedN > n) restartPaxos()!else println(n + “ was chosen!”)

Basic Paxos

Basic Paxos, needs extensions for the “real world”.

Additions: • “stable leader” • performance (basic = 2 * broadcast roundtrip) • ensure full replication • configuration changes

Multi Paxos

Multi Paxos

“Basically everyone does it, but everyone does it differently.”

Multi Paxos

• Keeps the Leader • Clients find and talk to the Leader

• Skips Phase 1, in stable state • 2 delays instead of 4, until learning a value

Raft

Raft – inspired by Paxos

Paxos is great. Multi-Paxos is great, but no “common understanding”. !

!

Raft wants to be understandable and just as solid."In search of an understandable consensus protocol" (2013)

Raft – inspired by Paxos!

!

• Leader based • Less processes than Paxos • It’s goal is simplicity • “Basic” includes snapshotting / membership

Raft - summarised on one page

Diego Ongaro & John Ouserhout – In search of an understandable consensus protocol

Raft

Raft

Raft - starting the cluster

Raft - Election timeout

Raft - 1st election

Raft - 1st election

Raft - Election Timeout

Raft - Election Phase

Raft

Raft

Raft

Raft

Raft

Raft

Raft

Raft

Raft

Raft

Raft – heartbeat = empty entries

Raft – heartbeat = empty entries

Akka–Raft !

(community project) (work in progress)

Raft, reminder:

Raft translated to Akka

abstract class RaftActor !! extends Actor ! ! with FSM[RaftState, Metadata]

Raft translated to Akka

abstract class RaftActor !! extends Actor ! ! with FSM[RaftState, Metadata]

Raft translated to Akka

onTransition {!! case Follower -> Candidate =>! self ! BeginElection! resetElectionDeadline()!! // ...!}

Raft translated to Akka

onTransition {!! case Follower -> Candidate =>! self ! BeginElection! resetElectionDeadline()!! // ...!}

Raft translated to Akka

! case Event(BeginElection, m: ElectionMeta) =>! log.info("Init election (among {} nodes) for {}”,! m.config.members.size, m.currentTerm)!! val request = RequestVote(m.currentTerm, m.clusterSelf, replicatedLog.lastTerm, replicatedLog.lastIndex)!! m.membersExceptSelf foreach { _ ! request }!! val includingThisVote = m.incVote! stay() using includingThisVote.withVoteFor(m.currentTerm, m.clusterSelf)! }!

Raft translated to Akka

Raft Heartbeat using Akka

akka-raft is a work in progress community project – it may change a lot

sendHeartbeat(m)!log.info("Starting hearbeat, with interval: {}", heartbeatInterval)!setTimer(HeartbeatName, SendHeartbeat, heartInterval, repeat = true)!

Raft Heartbeat using Akka

akka-raft is a work in progress community project – it may change a lot

sendHeartbeat(m)!log.info("Starting hearbeat, with interval: {}", heartbeatInterval)!setTimer(HeartbeatName, SendHeartbeat, heartInterval, repeat = true)!

Raft Heartbeat using Akka

akka-raft is a work in progress community project – it may change a lot

sendHeartbeat(m)!log.info("Starting hearbeat, with interval: {}", heartbeatInterval)!setTimer(HeartbeatName, SendHeartbeat, heartInterval, repeat = true)!

val leaderBehaviour = {! // ...! case Event(SendHeartbeat, m: LeaderMeta) =>! sendHeartbeat(m)! stay()!}

Akka-Raft in User-Land //alpha!!!

class WordConcatRaftActor extends RaftActor {!! type Command = Cmnd!! var words = Vector[String]()!! /** Applied when command committed by Raft consensus */! def apply = {! case AppendWord(word) =>! words = words :+ word! word!! case GetWords =>! log.info("Replying with {}", words.toList)! words.toList! }!}!

akka-raft is a work in progress community project – it may change a lot

FLP Impossibility

FLP Impossibility Proof (19

Impossibility of Distributed Consensus with One Faulty Process 1985 by Fisher, Lynch, Paterson

FLP Impossibility Result

Impossibility of Distributed Consensus with One Faulty Process 1985 by Fisher, Lynch, Paterson

FLP Impossibility Result

Impossibility of Distributed Consensus with One Faulty Process 1985 by Fisher, Lynch, Paterson

ktoso @ typesafe.com twitter: ktosopl github: ktoso blog: project13.pl team blog: letitcrash.com

JavaZone @ Oslo 2014

!

!

Takk! Dzięki! Thanks! ありがとう!

akka.io

Konrad 'ktoso' Malawski GeeCON 2014 @ Kraków, PL

Happy Byzantine Lunch-time!

©Typesafe 2014 – All Rights Reserved

Links1. http://cs-www.cs.yale.edu/homes/arvind/cs425/doc/fischer.pdf 2. http://hydra.infosys.tuwien.ac.at/teaching/courses/AdvancedDistributedSystems/download/

1975_Akkoyunlu,%20Ekanadham,%20Huber_Some%20constraints%20and%20tradeoffs%20in%20the%20design%20of%20network%20communications.pdf

3. http://research.microsoft.com/en-us/people/mickens/thesaddestmoment.pdf 4. http://research.microsoft.com/en-us/um/people/lamport/pubs/lamport-paxos.pdf 5. http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf 6. http://the-paper-trail.org/blog/consensus-protocols-paxos/ 7. http://static.googleusercontent.com/media/research.google.com/en//archive/

paxos_made_live.pdf 8. http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-

osdi06.pdf 9. https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf 10. Recent Leslie Lamport interview: http://www.se-radio.net/2014/04/episode-203-leslie-

lamport-on-distributed-systems/ 11. http://book.mixu.net/distsys/ 12. http://codahale.com/you-cant-sacrifice-partition-tolerance/

Peter Deutsch “The Eight Fallacies of Distributed Computing” https://blogs.oracle.com/jag/resource/Fallacies.html

Links1. Excellent Paxos lecture by Diego Ongaro

https://www.youtube.com/watch?v=JEpsBg0AO6o 2. Fallacies, actual paper: http://www.rgoarchitects.com/Files/fallacies.pdf 3. Diego Ongaro & John Ouserhout – In search of an understandable consensus protocol 4. http://macs.citadel.edu/rudolphg/csci604/ImpossibilityofConsensus.pdf

Peter Deutsch “The Eight Fallacies of Distributed Computing” https://blogs.oracle.com/jag/resource/Fallacies.html

Images / drawings1. Paxos Island Photo – Luigi Piazzi (CC license) https://www.flickr.com/photos/photolupi/

3686769346/in/photolist-6BME5J-orKHL2-58qmez-58uz7s-7bRwTj-7bRvHY-6DdRC2-fBqFFU-35KTg7-8vbe23-bsBGL7-58qq6z-58uAjG-8vbeCd-d1Sqqw-d1Smsj-d1Sqi5-d1SoMA-d1SmBE-d1SpVo-d1Sk2U-d1SoBQ-d1SoXu-d1SoqN-d1Spqu-d1Sq4w-d1SpLU-d1SKDG-d1Skcu-d1Sp8f-d1Sqaq-d1SpCw-75YaVN-d1SLs1-d1SK15-d1SJiC-d1Suiu-d1SKtS-d1SjQS-d1StyU-d1SKi1-d1SxGS-d1Sm6j-d1Sxdh-d1SKMN-d1SxAq-d1SwgC-d1Smgj-d1SvhJ-d1SjC7

2. Drawings – myself (use-them-at-will-unless-mocking-my-horrible-drawing-skills-license)

Peter Deutsch “The Eight Fallacies of Distributed Computing” https://blogs.oracle.com/jag/resource/Fallacies.html

top related