building on quicksand microservices indicthreads

Building on Quicksand

About Me• Work at ThoughtWorks Pune

• DDD and Distributed Computing enthusiast

• A fan of Pat Helland

Twitter handle: @shripadagashe

Blog: https://shripad-agashe.github.io

https://shripad-agashe.github.io

Evolution of systems

Image source: https://commons.wikimedia.org/wiki/File:Front_Z9_2094.jpg

Image Source: https://en.wikipedia.org/wiki/Solaris_Cluster#/media/File:Sun_Microsystems_Solaris_computer_cluster.jpg

Vs

Reliability Is a steep curveDowntime vs % availability

0

225

450

675

900

99 99.9 99.99 99.999

Downtime in Seconds

%Availability Per Year Per month Per day

99 3.65 days 7.2 hour 14.4 minutes

99.9 8.76 hours

43.8 minutes

1.44 minutes

99.99 52 minutes

4.38 munutes 9 seconds

99.999 5.26 minutes

25.9 seconds

0.8 seoncds

And It has a price

0

250

500

750

1000

99 99.9 99.99 99.999

Probability theory to the rescue

Union of ProbabilityP(A) Intersection P(B) = P(A) * P(B)

For 99 % Availability i.e. 1% unavailability Probability of unavailability for 2 servers = (0.01) * (0.01) = 0.0001

More on Probability

• Systems in series

• Availability = P(A) * P(B)

• Systems in parallel

• Availability = 1 - P(1-A) * P(1-B)

So Architectural patterns evolve around it

App App

DB

Box Cylinder Architecture Best Practices

•App layer should be stateless

• Architecture should be layered

But DB is still on a single machine

Primary DB Secondary DB

Asynchronous replication

DB high availability is achieved via replication

Active-Active Replication Active-Passive Replication

Primary DB Secondary DB

Synchronous Replication

Move to transaction model

Client In Memory Disk

Write

Write

CommitWrite to Disk

Enterprise organization

Image Source: https://www.flickr.com/photos/mwichary/2356663850

https://www.flickr.com/photos/mwichary/2356663850

Conway’s Law

Inventory

Sales

Finance

Fulfilment Inventory System

Sales System

Finance System

Fulfilment System

Organization IT Systems

Integration via DB

App

App

DB

App

App

DB

What Enabled it:• 2 Phase Commit • XA transaction

App 1 App 2

Possible Alternative

App

App

DB

App

App

DB

What Enabled it:• SOAP • REST

App 1 App 2

Service

Bouquets and brickbats+

• Integration is simple

• Familiar for most developers

• Easier to reason

-• Any sync call will add

to latency

• Sync calls will expose system to variations in behavior of external systems

Possible alternative

App

App

DB

App

App

DB

App 1 App 2

Replication

Replication Patterns• Via file • Batch app for replication • Event driven replication using message queues

Bouquets and brickbats+

• As there is no sync call, it does not add additional latency to app

• As systems are isolated chances of failure propagation are minimal

• With Pub Sub changes can be propagated to multiple subscriber with minimal additional work

-• Integration may not be

trivial

• Async propagation of data needs careful reasoning

Probabilistic Business Rules• When we have asynchronous replication we have

windows of failure that mean work may be lost or delayed.

• Distribution + AsynchronyàProbabilities of Enforcement

Source:http://db.cs.berkeley.edu/cs286/papers/quicksand-cidr2009.pdf

http://db.cs.berkeley.edu/cs286/papers/quicksand-cidr2009.pdf

Asynchrony and Truth

Image source: https://www.flickr.com/photos/stevenpisano/16595925953

https://www.flickr.com/photos/stevenpisano/16595925953

Here comes Eventual Consistency

• Eventual consistency guarantees that subset of previous writes will be returned; eventually it will return all writes.

• There is no guarantee of order• There is no time bound on eventual• Loosely defined term which guarantees nothing. The

application should tolerate any subset of writes without any time guarantee

• As opposed to EC being a single concept it is a spectrum• On one end of spectrum is strong consistency• On the other end eventual consistency

Eventual consistency thru simple example

Official scorekeeper: score = Read (“visitors”); Write(“visitors”,score+1);

Umpire: if middle of 9th inning then vScore = Read (“visitors”); hScore = Read (“home”); if vScore < hScore end game;

Radio reporter: do {

vScore = Read (“visitors”); hScore = Read (“home”);

report vScore and hScore; sleep (30 minutes);

}

Sportswriter: While not end of game { drink beer; smoke cigar; } go out to dinner; vScore = Read (“visitors”); hScore = Read (“home”); write article;

Statistician: Wait for end of game; score = Read (“home”); stat = Read (“season-runs”); Write(“season-runs”,stat+score);

Stat watcher: stat = Read (“season-runs”); discuss stats with friends;

StrongConsistency Seeallpreviouswrites.

EventualConsistency Seesubsetofpreviouswrites.

ConsistentPrefix Seeinitialsequenceofwrites.

BoundedStaleness Seeall“old”writes.

MonotonicReads Seeincreasingsubsetofwrites.

ReadMyWrites Seeallwritesperformedbyreader.

Source:http://cacm.acm.org/magazines/2013/12/169945-replicated-data-consistency-explained-through-baseball/fulltext#F8

http://cacm.acm.org/magazines/2013/12/169945-replicated-data-consistency-explained-through-baseball/fulltext#F8

Not everyone needs same thing

• Often different roles have different tolerances for stale information

• The trade off between correctness and availability can bring in more revenue

• The trade off is often driven by business value

Whats the Risk Appetite• Consistency is often cost of doing business

• The major point is that availability (and its cousins offline and latency-reduction) may be traded off with classic notions of consistency. This tradeoff may frequently be applied across many different aspects at many levels of granularity within a single application.

• Locally clear a check if the face value is less than $10,000. If it exceeds $10,000, double check with all the replicas to make sure it clears

• Schedule the shipment of a “Harry Potter” book based on a local opinion of the inventory. In contrast, the one and only one Gutenberg bible requires strict coordination!



Memories, Guesses, Apologies

• The idea is that everything is done locally with a subset of the global knowledge.

• You know what you know when an action is performed. Since you have only a subset of the knowledge, your actions are really only guesses.

• When your knowledge as a replica increases, you may have an “Oh, crap!” moment.



• Every business has to be ready for apologies.

• Consider a case where the only book in inventory is scheduled for delivery.

• In preparing the book for shipment, it is run over by the forklift in the warehouse.

• So correct software non withstanding you will need to apologize

More on Apologies



How to apologize• First of all recognize if you need to apologize

• Identify promises that could not be completed

• Unique identifier across systems becomes critical to identify failures

• Typically the mistakes are identified during reconciliation

• Based on severity apology can be handled by system based on rules or directed for human involvement

What inhibits business trade offs• The layering of an arbitrary application atop a storage subsystem

inhibits reordering (and also apologies)

• Logical delete vs Actual delete of row

• Only when commutative operations are used can we achieve the desired loose coupling.

• Application operations can be commutative



Commutative Business Transactions• Order insensitive logic

• Valid Account creation - Create dummy account first and then attach customer to it later

• Credit to account

• Visibility of business operations

• Logical deletion of record

Back to the Future

Vs

We need to partner with people like him

Image source: https://commons.wikimedia.org/wiki/File:Jackie_Stewart_2011_British_Grand_Prix.jpg

https://commons.wikimedia.org/wiki/File:Jackie_Stewart_2011_British_Grand_Prix.jpg

So whats in it for You• The technique explained requires business person

to be IT sympathetic

• Business has to align with IT and see IT as a competitive advantage

• Developing with us rather than developing for us mentality

Image source: https://commons.wikimedia.org/wiki/File:Jackie_Stewart_2011_British_Grand_Prix.jpg

• Move from DB Centric view of consistency to application centric view of consistency

• Carefully make trade offs in IT systems to limit losses and increase upside

• Look for inspiration in business practices developed for world without instant information

Key takeaway

https://commons.wikimedia.org/wiki/File:Jackie_Stewart_2011_British_Grand_Prix.jpg

building on quicksand microservices indicthreads

Software