building on quicksand microservices indicthreads
TRANSCRIPT
Building on Quicksand
About Me• Work at ThoughtWorks Pune
• DDD and Distributed Computing enthusiast
• A fan of Pat Helland
Twitter handle: @shripadagashe
Blog: https://shripad-agashe.github.io
Evolution of systems
Image source: https://commons.wikimedia.org/wiki/File:Front_Z9_2094.jpg
Image Source: https://en.wikipedia.org/wiki/Solaris_Cluster#/media/File:Sun_Microsystems_Solaris_computer_cluster.jpg
Vs
Reliability Is a steep curveDowntime vs % availability
0
225
450
675
900
99 99.9 99.99 99.999
Downtime in Seconds
%Availability Per Year Per month Per day
99 3.65 days 7.2 hour 14.4 minutes
99.9 8.76 hours
43.8 minutes
1.44 minutes
99.99 52 minutes
4.38 munutes 9 seconds
99.999 5.26 minutes
25.9 seconds
0.8 seoncds
And It has a price
0
250
500
750
1000
99 99.9 99.99 99.999
Probability theory to the rescue
Union of ProbabilityP(A) Intersection P(B) = P(A) * P(B)
For 99 % Availability i.e. 1% unavailability Probability of unavailability for 2 servers = (0.01) * (0.01) = 0.0001
More on Probability
• Systems in series
• Availability = P(A) * P(B)
• Systems in parallel
• Availability = 1 - P(1-A) * P(1-B)
So Architectural patterns evolve around it
App App
DB
Box Cylinder Architecture Best Practices
•App layer should be stateless
• Architecture should be layered
But DB is still on a single machine
Primary DB Secondary DB
Asynchronous replication
DB high availability is achieved via replication
Active-Active Replication Active-Passive Replication
Primary DB Secondary DB
Synchronous Replication
Move to transaction model
Client In Memory Disk
Write
Write
CommitWrite to Disk
Enterprise organization
Image Source: https://www.flickr.com/photos/mwichary/2356663850
Conway’s Law
Inventory
Sales
Finance
Fulfilment Inventory System
Sales System
Finance System
Fulfilment System
Organization IT Systems
Integration via DB
App
App
DB
App
App
DB
What Enabled it:• 2 Phase Commit • XA transaction
App 1 App 2
Possible Alternative
App
App
DB
App
App
DB
What Enabled it:• SOAP • REST
App 1 App 2
Service
Bouquets and brickbats+
• Integration is simple
• Familiar for most developers
• Easier to reason
-• Any sync call will add
to latency
• Sync calls will expose system to variations in behavior of external systems
Possible alternative
App
App
DB
App
App
DB
App 1 App 2
Replication
Replication Patterns• Via file • Batch app for replication • Event driven replication using message queues
Bouquets and brickbats+
• As there is no sync call, it does not add additional latency to app
• As systems are isolated chances of failure propagation are minimal
• With Pub Sub changes can be propagated to multiple subscriber with minimal additional work
-• Integration may not be
trivial
• Async propagation of data needs careful reasoning
Probabilistic Business Rules• When we have asynchronous replication we have
windows of failure that mean work may be lost or delayed.
• Distribution + AsynchronyàProbabilities of Enforcement
Source:http://db.cs.berkeley.edu/cs286/papers/quicksand-cidr2009.pdf
Asynchrony and Truth
Image source: https://www.flickr.com/photos/stevenpisano/16595925953
Here comes Eventual Consistency
• Eventual consistency guarantees that subset of previous writes will be returned; eventually it will return all writes.
• There is no guarantee of order• There is no time bound on eventual• Loosely defined term which guarantees nothing. The
application should tolerate any subset of writes without any time guarantee
• As opposed to EC being a single concept it is a spectrum• On one end of spectrum is strong consistency• On the other end eventual consistency
Eventual consistency thru simple example
Official scorekeeper: score = Read (“visitors”); Write(“visitors”,score+1);
Umpire: if middle of 9th inning then vScore = Read (“visitors”); hScore = Read (“home”); if vScore < hScore end game;
Radio reporter: do {
vScore = Read (“visitors”); hScore = Read (“home”);
report vScore and hScore; sleep (30 minutes);
}
Sportswriter: While not end of game { drink beer; smoke cigar; } go out to dinner; vScore = Read (“visitors”); hScore = Read (“home”); write article;
Statistician: Wait for end of game; score = Read (“home”); stat = Read (“season-runs”); Write(“season-runs”,stat+score);
Stat watcher: stat = Read (“season-runs”); discuss stats with friends;
StrongConsistency Seeallpreviouswrites.
EventualConsistency Seesubsetofpreviouswrites.
ConsistentPrefix Seeinitialsequenceofwrites.
BoundedStaleness Seeall“old”writes.
MonotonicReads Seeincreasingsubsetofwrites.
ReadMyWrites Seeallwritesperformedbyreader.
Source:http://cacm.acm.org/magazines/2013/12/169945-replicated-data-consistency-explained-through-baseball/fulltext#F8
Not everyone needs same thing
• Often different roles have different tolerances for stale information
• The trade off between correctness and availability can bring in more revenue
• The trade off is often driven by business value
Whats the Risk Appetite• Consistency is often cost of doing business
• The major point is that availability (and its cousins offline and latency-reduction) may be traded off with classic notions of consistency. This tradeoff may frequently be applied across many different aspects at many levels of granularity within a single application.
• Locally clear a check if the face value is less than $10,000. If it exceeds $10,000, double check with all the replicas to make sure it clears
• Schedule the shipment of a “Harry Potter” book based on a local opinion of the inventory. In contrast, the one and only one Gutenberg bible requires strict coordination!
Source:http://db.cs.berkeley.edu/cs286/papers/quicksand-cidr2009.pdf
Memories, Guesses, Apologies
• The idea is that everything is done locally with a subset of the global knowledge.
• You know what you know when an action is performed. Since you have only a subset of the knowledge, your actions are really only guesses.
• When your knowledge as a replica increases, you may have an “Oh, crap!” moment.
Source:http://db.cs.berkeley.edu/cs286/papers/quicksand-cidr2009.pdf
• Every business has to be ready for apologies.
• Consider a case where the only book in inventory is scheduled for delivery.
• In preparing the book for shipment, it is run over by the forklift in the warehouse.
• So correct software non withstanding you will need to apologize
More on Apologies
Source:http://db.cs.berkeley.edu/cs286/papers/quicksand-cidr2009.pdf
How to apologize• First of all recognize if you need to apologize
• Identify promises that could not be completed
• Unique identifier across systems becomes critical to identify failures
• Typically the mistakes are identified during reconciliation
• Based on severity apology can be handled by system based on rules or directed for human involvement
What inhibits business trade offs• The layering of an arbitrary application atop a storage subsystem
inhibits reordering (and also apologies)
• Logical delete vs Actual delete of row
• Only when commutative operations are used can we achieve the desired loose coupling.
• Application operations can be commutative
Source:http://db.cs.berkeley.edu/cs286/papers/quicksand-cidr2009.pdf
Commutative Business Transactions• Order insensitive logic
• Valid Account creation - Create dummy account first and then attach customer to it later
• Credit to account
• Visibility of business operations
• Logical deletion of record
Back to the Future
Vs
We need to partner with people like him
Image source: https://commons.wikimedia.org/wiki/File:Jackie_Stewart_2011_British_Grand_Prix.jpg
So whats in it for You• The technique explained requires business person
to be IT sympathetic
• Business has to align with IT and see IT as a competitive advantage
• Developing with us rather than developing for us mentality
Image source: https://commons.wikimedia.org/wiki/File:Jackie_Stewart_2011_British_Grand_Prix.jpg
• Move from DB Centric view of consistency to application centric view of consistency
• Carefully make trade offs in IT systems to limit losses and increase upside
• Look for inspiration in business practices developed for world without instant information
Key takeaway