adrian colyer - keynote: nosql matters - nosql matters dublin 2015
TRANSCRIPT
![Page 1: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/1.jpg)
NoSQL Matters@adriancolyer
![Page 2: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/2.jpg)
1. when choosing a data store / processing platform
2. when it comes to getting the most out of that platform
3. when we take things to the next level
What really matters...
![Page 3: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/3.jpg)
![Page 4: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/4.jpg)
The 13 horsemen of the apocalypse...
Your application(s)
Anomaly (Prevented By) Tolerable? Mitigation (M,G,A…)
Dirty Writes Read Uncommitted
Dirty Reads Read Committed
Fuzzy Reads (non-repeat-able)
Item-Cut Isolation
Phantoms Predicate-Cut Isolation
...
![Page 5: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/5.jpg)
Your application(s)Anomaly (Prevented By) Tolerable? Mitigation
Read Skew MAV Isolation + item-cut
Lost Update Repeatable Read
Cursor Lost Update
Cursor Stability
Write Skew Repeatable Read
Stale Reads Partition-intolerance
![Page 6: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/6.jpg)
Your application(s)
Anomaly (Prevented By) Tolerable? Mitigation
Non-monotonic read
Monotonic reads
Non-monotonic write
Monotonic writes
Invisible cause Writes-follow-reads
Disappearing writes
Read-your-writes
(for sessions)
![Page 7: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/7.jpg)
Your Developers
“we believe there is considerable work to be done to improve the programmability of highly-available systems” - Bailis et al. 2014 (HAT)
![Page 8: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/8.jpg)
Your Developers
“...an unacceptable burden to place on developers” - Google 2012 (F1)
![Page 9: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/9.jpg)
Consistency and all that...
If you accept a weaker consistency model make sure it’s a genuine trade-off and you’re getting something (you need) in return.
You can have causal consistency with (C)AC
![Page 10: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/10.jpg)
PACELC (pass-elk)
![Page 11: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/11.jpg)
Operations & all the other use cases…it is important to consider the data accesses that don’t use the API. These include back-ups, bulk import and deletion of data, bulk migrations from one data format to another, replica creation, asynchronous replication, consistency monitoring tools, and operational debugging. An alternate store would also have to provide atomic write transactions, efficient granular writes, and few latency outliers.- Facebook 2013 (TAO)
“
”
![Page 12: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/12.jpg)
it tears you apart with suspense!
“”
![Page 13: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/13.jpg)
![Page 14: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/14.jpg)
Why is it so hard?“We have found that the standard verification techniques in industry are necessary but not sufficient. We use deep design reviews, code reviews, static code analysis, stress testing, fault-injection testing, and many other techniques, but we still find that subtle bugs can hide in complex concurrent fault-tolerant systems.” - Amazon 2014
![Page 15: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/15.jpg)
In the ALPS...
![Page 16: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/16.jpg)
… or a walk in the park?
![Page 17: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/17.jpg)
(Web)Scale
The USL
Source : McSherry et al. 2015
Credit: Neil Gunther
![Page 18: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/18.jpg)
(Web)Scale
Source : McSherry et al. 2015
![Page 19: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/19.jpg)
Big?!
![Page 20: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/20.jpg)
How Big?“Working sets are Zipf-distributed. We can therefore store in memory all but the very largest datasets, which we avoid storing in memory altogether. For example, the distribution of input sizes of MapReduce jobs at Facebook is heavy-tailed. Furthermore, 96% of active jobs can have their entire data simultaneously fit in the corresponding clusters’ memory” - Tachyon, Lie et al. 2014
![Page 21: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/21.jpg)
Musketeer
![Page 22: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/22.jpg)
Performance
40-80% of all MR jobs would perform better on a single machine!
(and cost less, and be easier to operate, and have many fewer failures…)
![Page 23: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/23.jpg)
COST
The Configuration that Outperforms a Single Thread
“You can have a second computer once you’ve shown you know how to use the first one.” - Paul Barham
![Page 24: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/24.jpg)
vs a single thread...
![Page 25: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/25.jpg)
FlashGraph vs Pregel
● Pregel: 1B vertices, 127B edges, 300 machines
● FlashGraph: 3.4B vertices, 129B edges, 1 machine
![Page 26: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/26.jpg)
ApproxHadoop
![Page 27: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/27.jpg)
BlinkDB
![Page 28: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/28.jpg)
Sometimes it pays to wait (a little bit)
![Page 29: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/29.jpg)
What’s the bottleneck?
● Network I/O?● Disk I/O?● CPU?
Measure before optimising… and avoid excessive serialization and deserialization!
![Page 30: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/30.jpg)
X (multi-core)
Distributed X
In-memory XFlash Optimised X
NVMM X
NVMM & RDMA X
X (establish baseline COST)
![Page 31: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/31.jpg)
![Page 32: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/32.jpg)
ALPS, ACID 2.0, CRDTs, CAC, COPS, CRON, CALM, CAP, & CRAP!
![Page 33: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/33.jpg)
Coordination Avoidance
Invariant-Confluence for application level constraints
● NOT NULL ● PRIMARY KEY (read & delete, but not insert)● UNIQUE (read & delete, insert?)● FOREIGN KEY (insert, cascade delete, but delete)
![Page 34: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/34.jpg)
![Page 35: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/35.jpg)
Life Beyond...“In recent years, many ‘NoSQL’ designs have avoided cross-partition transactions entirely, effectively providing Read Uncommitted isolation…” - Bailis et al. 2014
From: “Life Beyond Distributed Transactions”,To: “Read-Atomic Multiple Partition” Transactions (RAMP)
![Page 36: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/36.jpg)
![Page 37: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/37.jpg)
Your application(s)
From anomalies to invariants...Invariant Type Affected Txns I-Confluent?
![Page 38: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/38.jpg)
Some closing thoughts
● Do you need eventual? ● Have you planned for anomalies?● Does it actually work?● Are you distributing for the right reasons? (AL…)● Do you need exact?● Do you need it ASAP?● Can you keep CALM?● Do you understand your application’s invariants?
![Page 40: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/40.jpg)
References● Highly Available Transactions, Virtues & Limitations - Bailis et al. 2014 http:
//blog.acolyer.org/2014/11/07/highly-available-transactions-virtues-and-limitations/
● Building on Quicksand - Helland 2009 http://blog.acolyer.org/2015/03/23/building-on-quicksand/
● F1: A Distributed SQL Database that Scales - Google 2012 http://blog.acolyer.org/2015/01/06/f1-a-distributed-sql-database-that-scales/
● Scalability! But at what COST? - McSherry et al. 2015 http://blog.acolyer.org/?p=941 (to appear, June 5th 2015)
● Applying the Universal Scalability Law to Organisations - Colyer 2015 http://blog.acolyer.org/2015/04/29/applying-the-universal-scalability-law-to-organisations/
![Page 41: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/41.jpg)
References● Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area
Storage with COPS - LLoyd et al. 2011 http://blog.acolyer.org/2015/03/17/consistency-availability-and-convergence-cops/
● Consistency, Availability, and Convergence - Mahajan et al. 2014 http://blog.acolyer.org/2015/03/17/consistency-availability-and-convergence-cops/
● Tachyon: Reliable, Memory-Speed Storage for Cluster Computing - Lie et al. 2014 http://blog.acolyer.org/2014/12/04/tachyon-reliable-memory-speed-storage-for-cluster-computing/
![Page 42: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/42.jpg)
References● Musketeer: all for one, one for all in data processing systems - Gog et al.
2015 http://blog.acolyer.org/2015/04/27/musketeer-part-i-whats-the-best-data-processing-system/ and http://blog.acolyer.org/2015/04/28/musketeer-part-ii-one-for-all-and-all-for-one/
● Pregel: A System for Large-Scale Graph Processing - Google 2010 http://blog.acolyer.org/2015/05/26/pregel-a-system-for-large-scale-graph-processing/
● FlashGraph: Processing Billion Node Graphs on an array of commodity SSDs - Zheng et al. 2015 http://blog.acolyer.org/?p=935
![Page 43: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/43.jpg)
References● ApproxHadoop: Bringing Approximations to Hadoop Frameworks - Goiri
2015 http://blog.acolyer.org/2015/04/16/approxhadoop-bringing-approximations-to-mapreduce-frameworks/
● BlinkDB: http://blinkdb.org/ ● Making Sense of Performance in Data Analytics Frameworks - Ousterhout
et al 2015 http://blog.acolyer.org/2015/04/20/making-sense-of-performance-in-data-analytics-frameworks/
● A Comprehensive Study of Convergent and Commutative Replicated Data Types - Shapiro et al. 2011 http://blog.acolyer.org/2015/03/18/a-comprehensive-study-of-convergent-and-commutative-replicated-data-types/
![Page 44: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/44.jpg)
References● The Declarative Imperative: Experiences and Conjectures in Distributed
Logic - Hellerstein 2010 http://blog.acolyer.org/2014/11/13/the-declarative-imperative-experiences-and-conjectures-in-distributed-logic/
● Fast Remote Memory - Dragojevic et al. 2014 http://blog.acolyer.org/2015/05/20/farm-fast-remote-memory/
● Mojim: A Reliable and Highly-Available Non-Volatile Memory System - Zhang et al. 2015 http://blog.acolyer.org/2015/04/14/mojim-a-reliable-and-highly-available-non-volatile-memory-system/
![Page 45: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/45.jpg)
References● Consistency Analysis in Bloom: A Calm and Collected Approach - Alvaro et
al. 2011 http://blog.acolyer.org/2015/03/16/consistency-analysis-in-bloom-a-calm-and-collected-approach/
● Edelweiss: Automatic Storage Reclamation for Distributed Programming - Conway et al. 2014 http://blog.acolyer.org/2015/02/20/edelweiss-automatic-storage-reclamation-for-distributed-programming/
● Scalable Atomic Visibility with RAMP Transactions - Bailis et al. 2014 http://blog.acolyer.org/2015/03/27/scalable-atomic-visibility-with-ramp-transactions/
![Page 46: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/46.jpg)
References● Coordination Avoidance in Database Systems - Bailis et al. 2014 http:
//blog.acolyer.org/2015/03/19/coordination-avoidance-in-database-systems/
● Putting Consistency Back into Eventual Consistency - Balegas et al. 2015 http://blog.acolyer.org/2015/05/04/putting-consistency-back-into-eventual-consistency/
● Use of Formal Methods at Amazon Web Services - Newcombe et al. 2014 http://blog.acolyer.org/2014/11/24/use-of-formal-methods-at-amazon-web-services/
● Consistency Trade-offs in Modern Distributed Database Systems Design - Abadi 2012 http://cs-www.cs.yale.edu/homes/dna/papers/abadi-pacelc.pdf
![Page 47: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/47.jpg)
References● Life Beyond Distributed Transactions - Helland 2007 http://blog.acolyer.
org/2014/11/20/life-beyond-distributed-transactions/
![Page 48: Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015](https://reader034.vdocuments.mx/reader034/viewer/2022042716/55b6d031bb61eb196c8b48d1/html5/thumbnails/48.jpg)
Image Credits● ALPS + Dublin Park: Wikimedia Commons● Movies: IMDB● Monotone Commuters: http://www.yenko.net/ubbthreads/ubbthreads.
php/topics/312207/re-old-street-scenes ● Elk picture by Jim Richmond: http://commons.wikimedia.org/wiki/File:Rm-
elk-locking-antlers.jpg