@knight cloud principal architect ryan knight · agenda challenges of distributed systems conflict...
TRANSCRIPT
![Page 1: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/1.jpg)
Akka Distributed Data
Ryan Knight Principal Architect@knight_cloud
![Page 2: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/2.jpg)
Agenda● Challenges of Distributed Systems
● Conflict Free Replicated Data Types -
CRDTs
● Akka Clustering
● Akka Distributed Data
![Page 3: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/3.jpg)
Challenges of Distributed Systems
![Page 4: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/4.jpg)
Challenges of Distributed Computing
● Replication is Slow
● Servers Fail
● The network is not reliable
● Latency > 0
● Limited Bandwidth
![Page 5: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/5.jpg)
Fallacies of Traditional Data Models
● Total Global Ordering is not possible
● Data is not a single opaque value
● ACID Transactions are not possible
![Page 6: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/6.jpg)
Global Locks / Distributed Transactions
![Page 7: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/7.jpg)
![Page 8: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/8.jpg)
Eventual Consistency (EC)
● Embracing failure in distributed systems
● Reconciling different operation orders
● EC with Probabilistic Guarantees
● EC with Strong Guarantees
![Page 9: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/9.jpg)
Conflict Free Replicated Data Types - CRDTs
![Page 10: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/10.jpg)
Avoiding Conflicts
![Page 11: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/11.jpg)
What are CRDTs
Data types that guarantee convergence to the
same value in spite of network delays,
partitions and message reordering
http://book.mixu.net/distsys/eventual.html
![Page 12: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/12.jpg)
Rethinking how we view Data
● Not just a place to dump values
● Abstraction of the data type
● Data Structure that tells how to build the
value
![Page 13: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/13.jpg)
Why CRDTs
● Replicate data across the network without
any synchronization mechanism
● Avoid distributed locks, two-phase commit,
etc.
● Consistency without Consensus
![Page 14: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/14.jpg)
Value of CRDTs
● Sacrifice linearizability (guaranteed
ordering ) while remaining correct
● Used to build AP Architectures - Highly
Available and Partition Tolerant
![Page 15: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/15.jpg)
Monotonic Sequence
● Monotonic Sequences - Sequence that
always increases or always decreases
● Monotonic Sequences are eventually
consistent without any need for
coordination protocols
![Page 16: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/16.jpg)
Convergent Operations
● Associative (a+(b+c)=(a+b)+c) - grouping doesn't
matter
● Commutative (a+b=b+a) - order of application
doesn't matter
● Idempotent (a+a=a) - duplication does not
matter
![Page 17: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/17.jpg)
Example Operations
{ a, b, c } 7
/ | \ / \
{a, b} {b,c} {a,c} 5 7
| \ / | / / | \
{a} {b} {c} 3 5 7
Union (Items) Max Values
![Page 18: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/18.jpg)
CRDT Counters
● Grow-only counter - only supports increments
● Positive-negative counter
○ Two grow counters, one for increments and
another for decrements
![Page 19: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/19.jpg)
CRDT Registers
● Last Write Wins Register
○ Cassandra Columns
● Multi-valued -register
○ Objects (values) in Riak
![Page 20: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/20.jpg)
CRDT Sets
● Grow-only set -> merge by union(items) with no
removal
● ORSet (Observer /Remove) - uses version vector and
birth dots.
○ Once removed, an element cannot be re-added
○ Version vector and the dots are used by the merge
function
![Page 21: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/21.jpg)
CRDT Maps
● ORMap
● ORMultiMap
● LWWMap
● PNCounterMap
![Page 22: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/22.jpg)
CRDT Compose
● CRDT Value can contain another CRDT
● ORSet can contain a G-Counter
● ORMap can contain a LWW Register
![Page 23: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/23.jpg)
CRDT Implementations
● Riak Data Types are convergent replicated
data types● https://docs.basho.com/riak/kv/2.2.0/learn/concepts/crdts/
● SoundCloud Roshi○ https://github.com/soundcloud/roshi
● Akka Distributed Data
![Page 24: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/24.jpg)
Akka Clustering
![Page 25: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/25.jpg)
Akka
● Actor Based Toolkit
● Simple Concurrency & Distribution
● Error Handling and Self-Healing
● Elastic and Decentralized
● Adaptive Load Balancing
![Page 26: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/26.jpg)
What is an Actor
● Isolated lightweight processes
● Message Based / Event Driven
● Run Asynchronously
● Processes one message at a time
● Sane Concurrency
● Isolated Failure Handling
![Page 27: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/27.jpg)
Actor Systems
● Actor system is the hierarchy of collaborating
actors
● Parent actors delegate work to child actors
● Child actors are supervised by Parent Actors
● Failure can be propagated back up Actor
Hierarchy
![Page 28: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/28.jpg)
Akka Clustering
● Peer-to-peer based cluster membership
● Communicates state via gossip protocols
● No single point of failure or single point of
bottleneck.
● Automatic node failure detector
![Page 29: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/29.jpg)
Gossip Protocol
● Sharing state by gossiping with neighbors
● Each node holds state and picks a random
node to share information with
● Reliable communication is not assumed
![Page 30: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/30.jpg)
Akka Clustering
● Cluster Singletons
● Cluster Roles
● Cluster Events
● Cluster-Aware Routers
● Cluster Sharding
![Page 31: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/31.jpg)
Akka Cluster State - Monotonic!
![Page 32: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/32.jpg)
Akka Distributed Data
![Page 33: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/33.jpg)
Akka Distributed Data
● Replicated in-memory data store
● Share Data between Akka Cluster Nodes
● Low latency and high-availability
● Key-Value store like API
![Page 34: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/34.jpg)
State Based CRDTs
● Akka Distributed Data only supports state
based CvRDT’s
● Require storage of extra data to facilitate
merging
● Entire State of CRDT’s must be disseminated
![Page 35: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/35.jpg)
Delta State Based CRDTs
● Akka 2.5 Introduced Delta State CRDTS
● Only recently applied mutations to a state
are disseminated instead of the entire state
![Page 36: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/36.jpg)
Data Resolution
● Concurrent updates automatically resolved
with monotonic merge function
● Fine Grained Control of Consistency Level of
Reads and Writes
● Update from any node without coordination
![Page 37: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/37.jpg)
Fine Grained Control of Consistency
● WriteLocal, WriteTo(n), WriteMajority, WriteAll
● ReadLocal, ReadFrom, ReadMajority, ReadAll
● Majority is N/2 + 1
● Guaranteed Consistency
○ (nodes_written + nodes_read) > N
![Page 38: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/38.jpg)
Data Distribution
● Data Spread two ways depending on
Consistency Level
● Direct replication to meet Consistency
Level of Write
● Gossip dissemination to remaining nodes
![Page 39: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/39.jpg)
Data Types
● Implements the ReplicatedData Trait
○ Monotonic merge function
● Counters: GCounter, PNCounter
● Sets: GSet, ORSet
● Maps: ORMap, ORMultiMap, LWWMap, PNCounterMap
● Registers: LWWRegister, Flag
![Page 40: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/40.jpg)
Replicated Data Type Scala Interface
trait ReplicatedData {
type T <: ReplicatedData
/**
* Monotonic merge function.
*/
def merge(that: T): T
}
![Page 41: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/41.jpg)
Replicated Data Type Java Interface
public class TwoPhaseSet extends AbstractReplicatedData<TwoPhaseSet> {
public final GSet<String> adds;
public final GSet<String> removals;
public TwoPhaseSet mergeData(TwoPhaseSet that) {
return new TwoPhaseSet(this.adds.merge(that.adds),
this.removals.merge(that.removals));
}
}
![Page 42: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/42.jpg)
The Replicator Actor
● Performs all Replication
● Started on all cluster nodes participating in Distributed
Data
● The replicator is similar to a key-value store:
○ Keys are strings, values are ReplicatedData
● Data is replicated directly and via gossiping
![Page 43: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/43.jpg)
The Local Replicator
● All Communication is done via the local replicator
● Accessed via the DataReplication extension
● Supported operations are Get, Subscribe, Update and
Delete
val replicator = DistributedData(context.system).replicator
![Page 44: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/44.jpg)
Updating
● Key typed with the distributed data type
● Initial value
● Write consistency -> Once met sends an
UpdateSuccess message back
● Optional request context - used to send response to the
sender on UpdateSuccess message
● Update function
![Page 45: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/45.jpg)
Update Example
Update<LWWMap<LineItem>> update = new Update<>(dataKey,
LWWMap.create(), writeMajority,
cart -> updateCart(cart, add.item));
replicator.tell(update, self());
![Page 46: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/46.jpg)
Change Notifications
● Subscription is done by sending a Subscribe message
to the local replicator
● The actor will then receive changed messages
![Page 47: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/47.jpg)
Change Notifications
case c @ Changed(DataKey) =>
val data = c.get(DataKey)
println()
println("Current elements:")
data.entries.foreach(println)
![Page 48: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/48.jpg)
Pruning Algorithm
● When a node is removed from the cluster a pruning
algorithm is used to collapse data
![Page 49: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/49.jpg)
Additional Resources
● http://doc.akka.io/docs/akka/snapshot/scala/distributed-data.html
● Strong Eventual Consistency and Conflict-free Replicated Data Types talk by Mark Shapiro
○ http://research.microsoft.com/apps/video/default.aspx?id=153540&r=1
● http://book.mixu.net/distsys/eventual.html
● https://www.infoq.com/presentations/crdt-soundcloud?utm_source=infoq&utm_medium=sli
deshare&utm_campaign=slidesharesf
![Page 50: @knight cloud Principal Architect Ryan Knight · Agenda Challenges of Distributed Systems Conflict Free Replicated Data Types - CRDTs Akka Clustering Akka Distributed Data](https://reader033.vdocuments.mx/reader033/viewer/2022060213/5f0563c67e708231d412b8a6/html5/thumbnails/50.jpg)
Questions?