stream-based data synchronization

137
Stream Based Data Synchronization Klemen Verdnik

Upload: klemen-verdnik

Post on 20-Feb-2017

194 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Stream-based Data Synchronization

Stream Based Data Synchronization

Klemen Verdnik

Page 2: Stream-based Data Synchronization

1. Introduction

Page 3: Stream-based Data Synchronization

1.1 Who am I, What I Do?• low-level programming enthusiast

(audio and video DSP routines, tight loop optimizations)

• embedded systems (graphic EQ with DSP, fleet management, mobile payment)

• familiar with iOS SDK since 2008

• vox.io (web / mobile sip, xmpp)

• layer.com (messaging)

• obsession with synchronization protocols

Page 4: Stream-based Data Synchronization

2. Data Synchronization

Page 5: Stream-based Data Synchronization

2.1 What is Data Synchronization?• Having data

consistency across two or more networked entities

Page 6: Stream-based Data Synchronization

2.1.1 Example

Toggle Switch App

Page 7: Stream-based Data Synchronization

2.1.2 How to Design the System?• Simple server

Toggle Switch App

Page 8: Stream-based Data Synchronization

2.1.2 How to Design the System?• Simple server • Simple client

Toggle Switch App

Page 9: Stream-based Data Synchronization

2.1.2 How to Design the System?• Simple server • Simple client • Simple data

structure

{ lightsOn: true }

Toggle Switch App

Page 10: Stream-based Data Synchronization

2.1.2 How to Design the System?

Page 11: Stream-based Data Synchronization

2.2 Other Use Cases• E-mail (IMAP, POP) • Messaging (iMessages, Hangouts) • Photo sharing (Photo Stream, Google Photos) • File sharing (Dropbox, iCloud Drive) • Online text editors / spreadsheet editors (Google Docs) • Multiplayer Games (Minecraft)

Page 12: Stream-based Data Synchronization

2.3 Types of Data Synchronization• File

synchronization

Page 13: Stream-based Data Synchronization

2.3 Types of Data Synchronization• File

synchronization • Text / document

synchronization

Page 14: Stream-based Data Synchronization

2.3 Types of Data Synchronization• File

synchronization • Text / document

synchronization • Data model

synchronization

Page 15: Stream-based Data Synchronization

2.4 Approaches to Data Synchronization

Page 16: Stream-based Data Synchronization

2.4.1 Absolute Synchronization (copying)

• Copying (wholesale transfer) is ok when dealing with small data-sets

(e.g. refreshing weather forecast, RSVP list ...)

Page 17: Stream-based Data Synchronization

2.4.1 Absolute Synchronization (copying)

• Figuring out differences between previously fetched data-sets costs CPU and memory

O(n ⋁ m)

Page 18: Stream-based Data Synchronization

2.4.1 Absolute Synchronization (copying)

• Figuring out differences between previously fetched data-sets costs CPU and memory

O(n ⋁ m)

Dan Alex Blake Emily George Caroline You

Page 19: Stream-based Data Synchronization

2.4.1 Absolute Synchronization (copying)

• Figuring out differences between previously fetched data-sets costs CPU and memory

O(n ⋁ m)

Dan Alex Blake Emily George Caroline You

Dan Alex Emily George Caroline You

Page 20: Stream-based Data Synchronization

2.4.2 Relative Synchronization (changes)

• Getting data up-to-date with changesinstead of full data sets.

(a.k.a. deltas)

Page 21: Stream-based Data Synchronization

2.4 What are Deltas?Delta encoding is a way to describe differences

between two datasets.

Page 22: Stream-based Data Synchronization

2.5.1 How to Encode Deltas?insert ― adds new values to dataset

update ― updates existing values in dataset

delete ― deletes existing values from dataset

+

-

• Three primitive operations

Page 23: Stream-based Data Synchronization

2.5.1 How to Encode Deltas?0000000: 4749 4654 2B31 0d00 0d00 9100 00b6 6257 GIFT+1........bW 0000010: 0804 0456 2c27 e5aa 7f21 f904 0000 0000 ...V,'...!...... 0000020: 002c 0000 0000 0d00 0d00 0002 318c 8f29 .,..........1..) 0000030: 3000 7986 944f 8823 260d 0feb b620 0b03 0.y..O.#&.... .. 0000040: 2e97 e1a4 0f79 920c 60a5 28e5 c452 abc6 .....y..`.(..R..

[ { type: "update", offset: 0x03, values: [ 0x38, 0x39, 0x61 ] }, { type: "insert", offset: 0x50, values: [ 0xCE, 0xE1, 0x50, 0x96, 0x89, 0x48, 0x9D, 0x02, 0x43, 0x62, 0x8D, 0x98, 0x28, 0x00, 0x00, 0x3B ] } ]

• Example on how to encode binary data changes

Page 24: Stream-based Data Synchronization

2.5.1 How to Encode Deltas?0000000: 4749 4638 3961 0d00 0d00 9100 00b6 6257 GIF89a........bW 0000010: 0804 0456 2c27 e5aa 7f21 f904 0000 0000 ...V,'...!...... 0000020: 002c 0000 0000 0d00 0d00 0002 318c 8f29 .,..........1..) 0000030: 3000 7986 944f 8823 260d 0feb b620 0b03 0.y..O.#&.... .. 0000040: 2e97 e1a4 0f79 920c 60a5 28e5 c452 abc6 .....y..`.(..R..

[ { type: "update", offset: 0x03, values: [ 0x38, 0x39, 0x61 ] }, { type: "insert", offset: 0x50, values: [ 0xCE, 0xE1, 0x50, 0x96, 0x89, 0x48, 0x9D, 0x02, 0x43, 0x62, 0x8D, 0x98, 0x28, 0x00, 0x00, 0x3B ] } ]

• Example on how to encode binary data changes

Page 25: Stream-based Data Synchronization

2.5.1 How to Encode Deltas?0000000: 4749 4638 3961 0d00 0d00 9100 00b6 6257 GIF89a........bW 0000010: 0804 0456 2c27 e5aa 7f21 f904 0000 0000 ...V,'...!...... 0000020: 002c 0000 0000 0d00 0d00 0002 318c 8f29 .,..........1..) 0000030: 3000 7986 944f 8823 260d 0feb b620 0b03 0.y..O.#&.... .. 0000040: 2e97 e1a4 0f79 920c 60a5 28e5 c452 abc6 .....y..`.(..R.. 0000050: cee1 5096 8948 9d02 4362 8d98 2800 003b ..P..H..Cb..(..;

[ { type: "update", offset: 0x03, values: [ 0x38, 0x39, 0x61 ] }, { type: "insert", offset: 0x50, values: [ 0xCE, 0xE1, 0x50, 0x96, 0x89, 0x48, 0x9D, 0x02, 0x43, 0x62, 0x8D, 0x98, 0x28, 0x00, 0x00, 0x3B ] } ]

• Example on how to encode binary data changes

Page 26: Stream-based Data Synchronization

2.5.1 How to Encode Deltas?

• Example on how to encode text changes

83: // 84: // Toggles the private ivar `_lightSwitchState` boolean, updates the 85: // background image, plays a sound and transmits the change over network. 86: // 87: func toggleAndSendLightSwitchState() { 88: self.lightSwitchState = !self.lightSwitchState > 89: self.lightSwitchClient.sendLightSwitchState(self.lightSwitchState) 90: }

Page 27: Stream-based Data Synchronization

2.5.1 How to Encode Deltas?

• Example on how to encode text changes (diff patch)

83: // 84: // Toggles the private ivar `_lightSwitchState` boolean, updates the 85: // background image, plays a sound and transmits the change over network. 86: // 87: func toggleAndSendLightSwitchState() { 88: self.lightSwitchState = !self.lightSwitchState > 89: self.lightSwitchClient?.sendLightSwitchState(self.lightSwitchState) 90: }

--- 89: self.lightSwitchClient.sendLightSwitchState(self.lightSwitchState) +++ 89: self.lightSwitchClient?.sendLightSwitchState(self.lightSwitchState)

Page 28: Stream-based Data Synchronization

2.5.1 How to Encode Deltas?

• Example on how to encode text changes (insert operation)

83: // 84: // Toggles the private ivar `_lightSwitchState` boolean, updates the 85: // background image, plays a sound and transmits the change over network. 86: // 87: func toggleAndSendLightSwitchState() { 88: self.lightSwitchState = !self.lightSwitchState > 89: self.lightSwitchClient?.sendLightSwitchState(self.lightSwitchState) 90: }

{ type: "insert", offset: 2781, values: [ "?" ] }

Page 29: Stream-based Data Synchronization

2.5.1 How to Encode Deltas?

• Example on how to encode custom data model changes

{ guests: [ "Alex", "Blake", "Caroline", "Dan", "Emily", "George" ] }

Page 30: Stream-based Data Synchronization

2.5.1 How to Encode Deltas?

• Example on how to encode custom data model changes

{ guests: [ "Alex", "Blake", "Caroline", "Dan", "Emily", "George" ] }

{ type: "delete", guest: [ "Blake" ] }

{ guests: [ "Alex", "Caroline", "Dan", "Emily", "George" ] }

Page 31: Stream-based Data Synchronization

3. Stream Based Synchronization

Page 32: Stream-based Data Synchronization

3.1 The Motivation

• Minimum data redundancy

Page 33: Stream-based Data Synchronization

3.1 The Motivation

• Speed / minimum bandwidth

Page 34: Stream-based Data Synchronization

3.1 The Motivation

• Fast writes = good concurrency characteristics

Page 35: Stream-based Data Synchronization

3.1 The Motivation

• Distributability and scalability

Page 36: Stream-based Data Synchronization

3.1 The Motivation

• Offline support

Page 37: Stream-based Data Synchronization

3.2 Stream of Mutations

• Clients with an open connection receive a live stream of events from the server

Page 38: Stream-based Data Synchronization

3.2 Stream of Mutations

• Clients with an open connection receive a live stream of events from the server

Page 39: Stream-based Data Synchronization

3.2 Stream of Mutations

• Example "To Do" app

Page 40: Stream-based Data Synchronization

3.2.1 Example (To-do List App Data-model)

• Live synchronized list of to-do tasks

public struct Todo { public class List: NSObject { private let tasks: Array<Task> = [] } }

Page 41: Stream-based Data Synchronization

3.2.1 Example (To-do List App Data-model)

• Live synchronized list of to-do tasks

• Task element consists of: checkbox, label and color public struct Todo {

public class List: NSObject { private let tasks: Array<Task> = [] } }

public struct Todo { public class Task: NSObject { public private(set) var identifier: NSUUID public private(set) var completed: Bool public private(set) var title: String public private(set) var label: ColorLabel public enum ColorLabel: UInt8 { case None = 0, Red, Orange, Yellow, Green, Turquoise, Blue, Purple, Pink } } }

Page 42: Stream-based Data Synchronization

3.2.1 Example (To-do List App Data-model)

• Live synchronized list of to-do tasks

• Task element consists of: checkbox, label and color

• Tasks can be added, edited and removed

public struct Todo { public class List: NSObject { private let tasks: Array<Task> = [] public func create(title: String, label: Task.ColorLabel) public func update(identifier: NSUUID, completed: Bool?, title: String?, label: Task.ColorLabel?) public func remove(identifier: NSUUID) } }

public struct Todo { public class Task: NSObject { public private(set) var identifier: NSUUID public private(set) var completed: Bool public private(set) var title: String public private(set) var label: ColorLabel public enum ColorLabel: UInt8 { case None = 0, Red, Orange, Yellow, Green, Turquoise, Blue, Purple, Pink } } }

Page 43: Stream-based Data Synchronization

3.2.1 Example (To-do List App Data-model)

• Live synchronized list of to-do tasks

• Task element consists of: checkbox, label and color

• Tasks can be added, edited and removed

Page 44: Stream-based Data Synchronization

3.2.2 Example (To-do List Sync Data-model)

• Todo.List user actions turn into events (!)

Page 45: Stream-based Data Synchronization

3.2.2 Example (To-do List Sync Data-model)

• Todo.List user actions turn into events (!)

• Simple concrete objects describing changes

public struct Sync { public class Event: NSObject { public enum Type: UInt8 { case Insert = 0, Update, Delete }

public private(set) var type: Type public private(set) var identifier: NSUUID public private(set) var completed: Bool? public private(set) var title: String? public private(set) var label: Int? } }

Page 46: Stream-based Data Synchronization

3.2.2 Example (To-do List Sync Data-model)

• Todo.List user actions turn into events (!)

• Simple concrete objects describing changes

• Serializable

public struct Sync { public class Event: NSObject, Serializable { public enum Type: UInt8 { case Insert = 0, Update, Delete }

public private(set) var type: Type public private(set) var identifier: NSUUID public private(set) var completed: Bool? public private(set) var title: String? public private(set) var label: Int? } }

public protocol Serializable: class { init(fromDictionary dictionary: Dictionary<String, AnyObject>) func toDictionary() -> Dictionary<String, AnyObject> }

Page 47: Stream-based Data Synchronization

3.2.2 Example (To-do List Sync Data-model)

• Todo.List user actions turn into events (!)

• Simple concrete objects describing changes

• Serializable

Page 48: Stream-based Data Synchronization

3.2.2 Example (To-do List Sync Data-model)

• Creating new task

{ // serialized event structure type: 0, // 0 = Insert identifier: "cb55ceec-b9ae-4bd9-8783-7dbf3e9cb2cd", // client generated id completed: false, // an incomplete task title: "Buy Milk", // task description label: 0 // color tag }

Page 49: Stream-based Data Synchronization

3.2.2 Example (To-do List Sync Data-model)

• Editing an existing task

{ // event structure type: 1, // 1 = Update identifier: "cb55ceec-b9ae-4bd9-8783-7dbf3e9cb2cd", // reference to task completed: true // new state }

Page 50: Stream-based Data Synchronization

3.2.3 Example (To-do List Sync and Transport)

• Receive live serialized Events from the server. public protocol TransportDelegate: class {

func transport(transport: Transport, didReceiveObject object: Serializable) func transportDidConnect(transport: Transport) func transportDidDisconnect(transport: Transport) }

public struct Sync { public class Client: NSObject, TransportDelegate { public private(set) var stream: Stream = Stream() public private(set) var transport: Transport public private(set) var todoList: Todo.List public private(set) var publishedEvents: Array<Event> = []

private func publish(event: Event) -> Bool public func transport(transport: Transport, didReceiveObject object: Serializable) } }

Page 51: Stream-based Data Synchronization

3.2.3 Example (To-do List Sync and Transport)

• Receive live serialized Events from the server.

• Send serialized Events to server.

public protocol TransportDelegate: class { func transport(transport: Transport, didReceiveObject object: Serializable) func transportDidConnect(transport: Transport) func transportDidDisconnect(transport: Transport) }

public struct Sync { public class Client: NSObject, TransportDelegate { public private(set) var stream: Stream = Stream() public private(set) var transport: Transport public private(set) var todoList: Todo.List public private(set) var publishedEvents: Array<Event> = []

private func publish(event: Event) -> Bool public func transport(transport: Transport, didReceiveObject object: Serializable) } }

Page 52: Stream-based Data Synchronization

3.2.3 Example (To-do List Sync and Transport)

• Receive live serialized Events from the server.

• Send serialized Events to server.

Page 53: Stream-based Data Synchronization

3.3 Let the Streaming Begin

• Data consistent, as long as clients remain connected

Page 54: Stream-based Data Synchronization

3.3 Let the Streaming Begin

• Missing out on events puts the client out-of-sync

Page 55: Stream-based Data Synchronization

3.3 Let the Streaming Begin

• Missing out on events puts the client out-of-sync

{ // event structure type: 1, identifier: "cb55ceec-b9ae-4bd9-8783-7dbf3e9cb2cd", completed: true }

Page 56: Stream-based Data Synchronization

3.3 Let the Streaming Begin• Data consistent, as long as clients remain

connected • Missing out on events puts the client out-of-sync • Clients can recover from out-of-sync state • Server's responsibility beside broadcasting should

also be preserving the events

Page 57: Stream-based Data Synchronization

3.4 Persistent Stream

Page 58: Stream-based Data Synchronization

3.4 Persistent Stream• Think of it as a linear magnetic tape, or as a storage

with a WORM behavior

• Append only

• Immutable events

• Journal of all the events that have happened

Page 59: Stream-based Data Synchronization

3.4 Persistent Stream

• Always copy all the events? (too expensive)

• Integrity check by hashing events? (only detects mismatches)

How does a client know if it's got all the events?

Page 60: Stream-based Data Synchronization

3.5 Event Discovery

Page 61: Stream-based Data Synchronization

3.5 Event Discovery

• Sequencing Events on server

Page 62: Stream-based Data Synchronization

3.5 Event Discovery

• Sequencing Events on server

public struct Sync { public class Event: NSObject { public private(set) var seq: Int? public private(set) var type: Type public private(set) var identifier: NSUUID? public private(set) var completed: Bool? public private(set) var title: String? public private(set) var label: Int? } }

Page 63: Stream-based Data Synchronization

3.5 Event Discovery

• Sequencing Events on server • Sequence is a linear function f(x)=x reproducible on client

Page 64: Stream-based Data Synchronization

3.5 Event Discovery

• Sequencing Events on server • Sequence is a linear function f(x)=x reproducible on client

Page 65: Stream-based Data Synchronization

3.5 Event Discovery

• Client only needs to know the seq value of the last event f(x<12)=x

Page 66: Stream-based Data Synchronization

3.5 Event Discovery

• Client only needs to know the seq value of the last event f(x<12)=x

• Figuring out missing events by subtracting the set of seqs

Page 67: Stream-based Data Synchronization

3.5 Event Discovery// Seq values pulled from all events the client has. // [ 0, 1, 2, 10, 11, 12 ] let seqsOfEvents: Set = events.map({ $0.seq })

// Calculated sequence ranging from 0 to 12. // [ 0, 1, 2, 3 ... 12 ] let seqsOfAllEvents: Set = [Int](0...12)

// Diffed set of seq values. // [ 3, 4, 5, 6, 7, 8, 9 ] let seqsOfMissingEvents: Set = seqOfAllEvents.subtract(seqOfEvents)

• Client only needs to know the seq value of the last event f(x<12)=x

• Figuring out missing events by subtracting the set of seqs

Page 68: Stream-based Data Synchronization

4. Event and Model ReconciliationOutbound and inbound reconciliation

Page 69: Stream-based Data Synchronization

4.1 Outbound Reconciliation

• Turning user actions (model changes) into Events

Page 70: Stream-based Data Synchronization

4.1 Outbound Reconciliation

• Turning user actions (model changes) into Events

Page 71: Stream-based Data Synchronization

4.1 Outbound Reconciliation

• Turning user actions (model changes) into Events

Page 72: Stream-based Data Synchronization

4.1 Outbound Reconciliation

• Turning user actions (model changes) into Events

Page 73: Stream-based Data Synchronization

4.1 Outbound Reconciliation

• Turning user actions (model changes) into Events

public class List: NSObject { public func create(title: String, label: Task.ColorLabel) -> Sync.Event public func update(identifier: NSUUID, completed: Bool?, title: String?, label: Task.ColorLabel?) -> Sync.Event? public func remove(identifier: NSUUID) -> Sync.Event? }

Page 74: Stream-based Data Synchronization

4.1 Outbound Reconciliation

• Turning user actions (model changes) into Events

let todoList = List() let event = todoList.create("Buy Milk", label: Task.ColorLabel.None) print("event: '\(event)", event)

// event: { // type: 0, // 0 = Insert // identifier: "cb55ceec-b9ae-4bd9-8783-7dbf3e9cb2cd", // client generated id // completed: false, // an incomplete task // title: "Buy milk", // task description // label: 0 // task without a label // }

Page 75: Stream-based Data Synchronization

4.1 Outbound Reconciliation

• Publishing events

let todoList = List() let event = todoList.create("Buy Milk", label: Task.ColorLabel.None) print("event: '\(event)", event)

// event: { // type: 0, // 0 = Insert // identifier: "cb55ceec-b9ae-4bd9-8783-7dbf3e9cb2cd", // client generated id // completed: false, // an incomplete task // title: "Buy milk", // task description // label: 0 // task without a label // }

// Sends the event to the stream over the network. self.syncClient.publish(event)

Page 76: Stream-based Data Synchronization

4.2 Inbound Reconciliation

• Apply Events onto the model

Page 77: Stream-based Data Synchronization

4.2 Inbound Reconciliation

• Apply Events onto the model

public class List: NSObject { private func apply(event: Sync.Event) -> Bool { switch event.type { case .Insert: // Task creation let task = Task(identifier: event.identifier, completed: event.completed!, title: event.title!, label: Task.ColorLabel(rawValue: event.label!)!) self.tasks.append(task) case .Update: // Task updates let task = self.task(event.identifier) if task == nil { return false } task!.update(event.completed!, title: event.title!, label: Task.ColorLabel(rawValue: event.label!)!) case .Delete: // Task removal if !self.removeTask(event.identifier) { return false } } return true }

}

Page 78: Stream-based Data Synchronization

4.3 Offline Support

• Events generated offline have to be published eventually

Page 79: Stream-based Data Synchronization

4.3 Offline Support

• Queue generated events; drain queue for publication

Page 80: Stream-based Data Synchronization

4.3 Offline Support

• Generating redundant events while offline

Page 81: Stream-based Data Synchronization

4.3 Offline Support

• Generating redundant events while offline

Page 82: Stream-based Data Synchronization

4.4 Reducing the Edit Distance

• Events describing the same mutation

Page 83: Stream-based Data Synchronization

4.4 Reducing the Edit Distance

• Causes stream pollution • Increases the edit distance

Page 84: Stream-based Data Synchronization

4.4 Reducing the Edit Distance

1. Insert Event merges withUpdate Events → single Insert Event

2. Update Event merge withthe rest of Update Events → single Update Event

3. Last Update Event defines final state.

4. Delete Event clobbers other Event types

Simple set of rules when queueing:

Page 85: Stream-based Data Synchronization

4.4 Reducing the Edit Distancepublic struct Sync { public class Event: NSObject, Serializable { var mergedEvents = Array<Event>() for oldEvent in events.reverse() { if oldEvent.identifier != self.identifier { // Event not mergeable, due to the identifier mismatch. mergedEvents.append(oldEvent) continue } else if self.type == Type.Delete { // Rule #4 self.reset() self.type = Type.Delete } else if self.type == Type.Update && (oldEvent.type == Type.Insert || oldEvent.type == Type.Update) { // Rule #1, #2, #3 self.completed = self.completed ?? oldEvent.completed self.title = self.title ?? oldEvent.title self.label = self.label ?? oldEvent.label } } mergedEvents.append(self) return mergedEvents } }

Page 86: Stream-based Data Synchronization

4.5 Conflict Resolution• Concurrent systems experience conflicts when two

or more nodes (clients) work on the same resource at the same time.

Page 87: Stream-based Data Synchronization

4.5 Conflict Resolution• Concurrent systems experience conflicts when two

or more nodes (clients) work on the same resource at the same time.

• Example: a client deletes a Todo Task before another client tries to mutate it.

Page 88: Stream-based Data Synchronization

4.5 Conflict ResolutionPossible conflict resolutions:

• Bring the deleted task back (last writer wins)

• Deleted task stays deleted (first writer wins)

• Ask the User what to do? (requires user interaction)

Page 89: Stream-based Data Synchronization

5. Order of Events

Page 90: Stream-based Data Synchronization

5. Order of EventsEvent sequence dictates the order they were written to stream this puts the Events in total order

• Task objects will be in the exact same order, defined bythe Event.seq

• Task mutations will be applied in the same manner on all clients

Page 91: Stream-based Data Synchronization

5. Order of Events (total order)

• Queued events must be published in batches

Page 92: Stream-based Data Synchronization

5. Order of Events (total order)

• Queued events must be published in batches

Page 93: Stream-based Data Synchronization

5. Order of Events (total order)

• Queued events must be published in batches

Page 94: Stream-based Data Synchronization

5. Order of Events (total order)

• Queued events must be published in batches

Page 95: Stream-based Data Synchronization

5.1 Total Order (sequential writes)

• Synchronized sequential writes block other clients from writing

Page 96: Stream-based Data Synchronization

5.1 Total Order (sequential writes)

• Synchronized sequential writes block other clients from writing - violates our fast concurrent writes requirement

serial writes

concurrent writes

Page 97: Stream-based Data Synchronization

5.1 Total Order (offline support)

• Both clients online

Page 98: Stream-based Data Synchronization

5.1 Total Order (offline support)

• Both clients online

Page 99: Stream-based Data Synchronization

5.1 Total Order (offline support)

• Left client loses connection

Page 100: Stream-based Data Synchronization

5.1 Total Order (offline support)

• Offline client adds more To-do tasks to the list

Page 101: Stream-based Data Synchronization

5.1 Total Order (offline support)

• Online client also adds a Todo task to the list

Page 102: Stream-based Data Synchronization

5.1 Total Order (offline support)

• Left client comes back online ― events generated offline get published and fall at the end (higher seq values)

Page 103: Stream-based Data Synchronization

5.2 Causal OrderCauses must precede their effects - effects come after causes,

and never before

Page 104: Stream-based Data Synchronization

5.2 Causal OrderCauses must precede their effects - effects come after causes,

and never before

cause

effect

Page 105: Stream-based Data Synchronization

5.2 Causal Order• Generated Event is an effect caused by user taking action /

responding to the UI.

• Events should be reconciled in the same order as they were generated by clients.

• Events should be applied onto the app model under the same conditions as it was when author generated the events.

• Total order cannot guarantee Events will be written to stream in the same order they were generated.

Page 106: Stream-based Data Synchronization

5.2.1 Order Based on TimestampsClient B's events are written before Client A's, even though Client A generated them first.

Page 107: Stream-based Data Synchronization

Encoding local time with events.

5.2.1 Order Based on Timestamps

Page 108: Stream-based Data Synchronization

Sorting events based on the embedded timestamp.

5.2.1 Order Based on Timestamps

Page 109: Stream-based Data Synchronization

5.2.1 Order Based on Timestamps• No guarantee

time will be the same on all devices

• Clock skew

• Manual override

Page 110: Stream-based Data Synchronization

5.2.2 Version Vectors• Reconstructing Events' order as it was perceived by the author

based on happened-before information.

• Provides causality-tracking basic principle in some optimistic (lazy) replication algorithms.

• Allows the client to operate independently from the server.

• When all clients eventually publish their events, it brings other online clients into a consistent state eventual consistency.

Page 111: Stream-based Data Synchronization

5.2.2 Version VectorsHow to encode happened-before information?

public struct Sync { public class Event: NSObject { public private(set) var seq: Int? public private(set) var type: Type public private(set) var identifier: NSUUID? public private(set) var completed: Bool? public private(set) var title: String? public private(set) var label: Int? } }

Page 112: Stream-based Data Synchronization

5.2.2 Version Vectors

1. Information of what's the last seen event - event.seq

How to encode happened-before information?

public struct Sync { public class Event: NSObject { public private(set) var seq: Int? public private(set) var precedingSeq: Int public private(set) var type: Type public private(set) var identifier: NSUUID? public private(set) var completed: Bool? public private(set) var title: String? public private(set) var label: Int? } }

Page 113: Stream-based Data Synchronization

5.2.2 Version Vectors

1. Information of what's the last seen event - event.seq

2. Keep unpublished events in order - event.clientSeq

How to encode happened-before information?

public struct Sync { public class Event: NSObject { public private(set) var seq: Int? public private(set) var precedingSeq: Int public private(set) var clientSeq: Int public private(set) var type: Type public private(set) var identifier: NSUUID? public private(set) var completed: Bool? public private(set) var title: String? public private(set) var label: Int? } }

Page 114: Stream-based Data Synchronization

5.2.2 Version Vectors

1. Information of what's the last seen event - event.seq

2. Keep unpublished events in order - event.clientSeq

3. Order

How to encode happened-before information?public struct Sync { public class Event: NSObject { /// Event sorting closure static public let causalOrder = { (e1: Event, e2: Event) -> Bool in if e1.precedingSeq == e2.precedingSeq { return e1.clientSeq < e2.clientSeq } return e1.precedingSeq < e2.precedingSeq }

public private(set) var seq: Int? public private(set) var precedingSeq: Int public private(set) var clientSeq: Int public private(set) var type: Type public private(set) var identifier: NSUUID? public private(set) var completed: Bool? public private(set) var title: String? public private(set) var label: Int? } }

Page 115: Stream-based Data Synchronization

5.2.2 Version Vectors

Page 116: Stream-based Data Synchronization

5.2.2 Version Vectors

Page 117: Stream-based Data Synchronization

5.2.2 Version Vectors

Page 118: Stream-based Data Synchronization

5.2.2 Version Vectors

Page 119: Stream-based Data Synchronization

5.2.2 Version Vectors

Page 120: Stream-based Data Synchronization

5.2.2 Version Vectors

Page 121: Stream-based Data Synchronization

5.2.2 Version Vectors

public struct Sync { public class Event: NSObject, Serializable { var mergedEvents = Array<Event>() for oldEvent in events.sort(Event.causalOrder) { if oldEvent.identifier != self.identifier { // etc...

public struct Todo { public class List: NSObject, ModelReconciler { public func apply(events: Array<Sync.Event>) -> Bool { for event in events.sort(Sync.Event.causalOrder) { let success = self.apply(event) // etc...

Minor adjustment in outbound / inbound reconciliation:

Page 122: Stream-based Data Synchronization

5.2.2 Version Vectors

• Newly published events generated offline are ordered by their causality.

Page 123: Stream-based Data Synchronization

5.2.2 Version Vectors

• Concurrent writes - no need for batched writes anymore, due to clientSeq.

Page 124: Stream-based Data Synchronization

5.2.2 Version Vectors

• Concurrent writes - events can be written with undetermined order; order can be reconstructed on clients

serial writes

concurrent writes

Page 125: Stream-based Data Synchronization

6. Advantages

Page 126: Stream-based Data Synchronization

6. Advantages• Shared source - minimal redundancy

Page 127: Stream-based Data Synchronization

6. Advantages• Shared source - minimal redundancy • Lightweight data structure - fast delivery

Page 128: Stream-based Data Synchronization

6. Advantages• Shared source - minimal redundancy • Lightweight data structure - fast delivery • Minimal server logic (low CPU)

Page 129: Stream-based Data Synchronization

6. Advantages• Shared source - minimal redundancy • Lightweight data structure - fast delivery • Minimal server logic (low CPU) • Short writes - high concurrency

Page 130: Stream-based Data Synchronization

6. Advantages• Shared source - minimal redundancy • Lightweight data structure - fast delivery • Minimal server logic (low CPU) • Short writes - high concurrency • Scalable / distributable

Page 131: Stream-based Data Synchronization

6. Advantages• Shared source - minimal redundancy • Lightweight data structure - fast delivery • Minimal server logic (low CPU) • Short writes - high concurrency • Scalable / distributable • Offline support

Page 132: Stream-based Data Synchronization

7. Disadvantages

Page 133: Stream-based Data Synchronization

7. Disadvantages• Server simplicity = client complexity

Page 134: Stream-based Data Synchronization

7. Disadvantages• Server simplicity = client complexity • Rogue clients = stream pollution

Page 135: Stream-based Data Synchronization

7. Disadvantages• Server simplicity = client complexity • Rogue clients = stream pollution • Clients must read full stream

Page 136: Stream-based Data Synchronization

7. Disadvantages• Server simplicity = client complexity • Rogue clients = stream pollution • Clients must read full stream • Partial sync difficult to implement

Page 137: Stream-based Data Synchronization

END_OF_STREAMquestions?

[email protected]/chipxsd

@chipxsd