Transcript
Page 1: Maxim Zaks: Deep dive into data serialisation
Page 2: Maxim Zaks: Deep dive into data serialisation

MAXIM ZAKSFreelance Software Developer

Page 3: Maxim Zaks: Deep dive into data serialisation

Deep dive into data serialisation

Page 4: Maxim Zaks: Deep dive into data serialisation

Persisting State / User generated Content

Machine to Machine Communication

Representation of Configuration / Read only data

Why do we need data serialisation?

Page 5: Maxim Zaks: Deep dive into data serialisation

Custom binary representation

Language provided binary serialisation

Text based representation (CSV, XML, JSON, YAML)

Embedded SQL or NoSQL DB

Binary cross platform serialisation library (FlatBuffers, FlexBuffers Protocol Buffers, Cap’n Proto, SBE, Apache Thrift etc…)

How can we persist data on mobile?

Page 6: Maxim Zaks: Deep dive into data serialisation

Size “on disk”

Speed of read & write / partial read / memory consumption

Human readable and writable

Support of OO language type system

Data versioning / evolution / migration

Important criteria for persisting data

Page 7: Maxim Zaks: Deep dive into data serialisation

JSON vs.

Protocol Buffers vs.

FlatBuffers

Page 8: Maxim Zaks: Deep dive into data serialisation

Text based

Self describing

Weakly typed: Object, Array, String, Number, Bool, Null

JSON

Page 9: Maxim Zaks: Deep dive into data serialisation

Binary

IDL (interface definition language) based Schema

Strongly typed: Message, fix length Numbers, variable length

numbers, String, Bool, Enum, Bytes, Repeated values, Maps,

Oneof, Any

Evolution Strategy

Protocol Buffers

Page 10: Maxim Zaks: Deep dive into data serialisation

Binary

IDL (interface definition language) based Schema

Strongly typed: Table, Struct, fix length Numbers, String, Bool, Enum, Bytes, typed Vector, Union

Value / Reference type semantics

Random Value access

Evolution Strategy

Flat Buffers

Page 11: Maxim Zaks: Deep dive into data serialisation

Size “on disk”

Speed of read & write / partial read / memory consumption

Human readable and writable

Support of OO language type system

Data versioning / evolution / migration

Important criteria for persisting data

Page 12: Maxim Zaks: Deep dive into data serialisation

👎 JSON, bad for representing numbers, lots of repetition,

needs to be at least minified

👍 Protocol Buffers, stores only values and a bit of meta data,

thanks to VLQ very efficient for storing numbers

🤔 Flat Buffers, has some overhead compared to Protocol Buffers because of Ref semantics and random value access

Important criteria for persisting data: Size

Page 13: Maxim Zaks: Deep dive into data serialisation

👎 JSON, text need to be parsed and translated to

intermediate data representation

🤔 Protocol Buffers, no partial read, VLQ means you need to do

some operations before value is available

👍 Flat Buffers, support partial reading thanks to Ref semantics and random value access mechanism

Important criteria for persisting data: Speed of read & write / partial read / memory consumption

Page 14: Maxim Zaks: Deep dive into data serialisation

👍 JSON, text based and can be nicely formatted

🤔 Protocol Buffers, provides tools for binary to JSON conversion and vice versa

🤔 Flat Buffers, provides tools for binary to JSON conversion

and vice versa

Important criteria for persisting data: Human readable and writable

Page 15: Maxim Zaks: Deep dive into data serialisation

🤔 JSON, is generally weakly typed, there are ways to

transform/validate against OO types

👍 Protocol Buffers, code generator creates Accessor classes

which can be comfortably used for encoding and decoding

👍 Flat Buffers, code generator creates Accessor classes which can be comfortably used for decoding, performant encoding can

be a bit painful

Important criteria for persisting data: Support of OO language type system

Page 16: Maxim Zaks: Deep dive into data serialisation

🤔 JSON, is implicit because of self describing nature

👍 Protocol Buffers, provides a set of rule how a schema can be evolved

👍 Flat Buffers, provides a set of rule how a schema can be

evolved

Important criteria for persisting data: Data versioning / evolution / migration

Page 17: Maxim Zaks: Deep dive into data serialisation

Size Efficiency Human Readable Types Evolution

JSON 👎 👎 👍 🤔 🤔

Proto Buffers 👍 🤔 🤔 👍 👍

Flat Buffers 🤔 👍 🤔 👍 👍

Page 19: Maxim Zaks: Deep dive into data serialisation

Protocol Buffers Encoding: https://developers.google.com/protocol-buffers/docs/encoding

FlatBuffers: https://google.github.io/flatbuffers/

FlatBuffersSwift: https://github.com/mzaks/FlatBuffersSwift

FlexBuffersSwift: https://github.com/mzaks/FlexBuffersSwift

Data Serialisation formats: https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats

JVM Serialisers: https://github.com/eishay/jvm-serializers

More Links

Page 20: Maxim Zaks: Deep dive into data serialisation

WWW.MDEVTALK.CZ

mdevtalk


Top Related