nibiru: building your own nosql store
TRANSCRIPT
![Page 1: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/1.jpg)
1
Building a nosql from scratchLet them know what they are missing!
#ddtx16@edwardcapriolo@HuffPostCode
![Page 2: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/2.jpg)
2
If you are looking for
A battle tested NoSQL data store That scales up to 1 million transactions a second Allows you to query data from your IoT sensors in real time You are at the wrong talk! This is a presentation about Nibiru An open source database I work on in my spare time But you should stay anyway...
![Page 3: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/3.jpg)
3
Motivations Why do that? How this got started? What did it morph into? Many NoSQL databases came out of an industry specific use
case and as a result they had baked in assumptions. If we have clean interfaces and good abstractions we can make a better general tool with lessed forced choices.
Pottentially support a majority of the use cases in one tool.
![Page 4: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/4.jpg)
4
A friend asked
Won't this make Nibiru have all the bugs of all the systems?
![Page 5: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/5.jpg)
5
My response
Jerk!
![Page 6: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/6.jpg)
6
You might want to follow along with local copy
There are a lot of slides that have a fair amount of code https://github.com/edwardcapriolo/nibiru/blob/master/hexagon
s.ppt http://bit.ly/1NcAoEO
![Page 7: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/7.jpg)
7
Basics
![Page 8: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/8.jpg)
8
Terminology
Keyspace: A logical grouping of store(s) Store: A structure that holds data
Avoided: Column Family, Table, Collection, etc Node: a system Cluster: a group of nodes
![Page 9: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/9.jpg)
9
Assumptions & Design notes
A store is of a specific type Key Value, Column Family, etc The API of the store is dictated by the type Ample gotchas from one man, after work, project Wire components together, not into a large context Using string (for now) instead of byte[] for debug
![Page 10: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/10.jpg)
10
Server ID
We need to uniquely identify each node Hostname/ip is not good solution
Systems have multiple Can change
Should be able to run N copies on single node
![Page 11: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/11.jpg)
11
Implementation
On first init() create guid and persist
![Page 12: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/12.jpg)
12
Cluster Membership
![Page 13: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/13.jpg)
13
Cluster Membership
What is a list of nodes in the cluster? What is the up/down state of each node?
![Page 14: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/14.jpg)
14
Static Membership
![Page 15: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/15.jpg)
15
Different cluster membership models
Consensus/Gossip Cassandra Elastic Search
Master Node/Someone elses problem HBase (zookeeper)
![Page 16: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/16.jpg)
16
Gossip
http://www.joshclemm.com/projects/
![Page 17: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/17.jpg)
17
Teknek Gossip
Licenced Apache V2 Forked from google code project Available from maven g: io.teknek a: gossip Great tool for building a peer-to-peer service
![Page 18: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/18.jpg)
18
Cluster Membership using Gossip
![Page 19: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/19.jpg)
19
Get Live Members
![Page 20: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/20.jpg)
20
Gutcheck
Did clean abstractions hurt the design here? Does it seem possible we could add zookeeper/etcd as a
backend implemention? Any takers? :)
![Page 21: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/21.jpg)
21
Request Routing
![Page 22: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/22.jpg)
22
Some options
So you have a bunch of nodes in a cluster, but where the heck does the data go? Client dictated - like a sharded memcache|mysql|whatever HBase - Sharding with a leader election Dynamo Style - ring topology token ownership
![Page 23: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/23.jpg)
23
Router & Partitioners
![Page 24: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/24.jpg)
24
Pick your poison: no hot spots or key locality :)
![Page 25: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/25.jpg)
25
Quick example LocalPartitioner
![Page 26: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/26.jpg)
26
Scenario: using a Dynamo-ish router
Construct a three node topology Give each an id Give them each a token Test that requests route properly
![Page 27: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/27.jpg)
27
Cluster and Token information
![Page 28: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/28.jpg)
28
Unit Test
![Page 29: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/29.jpg)
29
Token Router
![Page 30: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/30.jpg)
30
Do the Damn Thing!
![Page 31: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/31.jpg)
31
Do the Damn Thing! With Replication
![Page 32: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/32.jpg)
32
Storage Layer
![Page 33: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/33.jpg)
33
Basic Data Storage SSTables
SS = Sorted String { 'a', $PAYLOAD$ },{ 'b', $PAYLOAD$ }
![Page 34: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/34.jpg)
34
LevelDB SSTable payload
Key Value implementation SortedMap<byte, byte>
{ 'a', '1' }, { 'b', '2' }
![Page 35: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/35.jpg)
35
Cassandra SSTable Implementation
Key Value in which value is a map with last-update-wins versioning
SortedMap<byte, SortedMap <byte, Val<byte,long>>
{ 'a', { 'col':{ 'val', 1 } } }, { 'b', {
'col1':{ 'val', 1 }, 'col2':{ 'val2', 2 }
} }
![Page 36: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/36.jpg)
36
HBase SSTable Implementation
Key-Value in which value is a map with multi-versioning
SortedMap<byte, SortedMap <byte, Val<byte,long>>
{ { 'a', { 'col':{ 'val', 1 } } },
{ 'b', { 'col1':{ 'val', 1 },
'col1':{ 'valb', 2 }, 'col2':{ 'val2', 2 }
} }}
![Page 37: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/37.jpg)
37
Column Family Store high level
![Page 38: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/38.jpg)
38
Operations to support
![Page 39: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/39.jpg)
39
One possible memtable implementation
Holy Generics batman! Isn't it just a map of map?
![Page 40: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/40.jpg)
40
Unforunately no!
Imagine two requests arrive in this order: set people [edward] [age]='34' (Time 2) set people [edward] [age]='35' (Time 1)
What should be the final value? We need to deal with events landing out of order Also exists delete write known as Tombstone
![Page 41: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/41.jpg)
41
And then, there is concurrency
Multiple threads manipulating at same time Proposed solution: (Which I think is correct)
Do not compare and swap value, instead append to queue and take a second pass to optimize
![Page 42: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/42.jpg)
42
![Page 43: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/43.jpg)
43
Optimization 1: BloomFilters
Use guava. Smart! Audiance: make disapointed aww sound because Ed did not
write it himself
![Page 44: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/44.jpg)
44
Optimization 2: IndexWriter
Not ideal to seek a disk like you would seek memory
![Page 45: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/45.jpg)
45
Consistency
![Page 46: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/46.jpg)
46
Multinode Consistency
Replication: Number of places data lives Active/Active Master/Slave (with takover) Resolving conflicted data
![Page 47: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/47.jpg)
47
Quorum Consistency Active/Active Implemantation
![Page 48: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/48.jpg)
48
Message dispatched
![Page 49: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/49.jpg)
49
Asyncronos Responses T1
![Page 50: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/50.jpg)
50
Asyncronos Responses T2
![Page 51: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/51.jpg)
51
Logic to merge results
![Page 52: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/52.jpg)
52
Breakdown of components
Start & dedline : Max time to wait for requests Message : The read/write request sent to each destination Merger : Turn multiple responses into single result
![Page 53: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/53.jpg)
53
![Page 54: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/54.jpg)
54
Testing
![Page 55: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/55.jpg)
55
Challenges of timing in testing
Target goal is ~ 80% unit 20% integetration (e2e) testing Performance varies in local vs travis-ci Hard to test something that typically happens in milliseconds
but at worst case can take seconds Lazy half solution: Thread.sleep() statements for worst case
Definately a slippery slope
![Page 56: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/56.jpg)
56
Introducing TUnit
https://github.com/edwardcapriolo/tunit
![Page 57: Nibiru: Building your own NoSQL store](https://reader030.vdocuments.mx/reader030/viewer/2022021507/58a65d2c1a28ab1c5b8b59dd/html5/thumbnails/57.jpg)
57
The End