supercharge your rdbms with mongodb superpowers
TRANSCRIPT
ToroDB @NoSQLonSQL
About $self and 8Kdata *8Kdata*
ToroDB @NoSQLonSQL
The world has changed...
http://chasingafterdear.com/wp-content/uploads/2013/05/how-the-world-has-changed.png
ToroDB @NoSQLonSQL
Say you were…
● A happy DBA, managing your RDBMS
● Bofhing your users when required
● Just having to fight devs who don't know who is Mr. Bobby Tables
ToroDB @NoSQLonSQL
… and then NoSQL came
And you started receiving questions like:
I want NoSQL!Install MongoDB!My app is web scale!
ToroDB @NoSQLonSQL
Fear no more!You can now
supercharge your RDBMSwith MongoDB superpowers
ToroDB @NoSQLonSQL
ToroDB @NoSQLonSQL
ToroDB in one slide
● Document-oriented, JSON, NoSQL db
● Open source (AGPL)
● MongoDB compatibility (wire protocol level)
ToroDB @NoSQLonSQL
ToroDB @NoSQLonSQL
Mapping unstructured datato relational
ToroDB @NoSQLonSQL
ToroDB storage internals
{ "name": "ToroDB", "data": { "a": 42, "b": "hello world!" }, "nested": { "j": 42, "deeper": { "a": 21, "b": "hello" } }}
ToroDB @NoSQLonSQL
ToroDB storage internals
The document is split into the following subdocuments:
{ "name": "ToroDB", "data": {}, "nested": {} }
{ "a": 42, "b": "hello world!"}
{ "j": 42, "deeper": {}}
{ "a": 21, "b": "hello"}
ToroDB @NoSQLonSQL
ToroDB storage internals┌─────┬───────┬────────────────────────────┬────────┐│ did │ index │ _id │ name │├─────┼───────┼────────────────────────────┼────────┤│ 0 │ ¤ │ \x5451a07de7032d23a908576d │ ToroDB │└─────┴───────┴────────────────────────────┴────────┘┌─────┬───────┬────┬──────────────┐│ did │ index │ a │ b │├─────┼───────┼────┼──────────────┤│ 0 │ ¤ │ 42 │ hello world! ││ 0 │ 1 │ 21 │ hello │└─────┴───────┴────┴──────────────┘┌─────┬───────┬────┐│ did │ index │ j │├─────┼───────┼────┤│ 0 │ ¤ │ 42 │└─────┴───────┴────┘
ToroDB @NoSQLonSQL
ToroDB storage internalsselect * from demo.structures┌─────┬────────────────────────────────────────────────────────────────────────────┐│ sid │ _structure │├─────┼────────────────────────────────────────────────────────────────────────────┤│ 0 │ {"t": 3, "data": {"t": 1}, "nested": {"t": 2, "deeper": {"i": 1, "t": 1}}} │└─────┴────────────────────────────────────────────────────────────────────────────┘
select * from demo.root;┌─────┬─────┐│ did │ sid │├─────┼─────┤│ 0 │ 0 │└─────┴─────┘
ToroDB @NoSQLonSQL
How data is stored in schema-less
Data normalization
ToroDB @NoSQLonSQL
This is how we store in ToroDB
ToroDB @NoSQLonSQL
Advantages over MongoDB
ToroDB @NoSQLonSQL
ToroDB: native SQL
ToroDB @NoSQLonSQL
torodb$ select * from toroviews.person ;┌─────┬───────────┬────────┬─────┐│ did │ surname │ name │ age │├─────┼───────────┼────────┼─────┤│ 0 │ Hernandez │ Alvaro │ ¤ ││ 1 │ Surname │ Name │ 31 │└─────┴───────────┴────────┴─────┘(2 rows)
torodb$ select * from toroviews."person.contact";┌─────┬──────────┬────────────────────────┐│ did │ verified │ email │├─────┼──────────┼────────────────────────┤│ 0 │ t │ [email protected] ││ 1 │ ¤ │ [email protected] │└─────┴──────────┴────────────────────────┘(2 rows)
ToroDB VIEWs
ToroDB @NoSQLonSQL
VIEWs, ToroDB from any SQL tool
ToroDB @NoSQLonSQL
Mix-and-match relational & NoSQL
● Use the same database for both your relational data and ToroDB
● Just use separate schemas (if you will)
● Don't write to ToroDB data or metadata tables
● Query with SQL, do joins, whatever!
ToroDB @NoSQLonSQL
And much more!
● Atomic batch-operations
● Clean reads
● Within node… transactions! (coming soon)
ToroDB @NoSQLonSQL
Data discoverability, SQL connectors● They are two of the major announcements for MongoDB 3.2
● To discover data, MongoDB samples data. ToroDB: just look at table structures! (and join with root if you want a count)
● SQL connectors: native, no emulation
ToroDB @NoSQLonSQL
Replication
ToroDB @NoSQLonSQL
ToroDB v0.4● ToroDB works as a secondary slave of a MongoDB master (or slave, chained rep)
● Implements the full replication protocol (not as an oplog tailable query)
● Open source github.com/torodb/torodb (devel branch, version 0.4-SNAPSHOT)
ToroDB @NoSQLonSQL
Horizontal scalability(aka sharding)
ToroDB @NoSQLonSQL
Write scalability(sharding)
● MongoDB's sharding API not implemented yet (roadmap: ToroDB 0.8)
● Will use MongoDB's mongos without modification, as well as config servers
● That might change in the future (pg_shard?)
ToroDB @NoSQLonSQL
Horizontal scalability(storage level)
● Another non-exclusive option is to have ToroDB store data in a distributed database
● Requires a distributed database like GreenPlum, CitusDb or RedShift
● Paired with replication as a slave:DW in NoSQL enabler
ToroDB @NoSQLonSQL
Enabling Data Warehousingfor the NoSQL World
ToroDB @NoSQLonSQL
● Amazon reviews datasetImage-based recommendations on styles and substitutesJ. McAuley, C. Targett, J. Shi, A. van den HengelSIGIR, 2015
● AWS c4.xlarge (4vCPU, 8GB RAM) 4KIOPS SSD
● 4x shards, 3x config; 4x segments GP● 83M records, 65GB plain json
Benchmark
ToroDB @NoSQLonSQL
Disk usage
Mongo 3.0, WT, Snappy GP columnar, zlib level 9table size index size total size
0
10000000000
20000000000
30000000000
40000000000
50000000000
60000000000
70000000000
80000000000
Storage requirements
MongoDB vs ToroDB on Greenplum
Mongo
ToroDB on GP
byt
es
ToroDB @NoSQLonSQL
SELECT count( distinct( "reviewerID" ))FROM reviews;
Queries: which one is easier?
db.reviews.aggregate([{ $group: { _id: "reviewerID"}},{ $group: {_id: 1, count: { $sum: 1}}}])
ToroDB @NoSQLonSQL
SELECT "reviewerName", count(*) as reviews FROM reviews GROUP BY "reviewerName" ORDER BY reviews DESC LIMIT 10;
Queries: which one is easier?
db.reviews.aggregate([ { $group : { _id : '$reviewerName', r : { $sum : 1 } } }, { $sort : { r : -1 } }, { $limit : 10 } ], {allowDiskUse: true})
ToroDB @NoSQLonSQL
Query times
3 different queries Q3 on MongoDB: aggregate fails
27.95 74.87 00
200
400
600
800
1000
1200969 1007
035 13 31
Query duration (s)
MongoDB vs ToroDB on Greenplum
MongoDB
ToroDB on GP
speedup
seco
nd
s
ToroDB @NoSQLonSQL
Announcing today…
MyToro!(experimental)
ToroDB @NoSQLonSQL