using mongodb as a graph database - 2014 redux

68
Using MongoDB as a Graph Database Chris Clarke NoSQL Birmingham 16th October 2014

Upload: chris-clarke

Post on 18-Dec-2014

1.314 views

Category:

Technology


2 download

DESCRIPTION

** An update to the 2012 MongoUK presentation, given at NoSQL Birmingham/London meetup ** This presentation charts how Talis implemented tripod, a library that runs over the top of MongoDB, to provide access to large scale graph datasets with very high performance query access. As Talis' own applications became web-scale, the company used tripod as a replacement for its earlier, general purpose RDF triple store, and maintained the graph-model in the code line whilst swapping in MongoDB underneath. By prioritising on what really mattered to those applications, and discarding what did not, the company was able to extract extreme performance from graph based datasets using MongoDB running on commodity hardware. https://github.com/talis/tripod-php https://github.com/talis/tripod-node

TRANSCRIPT

Page 1: Using MongoDB as a graph database - 2014 redux

Using MongoDB as a Graph Database

Chris ClarkeNoSQL Birmingham16th October 2014

Page 2: Using MongoDB as a graph database - 2014 redux

Graphs 101For the uninitiated

Page 3: Using MongoDB as a graph database - 2014 redux

John Janeknows

Page 4: Using MongoDB as a graph database - 2014 redux

John Janeknows

John knows JaneJane knows John

Page 5: Using MongoDB as a graph database - 2014 redux

John Janeknows

Page 6: Using MongoDB as a graph database - 2014 redux

John Janeknows

John knows JaneJane ? John

Page 7: Using MongoDB as a graph database - 2014 redux

John Jane

John knows JaneJane knows John

knows

knows

Page 8: Using MongoDB as a graph database - 2014 redux

RDF

Page 9: Using MongoDB as a graph database - 2014 redux

John knows JaneEntity Property Value

Page 10: Using MongoDB as a graph database - 2014 redux

John knows Jane

Subject Predicate Object

Page 11: Using MongoDB as a graph database - 2014 redux

John knows Jane

Jane knows John

Subject Predicate Object

Page 12: Using MongoDB as a graph database - 2014 redux

http://example.com/John foaf:knows http://example.com/Jane

PREFIX foaf: <http://xmlns.com/foaf/0.1/>

Subject Predicate Object

Page 13: Using MongoDB as a graph database - 2014 redux

http://example.com/John

http://example.com/John

foaf:knows http://example.com/Jane

foaf:name “John”

PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX rdf: <

http://www.w3.org/1999/02/22-rdf-syntax-ns#>

http://example.com/John rdf:type foaf:Person

http://example.com/Jane foaf:name “Jane”

http://example.com/Jane rdf:type foaf:Person

http://example.com/Jane foaf:knows http://example.com/John

Subject Predicate Object

Page 14: Using MongoDB as a graph database - 2014 redux

example:John example:Jane

foaf:Person

rdf:type rdf:type

“John” “Jane”

foaf:name foaf:name

foaf:knows

foaf:knows

Page 15: Using MongoDB as a graph database - 2014 redux

– Jack Fullstack

“WTF! Surely this is easier in JSON!”

Page 16: Using MongoDB as a graph database - 2014 redux

> db.people.find(){ _id: ObjectID(‘123’), name: ‘John’ knows: [ObjectID(‘456’)]},{ _id: ObjectID(‘456’), name: ‘Jane’ knows: [ObjectID(‘123’)]}

Page 17: Using MongoDB as a graph database - 2014 redux

foaf:Person

Page 18: Using MongoDB as a graph database - 2014 redux

example:John

“John”

foaf:name

example:John

24

foaf:age

Dataset A Dataset B

Page 19: Using MongoDB as a graph database - 2014 redux

example:John

“John” 24

Dataset A+B

foaf:name foaf:age

Page 20: Using MongoDB as a graph database - 2014 redux

SPARQLAn RDF Query Language

Page 21: Using MongoDB as a graph database - 2014 redux

PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT ?name ?emailWHERE { ?person a foaf:Person. ?person foaf:name ?name. ?person foaf:mbox ?email.}ORDER BY ?nameLIMIT 50

Page 22: Using MongoDB as a graph database - 2014 redux

CONSTRUCTDESCRIBESELECTASK

GraphGraph

TabularBoolean

Page 23: Using MongoDB as a graph database - 2014 redux

Graphs and Talis A bit of history

Page 24: Using MongoDB as a graph database - 2014 redux

Over time…• Our apps become popular. Last week, average

4M requests per day and at peak times 600k+ per hour

• Our dataset is growing in size - about 350M triples this week

• Our apps needed more queries and more expensive queries

• Our in-house triple store was EoL and out of date

Page 25: Using MongoDB as a graph database - 2014 redux

Project Tripodhttp://github.com/talis/tripod-php http://github.com/talis/tripod-node

Page 26: Using MongoDB as a graph database - 2014 redux

System characteristics

• 99:1 read:write

• Well shared, tenant based system. Our largest single customer has 35M triples

• Graph data structures and operations (merges, sub-graphs etc.) well entrenched in the codebase, over 2M lines code (inc. libraries)

• Actually not that many distinct query shapes

Page 27: Using MongoDB as a graph database - 2014 redux

Simple Queries, and how they influenced our core

data model

Page 28: Using MongoDB as a graph database - 2014 redux

DESCRIBE <http://example.com/John>

SELECT ?name ?age WHERE { <http://example.com/John> <foaf:name> ?name . <http://example.com/John> <foaf:age> ?age .}

Give me all the triples about John as a graph

Give me properties name, age of John as tabular data

Page 29: Using MongoDB as a graph database - 2014 redux

Subject Predicate Object

http://example.com/John

http://example.com/John

foaf:knows http://example.com/Jane

foaf:name “John”

PREFIX foaf: <http://xmlns.com/foaf/0.1/>PREFIX rdf: <

http://www.w3.org/1999/02/22-rdf-syntax-ns#>

http://example.com/John rdf:type foaf:Person

http://example.com/Jane foaf:name “Jane”

http://example.com/Jane rdf:type foaf:Person

http://example.com/Jane foaf:knows http://example.com/John

Page 30: Using MongoDB as a graph database - 2014 redux

http://example.com/John

http://example.com/John

foaf:knows http://example.com/Jane

foaf:name “John”

http://example.com/John rdf:type foaf:Person

http://example.com/Jane foaf:name “Jane”

http://example.com/Jane rdf:type foaf:Person

http://example.com/Jane foaf:knows http://example.com/John

Concise Bound Description of http://example.com/John

Concise Bound Description of http://example.com/Jane

Page 31: Using MongoDB as a graph database - 2014 redux

http://example.com/John

http://example.com/John

foaf:knows http://example.com/Jane

foaf:name “John”

http://example.com/John rdf:type foaf:Person

Concise Bound Description of http://example.com/John

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

Page 32: Using MongoDB as a graph database - 2014 redux

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

Page 33: Using MongoDB as a graph database - 2014 redux

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

_id is the unique primary key. There can only be one John

Page 34: Using MongoDB as a graph database - 2014 redux

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

_id is the unique primary key. There can only be one John

l means value is a literal text value

Page 35: Using MongoDB as a graph database - 2014 redux

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

_id is the unique primary key. There can only be one John

u means value is a uri, or another

node.l means value is a literal text value

Page 36: Using MongoDB as a graph database - 2014 redux

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

DESCRIBE <http://example.com/John>

SELECT ?name ?age WHERE { <http://example.com/John> <foaf:name> ?name . <http://example.com/John> <foaf:age> ?age .}

Page 37: Using MongoDB as a graph database - 2014 redux

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

DESCRIBE <http://example.com/John>

SELECT ?name ?age WHERE { <http://example.com/John> <foaf:name> ?name . <http://example.com/John> <foaf:age> ?age .}

mongo$ col.findOne({_id:”example:John”});

mongo$ col.findOne({_id:”example:John”},{“foaf:name.l”:1,”foaf:age.l”:1});

Page 38: Using MongoDB as a graph database - 2014 redux

{ s: “example:John, p: “foaf:knows” o: { u: “example:Jane” } }, { s: “example:John, p: “rdf:type” o: { u: “foaf:Person” } }, { s: “example:John, p: “foaf:name” o: { l: “John” } },

Page 39: Using MongoDB as a graph database - 2014 redux

{ s: “example:John, p: “foaf:knows” o: { u: “example:Jane” } }, { s: “example:John, p: “rdf:type” o: { u: “foaf:Person” } }, { s: “example:John, p: “foaf:name” o: { l: “John” } },

DESCRIBE <http://example.com/John>

SELECT ?name ?age WHERE { <http://example.com/John> <foaf:name> ?name . <http://example.com/John> <foaf:age> ?age .}

mongo$ var s = col.find({s:”example:John”});mongo$ while (s.hasNext()) { addToGraph(s.next()) }

mongo$ col.find({s:”example:John”, p: “foaf:name”}},{“o”:1});mongo$ col.find({s:”example:John”, p: “age”}},{“o”:1});

Page 40: Using MongoDB as a graph database - 2014 redux

{ s: “example:John, p: “foaf:knows” o: { u: “example:Jane” } }, { s: “example:John, p: “rdf:type” o: { u: “foaf:Person” } }, { s: “example:John, p: “foaf:name” o: { l: “John” } },

DESCRIBE ?person WHERE { ?person <foaf:name> “John” . }

mongo$ var s = col.find({p:”foaf:name”, o:”John”}); // BasicCursor = slow

{ _id: “example:John”, “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

DESCRIBE ?person WHERE { ?person <foaf:name> “John” . }

mongo$ col.ensureIndex({“foaf:name.u”:1});mongo$ var s = col.find({“foaf:name.u”:”John”}); // BTreeCursor = fast

Page 41: Using MongoDB as a graph database - 2014 redux

Complex Queries

Page 42: Using MongoDB as a graph database - 2014 redux

DESCRIBE <http://example.com/foo> ?sectionOrItem ?resource ?document ?authorList ?author ?usedBy ?creator ?libraryNote ?publisherWHERE{ OPTIONAL { <http://example.com/foo> resource:contains ?sectionOrItem . OPTIONAL { ?sectionOrItem resource:resource ?resource . OPTIONAL { ?resource dcterms:isPartOf ?document . } OPTIONAL { ?resource bibo:authorList ?authorList . OPTIONAL { ?authorList ?p ?author . } } OPTIONAL { ?resource dcterms:publisher ?publisher . } } OPTIONAL { ?libraryNote bibo:annotates ?sectionOrItem } } . OPTIONAL { <http://example.com/foo> resource:usedBy ?usedBy } . OPTIONAL { <http://example.com/foo> sioc:has_creator ?creator }}

Page 43: Using MongoDB as a graph database - 2014 redux

DESCRIBE <http://example.com/foo> ?sectionOrItem ?resource ?document ?authorList ?author ?usedBy ?creator ?libraryNote ?publisherWHERE{ OPTIONAL { <http://example.com/foo> resource:contains ?sectionOrItem . OPTIONAL { ?sectionOrItem resource:resource ?resource . OPTIONAL { ?resource dcterms:isPartOf ?document . } OPTIONAL { ?resource bibo:authorList ?authorList . OPTIONAL { ?authorList ?p ?author . } } OPTIONAL { ?resource dcterms:publisher ?publisher . } } OPTIONAL { ?libraryNote bibo:annotates ?sectionOrItem } } . OPTIONAL { <http://example.com/foo> resource:usedBy ?usedBy } . OPTIONAL { <http://example.com/foo> sioc:has_creator ?creator }}

Page 44: Using MongoDB as a graph database - 2014 redux

– Project Tripod Team, sometime 2012

“We don’t need dynamic queries”

Page 45: Using MongoDB as a graph database - 2014 redux

Precomputed viewsRemember those from the RDBMS?

Page 46: Using MongoDB as a graph database - 2014 redux

{ _id: { “example:John” “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” }}

{ _id: “example:Jane”, “foaf:knows”: { u: “example:John” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “Jane” }}

DESCRIBE example:John ?knownPerson WHERE { example:John foaf:knows ?knownPerson . }

mongo$ var john = col.findOne({_id:”example:John”}); for (var i=0; i < john[“foaf:knows”].length; i++) { var knownPerson = col.findOne({“_id: john[“foaf:knows”][i]}); }

Page 47: Using MongoDB as a graph database - 2014 redux

System characteristics

• 99:1 read:write

• Well shared, tenant based system. Our largest single customer has 35M triples

• Graph data structures and operations (merges, sub-graphs etc.) well entrenched in the codebase, over 2M lines code (inc. libraries).

• Actually not that many distinct query shapes.

Page 48: Using MongoDB as a graph database - 2014 redux

{ _id : { r: “example:John, t: “v_knows”}, graphs: [{ _id: { “example:John” “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } }, { _id: “example:Jane”, “foaf:knows”: { u: “example:John” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “Jane” } }]}

DESCRIBE example:John ?knownPerson WHERE { example:John foaf:knows ?knownPerson . }

mongo$ viewsCol.findOne({_id: {r:”example:John”,t:”v_knows”}})

Page 49: Using MongoDB as a graph database - 2014 redux

{ _id : { r: “example:John, t: “v_knows”}, graphs: [{ _id: { “example:John” “foaf:knows”: { u: “example:Jane” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “John” } }, { _id: “example:Jane”, “foaf:knows”: { u: “example:John” }, “rdf:type”: { u: “foaf:Person” }, “foaf:name”: { l: “Jane” } }] _impactIndex : [“example:Jane”,”example:John”]}

Page 50: Using MongoDB as a graph database - 2014 redux

{ "_id":"v_knows", "type":["foaf:Person"], "from":"CBD_people", "joins":{ “foaf:knows":{} }}

View specification

Page 51: Using MongoDB as a graph database - 2014 redux

More complex example

{ "_id":"v_resources", "type":["resourcelist:Resource"], "from":"CBD_resources", "joins":{ "dct:partOf":{ "joins": { "bibo:authorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "bibo:editorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "dct:publisher":{} } },

"dct:isPartOf":{ "joins": { "bibo:authorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "bibo:editorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "dct:publisher":{} } }, "bibo:authorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "bibo:editorList":{ "joins" : { "followSequence":{ "maxJoins":50 } } }, "dct:publisher":{} } }

Page 52: Using MongoDB as a graph database - 2014 redux

What about tabular data?

• We also have tables and table specs

• Conceptually the same as views

• Instead of an array of graphs we have computed columns for complex tabular queries

• You can page, limit, offset results just like you’d expect

Page 53: Using MongoDB as a graph database - 2014 redux

{"_id" : {

"r" : “http://example.com/users/FC44E153-161C-C199-DBAB-4DDE13F76F9B/bookmarks/1ABE1B4B-A68C-90E4-41DB-AF132854770F”"type" : "t_user_resources"

},"value" : {

"_impactIndex" : [{

"r" : “http://example.com/users/FC44E153-161C-C199-DBAB-4DDE13F76F9B/bookmarks/1ABE1B4B-A68C-90E4-41DB-AF132854770F","c" : "tenantContexts:DefaultGraph"

},{

"r" : "tenantResources:7AB1D8E3-5D74-D07F-41E7-56206CFEC8EE","c" : "tenantContexts:DefaultGraph"

}],"collection" : “http://example.com/users/FC44E153-161C-C199-DBAB-4DDE13F76F9B/bookmarks","createdDate" : "2011-02-08T15:59:45+00:00","resourceUri" : "tenantResources:7AB1D8E3-5D74-D07F-41E7-56206CFEC8EE","note" : "ELECTRONIC","title" : "Feminism & psychology","type" : [

"resourcelist:Resource","bibo:Journal"

]}

}

Page 54: Using MongoDB as a graph database - 2014 redux

Database layout

talis-rs:PRIMARY> show collectionsCBD_configCBD_draftCBD_eventsCBD_jobsCBD_listsCBD_nodesCBD_resourcesCBD_reviewsCBD_serviceCBD_user_listsCBD_user_resourcesCBD_userstable_rowsviews

{r/w

} read only

Page 55: Using MongoDB as a graph database - 2014 redux

Fast and slow saves, you decide.

Page 56: Using MongoDB as a graph database - 2014 redux

Tripod save()• Based on change sets, you supply the old and

new graphs

• CBDs updated immediately. Write ahead transaction log for multi-CBD writes

• Choice per save on whether to update views/tables sync or async (eventually consistent)

• Async adds jobs to a Mongo based queue

Page 57: Using MongoDB as a graph database - 2014 redux

Measure everything

Page 58: Using MongoDB as a graph database - 2014 redux

Query volumecomplex vs. simple

Page 59: Using MongoDB as a graph database - 2014 redux

Query volumegraph vs. tabular

Page 60: Using MongoDB as a graph database - 2014 redux

Query speedcomplex vs. simple graph query

Page 61: Using MongoDB as a graph database - 2014 redux

Hardware• Real tin, 2x Dell low-end rack mount servers

• 96Gb RAM, 24 cores

• RAID-10 disks, non-SSD

• Keep ‘em on the same LAN as your app servers

• About the same to lease per month than a couple of c3.4xlarge (30Gb, 32vCPU)

• We’re about to add similar second cluster, 144Gb

Page 62: Using MongoDB as a graph database - 2014 redux

Why Mongo? RTFM, not HN comment feeds.

But seriously it could have been n other document DBs

Page 63: Using MongoDB as a graph database - 2014 redux

There’s lots moreSearch, named graphs (quads), data

functions

Page 64: Using MongoDB as a graph database - 2014 redux

Future roadmap• Multi-cluster <- IN PROGRESS

• NodeJS port <- IN PROGRESS

• Choose better solution for tlog, probably PostgreSQL

• Background queue -> redis and resque

• Chainable API

• Spout of updates for Apache Storm

• Versioned views/tables config

Page 65: Using MongoDB as a graph database - 2014 redux

ApertureAnnotate your models to persist to graph

Page 66: Using MongoDB as a graph database - 2014 redux

ApertureAnnotate your models to persist to graph

Page 67: Using MongoDB as a graph database - 2014 redux

tripod-php code…

…same in aperture

Page 68: Using MongoDB as a graph database - 2014 redux

@talisfacebook.com/talisgroup

+44 (0) 121 374 2740

[email protected]

48 Frederick StreetBirminghamB1 3HN