mongodb days uk: using mongodb to build a fast and scalable content repository sponsored by nuxeo

36
Using MongoDB to Build a Fast and Scalable Content Repository

Upload: mongodb

Post on 16-Apr-2017

602 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Using MongoDB to Build a Fast and Scalable Content Repository

Page 2: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Some ContextWhat we Do and What Problems We Try to Solve

Page 3: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Nuxeo Platform

• We provide a Platform that developers can use to build highly customised Content Applications

• We provide components, and the tools to assemble them

• The Platform is open source (https://github.com/nuxeo)

• Various customers - various use cases

• Me: Product Director at Nuxeo @aescaffre

Page 4: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Document Repository

• Document Oriented Database

➡ store JSON documents

• Document Repository

➡Manage Document attributes, hierarchy, blobs, security, lifecycle, versions

Page 5: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Document Repository

Storage abstraction:be able to choose the right storage:

• Depending on the constraints

• Depending on the environment

Page 6: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Document RepositoryA Nuxeo Platform document

Page 7: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Document RepositoryWith custom schemas:

Page 8: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Document RepositorySecurity on each record:

Page 9: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Document Repository

Thumbnail, Preview URLs

Get A Conversion (image, video, office, sound)

GET http://localhost:8080/nuxeo/api/v1/path/{docPath}/@convert?type=application%2Fpdf

Page 10: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Document Repository

•Start a Worklfow

curl -X POST 'http://localhost:8080/nuxeo/site/automation/Context.StartWorkflow' -H 'Accept: */*' -H 'Authorization: Basic QWRtaW5pc3RyYXRvcjpBZG1pbmlzdHJhdG9y’ -H 'content-type: application/json+nxrequest' -d '{"params":{"id":"serial-review","start":"true"},"input":"/default-domain/Passports/3719050812174596321","context":{}}'

•Do Some QueriesSELECT * FROM Document WHERE files/*1/file/name LIKE '%.txt' AND files/*1/file/length = 0

Page 11: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

History : Nuxeo Repository And Storage

• 2006: Nuxeo Repository is based on ZODB (Python / Zope based):

• This is not JSON in NoSQL, but Python serialization in ObjectDB

• Concurrency and performances issues, Bad transaction handling2007: Nuxeo Platform 5.1 - Apache JackRabbit (JCR based)

• 2007: Nuxeo Platform 5.1 - Apache JackRabbit (JCR based)

• Mix SQL + Java Serialization + Lucene

• Transaction and consistency issues

• 2009: Nuxeo 5.2 - Nuxeo VCS

• SQL based repository : MVCC & ACID

• very reliable, but some use cases can not fit in a SQL DB !

• 2014: Nuxeo 5.9 - Nuxeo DBS

• Document Based Storage repository

• MongoDB is the reference backend

Page 12: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

From SQL to NoSQLUnderstanding the motivations for moving to MongoDB

Page 13: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

SQL Based Repository

Page 14: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

KEY LIMITATIONS OF THE SQL APPROACH

• Impedance Issues

• storing Documents in tables is not easy

• requires Caching and Lazy loading

• Scalability

• Document repository can become very large (versions, workflows …)

• Scaling out SQL DB is very complex (and never transparent)

• Concurrency model

• Heavy write is an issue (Quotas, Inheritance)

• Heavy write is an issue (Quotas, Inheritance)

Page 15: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Need a Different Storage Model!

Page 16: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

From SQL to NoSQL

Page 17: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

NoSQL with MongoDB

• No Impedance Issue

‣ One Nuxeo Document = One MongoDB Document

• No Scalability Issue

‣ Native distributed architecture allows scale out

• No Concurrency Issue

‣ Document Level "Transactions"

• No Application Level Cache is Needed

‣ No need to manage invalidations

Page 18: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

MongoDB IntegrationInside nuxeo-dbs storage adapter

Page 19: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Document base Storage & Mongodb

Page 20: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Storing Nuxeo Documents in MongoDB

Page 21: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Hierarchy

• Parent-child relationship: ecm:parentId

• Recursion optimised through ecm:ancestorIds array

• Maintained by the framework (create, delete, move, copy)

Page 22: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Security

• Generic ACP stored in ecm:acp field

• Precomputed Read ACLs to avoid post-filtering on search

• Simple Set of identities having access

• Semantic restriction on blocking

• Maintained by framework

• Search matches if intersection

Page 23: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Search

Page 24: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Consistency Challenges

• Unitary Document Operations Are Safe

• No impedance issue

• Large batch updates is not so much of an issue

• SQL DB do not like long running transactions anyway

• Multi-documents transactions are an issue

• Workflows is a typical use case

• Isolation issue

• Other transactions can see intermediate states

Page 25: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Mitigating Consistency Issues• Transient State Manager

• Run All Operations In Memory

• Flush to MongoDB as late as possible

• Populate an Undo log

• Replay backward in case of Rollback

➡ recover partial transaction management

Complete isolation not possible • Need to flush transient state for queries

• “uncommited” changes are visible to others

➡Read Uncommitted, at best

Page 26: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Typical Use Cases ForMongoDB

Page 27: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Huge Repository - Heavy Loading

• Massive Amount of Documents (X00,000,000+ docs)

➡ Retail DAM repository, Banks archiving repository (email), large B2C companies invoicing output

• Automatic and grape versioning: create a version for each single change

➡Pharmaceutical,financial, etc.

Page 28: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Huge Repository - Heavy Loading

• Massive Amount of Documents (X00,000,000+ docs)

➡ Retail DAM repository, Banks archiving repository (email), large B2C companies invoicing output

• Automatic and grape versioning: create a version for each single change

➡Pharmaceutical,financial, etc.

SQL DB collapses (on commodity hardware)

MongoDB handles the volume

Page 29: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Benchmarking Mass Import

Page 30: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Benchmarking Mass Import• Process 20000 documents

๏ 700 documents/s with SQL backend (cold cache)

๏ 6,000 documents/s with MongoDB / mmapv1: x9

๏ 11,000 documents/s with MongoDB / wiredTiger: x15

• Process 100000 documents

๏ 750 documents/s with SQL backend (cold cache)

๏ 9,500 documents/s with MongoDB / mmapv1: x9

๏ 11,500 documents/s with MongoDB / wiredTiger: x15

• Process 200000 documents

๏ 750 documents/s with SQL backend (cold cache)

๏ 14,000 documents/s with MongoDB / mmapv1: x9

๏ 11,000 documents/s with MongoDB / wiredTiger: x15

Page 31: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Benchmarking Scale Out

• 1 Nuxeo node + 1 MongoDB node

• 1900 docs/s

• MongoDB CPU is the bottleneck (800%)

• 2 Nuxeo nodes + 1 MongoDB node

• 1850 docs/s

• MongoDB CPU is the bottleneck (800%)

• 2 Nuxeo nodes + 2 MongoDB nodes

• 3400 docs/s when using read preferences

Adding one MongoDB node adds 80% throughput

Page 32: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Geo-distributed Architecture

Page 33: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

A Real Life Exemple

Page 34: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Context

• Who: US Network Carier

• Goal: Provide VOD Services

• Requirements:

• store videos

• manage metadata

• manage workflows

• generate thumbs

• generate conversions

• manage availability

Nuxeo Platform as

a videos repository

Page 35: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

Challenges

• Very Large Objects:

• Lots of metadata (dublincore, ADI, ratings)

• Massive Daily Updates

• Updates On Rights and Availability

• Need To Track All Changes

• Prove what was the availability for a given date

Lots of data + lots of updates ➡ db.createCollection(“myMovies”)

Page 36: MongoDB Days UK: Using MongoDB to Build a Fast and Scalable Content Repository Sponsored by Nuxeo

MongoDB Choice

• They chose MongoDB

• because they have a good use case for MongoDB

• because they wanted to use MongoDB

• change work habits (Open source, NoSQL)

• doing a project with MongoDB is cool!!