cluster of unreliable commodity hardware (couchdb) (2)

Post on 11-Apr-2017

296 Views

Category:

Engineering

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Presented by Namitha Acharya

CLUSTER OF UNRELIABLE COMMODITY HARDWARE

(COUCHDB)

1

•What is CouchDB?

2

•ACID SEMANTICS

3

•FUTON

4

•How can I see my data

5

•REST API

6

•Views

7

•MapReduce Dialog

8

•MapReduce in CouchDB

9

•Reduce/ReReduce

10

•Restrictions on MapReduce

11

•Conflict Management

12

•Database Replication

13

•Security

14

•Enterprise using CouchDB

15

•SYNTAX

16

•CouchDB vs MongoDB

WHAT IS COUCHDB

◎ CouchDB was first released in 2005.

◎ It is an open source database, developed by Damien Katz ,former Lotus Notes developer at IBM.

◎ Damien Katz defined it as a "storage system for a large scale object database“

◎ He self-funded the project for almost two years and released it as an open source project under the GNU General Public License.

◎ It focuses on ease of use and completing embracing the web.

◎ It is a No SQL database.

◎ A document database server, accessible via a RESTful JSON API.

◎ Ad-hoc and schema-free with a flat address space.

◎ Distributed, featuring robust, incremental replication with bi-directional conflict detection and management.

◎ Recently merged with Membase.

◎ It uses javascript as a query language using MapReduce.

◎ It uses HTTP protocol for an API.

◎ The distinguishing feature is that provides multi-master replication.

◎ Later it became an apache project in 2008.Unlike relational database it does not store data and relationships in table.

◎ Instead each database is a collection of independent document.

◎ Each document maintains its own data and self contained schema.

◎ An application may access multiple databases. For eg:one stored on user’s mobile phone and on server.

◎ Document metadata contains revision information(making it possible to merge differences occurred while databases are disconnected).

ACID SEMANTICS

◎ CouchDB implements MVCC(Multi-Version Concurrency Control) which the need to lock during writes.

◎ CouchDB reads operation where each client sees a consistent snapshot of the database from the beginning to the end of the read operation.

◎ CouchDB can handle a high volume of concurrent readers and writers without conflict.

FUTON

◎ Administration is supported by a built-in web application called FUTON.

◎ First version released in 2005.

◎ CouchDB is written in Erlang programming language, a cross-platform S/W available on various OS.

◎ CouchDB belongs to document oriented DB category, available under apache license(couchdb.apache.org)

HOW CAN I SEE MY DATA?

◎ CouchDB design documents can contain a “views” section

◎ Views contain Map/Reduce functions

◎ Map/Reduce functions are implemented in javascript

REST API

◎ All items have a unique URI that gets exposed via HTTP.

◎ REST uses the HTTP methods POST, GET, PUT and DELETE for the four basic CRUD (Create, Read, Update, Delete) operations on all resources.

VIEWS

◎ Filtering the documents in your database to find those relevant to a particular process.

◎ Building efficient indexes to find documents by any value or structure that resides in them

◎ Extracting data from your documents and presenting it in a specific order.

◎ Use these indexes to represent relationships among documents.

◎ CouchDB can index views and keep those indexes updated as documents are added, removed, or updated.

MAP/REDUCE DIALOG

◎ Bob: So, how do I query the database?

◎ IT guy: It’s not a database. It’s a key-value store.

◎ Bob: OK, it’s not a database. How do I query it?

◎ IT guy: You write a distributed map-reduce function in Erlang.

◎ Bob: Did you just tell me to go screw myself?

◎ IT guy: I believe I did, Bob.

MAP/REDUCE IN COUCHDB

◎ Map functions have a single parameter a document, and emit a list of key/value pairs of JSON values◉ CouchDB allows arbitrary JSON structures to be used

as keys

◎ Map is called for every document in the database◉ Efficiency?

◎ emit() function can be called multiple times in the map function

◎ View results are stored in B-Trees

REDUCE/REREDUCE

◎ The reduce function is optional..

◎ used to produce aggregate results for that view

◎ Reduce functions must accept, as input, results emitted by its corresponding map function as well as results returned by the reduce function itself(rereduce).

◎ On rereduce the key = null

◎ On a large database objects to be reduced will be sent to your reduce function in batches. These batches will be broken up on B-tree boundaries, which may occur in arbitrary places.

RESTRICTIONS ON MAP/REDUCE

◎ Map functions must be referentially transparent. Given the same doc will always issue the same key/value pairs

◉ Allows for incremental update

◎ Reduce functions must be able reduce on its own output

◉ This requirement of reduce functions allows CouchDB to store off intermediated reductions directly into inner nodes of btree indexes, and the view index updates and retrievals will have logarithmic cost

CONFLICT MANAGEMENT

◎ Conflicts are left to the application to resolve.

1. Involves merging data into one of the documents

2. Deleting the stale one.

◎ Multi-Version Concurrency Control (MVCC)

◎ CouchDB does not attempt to merge the conflicting revisions this is an application

◎ If there is a conflict in revisions between nodes

◉ App is ultimately responsible for resolving the conflict

◉ All revisions are saved

◉ One revision is selected as the most recent

DATABASE REPLICATION

◎ “CouchDB has built-in conflict detection and management and the replication process is incremental and fast, copying only documents and individual fields changed since the previous replication.”

◎ Replication is a unidirectional process.

◎ Databases in CouchDB have a sequence number that gets incremented every time the database is changed.

SECURITY

◎ Authorizations

◉ Reader - read/write document

◉ Database Admin - compact, add/edit views

◉ Server Admin - create and remove databases

◎ Eventual Consistency

◉ CouchDB guarantees eventual consistency to be able to provide both availability and partition tolerance.

◎ Built for Offline

◉ CouchDB can replicate to devices (like smartphones) that can go offline and handle data sync for you when the device is back online.

ENTERPRISES THAT USED OR ARE USING

COUCHDB ARE◎ Ubuntu began using it in 2009 for its

synchronization service "Ubuntu One“.

◎ The BBC, for its dynamic content platforms.

◎ Credit Suisse, for internal use at commodities department for their marketplace framework.

◎ Meebo, for their social platform (web and applications) - Meebo was acquired by Google and was shut down on July 12, 2012.

◎ Sophos, for some of their back-end systems.

ACCESSING DATA VIA HTTP

◎ Applications interact with CouchDB via HTTP.

◎ The following demonstrates a few examples using cURL, a command-line utility.

◎ These examples assume that CouchDB is running on localhost (127.0.0.1) on port 5984.

Action Request Response

CHECK SERVER

◎ curl http://127.0.0.1:5984/

◎ {

◎ "couchdb": "Welcome",

◎ "uuid": "85fb71bf700c17267fef77535820e371",

◎ "vendor": {

◎ "name": "The Apache Software Foundation",

◎ "version": "1.5.0"

◎ },

◎ "version": "1.5.0"

◎ }

CREATING A DATABASE

◎ curl -X PUT http://127.0.0.1:5984/albums

◎ { "ok": true }8

INSERTING A DOCUMENT

◎ curl -X PUT http://127.0.0.1:5984/albums/<uuid>

◎ -d '{“title”:“Hello”, “artist”:“World”}'

◎ { "ok": true,

◎ "id": “<uuid>",

◎ "rev": "1-2902191555“

◎ }

COUCHDB V.S. MONGODB

◎ Erlang v.s. C++

◎ JSON v.s. BSON

◎ HTTP v.s. Custom Protocol over TCP/IP

◎ Documents v.s. Collections/Documents

◎ Ranged Query v.s. Object-based Query

◎ MR -> View v.s. MR -> Collection

◎ MVCC v.s. Update in Place

◎ Master-Master v.s. Master-Slave

Thank You

top related