mongodb/cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/lecture_notes_mongodb_cassan… ·...

37
MongoDB/Cassandra SUNNIE CHUNG CIS 612

Upload: lyminh

Post on 22-Apr-2018

229 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB/CassandraSUNNIE CHUNG

CIS 612

Page 2: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDBSUNNIE CHUNG

CIS 612

Page 3: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB

http://www.mongodb.org/

MongoDB is an open-source database and

classified as a NoSQL database

The primary reason for the development of

MongoDB is to make data. scaling easier as

well as for semi-structured

MongoDB is a document oriented database

in which data is organized as JSON

document, and store into a collection.

3

Page 4: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

Architecture

NoSQL database, which means the mechanism for

storage and retrieval of data is modeled in means other

than tabular relation used in relational database.

It has rich data structures with dynamic attributes, mixed

structure, text, media, arrays and other complex types.

MongoDB is flexible as it evolves over time to

accommodate new features and requirements.

Object-oriented programming languages interact with

data in structures that are dramatically different from the

way is stored in a relational database.

4

Page 5: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

Features

Data is stored in a structure that maps to object in modern

Object Oriented programming languages

Rich index and query support, including Secondary Index

geospatial and text search indexes, native MapReduce…

MongoDB system capacity can dynamically increase

Support data replication, failure tolerance

Data is read and written in RAM providing fast

performance.

5

Page 6: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

No SQL Data Model

Document Model: MongoDB and CouchDB

document databases store data in documents: JSON

Graph Model: Neo4j and Giraph

graph structures with nodes, edges and properties to

represent data

Key-Value and Wide Column Models

Redis (Key-Value), HBase and Cassandra (Wide Column)

6

Page 7: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongDB Data Model

MongoDB stores data as documents in a binary representation call BSON (Binary

JSON) : Document Database.

BSON extends the JSON (JavaScript Object Notation) representation to include additional types such as int, long, and floating point.

BSON documents contain one or more fields, and each field contains a value of

a specific data type, including arrays, binary data and sub-documents.

Document and Collection can be seen as equivalent to Record and table in

relational database system.

A Document is an ordered set of keys with associated values. The values could be one of several different data types: string, integer, etc. But the keys are strings

and documents in MongoDB cannot contain duplicate keys.

{"greeting" : "Hello, world!", "foo" : 3}

A Collections is a group of documents and has a dynamic schema.

7

Page 8: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

Storage Model MongoDB uses a memory map file that directly map a data file on

disk to byte array in memory where data access is implemented using pointer arithmetic.

Each document collection is stored in one namespace file as well as multiple extent data files.

Each collection is organized in a linked list of extents each of which represents a contiguous disk space, and each document contains a linked list to other documents as well as the actual encoded in BSON format.

MongoDB’s high availability is achieved via Replica Set which provides data redundancy across multiple physical servers including a single primary DB as well as multiple secondary DBs.

All modifications request go to the primary DB then each modification is made and replicated asynchronously to the secondary DBs.

8

Page 9: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

ACID in MongoDB Data that read is treated as a snapshot, which means it may has

been changed in the database.

In order to maintain consistency, a condition is attached along with

modification request so that the DB server can validate the

condition before applying the modification request.

One way to achieve this isolation is to use findAndModify operation

This command returns either the previous or updated values of the

documents.

Transaction concept also missing in MongoDB, which there is no

guarantee multiple documents update. In this case, developers are

responsible to implement multi-update across multiple documents.

A separate document is created and links all documents that need

to be modified. Then all the modifications are done in sequence for

each document.

9

Page 10: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

Major Differences from RDBMS

RDBMS has fixed number of data type, while

MongoDB documents can contains multiple-

value field because it has nested structure.

Documents of any structure can be stored in the

same collection without a defined schema.

MongoDB has no concept of Transactions

Atomicity is guaranteed only at document level.

There is also no concept of Isolation, which

means any data read by one client may have its

value modified by another concurrent client.

10

Page 11: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB Term vs SQL Termhttps://docs.mongodb.com/manual/reference/sql-comparison/

SQL Term/Concept MongoDB Term/Concept

Database Database

Table Collection

Row Document or Bson Document

Column Field

Index Index

Table Join Embedded Document/ Linking

Primary Key Primary Key

11

Specifying any Unique Column or

Column Set as PK

In MongoDB, Primary Key is

automatically set to the _Id field

Aggregation (with Group By) See SQL to Aggregation Mapping Chart

Page 12: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

SQL to Aggregation Mapping ChartSQL Terms, Function, Concept MongDB Aggregation Operator

Where $match

Group BY $group

Having $match

Select $project

Order By $sort

Limit Limit

SUM $sum

COUNT

C

12

COUNT $count

Join $look up with $unwind for array

Page 13: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

Mongo DB API

MongoDB Drivers and Client Libraries:

MongoDB supports idiomatic drivers in over ten

languages:

Java, .NET, Ruby, Node.js, Perl, Python, PHP, C,

C++, C#, Javascript, and Scala

Interface for Thrift or RESTful APIs

https://docs.mongodb.com/getting-started/shell/

13

Page 14: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

InstallationMongoDB 2.4.9 (mongodb-osx-x86_64-2.4.9)

To start a MongoDB instance:

$ mongod

mongod --help for help and startup options

Tue Apr 1 15:19:17.445 [initandlisten] MongoDB starting : pid=616 port=27017 dbpath=/data/db/ 64-bit host=Thuats-MacBook-Pro.local

Tue Apr 1 15:19:17.445 [initandlisten]

Tue Apr 1 15:19:17.445 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000

Tue Apr 1 15:19:17.445 [initandlisten] db version v2.4.9

Tue Apr 1 15:19:17.445 [initandlisten] git version:

14

Page 15: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB Shell MongoDB comes with a JavaScript shell that allows interaction with a MongoDB

instance from the command line.

The shell is a full-featured JavaScript interpreter, capable of running JavaScript programs.

To start the shell:

$ mongo

MongoDB shell version: 2.4.9

connecting to: test

Welcome to the MongoDB shell.

For interactive help, type "help".

For more comprehensive documentation, see

http://docs.mongodb.org/

Questions? Try the support group

http://groups.google.com/group/mongodb-user

Server has startup warnings:

Tue Apr 1 15:19:17.445 [initandlisten]

Tue Apr 1 15:19:17.445 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000

>

15

Page 16: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB Command

To show current databases

> show dbs

local 0.078125GB

To create a new database:

> use blog

If there is a database exists, then it switches to that one.

16

Page 17: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB CRUD Queryhttps://docs.mongodb.com/manual/reference/sql-comparison/

The CRUD operations used to manipulate and view data in the shell.

Create a new document:

> post = {"title": "My Blog Post",

"content" : "This is a blog post.",

"data" : new Date()}

{"title" : "My Blog Post",

"content" : "This is a blog post.",

"data" : ISODate("2014-04-01T19:39:36.521Z")}

‘post’ is a JavaScript object represents the documents, there are three keys ‘title’, ‘content’, and ‘date’

17

Page 18: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB CRUD Query

Insert into collection:

> db.blog.insert(post)

To see the collection:

> db.blog.find()

{ "_id" : ObjectId("533b16898bce20d2fd851cfc"), "title" : "My Blog Post", "content" : "This is a blog post.", "data" : ISODate("2014-04-01T19:39:36.521Z") }

> db.blog.findOne()

{

"_id" : ObjectId("533b16898bce20d2fd851cfc"),

"title" : "My Blog Post",

"content" : "This is a blog post.",

"data" : ISODate("2014-04-01T19:39:36.521Z")

}

18

Page 19: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB CRUD Query To see how MongoDB created that document:

> db.blog.find().explain()

{

"cursor" : "BasicCursor",

"isMultiKey" : false,

"n" : 1,

"nscannedObjects" : 1,

"nscanned" : 1,

"nscannedObjectsAllPlans" : 1,

"nscannedAllPlans" : 1,

"scanAndOrder" : false,

"indexOnly" : false,

"nYields" : 0,

"nChunkSkips" : 0,

"millis" : 0,

"indexBounds" : {

},

"server" : “SUNNIEW7.local:27017"

}

19

Page 20: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB CRUD Query

To update:

> post.comments = []

[ ]

> db.blog.update({title: "My Blog Post"}, post)

> db.blog.findOne()

{

"_id" : ObjectId("533b16898bce20d2fd851cfc"),

"title" : "My Blog Post",

"content" : "This is a blog post.",

"data" : ISODate("2014-04-01T19:39:36.521Z"),

"comments" : [ ]

}

20

Page 21: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB CRUD Query

To delete:

> db.blog.remove({title : "My Blog Post"})

> db.blog.findOne()

null

To build index:

> db.blog.ensureIndex({title:1})

21

Page 22: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB CRUD Query To show all existing indexes:

> db.blog.getIndexes()

[

{

"v" : 1,

"key" : {

"_id" : 1

},

"ns" : "blog.blog",

"name" : "_id_"

},

{

"v" : 1,

"key" : {

"title" : 1

},

"ns" : "blog.blog",

"name" : "title_1"

}

]

22

Page 23: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB CRUD Query

To remove index:

> db.blog.dropIndex({title:1})

{ "nIndexesWas" : 2, "ok" : 1 }

23

Page 24: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB Application MongoDB Drivers and Client Libraries:

MongoDB supports variety of modern programming

languages including C, C++, C#, Java, Node.js, PHP,

Python…

24

Page 25: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB Import/Export

MongoDB can import input files of formats JSON, CSV or TSV and

also can export database to those format using mongoimport

and mongoexport respectively.

Syntax:

mongoimport --collection collection --file collection.json

mongoexport --collection collection --out collection.json

25

Page 26: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB Import/Export

Import a CSV file (NASDAQ_daily_prices_B.csv) into MongoDB collection stocks

$ cat NASDAQ_daily_prices_B.csv

exchange,stock_symbol,date,stock_price_open,stock_price_high,stock_price_low,stock_price_close,stock_volume,stock_price_adj_close

NASDAQ,BBND,2010-02-08,2.92,2.98,2.86,2.96,483800,2.96

NASDAQ,BBND,2010-02-05,2.85,2.94,2.79,2.93,884000,2.93

NASDAQ,BBND,2010-02-04,2.83,2.88,2.78,2.83,1333300,2.83

NASDAQ,BBND,2010-02-03,2.98,3.03,2.80,2.83,1015800,2.83

NASDAQ,BBND,2010-02-02,3.05,3.10,2.96,2.97,513100,2.97

NASDAQ,BBND,2010-02-01,3.11,3.13,3.00,3.04,997000,3.04

NASDAQ,BBND,2010-01-29,3.01,3.14,2.96,3.14,1132900,3.14

26

Page 27: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB Import/Export$ mongoimport --db stocks --collection nasdaq_daily_prices --type csv --file /Users/nqt289/Desktop/NASDAQ_daily_prices_B.csv --headerline

connected to: 127.0.0.1

Thu Apr 10 05:24:46.009 Progress: 780677/21998523 3%

Thu Apr 10 05:24:46.009 14000 4666/second

Thu Apr 10 05:24:49.004 Progress: 2011431/21998523 9%

Thu Apr 10 05:24:49.004 36200 6033/second

Thu Apr 10 05:24:52.004 Progress: 3300955/21998523 15%

Thu Apr 10 05:24:52.004 58600 6511/second

Thu Apr 10 05:24:55.005 Progress: 4575925/21998523 20%

Thu Apr 10 05:24:55.006 81300 6775/second

Thu Apr 10 05:24:58.009 Progress: 5845580/21998523 26%

Thu Apr 10 05:24:58.009 104000 6933/second

Thu Apr 10 05:25:34.005 374000 7333/second

Thu Apr 10 05:25:35.956 check 9 388777

Thu Apr 10 05:25:35.956 imported 388776 objects

27

Page 28: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB Import/Export

Check result collection in the shell:

> show dbs

blog 0.203125GB

local 0.078125GB

stocks 0.453125GB

> use stocks

switched to db stocks

> show tables

nasdaq_daily_prices

system.indexes

28

Page 29: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB Import/Export

> db.nasdaq_daily_prices.find().limit(5)

{ "_id" : ObjectId("5346635c6857e587111a2466"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-08", "stock_price_open" : 2.92, "stock_price_high" : 2.98, "stock_price_low" : 2.86, "stock_price_close" : 2.96, "stock_volume" : 483800, "stock_price_adj_close" : 2.96 }

{ "_id" : ObjectId("5346635c6857e587111a2467"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-05", "stock_price_open" : 2.85, "stock_price_high" : 2.94, "stock_price_low" : 2.79, "stock_price_close" : 2.93, "stock_volume" : 884000, "stock_price_adj_close" : 2.93 }

{ "_id" : ObjectId("5346635c6857e587111a2468"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-04", "stock_price_open" : 2.83, "stock_price_high" : 2.88, "stock_price_low" : 2.78, "stock_price_close" : 2.83, "stock_volume" : 1333300, "stock_price_adj_close" : 2.83 }

{ "_id" : ObjectId("5346635c6857e587111a2469"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-03", "stock_price_open" : 2.98, "stock_price_high" : 3.03, "stock_price_low" : 2.8, "stock_price_close" : 2.83, "stock_volume" : 1015800, "stock_price_adj_close" : 2.83 }

{ "_id" : ObjectId("5346635c6857e587111a246a"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-02", "stock_price_open" : 3.05, "stock_price_high" : 3.1, "stock_price_low" : 2.96, "stock_price_close" : 2.97, "stock_volume" : 513100, "stock_price_adj_close" : 2.97 }

29

Page 30: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB Import/Export

Export that collection to JSON format:

$ mongoexport -d stocks -c nasdaq_daily_prices -q "{stock_price_open: {

\$gte: 50 }}" --out /Users/nqt289/Desktop/gte50.json

connected to: 127.0.0.1

30

Page 31: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

MongoDB Import/Export

exported 9911 records

$ cat gte50.json

{ "_id" : { "$oid" : "5346635d6857e587111a4cda" }, "exchange" : "NASDAQ",

"stock_symbol" : "BOLT", "date" : "2007-07-25", "stock_price_open" : 51,

"stock_price_high" : 51.47, "stock_price_low" : 44.1, "stock_price_close" :

47.04, "stock_volume" : 1109600, "stock_price_adj_close" : 31.36 }

{ "_id" : { "$oid" : "5346635d6857e587111a4cdb" }, "exchange" : "NASDAQ",

"stock_symbol" : "BOLT", "date" : "2007-07-24", "stock_price_open" : 52.4,

"stock_price_high" : 52.4, "stock_price_low" : 48.55, "stock_price_close" :

49.43, "stock_volume" : 650600, "stock_price_adj_close" : 32.95 }

31

Page 32: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

Cassandra

Cassandra is a open source distributed database management system designed to handle large amount of data across many commodity servers, providing high availability with no single point of failure.

Cassandra achieves the highest throughput for the maximum number of nodes in all experiments.

Decentralized:

Every node in the cluster has the same role. There is no single point of failure.

Data is distributed across the cluster (so each node contains different data), but there is no master as every node can service any request.

Supports replication and multi Data Center replication

32

Page 33: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

Replication for multiple-data

center

Replication strategies are configurable.[18]

Cassandra is designed as a distributed system, for deployment of

large numbers of nodes across multiple data centers.

Key features of Cassandra’s distributed architecture are specifically

tailored for multiple-data center deployment, for redundancy, for

failover and Disaster Recovery.

33

Page 34: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

Cassandra

Scalability

Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.

Fault-tolerant

Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.

Tunable consistency

Writes and reads offer a tunable level of consistency, all the way from "writes never fail" to "block for all replicas to be readable", with the quorum level in the middle.

MapReduce support

Cassandra has Hadoop integration, with MapReduce support. There is support also for Apache Pig and Apache Hive.

34

Page 35: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

Cassandra's data model

Cassandra's data model is a partitioned row store with tunable

consistency.

Rows are organized into tables ;

The first component of a table's primary key is the partition key;

Within a partition, rows are clustered by the remaining columns of

the key.

Other columns may be indexed separately from the primary key.

Cassandra doesn't support joins and sub queries.

35

Page 36: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

Query language

CQL (Cassandra Query Language) was introduced, a SQL-like

alternative to the traditional RPC interface.

36

RDBMS Cassandra

Structured Data, Flexible Schema Unstructured Data, Flexible Schema

Database Keyspace

Table Table , Column family

Row Row , partition unit of replication

Column Column[value,name,Timestamp] a.k.a CLUSTER,

Unit of Cluster

Join ,Foreign key, ACID Consistency Referential Integrity is not enforced But

Relationships represented using COLLECTIONS.

Page 37: MongoDB/Cassandra - csuohio.edueecs.csuohio.edu/~sschung/cis612/Lecture_Notes_MongoDB_Cassan… · MongoDB/Cassandra SUNNIE CHUNG CIS 612. MongoDB ... MongoDB’shigh availability

Cassandra Installation

Prerequisitive

Cassandra requires stable version of java 7 to be installed

Download Stable version of Cassandra

http://cassandra.apache.org/download/

Add the DataStax Community repository to

/etc/apt/sources.list.d/cassandra.sources.list

deb http://debian.datastax.com/community stable main" | sudo tee -a

/etc/apt/sources.list.d/cassandra.sources.list

Add the DataStax repository key to your aptitude trusted keys.

$ curl -L http://debian.datastax.com/debian/repo_key | sudo apt-key add -

Installing the package

sudo apt-get update

sudo apt-get install cassandra

37