mongodb - csuohio.edueecs.csuohio.edu/~sschung/cis612/lecture_notes_mongodb...features data is...
TRANSCRIPT
MongoDBSUNNIE CHUNG
CIS 612
MongoDB
http://www.mongodb.org/
MongoDB is an open-source database and
classified as a NoSQL database
The primary reason for the development of
MongoDB is to make data. scaling easier as
well as for semi-structured
MongoDB is a document oriented database
in which data is organized as JSON
document, and store into a collection.
2
Architecture
NoSQL database, which means the mechanism for
storage and retrieval of data is modeled in means other
than tabular relation used in relational database.
It has rich data structures with dynamic attributes, mixed
structure, text, media, arrays and other complex types.
MongoDB is flexible as it evolves over time to
accommodate new features and requirements.
Object-oriented programming languages interact with
data in structures that are dramatically different from the
way is stored in a relational database.
3
Features
Data is stored in a structure that maps to object in modern
Object Oriented programming languages
Rich index and query support, including Secondary Index
geospatial and text search indexes, native MapReduce…
MongoDB system capacity can dynamically increase
Support data replication, failure tolerance
Data is read and written in RAM providing fast
performance.
4
No SQL Data Model
Document Model: MongoDB and CouchDB
document databases store data in documents: JSON
Graph Model: Neo4j and Giraph
graph structures with nodes, edges and properties to
represent data
Key-Value and Wide Column Models
Redis (Key-Value), HBase and Cassandra (Wide Column)
5
MongDB Data Model
MongoDB stores data as documents in a binary representation call BSON (Binary
JSON) : Document Database.
BSON extends the JSON (JavaScript Object Notation) representation to include additional types such as int, long, and floating point.
BSON documents contain one or more fields, and each field contains a value of
a specific data type, including arrays, binary data and sub-documents.
Document and Collection can be seen as equivalent to Record and table in
relational database system.
A Document is an ordered set of keys with associated values. The values could be one of several different data types: string, integer, etc. But the keys are strings
and documents in MongoDB cannot contain duplicate keys.
{"greeting" : "Hello, world!", "foo" : 3}
A Collections is a group of documents and has a dynamic schema.
6
Storage Model MongoDB uses a memory map file that directly map a data file on
disk to byte array in memory where data access is implemented using pointer arithmetic.
Each document collection is stored in one namespace file as well as multiple extent data files.
Each collection is organized in a linked list of extents each of which represents a contiguous disk space, and each document contains a linked list to other documents as well as the actual encoded in BSON format.
MongoDB’s high availability is achieved via Replica Set which provides data redundancy across multiple physical servers including a single primary DB as well as multiple secondary DBs.
All modifications request go to the primary DB then each modification is made and replicated asynchronously to the secondary DBs.
7
ACID in MongoDB Data that read is treated as a snapshot, which means it may has
been changed in the database.
In order to maintain consistency, a condition is attached along with
modification request so that the DB server can validate the
condition before applying the modification request.
One way to achieve this isolation is to use findAndModify operation
This command returns either the previous or updated values of the
documents.
Transaction concept also missing in MongoDB, which there is no
guarantee multiple documents update. In this case, developers are
responsible to implement multi-update across multiple documents.
A separate document is created and links all documents that need
to be modified. Then all the modifications are done in sequence for
each document.
8
Major Differences from RDBMS
RDBMS has fixed number of data type, while
MongoDB documents can contains multiple-
value field because it has nested structure.
Documents of any structure can be stored in the
same collection without a defined schema.
MongoDB has no concept of Transactions
Atomicity is guaranteed only at document level.
There is also no concept of Isolation, which
means any data read by one client may have its
value modified by another concurrent client.
9
MongoDB Term vs SQL Termhttps://docs.mongodb.com/manual/reference/sql-comparison/
SQL Term/Concept MongoDB Term/Concept
Database Database
Table Collection
Row Document or Bson Document
Column Field
Index Index
Table Join Embedded Document/ Linking
Primary Key Primary Key
10
Specifying any Unique Column or
Column Set as PK
In MongoDB, Primary Key is
automatically set to the _Id field
Aggregation (with Group By) See SQL to Aggregation Mapping Chart
SQL to Aggregation Mapping ChartSQL Terms, Function, Concept MongDB Aggregation Operator
Where $match
Group BY $group
Having $match
Select $project
Order By $sort
Limit Limit
SUM $sum
COUNT
C
11
COUNT $count
Join $look up with $unwind for array
Mongo DB API
MongoDB Drivers and Client Libraries:
MongoDB supports idiomatic drivers in over ten
languages:
Java, .NET, Ruby, Node.js, Perl, Python, PHP, C,
C++, C#, Javascript, and Scala
Interface for Thrift or RESTful APIs
https://docs.mongodb.com/getting-started/shell/
12
InstallationMongoDB 2.4.9 (mongodb-osx-x86_64-2.4.9)
To start a MongoDB instance:
$ mongod
mongod --help for help and startup options
Tue Apr 1 15:19:17.445 [initandlisten] MongoDB starting : pid=616 port=27017 dbpath=/data/db/ 64-bit host=Thuats-MacBook-Pro.local
Tue Apr 1 15:19:17.445 [initandlisten]
Tue Apr 1 15:19:17.445 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
Tue Apr 1 15:19:17.445 [initandlisten] db version v2.4.9
Tue Apr 1 15:19:17.445 [initandlisten] git version:
…
13
MongoDB Shell MongoDB comes with a JavaScript shell that allows interaction with a MongoDB
instance from the command line.
The shell is a full-featured JavaScript interpreter, capable of running JavaScript programs.
To start the shell:
$ mongo
MongoDB shell version: 2.4.9
connecting to: test
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
http://docs.mongodb.org/
Questions? Try the support group
http://groups.google.com/group/mongodb-user
Server has startup warnings:
Tue Apr 1 15:19:17.445 [initandlisten]
Tue Apr 1 15:19:17.445 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
>
14
MongoDB Command
To show current databases
> show dbs
local 0.078125GB
To create a new database:
> use blog
If there is a database exists, then it switches to that one.
15
MongoDB CRUD Queryhttps://docs.mongodb.com/manual/reference/sql-comparison/
The CRUD operations used to manipulate and view data in the shell.
Create a new document:
> post = {"title": "My Blog Post",
"content" : "This is a blog post.",
"data" : new Date()}
{"title" : "My Blog Post",
"content" : "This is a blog post.",
"data" : ISODate("2014-04-01T19:39:36.521Z")}
‘post’ is a JavaScript object represents the documents, there are three keys ‘title’, ‘content’, and ‘date’
16
MongoDB CRUD Query
Insert into collection:
> db.blog.insert(post)
To see the collection:
> db.blog.find()
{ "_id" : ObjectId("533b16898bce20d2fd851cfc"), "title" : "My Blog Post", "content" : "This is a blog post.", "data" : ISODate("2014-04-01T19:39:36.521Z") }
> db.blog.findOne()
{
"_id" : ObjectId("533b16898bce20d2fd851cfc"),
"title" : "My Blog Post",
"content" : "This is a blog post.",
"data" : ISODate("2014-04-01T19:39:36.521Z")
}
17
MongoDB CRUD Query To see how MongoDB created that document:
> db.blog.find().explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 1,
"nscannedObjects" : 1,
"nscanned" : 1,
"nscannedObjectsAllPlans" : 1,
"nscannedAllPlans" : 1,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
},
"server" : “SUNNIEW7.local:27017"
}
18
MongoDB CRUD Query
To update:
> post.comments = []
[ ]
> db.blog.update({title: "My Blog Post"}, post)
> db.blog.findOne()
{
"_id" : ObjectId("533b16898bce20d2fd851cfc"),
"title" : "My Blog Post",
"content" : "This is a blog post.",
"data" : ISODate("2014-04-01T19:39:36.521Z"),
"comments" : [ ]
}
19
MongoDB CRUD Query
To delete:
> db.blog.remove({title : "My Blog Post"})
> db.blog.findOne()
null
To build index:
> db.blog.ensureIndex({title:1})
20
MongoDB CRUD Query To show all existing indexes:
> db.blog.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "blog.blog",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"title" : 1
},
"ns" : "blog.blog",
"name" : "title_1"
}
]
21
MongoDB CRUD Query
To remove index:
> db.blog.dropIndex({title:1})
{ "nIndexesWas" : 2, "ok" : 1 }
22
MongoDB Application MongoDB Drivers and Client Libraries:
MongoDB supports variety of modern programming
languages including C, C++, C#, Java, Node.js, PHP,
Python…
23
MongoDB Import/Export
MongoDB can import input files of formats JSON, CSV or TSV and
also can export database to those format using mongoimport
and mongoexport respectively.
Syntax:
mongoimport --collection collection --file collection.json
mongoexport --collection collection --out collection.json
24
MongoDB Import/Export
Import a CSV file (NASDAQ_daily_prices_B.csv) into MongoDB collection stocks
$ cat NASDAQ_daily_prices_B.csv
exchange,stock_symbol,date,stock_price_open,stock_price_high,stock_price_low,stock_price_close,stock_volume,stock_price_adj_close
NASDAQ,BBND,2010-02-08,2.92,2.98,2.86,2.96,483800,2.96
NASDAQ,BBND,2010-02-05,2.85,2.94,2.79,2.93,884000,2.93
NASDAQ,BBND,2010-02-04,2.83,2.88,2.78,2.83,1333300,2.83
NASDAQ,BBND,2010-02-03,2.98,3.03,2.80,2.83,1015800,2.83
NASDAQ,BBND,2010-02-02,3.05,3.10,2.96,2.97,513100,2.97
NASDAQ,BBND,2010-02-01,3.11,3.13,3.00,3.04,997000,3.04
NASDAQ,BBND,2010-01-29,3.01,3.14,2.96,3.14,1132900,3.14
…
25
MongoDB Import/Export$ mongoimport --db stocks --collection nasdaq_daily_prices --type csv --file /Users/nqt289/Desktop/NASDAQ_daily_prices_B.csv --headerline
connected to: 127.0.0.1
Thu Apr 10 05:24:46.009 Progress: 780677/21998523 3%
Thu Apr 10 05:24:46.009 14000 4666/second
Thu Apr 10 05:24:49.004 Progress: 2011431/21998523 9%
Thu Apr 10 05:24:49.004 36200 6033/second
Thu Apr 10 05:24:52.004 Progress: 3300955/21998523 15%
Thu Apr 10 05:24:52.004 58600 6511/second
Thu Apr 10 05:24:55.005 Progress: 4575925/21998523 20%
Thu Apr 10 05:24:55.006 81300 6775/second
Thu Apr 10 05:24:58.009 Progress: 5845580/21998523 26%
Thu Apr 10 05:24:58.009 104000 6933/second
Thu Apr 10 05:25:34.005 374000 7333/second
…
Thu Apr 10 05:25:35.956 check 9 388777
Thu Apr 10 05:25:35.956 imported 388776 objects
26
MongoDB Import/Export
Check result collection in the shell:
> show dbs
blog 0.203125GB
local 0.078125GB
stocks 0.453125GB
> use stocks
switched to db stocks
> show tables
nasdaq_daily_prices
system.indexes
27
MongoDB Import/Export
> db.nasdaq_daily_prices.find().limit(5)
{ "_id" : ObjectId("5346635c6857e587111a2466"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-08", "stock_price_open" : 2.92, "stock_price_high" : 2.98, "stock_price_low" : 2.86, "stock_price_close" : 2.96, "stock_volume" : 483800, "stock_price_adj_close" : 2.96 }
{ "_id" : ObjectId("5346635c6857e587111a2467"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-05", "stock_price_open" : 2.85, "stock_price_high" : 2.94, "stock_price_low" : 2.79, "stock_price_close" : 2.93, "stock_volume" : 884000, "stock_price_adj_close" : 2.93 }
{ "_id" : ObjectId("5346635c6857e587111a2468"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-04", "stock_price_open" : 2.83, "stock_price_high" : 2.88, "stock_price_low" : 2.78, "stock_price_close" : 2.83, "stock_volume" : 1333300, "stock_price_adj_close" : 2.83 }
{ "_id" : ObjectId("5346635c6857e587111a2469"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-03", "stock_price_open" : 2.98, "stock_price_high" : 3.03, "stock_price_low" : 2.8, "stock_price_close" : 2.83, "stock_volume" : 1015800, "stock_price_adj_close" : 2.83 }
{ "_id" : ObjectId("5346635c6857e587111a246a"), "exchange" : "NASDAQ", "stock_symbol" : "BBND", "date" : "2010-02-02", "stock_price_open" : 3.05, "stock_price_high" : 3.1, "stock_price_low" : 2.96, "stock_price_close" : 2.97, "stock_volume" : 513100, "stock_price_adj_close" : 2.97 }
28
MongoDB Import/Export
Export that collection to JSON format:
$ mongoexport -d stocks -c nasdaq_daily_prices -q "{stock_price_open: {
\$gte: 50 }}" --out /Users/nqt289/Desktop/gte50.json
connected to: 127.0.0.1
29
MongoDB Import/Export
exported 9911 records
$ cat gte50.json
{ "_id" : { "$oid" : "5346635d6857e587111a4cda" }, "exchange" : "NASDAQ",
"stock_symbol" : "BOLT", "date" : "2007-07-25", "stock_price_open" : 51,
"stock_price_high" : 51.47, "stock_price_low" : 44.1, "stock_price_close" :
47.04, "stock_volume" : 1109600, "stock_price_adj_close" : 31.36 }
{ "_id" : { "$oid" : "5346635d6857e587111a4cdb" }, "exchange" : "NASDAQ",
"stock_symbol" : "BOLT", "date" : "2007-07-24", "stock_price_open" : 52.4,
"stock_price_high" : 52.4, "stock_price_low" : 48.55, "stock_price_close" :
49.43, "stock_volume" : 650600, "stock_price_adj_close" : 32.95 }
…
30