python ireland conference 2016 - python and mongodb workshop

118
MongoDB and Python Workshop Joe Drumgoole Director of Developer Advocacy, EMEA MongoDB @jdrumgoole

Upload: joe-drumgoole

Post on 06-Apr-2017

237 views

Category:

Software


2 download

TRANSCRIPT

Page 1: Python Ireland Conference 2016 - Python and MongoDB Workshop

MongoDB and Python Workshop

Joe DrumgooleDirector of Developer Advocacy, EMEA

MongoDB@jdrumgoole

Page 2: Python Ireland Conference 2016 - Python and MongoDB Workshop

2

Agenda for Today

• Introduction to NoSQL• My First MongoDB Application• Thinking in Documents• Understanding Replica Sets and Drivers

Page 3: Python Ireland Conference 2016 - Python and MongoDB Workshop

3

Relational

Expressive Query Language& Secondary Indexes

Strong Consistency

Enterprise Management& Integrations

Page 4: Python Ireland Conference 2016 - Python and MongoDB Workshop

4

The World Has Changed

Data Risk Time Cost

Page 5: Python Ireland Conference 2016 - Python and MongoDB Workshop

5

NoSQL

Scalability& Performance

Always On,Global Deployments

FlexibilityExpressive Query Language& Secondary Indexes

Strong Consistency

Enterprise Management& Integrations

Page 6: Python Ireland Conference 2016 - Python and MongoDB Workshop

6

Nexus Architecture

Scalability& Performance

Always On,Global Deployments

FlexibilityExpressive Query Language& Secondary Indexes

Strong Consistency

Enterprise Management& Integrations

Page 7: Python Ireland Conference 2016 - Python and MongoDB Workshop

7

Types of NoSQL Database

• Key/Value Stores• Column Stores• Graph Stores• Multi-model Databases• Document Stores

Page 8: Python Ireland Conference 2016 - Python and MongoDB Workshop

8

Key Value Stores

• An associative array• Single key lookup• Very fast single key lookup• Not so hot for “reverse lookups”

Key Value

12345 4567.3456787

12346 { addr1 : “The Grange”, addr2: “Dublin” }

12347 “top secret password”

12358 “Shopping basket value : 24560”

12787 12345

Page 9: Python Ireland Conference 2016 - Python and MongoDB Workshop

9

Revision : Row Stores (RDBMS)

• Store data aligned by rows (traditional RDBMS, e.g MySQL)• Reads retrieve a complete row everytime• Reads requiring only one or two columns are wasteful

ID Name Salary Start Date

1 Joe D $24000 1/Jun/1970

2 Peter J $28000 1/Feb/1972

3 Phil G $23000 1/Jan/1973

1 Joe D $24000 1/Jun/1970 2 Peter J $28000 1/Feb/1972 3 Phil G $23000 1/Jan/1973

Page 10: Python Ireland Conference 2016 - Python and MongoDB Workshop

10

How a Column Store Does it

1 2 3

ID Name Salary Start Date

1 Joe D $24000 1/Jun/1970

2 Peter J $28000 1/Feb/1972

3 Phil G $23000 1/Jan/1973

Joe D Peter J Phil G $24000 $28000 $23000 1/Jun/1970 1/Feb/1972 1/Jan/1973

Page 11: Python Ireland Conference 2016 - Python and MongoDB Workshop

11

Why is this Attractive?

• A series of consecutive seeks can retrieve a column efficiently• Compressing similar data is super efficient• So reads can grab more data off disk in a single seek• How do I align my rows? By order or by inserting a row ID• IF you just need a small number of columns you don’t need to

read all the rows• But:

– Updating and deleting by row is expensive• Append only is preferred• Better for OLAP than OLTP

Page 12: Python Ireland Conference 2016 - Python and MongoDB Workshop

12

Graph Stores

• Store graphs (edges and vertexes)• E.g. social networks• Designed to allow efficient traversal• Optimised for representing connections• Can be implemented as a key value stored with the ability to store

links• If your use case is not a graph you don’t need a graph database

Page 13: Python Ireland Conference 2016 - Python and MongoDB Workshop

13

Multi-Model Databases

• Combine multiple storage/access models• Often Graph plus “something else”• Fixes the “polyglot persistence” issue of keeping multiple

independent databases consistent• The “new new thing” in NoSQL Land• Expect to hear more noise about these kinds of databases

Page 14: Python Ireland Conference 2016 - Python and MongoDB Workshop

14

Document Store• Not PDFs, Microsoft Word or HTML• Documents are nested structures created using Javascript Object Notation (JSON)

{ name : “Joe Drumgoole”,title : “Director of Developer Advocacy”,Address : {

address1 : “Latin Hall”,address2 : “Golden Lane”,eircode : “D09 N623”,

}expertise: [ “MongoDB”, “Python”, “Javascript” ],employee_number : 320,location : [ 53.34, -6.26 ]

}

Page 15: Python Ireland Conference 2016 - Python and MongoDB Workshop

15

MongoDB Documents are Typed

{

name : “Joe Drumgoole”,

title : “Director of Developer Advocacy”,

Address : {

address1 : “Latin Hall”,

address2 : “Golden Lane”,

eircode : “D09 N623”,

}

expertise: [ “MongoDB”, “Python”, “Javascript” ],

employee_number : 320,

location : [ 53.34, -6.26 ]

}

Strings

Nested Document

Array

Integer

Geo-spatial Coordinates

Page 16: Python Ireland Conference 2016 - Python and MongoDB Workshop

16

MongoDB Understands JSON Documents

• From the very first version it was a native JSON database• Understands and can index the sub-structures• Stores JSON as a binary format called BSON• Efficient for encoding and decoding for network transmission• MongoDB can create indexes on any document field

Page 17: Python Ireland Conference 2016 - Python and MongoDB Workshop

17

Why Documents?• Dynamic Schema• Elimination of Object/Relational Mapping Layer• Implicit denormalisation of the data for performance

Page 18: Python Ireland Conference 2016 - Python and MongoDB Workshop

18

Why Documents?• Dynamic Schema• Elimination of Object/Relational Mapping Layer• Implicit denormalisation of the data for performance

Page 19: Python Ireland Conference 2016 - Python and MongoDB Workshop

19

MongoDB is Full Featured

Rich Queries

• Find Paul’s cars• Find everybody in London with a car

between 1970 and 1980

Geospatial • Find all of the car owners within 5km of Trafalgar Sq.

Text Search • Find all the cars described as having leather seats

Aggregation • Calculate the average value of Paul’s car collection

Map Reduce

• What is the ownership pattern of colors by geography over time (is purple trending in China?)

Page 20: Python Ireland Conference 2016 - Python and MongoDB Workshop

20

High Availability and Data Durability – Replica Sets

SecondarySecondary

Primary

Page 21: Python Ireland Conference 2016 - Python and MongoDB Workshop

21

Replica Set Creation

SecondarySecondary

Primary

Heartbeat

Page 22: Python Ireland Conference 2016 - Python and MongoDB Workshop

22

Replica Set Node Failure

SecondarySecondary

Primary

No Heartbeat

Page 23: Python Ireland Conference 2016 - Python and MongoDB Workshop

23

Replica Set Recovery

SecondarySecondary

HeartbeatAnd Election

Page 24: Python Ireland Conference 2016 - Python and MongoDB Workshop

24

New Replica Set – 2 Nodes

SecondaryPrimary

HeartbeatAnd New Primary

Page 25: Python Ireland Conference 2016 - Python and MongoDB Workshop

25

Replica Set Repair

SecondaryPrimary

Secondary

Rejoin and resync

Page 26: Python Ireland Conference 2016 - Python and MongoDB Workshop

26

Replica Set Stable

SecondaryPrimary

Secondary

Heartbeat

Page 27: Python Ireland Conference 2016 - Python and MongoDB Workshop

27

Scalability with Sharding

Shard 1 Shard 2 Shard N

Page 28: Python Ireland Conference 2016 - Python and MongoDB Workshop

28

Scalability with Sharding

• Shard key partitions the content• MongoDB automatically balances the cluster• Shards can be added dynamically to a live system• Rebalancing happens in the background• Shard key is immutable• Shard key can vector queries to a specific shard• Queries without a shard key are sent to all members

Page 29: Python Ireland Conference 2016 - Python and MongoDB Workshop

29

Scalability with ShardingMongoS MongoS

Shard 1 Shard 2 Shard N

Shard Key

Page 30: Python Ireland Conference 2016 - Python and MongoDB Workshop

Your First MongoDB Application

Page 31: Python Ireland Conference 2016 - Python and MongoDB Workshop

31

Installing MongoDB

$ curl -O https://fastdl.mongodb.org/osx/mongodb-osx-x86_64-3.2.6.tgz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed100 60.9M 100 60.9M 0 0 2730k 0 0:00:22 0:00:22 --:--:-- 1589k$ tar xzvf mongodb-osx-x86_64-3.2.6.tgz x mongodb-osx-x86_64-3.2.6/READMEx mongodb-osx-x86_64-3.2.6/THIRD-PARTY-NOTICESx mongodb-osx-x86_64-3.2.6/MPL-2x mongodb-osx-x86_64-3.2.6/GNU-AGPL-3.0x mongodb-osx-x86_64-3.2.6/bin/mongodumpx mongodb-osx-x86_64-3.2.6/bin/mongorestorex mongodb-osx-x86_64-3.2.6/bin/mongoexportx mongodb-osx-x86_64-3.2.6/bin/mongoimportx mongodb-osx-x86_64-3.2.6/bin/mongostatx mongodb-osx-x86_64-3.2.6/bin/mongotopx mongodb-osx-x86_64-3.2.6/bin/bsondumpx mongodb-osx-x86_64-3.2.6/bin/mongofilesx mongodb-osx-x86_64-3.2.6/bin/mongooplogx mongodb-osx-x86_64-3.2.6/bin/mongoperfx mongodb-osx-x86_64-3.2.6/bin/mongosniffx mongodb-osx-x86_64-3.2.6/bin/mongodx mongodb-osx-x86_64-3.2.6/bin/mongosx mongodb-osx-x86_64-3.2.6/bin/mongo$ ln -s mongodb-osx-x86_64-3.2.6 mongodb

Page 32: Python Ireland Conference 2016 - Python and MongoDB Workshop

32

Running MongodJD10Gen:mongodb jdrumgoole$ ./bin/mongod --dbpath /data/b2b2016-05-23T19:21:07.767+0100 I CONTROL [initandlisten] MongoDB starting : pid=49209 port=27017 dbpath=/data/b2b 64-bit host=JD10Gen.local2016-05-23T19:21:07.768+0100 I CONTROL [initandlisten] db version v3.2.62016-05-23T19:21:07.768+0100 I CONTROL [initandlisten] git version: 05552b562c7a0b3143a729aaa0838e558dc49b252016-05-23T19:21:07.768+0100 I CONTROL [initandlisten] allocator: system2016-05-23T19:21:07.768+0100 I CONTROL [initandlisten] modules: none2016-05-23T19:21:07.768+0100 I CONTROL [initandlisten] build environment:2016-05-23T19:21:07.768+0100 I CONTROL [initandlisten] distarch: x86_642016-05-23T19:21:07.768+0100 I CONTROL [initandlisten] target_arch: x86_642016-05-23T19:21:07.768+0100 I CONTROL [initandlisten] options: { storage: { dbPath: "/data/b2b" } }2016-05-23T19:21:07.769+0100 I - [initandlisten] Detected data files in /data/b2b created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.2016-05-23T19:21:07.769+0100 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=4G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),2016-05-23T19:21:08.837+0100 I CONTROL [initandlisten] 2016-05-23T19:21:08.838+0100 I CONTROL [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 10002016-05-23T19:21:08.840+0100 I NETWORK [HostnameCanonicalizationWorker] Starting hostname canonicalization worker2016-05-23T19:21:08.840+0100 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/data/b2b/diagnostic.data'2016-05-23T19:21:08.841+0100 I NETWORK [initandlisten] waiting for connections on port 270172016-05-23T19:21:09.148+0100 I NETWORK [initandlisten] connection accepted from 127.0.0.1:59213 #1 (1 connection now open)

Page 33: Python Ireland Conference 2016 - Python and MongoDB Workshop

33

Connecting Via The Shell$ ./bin/mongoMongoDB shell version: 3.2.6connecting to: testServer has startup warnings: 2016-05-17T11:46:03.516+0100 I CONTROL [initandlisten] 2016-05-17T11:46:03.516+0100 I CONTROL [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000>

Page 34: Python Ireland Conference 2016 - Python and MongoDB Workshop

34

Inserting your first record> show databaseslocal 0.000GB> use testswitched to db test> show databaseslocal 0.000GB> db.demo.insert( { "key" : "value" } )WriteResult({ "nInserted" : 1 })> show databaseslocal 0.000GBtest 0.000GB> show collectionsdemo> db.demo.findOne(){ "_id" : ObjectId("573af7085ee4be80385332a6"), "key" : "value" }>

Page 35: Python Ireland Conference 2016 - Python and MongoDB Workshop

35

Object ID

573af7085ee4be80385332a6TS------ID----PID-Count-

Page 36: Python Ireland Conference 2016 - Python and MongoDB Workshop

36

A Simple Blog Application

• Lets create a blogging application with:– Articles– Users– Comments

Page 37: Python Ireland Conference 2016 - Python and MongoDB Workshop

37

Typical Entity Relation Diagram

Page 38: Python Ireland Conference 2016 - Python and MongoDB Workshop

38

In MongoDB we can build organically> use blogswitched to db blog> db.users.insert( { "username" : "jdrumgoole", "password" : "top secret", "lang" : "EN" } )WriteResult({ "nInserted" : 1 })> db.users.findOne(){

"_id" : ObjectId("573afff65ee4be80385332a7"),"username" : "jdrumgoole","password" : "top secret","lang" : "EN"

}

Page 39: Python Ireland Conference 2016 - Python and MongoDB Workshop

39

How do we do this in a program?'''Created on 17 May 2016

@author: jdrumgoole'''import pymongo ## client defaults to localhost and port 27017. eg MongoClient('localhost', 27017)client = pymongo.MongoClient()blogDatabase = client[ "blog" ]usersCollection = blogDatabase[ "users" ]

usersCollection.insert_one( { "username" : "jdrumgoole", "password" : "top secret", "lang" : "EN" })

user = usersCollection.find_one()

print( user )

Page 40: Python Ireland Conference 2016 - Python and MongoDB Workshop

40

Next up Articles

…articlesCollection = blogDatabase[ "articles" ]

author = "jdrumgoole"

article = { "title" : "This is my first post", "body" : "The is the longer body text for my blog post. We can add lots of text here.", "author" : author, "tags" : [ "joe", "general", "Ireland", "admin" ] }

## Lets check if our author exists#

if usersCollection.find_one( { "username" : author }) : articlesCollection.insert_one( article )else: raise ValueError( "Author %s does not exist" % author )

Page 41: Python Ireland Conference 2016 - Python and MongoDB Workshop

41

Create a new type of article## Lets add a new type of article with a posting date and a section#author = "jdrumgoole"title = "This is a post on MongoDB" newPost = { "title" : title, "body" : "MongoDB is the worlds most popular NoSQL database. It is a document database", "author" : author, "tags" : [ "joe", "mongodb", "Ireland" ], "section" : "technology", "postDate" : datetime.datetime.now(), }

## Lets check if our author exists#

if usersCollection.find_one( { "username" : author }) : articlesCollection.insert_one( newPost )

Page 42: Python Ireland Conference 2016 - Python and MongoDB Workshop

42

Make a lot of articles 1import pymongoimport stringimport datetimeimport random def randomString( size, letters = string.letters ): return "".join( [random.choice( letters ) for _ in xrange( size )] )

client = pymongo.MongoClient()

def makeArticle( count, author, timestamp ): return { "_id" : count, "title" : randomString( 20 ), "body" : randomString( 80 ), "author" : author, "postdate" : timestamp }

def makeUser( username ): return { "username" : username, "password" : randomString( 10 ) , "karma" : random.randint( 0, 500 ), "lang" : "EN" }

Page 43: Python Ireland Conference 2016 - Python and MongoDB Workshop

43

Make a lot of articles 2blogDatabase = client[ "blog" ]usersCollection = blogDatabase[ "users" ]articlesCollection = blogDatabase[ "articles" ]

bulkUsers = usersCollection.initialize_ordered_bulk_op()bulkArticles = articlesCollection.initialize_ordered_bulk_op()

ts = datetime.datetime.now()

for i in range( 1000000 ) : #username = randomString( 10, string.ascii_uppercase ) + "_" + str( i ) username = "USER_" + str( i ) bulkUsers.insert( makeUser( username ) ) ts = ts + datetime.timedelta( seconds = 1 ) bulkArticles.insert( makeArticle( i, username, ts )) if ( i % 500 == 0 ) : bulkUsers.execute() bulkArticles.execute() bulkUsers = usersCollection.initialize_ordered_bulk_op() bulkArticles = articlesCollection.initialize_ordered_bulk_op() bulkUsers.execute()bulkArticles.execute()

Page 44: Python Ireland Conference 2016 - Python and MongoDB Workshop

44

Find a User> db.users.findOne(){

"_id" : ObjectId("5742da5bb26a88bc00e941ac"),"username" : "FLFZQLSRWZ_0","lang" : "EN","password" : "vTlILbGWLt","karma" : 448

}> db.users.find( { "username" : "VHXDAUUFJW_45" } ).pretty(){

"_id" : ObjectId("5742da5bb26a88bc00e94206"),"username" : "VHXDAUUFJW_45","lang" : "EN","password" : "GmRLnCeKVp","karma" : 284

}

Page 45: Python Ireland Conference 2016 - Python and MongoDB Workshop

45

Find Users with high Karma> db.users.find( { "karma" : { $gte : 450 }} ).pretty(){

"_id" : ObjectId("5742da5bb26a88bc00e941ae"),"username" : "JALLFRKBWD_1","lang" : "EN","password" : "bCSKSKvUeb","karma" : 487

}{

"_id" : ObjectId("5742da5bb26a88bc00e941e4"),"username" : "OTKWJJBNBU_28","lang" : "EN","password" : "HAWpiATCBN","karma" : 473

}{

Page 46: Python Ireland Conference 2016 - Python and MongoDB Workshop

46

Using projection > db.users.find( { "karma" : { $gte : 450 }}, { "_id" : 0, username : 1, karma : 1 } ){ "username" : "JALLFRKBWD_1", "karma" : 487 }{ "username" : "OTKWJJBNBU_28", "karma" : 473 }{ "username" : "RVVHLKTWHU_31", "karma" : 493 }{ "username" : "JBNESEOOEP_48", "karma" : 464 }{ "username" : "VSTBDZLKQQ_51", "karma" : 487 }{ "username" : "UKYDTQJCLO_61", "karma" : 493 }{ "username" : "HZFZZMZHYB_106", "karma" : 493 }{ "username" : "AAYLPJJNHO_113", "karma" : 455 }{ "username" : "CXZZMHLBXE_128", "karma" : 460 }{ "username" : "KKJXBACBVN_134", "karma" : 460 }{ "username" : "PTNTIBGAJV_165", "karma" : 461 }{ "username" : "PVLCQJIGDY_169", "karma" : 463 }

Page 47: Python Ireland Conference 2016 - Python and MongoDB Workshop

47

Update an Article to Add Comments 1> db.articles.find( { "_id" : 19 } ).pretty(){

"_id" : 19,"body" :

"nTzOofOcnHKkJxpjKAyqTTnKZMFzzkWFeXtBRuEKsctuGBgWIrEBrYdvFIVHJWaXLUTVUXblOZZgUqWu",

"postdate" : ISODate("2016-05-23T12:02:46.830Z"),"author" : "ASWTOMMABN_19","title" : "CPMaqHtAdRwLXhlUvsej"

} > db.articles.update( { _id : 18 }, { $set : { comments : [] }} )WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

Page 48: Python Ireland Conference 2016 - Python and MongoDB Workshop

48

Update an article to add Comments 2> db.articles.find( { _id :18 } ).pretty(){

"_id" : 18,"body" :

"KmwFSIMQGcIsRNTDBFPuclwcVJkoMcrIPwTiSZDYyatoKzeQiKvJkiVSrndXqrALVIYZxGpaMjucgXUV",

"postdate" : ISODate("2016-05-23T16:04:39.497Z"),"author" : "USER_18","title" : "wTLreIEyPfovEkBhJZZe","comments" : [ ]

}>

Page 49: Python Ireland Conference 2016 - Python and MongoDB Workshop

49

Update an Article to Add Comments 3> db.articles.update( { _id : 18 }, { $push : { comments : { username : "joe", comment : "hey first post" }}} )WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

> db.articles.find( { _id :18 } ).pretty(){

"_id" : 18,"body" : "KmwFSIMQGcIsRNTDBFPuclwcVJkoMcrIPwTiSZDYyatoKzeQiKvJkiVSrndXqrALVIYZxGpaMjucgXUV","postdate" : ISODate("2016-05-23T16:04:39.497Z"),"author" : "USER_18","title" : "wTLreIEyPfovEkBhJZZe","comments" : [{"username" : "joe","comment" : "hey first post"}]

}>

Page 50: Python Ireland Conference 2016 - Python and MongoDB Workshop

50

Delete an Article

> db.articles.remove( { "_id" : 25 } )WriteResult({ "nRemoved" : 1 })> db.articles.remove( { "_id" : 25 } )WriteResult({ "nRemoved" : 0 })> db.articles.remove( { "_id" : { $lte : 5 }} )WriteResult({ "nRemoved" : 6 })

• Deletion leaves holes• Dropping a collection is cheaper than deleting a large collection

element by element

Page 51: Python Ireland Conference 2016 - Python and MongoDB Workshop

51

A Quick Look at Users and Articles Again> db.users.findOne(){

"_id" : ObjectId("57431c07b26a88bf060e10cb"),"username" : "USER_0","lang" : "EN","password" : "kGIxPxqKGJ","karma" : 266

}> db.articles.findOne(){

"_id" : 0,"body" :

"hvJLnrrfZQurmtjPfUWbMhaQWbNjXLzjpuGLZjsxHXbUycmJVZTeOZesTnZtojThrebRcUoiYwivjpwG","postdate" : ISODate("2016-05-23T16:04:39.246Z"),"author" : "USER_0","title" : "gpNIoPxpfTAxWjzAVoTJ"

}>

Page 52: Python Ireland Conference 2016 - Python and MongoDB Workshop

52

Find a User> db.users.find( { "username" : "ABOXHWKBYS_199" } ).explain(){

"queryPlanner" : {"plannerVersion" : 1,"namespace" : "blog.users","indexFilterSet" : false,"parsedQuery" : {

"username" : {"$eq" : "ABOXHWKBYS_199"

}},"winningPlan" : {

"stage" : "COLLSCAN","filter" : {

"username" : {"$eq" : "ABOXHWKBYS_199"

}},"direction" : "forward"

},"rejectedPlans" : [ ]

},"serverInfo" : {

"host" : "JD10Gen.local","port" : 27017,"version" : "3.2.6","gitVersion" : "05552b562c7a0b3143a729aaa0838e558dc49b25"

},"ok" : 1

}

Page 53: Python Ireland Conference 2016 - Python and MongoDB Workshop

53

Find a User – Execution Stats> db.users.find( {"username" : "USER_999999" } ).explain( "executionStats" ).executionStats{

"executionSuccess" : true,"nReturned" : 1,"executionTimeMillis" : 433,"totalKeysExamined" : 0,"totalDocsExamined" : 1000000,"executionStages" : {

"stage" : "COLLSCAN","filter" : {

"username" : {"$eq" : "USER_999999"

}},"nReturned" : 1,"executionTimeMillisEstimate" : 330,"works" : 1000002,"advanced" : 1,"needTime" : 1000000,"needYield" : 0,"saveState" : 7812,"restoreState" : 7812,"isEOF" : 1,"invalidates" : 0,"direction" : "forward","docsExamined" : 1000000

Page 54: Python Ireland Conference 2016 - Python and MongoDB Workshop

54

We need an index

> db.users.createIndex( { username : 1 } ){

"createdCollectionAutomatically" : false,"numIndexesBefore" : 1,"numIndexesAfter" : 2,"ok" : 1

}>

Page 55: Python Ireland Conference 2016 - Python and MongoDB Workshop

55

Indexes Overview

• Parameters– Background : Create an index in the background as opposed to locking the database– Unique : All keys in the collection must be unique. Duplicate key insertions will be

rejected with an error.– Name : explicitly name an index. Otherwise the index name is autogenerated from the

index field.• Deleting an Index

– db.users.dropIndex({ “username” : 1 })• Get All the Indexes on a collection

– db.users.getIndexes()

Page 56: Python Ireland Conference 2016 - Python and MongoDB Workshop

56

Query Plan Execution Stages

• COLLSCAN : for a collection scan• IXSCAN : for scanning index keys• FETCH : for retrieving documents• SHARD_MERGE : for merging results from shards

Page 57: Python Ireland Conference 2016 - Python and MongoDB Workshop

57

Add an Index> db.users.find( {"username" : "USER_999999”} ).explain("executionStats”).executionStats{

"executionSuccess" : true,"nReturned" : 1,"executionTimeMillis" : 0,"totalKeysExamined" : 1,"totalDocsExamined" : 1,

Page 58: Python Ireland Conference 2016 - Python and MongoDB Workshop

58

Execution Stage"executionStages" : {

"stage" : "FETCH","nReturned" : 1,"executionTimeMillisEstimate" : 0,"docsExamined" : 1,,"inputStage" : {

"stage" : "IXSCAN","nReturned" : 1,"executionTimeMillisEstimate" : 0,"keyPattern" : {

"username" : 1},"indexName" : "username_1","isMultiKey" : false,"isUnique" : false,"isSparse" : false,"isPartial" : false,"indexVersion" : 1,"direction" : "forward","indexBounds" : {

"username" : ["[\"USER_999999\", \"USER_999999\"]"

]},"keysExamined" : 1,"seenInvalidated" : 0

}}

}

Page 59: Python Ireland Conference 2016 - Python and MongoDB Workshop

Thinking in Documents

Page 60: Python Ireland Conference 2016 - Python and MongoDB Workshop

60

Example Document

{ first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ]}

Fields can contain an array of sub-documents

Fields

Typed field values

Fields can contain arrays

String

Number

Geo-Location

Page 61: Python Ireland Conference 2016 - Python and MongoDB Workshop

61

Data Stores – Key Value

Key 1 Value

Key 1 Value

Key 1 Value

Page 62: Python Ireland Conference 2016 - Python and MongoDB Workshop

62

Data Stores - Relational

Key 1

Value 1

Value 1

Value 1

Value 1

Key 2

Value 1

Value 1

Value 1

Value 1

Key 3

Value 1

Value 1

Value 1

Value 1

Key 4

Value 1

Value 1

Value 1

Value 1

Page 63: Python Ireland Conference 2016 - Python and MongoDB Workshop

63

Data Stores - Document

Key3

Key4

Key5

Value 3

Value 5

Value 4Key6

Value 5Key7

Value 2

Value 1Key1

Key1

Key1

Key2

Page 64: Python Ireland Conference 2016 - Python and MongoDB Workshop

64

In Document Form

{ “key1” : “value 1” }

{ “key1” : { “key2” : “value 1”, “key3” : { “key4” : “value 3”, “key5” : “value 4” }}

{ “key1” : { “key6” : “value 5”, “key7” : “value 6” }}

Page 65: Python Ireland Conference 2016 - Python and MongoDB Workshop

65

Some Example Queries

# Will find the first two documentsdb.demo.find( { “key1” : “value” } )

# find the second document by nested valuedb.demo.find( { "key1.key3.key4" : "value 3" } )

# will find the third documentdb.demo.find( { "key1.key6" : "value 4" } )

Page 66: Python Ireland Conference 2016 - Python and MongoDB Workshop

66

Modelling and Cardinality

• One to One–Title to blog post

• One to Many–Blog post to comments

• One to Millions–Blog post to site views (e.g. Huffington Post)

Page 67: Python Ireland Conference 2016 - Python and MongoDB Workshop

67

One To One

{ “Title” : “This is a blog post”, “Body” : “This is the body text of a very short blog post”, …}

We can index on “Title” and “Body”.

Page 68: Python Ireland Conference 2016 - Python and MongoDB Workshop

68

One to Many

{ “Title” : “This is a blog post”, “Body” : “This is the body text”, “Comments” : [ { “name” : “Joe Drumgoole”, “email” : “[email protected]”, “comment” : “I love your writing style” }, { “name” : “John Smith”, “email” : “[email protected]”, “comment” : “I hate your writing style” }]}

Where we expect a small number of comments we can embed them in the main document

Page 69: Python Ireland Conference 2016 - Python and MongoDB Workshop

69

Key Concerns

• What are the write patterns?– Comments are added more frequently than posts– Comments may have images, tags, large bodies of text

• What are the read patterns?– Comments may not be displayed– May be shown in their own window– People rarely look at all the comments

Page 70: Python Ireland Conference 2016 - Python and MongoDB Workshop

70

Approach 2 – Separate Collection

• Keep all comments in a separate comments collection• Add references to comments as an array of comment IDs• Requires two queries to display blog post and associated comments• Requires two writes to create a comments

{ _id : ObjectID( “AAAA” ), name : “Joe Drumgoole”, email : “[email protected]”, comment :“I love your writing style”,}{ _id : ObjectID( “AAAB” ), name : “John Smith”, email : “[email protected]”, comment :“I hate your writing style”,}

{ “_id” : ObjectID( “ZZZZ” ), “Title” : “A Blog Title”, “Body” : “A blog post”, “comments” : [ ObjectID( “AAAA” ), ObjectID( “AAAB” )]}{ “_id” : ObjectID( “AZZZ” ), “Title” : “A Blog Title”, “Body” : “A blog post”, “comments” : []}

Page 71: Python Ireland Conference 2016 - Python and MongoDB Workshop

71

Approach 3 – A Hybrid Approach

{ “_id” : ObjectID( “ZZZZ” ), “Title” : “A Blog Title”, “Body” : “A blog post”, “comments” : [{ “_id” : ObjectID( “AAAA” ) “name” : “Joe Drumgoole”, “email” : “[email protected]”,

comment :“I love your writing style”,}{ _id : ObjectID( “AAAB” ), name : “John Smith”, email : “[email protected]”, comment :“I hate your writing style”,}]

}

{ “_post_id” : ObjectID( “ZZZZ” ), “comments” : [{ “_id” : ObjectID( “AAAA” ) “name” : “Joe Drumgoole”, “email” : “[email protected]”,

“comment” :“I love your writing style”,}{...},{...},{...},{...},{...},{...},{..},{...},{...},{...} ]

Page 72: Python Ireland Conference 2016 - Python and MongoDB Workshop

72

What About One to A Million

• What is we were tracking mouse position for heat tracking?– Each user will generate hundreds of data points per visit– Thousands of data points per post– Millions of data points per blog site

• Reverse the model– Store a blog ID per event

{ “post_id” : ObjectID(“ZZZZ”), “timestamp” : ISODate("2005-01-02T00:00:00Z”), “location” : [24, 34] “click” : False,}

Page 73: Python Ireland Conference 2016 - Python and MongoDB Workshop

73

But – Finite number of events per second

{ post_id : ObjectID ( “ZZZZ” ), timeStamp: ISODate("2005-01-02T00:00:00Z”), events : { 0 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}, 1 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}, 2 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}, 3 : { 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}, ... 59 :{ 0 : { <Info> }, 1 : { <Info> }, … 99: { <Info> }}}

Page 74: Python Ireland Conference 2016 - Python and MongoDB Workshop

74

Guidelines

• Embed objects for one to one capabilities• Look at read and write patterns to determine when to break out data• Don’t get stuck in “one record” per item thinking• Embrace the hierarchy• Think about cardinality• Grow your data by adding documents not be increasing document size• Think about your indexes• Document updates are transactions

Page 75: Python Ireland Conference 2016 - Python and MongoDB Workshop

Building Real World Applications

Page 76: Python Ireland Conference 2016 - Python and MongoDB Workshop

76

Drivers and Frameworks

Morphia

MEAN Stack

Page 77: Python Ireland Conference 2016 - Python and MongoDB Workshop

77

Single Server

Driver

Mongod

Page 78: Python Ireland Conference 2016 - Python and MongoDB Workshop

78

Replica Set

Driver

Secondary Secondary

Primary

Page 79: Python Ireland Conference 2016 - Python and MongoDB Workshop

79

Replica Set Primary Failure

Driver

Secondary Secondary

Page 80: Python Ireland Conference 2016 - Python and MongoDB Workshop

80

Replica Set Election

Driver

Secondary Secondary

Page 81: Python Ireland Conference 2016 - Python and MongoDB Workshop

81

Replica Set New Primary

Driver

Primary Secondary

Page 82: Python Ireland Conference 2016 - Python and MongoDB Workshop

82

Replica Set Recovery

Driver

Primary Secondary

Secondary

Page 83: Python Ireland Conference 2016 - Python and MongoDB Workshop

83

Sharded Cluster

Driver

Mongod Mongod

Mongod

Mongod Mongod

Mongod

Mongod Mongod

Mongod

mongos mongos

Page 84: Python Ireland Conference 2016 - Python and MongoDB Workshop

84

Driver Responsibilities

https://github.com/mongodb/mongo-python-driver

Driver

Authentication& Security Python<->BSON Error handling &

Recovery

WireProtocol

Topology Management Connection Pool

Page 85: Python Ireland Conference 2016 - Python and MongoDB Workshop

85

Driver Responsibilities

https://github.com/mongodb/mongo-python-driver

Driver

Authentication& Security Python<->BSON Error handling &

Recovery

WireProtocol

Topology Management Connection Pool

Page 86: Python Ireland Conference 2016 - Python and MongoDB Workshop

86

Example API Callsimport pymongoclient = pymongo.MongoClient( host=“localhost”, port=27017)database = client[ ‘test_database’ ]collection = database[ ‘test_collection’ ]

collection.insert_one({ "hello" : "world" , "goodbye" : "world" } )

collection.find_one( { "hello" : "world" } )

collection.update({ "hello" : "world" }, { "$set" : { "buenos dias" : "world" }} )

collection.delete_one({ "hello" : "world" } )

Page 87: Python Ireland Conference 2016 - Python and MongoDB Workshop

87

Start MongoClient

c = MongoClient( "host1, host2", replicaSet="replset" )

Page 88: Python Ireland Conference 2016 - Python and MongoDB Workshop

88

Client Side View

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MongoClient( "host2, host3", replicaSet="replset" )

Page 89: Python Ireland Conference 2016 - Python and MongoDB Workshop

89

Client Side View

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2

{ ismaster : False, secondary: True, hosts : [ host1, host2, host3 ] }

Page 90: Python Ireland Conference 2016 - Python and MongoDB Workshop

90

What Does ismaster show?

>>> pprint.pprint( db.command( "ismaster" )){u'hosts': [u'JD10Gen-old.local:27017', u'JD10Gen-old.local:27018', u'JD10Gen-old.local:27019'], u'ismaster' : False, u'secondary': True, u'setName' : u'replset',…}>>>

Page 91: Python Ireland Conference 2016 - Python and MongoDB Workshop

91

Topology

Current Topology ismaster New

Topology

Page 92: Python Ireland Conference 2016 - Python and MongoDB Workshop

92

Client Side View

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

Page 93: Python Ireland Conference 2016 - Python and MongoDB Workshop

93

Client Side View

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

Page 94: Python Ireland Conference 2016 - Python and MongoDB Workshop

94

Client Side View

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Page 95: Python Ireland Conference 2016 - Python and MongoDB Workshop

95

Next Is Insert

c = MongoClient( "host1, host2", replicaSet="replset" )client.db.col.insert_one( { "a" : "b" } )

Page 96: Python Ireland Conference 2016 - Python and MongoDB Workshop

96

Insert Will Block

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

Page 97: Python Ireland Conference 2016 - Python and MongoDB Workshop

97

ismaster response from Host 1

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

ismaster

Page 98: Python Ireland Conference 2016 - Python and MongoDB Workshop

98

Now Write Can Proceed

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

Insert

Page 99: Python Ireland Conference 2016 - Python and MongoDB Workshop

99

Later Host 3 Responds

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Page 100: Python Ireland Conference 2016 - Python and MongoDB Workshop

100

Steady State

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Page 101: Python Ireland Conference 2016 - Python and MongoDB Workshop

101

Life Intervenes

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Page 102: Python Ireland Conference 2016 - Python and MongoDB Workshop

102

Monitor may not detect

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

ConnectionFailure

Page 103: Python Ireland Conference 2016 - Python and MongoDB Workshop

103

So Retry

Secondaryhost2

Secondaryhost3

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

Page 104: Python Ireland Conference 2016 - Python and MongoDB Workshop

104

Check for Primary

Secondaryhost2

Secondaryhost3

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

Page 105: Python Ireland Conference 2016 - Python and MongoDB Workshop

105

Host 2 Is Primary

Primaryhost2

Secondaryhost3

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

Page 106: Python Ireland Conference 2016 - Python and MongoDB Workshop

106

Steady State

Secondaryhost2

Secondaryhost3

Primaryhost1

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Page 107: Python Ireland Conference 2016 - Python and MongoDB Workshop

107

What Does This Mean? - Connect

import pymongo

client = pymongo.MongoClient()

try: client.admin.command( "ismaster" )except pymongo.errors.ConnectionFailure, e : print( "Cannot connect: %s" % e )

Page 108: Python Ireland Conference 2016 - Python and MongoDB Workshop

108

What Does This Mean? - Queries

import pymongo

def find_with_recovery( collection, query ) : try:

return collection.find_one( query )

except pymongo.errors.ConnectionFailure, e :

logging.info( "Connection failure : %s" e ) return collection.find_one( query )

Page 109: Python Ireland Conference 2016 - Python and MongoDB Workshop

109

What Does This Mean? - Inserts

def insert_with_recovery( collection, doc ) : doc[ "_id" ] = ObjectId() try: collection.insert_one( doc ) except pymongo.errors.ConnectionFailure, e: logging.info( "Connection error: %s" % e ) collection.insert_one( doc ) except DuplicateKeyError: pass

Page 110: Python Ireland Conference 2016 - Python and MongoDB Workshop

110

What Does This Mean? - Updates

collection.update( { "_id" : 1 }, { "$inc" : { "counter" : 1 }})

Page 111: Python Ireland Conference 2016 - Python and MongoDB Workshop

111

Configuration

connectTimeoutMS : 30ssocketTimeoutMS : None

Page 112: Python Ireland Conference 2016 - Python and MongoDB Workshop

112

connectTimeoutMS

Secondaryhost2

Secondaryhost3

MongoClient

MonitorThread 1

MonitorThread 2 ✔

MonitorThread 3

YourCode

Insert

connectTimeoutMS

serverTimeoutMS

Page 113: Python Ireland Conference 2016 - Python and MongoDB Workshop

113

More Reading

• The spec author Jess Jiryu Davis has a collection of links and his better version of this talkhttps://emptysqua.re/blog/server-discovery-and-monitoring-in-mongodb-drivers/

• The full server discovery and monitoring spec is on GitHubhttps://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/server-discovery-and-monitoring.rst

Page 114: Python Ireland Conference 2016 - Python and MongoDB Workshop

Q&A

Page 115: Python Ireland Conference 2016 - Python and MongoDB Workshop
Page 116: Python Ireland Conference 2016 - Python and MongoDB Workshop

116

insert_one

• Stages– Parse the parameters– Get a socket to write data on– Add the object Id– Convert the whole insert command and parameters to a SON object– Apply the writeConcern to the command– Encode the message into a BSON object– Send the message to the server via the socket (TCP/IP)– Check for writeErrors (e.g. DuplicateKeyError)– Check for writeConcernErrors (e.g.writeTimeout)– Return Result object

Page 117: Python Ireland Conference 2016 - Python and MongoDB Workshop

117

Bulk Insert

bulker = collection.initialize_ordered_bulk_op()bulker.insert( { "a" : "b" } )bulker.insert( { "c" : "d" } )bulker.insert( { "e" : "f" } )try: bulker.execute()except pymongo.errors.BulkWriteError as e : print( "Bulk write error : %s" % e.detail )

Page 118: Python Ireland Conference 2016 - Python and MongoDB Workshop

118

Bulk Write

• Create Bulker object• Accumulate operations• Each operation is created as a SON object• The operations are accumulated in a list• Once execute is called

– For ordered execute in order added– For unordered execute INSERT, UPDATEs then DELETE

• Errors will abort the whole batch unless no write concern specified