mongodb and indexes - mug denver - 20160329

Post on 16-Apr-2017

403 Views

Category:

Data & Analytics

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

MongoDB and Indexes

Doug Duncan

dugdun@hotmail.com@dugdun

What we’ll cover• What are indexes?

• Types• Properties

• Why use indexes?• How to create indexes.• Commands to check indexes and plans.

What are indexes?

What are indexes?Indexes are special data-structures that store a subset of your data in an easily traversable format.

MongoDB stores indexes in a b-tree format which allows for efficient access to the index content.

Proper index use is good and makes a system run optimally. Improper index use can bring a system to a grinding halt.

What are indexes?Indexes are stored similar in a format similar to the following if there was an index on Origin:[ABE] -> 0xa193b48c[ABE] -> 0x8e8b242a[ABE] -> 0x0928cdc1…[DEN] -> 0x24aa4ecd[DEN] -> 0x87396a3c[DEN] -> 0x9392ab2f…[LAX] -> 0x89ccede0…

Types of indexes• _id• Simple• Compound• Multikey• Full-Text• Geo-spatial• Hashed

The _id index• The _id index is automatically created and cannot be

removed.• This is the same as a primary key in traditional RDBMS.• Default value is a 12-byte ObjectId:

• 4-byte time stamp• 3-byte machine id• 2-byte process id• 3-byte counter

Simple index

• A simple index is an index on a single key• This is similar to a book’s index where you

look up a word to find the pages it’s referenced on.

Compound index

• A compound index is created over two or more fields in a document

• This is similar to a phone book where you can find the phone number of a person given their first and last names.

Multikey index• A multikey index is an index that’s created

on a field that contains an array.• If using in a compound index, only a single

field in a given document can be an array.• You will get one entry in the index for

every item in the array for the given document. This means if you have an array with 100 items, that document will have 100 index entries.

Full-text index

• This is an index over a text based field, similar to how Google indexes web pages.

Geo-spatial index

• A geo-spatial index will allow you to determine distance from a given point.

• Works on both planar and spherical geometries.

Hashed indexes• A hashed index is used in hash based

sharding, and allows for a more randomized distribution.

• Hashed indexes cannot contain compound keys or be unique.

• Hashed indexes can contain the key in both a hashed and non-hashed version. The non-hashed version will allow for range based queries.

Index properties

• Unique• Sparse• TTL• Partial (new in 3.2)

Unique• The unique property allows for only a

single value for the indexed field, or combination of fields for a compound indexdb.collection.createIndex({“email”: 1}, {“unique”:

true})

• A unique index can only have a single null or missing field value for all documents in the collection.

Sparse• The sparse property allows you to index

only documents that contain a value for the given field.db.collection.createIndex({“kids”: 1}, {“sparse”: true})

• A sparse index will not be used if it would result in an incomplete result set, unless specifically hinted.

db.collection.find({“kids”: {“$gte”: 5})

TTL• The TTL property allows for the automatic

removal of documents after a given time period.db.collection.createIndex({“accessTime”: 1}, {“expireAfterSeconds”:

“1200”})

• The indexed field should contain an ISODate() value. If any other type is used the document will not be removed.

• The TTL removal process runs once every 60 seconds so you might see the document even though the time has expired.

Partial• The partial property allows you to index a

subset of your data.db.collection.createIndex({“movie”: 1, “reviews”: 1},

{“rating”: {“$gte”: 4}})

• The index will not be used if it would provide an incomplete result set (similar to the sparse index).

Why use indexes?

Why use indexes?• Efficiently retrieving document matches

• Equality matching• Inequality or range matching

• Sorting• Lack of a usable index will cause MongoDB

to scan the entire collection.

How to create indexes.

Before creating indexes• Think about the queries you will be running

and try to create as few indexes as possible to support those queries. Similar query patterns could use the same (or very similar) indexes.

• Think about the data that you will query and put your highly selective fields first in the index if possible.

• Check your current indexes before creating new ones. MongoDB will allow you to create indexes with the same fields in different orders.

Simple indexes• When creating a simple index, the sort

order, ascending (1) or descending (-1), of the values doesn’t matter as much as MongoDB can walk the index forwards and backwards.

• Simple index creation:db.flights.createIndex({“Origin”: 1})

Compound indexes• When creating a compound index, the sort order, ascending (1) or

descending (-1), of the values starts to matter, especially if the index is used to sort on multiple keys.

• When creating compound indexes you want to add keys to the index in the following key order:• Equality matches• Sort fields• Inequality matches

• A compound index will also help any queries that are made based off the left most subset of keys.

Compound indexes• Compound index creation:

db.flights.createIndex({“Origin”: 1, “Dest”: 1, “FlightDate”: -1})

• Queries supported:db.flights.find({“Origin”: “DEN”})

db.flights.find({“Origin”: “DEN”, “Dest”: “JFK”})

db.flights.find({“Origin”: “DEN”, “Dest”: “JFK”}).sort({“FlightDate”: -1})

db.flights.find({“Origin”: “DEN”, “Dest”: “JFK”}).sort({“FlightDate”: 1})

Compound indexes• An index created as follows:

db.flights.createIndex({“Origin”: 1, “Dest”: -1})

Could be used with either of the following queries as well since MongoDB can walk the index either way:

db.flights.find().sort({“Origin”: 1, “Dest”: -1})

db.flights.find().sort({“Origin”: -1, “Dest”: 1})

Full-text indexes• Full-text index creation:• db.messages.createIndex({“body”: “text”})• To search using the index finding any of the

words:db.messages.find({“$text”: {“$search”: “some text”}})

• To search using the index finding a phrasedb.message.find({“$text”: {“$search”: “\”some text\””}}

Covering indexes• Covering indexes are indexes that will answer

a query without going back to the data. For example:db.flights.createIndex({“Origin”: 1, “Dest”: 1,

“ArrDelay”: 1, “UniqueCarrier”: 1})

• The following query would be covered as all fields are in the index:db.flights.find({“Origin”: “DEN”, “Dest”: “JFK”},

{“UniqueCarrier”: 1, “ArrDelay”: 1, “_id”: 0}).sort({“ArrDelay”: -1})

Indexing nested fields/documents

• Let’s say you have documents with nested documents in them like the following:

db.locations.findOne()

{

“_id”: ObjectId(…),

…,

“location”: {

“state”: “Colorado”,

“city”: “Lyons”

}

}

Indexing nested fields/documents

• You can index on embedded fields by using dot notation:

db.locations.createIndex({“location.state”: 1})

Indexing nested fields/documents

• You can also index embedded documentsdb.locations.createIndex({“location”: 1})

• If you do this the query must match the document exactly (keys in the same order). That means that this will return the document:

db.locations.find({“location”: {“state”: “Colorado”, “city”: “Lyons”})

• But this won’t:db.locations.find({“location”: {“city”: “Lyons”, “state”:

“Colorado”})

Index Intersection• Index intersection is when MongoDB uses two or

more indexes to satisfy a query.• Given the following two indexes:

db.orders.createIndex({“qty”: 1})

db.orders.createIndex({“item”: 1})

• Index intersection means a query such as the following could use both indexes in parallel with the results being merged together to satisfy the query:db.orders.find({“item”: “ABC123”, “qty”: {“$gte”: 15}})

Indexing arrays• You can index fields that contain arrays as well.• Compound indexes however can only have a single field that is an array in a given document. If a document has two indexed fields that are arrays, you will get an error.

db.arrtest.createIndex({“a”: 1, “b”: 1})

db.arrtest.insert({"b": [1,2,3], "a": [1,2,3]})

cannot index parallel arrays [b] [a]

WriteResult({

"nInserted": 0,

"writeError": {

"code": 10088,

"errmsg": "cannot index parallel arrays [b] [a]"

}

})

Index Intersection• Index intersection is when MongoDB uses two or

more indexes to satisfy a query.• Given the following two indexes:

db.orders.createIndex({“qty”: 1})

db.orders.createIndex({“item”: 1})

• Index intersection means a query such as the following could in theory use both indexes in parallel with the results being merged together to satisfy the query:db.orders.find({“item”: “ABC123”, “qty”: {“$gte”: 15}})

Removing indexes

• The command to remove indexes is similar to the one to create the index.db.flights.dropIndex({“Origin”: 1, “Dest”: -1})

Commands to check indexes and index

usage

View all indexes in a database

• To view all indexes in a database use the following command:db.system.indexes.find()

• For each index you’ll see the fields the index was created with, the name of the index and the namespace (db.collection) that the index was built on.

View indexes for a given collection

• To view all indexes for a given collection use the following command:db.collection.getIndexes()

• This returns the same information as the previous command, but is limited to the given collection.

View index sizes• To view the size of all indexes in a

collection:db.collection.stats()

• You will see the size of all indexes and the size of each individual index in the results. The sizes are in bytes.

How to see if an index is used• If you want to see if an index is used,

append the .explain() operator to your querydb.flights.find({“Origin”: “DEN”}).explain()

• The explain operator has three levels of verbosity:• queryPlanner - this is the default, and it returns the winning query

plan

• executionStats - adds execution stats for the plan

• allPlansExecution - adds stats for the other candidate plans

Notes on indexes.• When creating an index you need to know

your data and the queries that will run against it.

• Don’t build indexes in isolation! • While indexes can improve performance,

be careful to not over index as every index gets updated every time you write to the collection.

Q & A

End Notes• User group discounts

• Manning publications: www.manning.com• Code ‘ug367’ to save 36% off order

• APress publications: www.appress.com• Code ‘UserGroup’ to save 10% off order

• O’Reilly publication: www.oreilly.com• Still waiting to get information

End Notes

• Communication• Twitter: @MUGDenver and #MUGDenver• Email: mugdenver@gmail.com• Slack: ???

End Notes

• MongoDB World• When: June 28th and 29th• Where: NYC• Save 25% by using code ‘DDuncan’

top related