best practices - couchbase indexing in production: couchbase connect 2014
DESCRIPTION
Abstract: Couchbase Views is a very powerful feature to build real time applications. However, indexing can be a pretty heavy weight operation on your Couchbase Cluster. This session will briefly introduce you to Couchbase views, discuss document, database and view design best practices and present tips and tunables for running views in production for a successful Couchbase deployment.TRANSCRIPT
Best Practices
Couchbase Indexing in ProductionDavid Maier | Senior Solutions Engineer, Couchbase
• Introduction
• Document Modeling Basics
• Ways to query with Couchbase Server
• How Indexing works in Couchbase 3.x compared to 2.x
• Database Design Considerations for Views
• Configuration Settings and their Effects
• Resource Requirements
Agenda
©2014 Couchbase, Inc. 2
• Views are a powerful feature for real time applications
• Indexing can be a pretty heavy weighted operation
Introduction
©2014 Couchbase, Inc. 3
Patch
Management
Many others..
90%Views/Queries Key Access10%
Document Modeling Basics
• JavaScript Object Notation
• Meta data
• Document value
JSON Document Structure
©2014 Couchbase, Inc. 5
Patch
Management
Many others..
Normalized vs. Denormalized Data
©2014 Couchbase, Inc. 6
Patch
Management
Many others..
• Normalized
• Uses references for 1-many relationships
• Reduces data duplicates
• Smaller document size
• Denormalized
• Uses nested data
• Aggregate view of data
• Allows atomic operations
• No client side joins
Normalized vs. Denormalized Data
©2014 Couchbase, Inc. 7
Patch
Management
Many others..
Atomic Counters
©2014 Couchbase, Inc. 8
Patch
Management
Many others..
• Similar to sequences / Auto-Incrementing Columns from the rel. world
• Initialize and then increment a counter value
• Use the counter value as part of a key
Reference Documents for Lookups
©2014 Couchbase, Inc. 9
Patch
Management
Many others..
• Second document which references the primary one
Ways to query with
Couchbase Server
Retrieval via Key Patterns and Lookup Documents
©2014 Couchbase, Inc. 11
Patch
Management
Many others..
• Via key pattern
• ‘person::$firstname.$lastname’
• With lookup document
• Just 2 steps to retrieve an user by email address
• Most efficient way
• B-Tree traversal vs. direct access
Retrieval via Key Patterns and Lookup Documents
©2014 Couchbase, Inc. 12
Patch
Management
Many others..
• Access multiple documents by using a counter value
Indexing and Querying via Views
©2014 Couchbase, Inc. 13
Patch
Management
Many others..
• Organized in Design Documents
• Incremental Map-Reduce
• Spread load across nodes
• Each node indexes it’s data
Map Reduce
Process,
filter, map
and emit a
row
Aggregate
mapped data
Built in:
_count,
_sum, _stats
Indexing and Querying via Views
©2014 Couchbase, Inc. 14
Many others..
• Multiple roles
• A primary index provides access to all document id-s of a bucket
• A secondary index is an alternative access path regarding a (compound) key attribute
• A View provides you an alternative view on your data
Indexing and Querying via Views
©2014 Couchbase, Inc. 15
Patch
Management
Indexing and Querying via Views
©2014 Couchbase, Inc. 16
Patch
Management
Indexing and Querying via Views
©2014 Couchbase, Inc. 17
Patch
Management
• Simple View Access
• Exact Match
• Range
• With Reduction
• With Grouping
Best Practices for Selection, Projection and Aggregation
©2014 Couchbase, Inc. 18
Patch
Management
Many others..
• Try to avoid computing too many things in a View
• Check for attribute existence
• Select (filter) data to avoid unnecessary entries in the View
• Use document types to make Views more selective
• Project (map) only necessary data and emit it as value
• Do not emit the full document
• If possible then emit a null value and do an additional Get to retrieve the whole document
• Use the built in reduce functions if possible
Best Practices for Selection, Projection and Aggregation
©2014 Couchbase, Inc. 19
Patch
Management
Many others..
How Indexing works in Couchbase
3.x compared to 2.x
2.x Architecture
©2014 Couchbase, Inc. 21
Patch
Management
Many others..
3.x Architecture
©2014 Couchbase, Inc. 22
Patch
Management
Many others..
The Semantic of ‘stale = false’
©2014 Couchbase, Inc. 23
Patch
Management
• 'stale = false’
• Default is ‘update_after’
• Used to enforce an index update at query time
• Adds latency if used with every query
• 2.x
• Data was eventually indexed and result was eventual consistent
• The data which did previously hit the disk was indexed
• 3.x
• Data is indexed from memory and so 'stale = false' works as semantically expected
Database Design Considerations
for Views
Number of Design Documents by Bucket
©2014 Couchbase, Inc. 25
• Indexers are allocated per Design Document
• Effects number of in parallel used CPU-s
• Bad cases
• One Design Document contains all Views
All Views are updated the same time
A lot to do for the Indexer
• One View per Design Document
Resource intensive because one Indexer per View
• Use a good balance regarding the number of Views per Design Document !
Separated buckets for Indexing / Querying
©2014 Couchbase, Inc. 26
Patch
Management
Many others..
• Creating a View for the entire bucket is heavy weighted
• View function is executed for every Set operation
• Separate the data which should be queried by Views by storing it in a separated bucket
• But don't create too much buckets !
• Overhead regarding the cluster management
XDCR: A separated cluster for Indexing / Querying
©2014 Couchbase, Inc. 27
• Use a separated Cluster for Indexing and Querying to avoid the load on the main one
• Reporting cluster vs. operational one
• Active-Passive XDCR
Configuration Settings
and their Effects
Indexing Settings
©2014 Couchbase, Inc. 29
• Index path
To use separated disks for the data and the indexes in order to improve I/O performance
Indexing Settings
©2014 Couchbase, Inc. 30
• Indexing interval
Controls how up to date the index is by default
• ‘stale = false’ as explained before
Indexing Settings
©2014 Couchbase, Inc. 31
• Maximum number of in parallel working Indexers
To increase the number of threads per node means higher level of concurrency, but also higher disk and CPU load
Rebalance Settings
©2014 Couchbase, Inc. 32
• Index aware rebalance
• By default indexing happens as part of the rebalance operation
• Ensures that you get query results from a new node during rebalance that are consistent with the query results you would have received from the node before rebalance started
Performance impact if enabled, so rebalance takes significantly more time
Rebalance Settings
©2014 Couchbase, Inc. 33
• Rebalance before compaction
• Default is 16, which means that 16 vBuckets are moved before rebalance is paused
Higher value may increase rebalance performance because it implicitly increases the rebalance priority
Rebalance Settings
©2014 Couchbase, Inc. 34
• Rebalance moves per node
• The default is 1
The number of vBuckets moved at a time during the rebalance operation
Compaction Settings
©2014 Couchbase, Inc. 35
• (Auto) Compaction
• Necessary because append only structures are used
• In-place updates are expensive
• Removes thumb stone objects and fragmentation
• Process Database and View compaction in parallel
Implies a heavier processing and disk I/O load during the compaction process
Compaction Settings
©2014 Couchbase, Inc. 36
Resource Requirements
Resource Requirements
©2014 Couchbase, Inc. 38
More CPU cores are recommended
Configure your OS File System Buffer !
Use SSD-s for Views !
CPU Disk (size, I/O)
Number of Views per Design Document
Number of the emitted items
Compaction
Complexity of
Map/Reduce
functions
Size of the emitted
value
0 200
ms
0 5000
q / s