best practices - couchbase indexing in production: couchbase connect 2014

39
Best Practices Couchbase Indexing in Production David Maier | Senior Solutions Engineer, Couchbase

Upload: couchbase

Post on 02-Jul-2015

2.420 views

Category:

Data & Analytics


3 download

DESCRIPTION

Abstract: Couchbase Views is a very powerful feature to build real time applications. However, indexing can be a pretty heavy weight operation on your Couchbase Cluster. This session will briefly introduce you to Couchbase views, discuss document, database and view design best practices and present tips and tunables for running views in production for a successful Couchbase deployment.

TRANSCRIPT

Page 1: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Best Practices

Couchbase Indexing in ProductionDavid Maier | Senior Solutions Engineer, Couchbase

Page 2: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

• Introduction

• Document Modeling Basics

• Ways to query with Couchbase Server

• How Indexing works in Couchbase 3.x compared to 2.x

• Database Design Considerations for Views

• Configuration Settings and their Effects

• Resource Requirements

Agenda

©2014 Couchbase, Inc. 2

Page 3: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

• Views are a powerful feature for real time applications

• Indexing can be a pretty heavy weighted operation

Introduction

©2014 Couchbase, Inc. 3

Patch

Management

Many others..

90%Views/Queries Key Access10%

Page 4: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Document Modeling Basics

Page 5: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

• JavaScript Object Notation

• Meta data

• Document value

JSON Document Structure

©2014 Couchbase, Inc. 5

Patch

Management

Many others..

Page 6: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Normalized vs. Denormalized Data

©2014 Couchbase, Inc. 6

Patch

Management

Many others..

• Normalized

• Uses references for 1-many relationships

• Reduces data duplicates

• Smaller document size

• Denormalized

• Uses nested data

• Aggregate view of data

• Allows atomic operations

• No client side joins

Page 7: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Normalized vs. Denormalized Data

©2014 Couchbase, Inc. 7

Patch

Management

Many others..

Page 8: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Atomic Counters

©2014 Couchbase, Inc. 8

Patch

Management

Many others..

• Similar to sequences / Auto-Incrementing Columns from the rel. world

• Initialize and then increment a counter value

• Use the counter value as part of a key

Page 9: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Reference Documents for Lookups

©2014 Couchbase, Inc. 9

Patch

Management

Many others..

• Second document which references the primary one

Page 10: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Ways to query with

Couchbase Server

Page 11: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Retrieval via Key Patterns and Lookup Documents

©2014 Couchbase, Inc. 11

Patch

Management

Many others..

• Via key pattern

• ‘person::$firstname.$lastname’

• With lookup document

• Just 2 steps to retrieve an user by email address

• Most efficient way

• B-Tree traversal vs. direct access

Page 12: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Retrieval via Key Patterns and Lookup Documents

©2014 Couchbase, Inc. 12

Patch

Management

Many others..

• Access multiple documents by using a counter value

Page 13: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Indexing and Querying via Views

©2014 Couchbase, Inc. 13

Patch

Management

Many others..

• Organized in Design Documents

• Incremental Map-Reduce

• Spread load across nodes

• Each node indexes it’s data

Map Reduce

Process,

filter, map

and emit a

row

Aggregate

mapped data

Built in:

_count,

_sum, _stats

Page 14: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Indexing and Querying via Views

©2014 Couchbase, Inc. 14

Many others..

• Multiple roles

• A primary index provides access to all document id-s of a bucket

• A secondary index is an alternative access path regarding a (compound) key attribute

• A View provides you an alternative view on your data

Page 15: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Indexing and Querying via Views

©2014 Couchbase, Inc. 15

Patch

Management

Page 16: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Indexing and Querying via Views

©2014 Couchbase, Inc. 16

Patch

Management

Page 17: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Indexing and Querying via Views

©2014 Couchbase, Inc. 17

Patch

Management

• Simple View Access

• Exact Match

• Range

• With Reduction

• With Grouping

Page 18: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Best Practices for Selection, Projection and Aggregation

©2014 Couchbase, Inc. 18

Patch

Management

Many others..

• Try to avoid computing too many things in a View

• Check for attribute existence

• Select (filter) data to avoid unnecessary entries in the View

• Use document types to make Views more selective

• Project (map) only necessary data and emit it as value

• Do not emit the full document

• If possible then emit a null value and do an additional Get to retrieve the whole document

• Use the built in reduce functions if possible

Page 19: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Best Practices for Selection, Projection and Aggregation

©2014 Couchbase, Inc. 19

Patch

Management

Many others..

Page 20: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

How Indexing works in Couchbase

3.x compared to 2.x

Page 21: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

2.x Architecture

©2014 Couchbase, Inc. 21

Patch

Management

Many others..

Page 22: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

3.x Architecture

©2014 Couchbase, Inc. 22

Patch

Management

Many others..

Page 23: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

The Semantic of ‘stale = false’

©2014 Couchbase, Inc. 23

Patch

Management

• 'stale = false’

• Default is ‘update_after’

• Used to enforce an index update at query time

• Adds latency if used with every query

• 2.x

• Data was eventually indexed and result was eventual consistent

• The data which did previously hit the disk was indexed

• 3.x

• Data is indexed from memory and so 'stale = false' works as semantically expected

Page 24: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Database Design Considerations

for Views

Page 25: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Number of Design Documents by Bucket

©2014 Couchbase, Inc. 25

• Indexers are allocated per Design Document

• Effects number of in parallel used CPU-s

• Bad cases

• One Design Document contains all Views

All Views are updated the same time

A lot to do for the Indexer

• One View per Design Document

Resource intensive because one Indexer per View

• Use a good balance regarding the number of Views per Design Document !

Page 26: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Separated buckets for Indexing / Querying

©2014 Couchbase, Inc. 26

Patch

Management

Many others..

• Creating a View for the entire bucket is heavy weighted

• View function is executed for every Set operation

• Separate the data which should be queried by Views by storing it in a separated bucket

• But don't create too much buckets !

• Overhead regarding the cluster management

Page 27: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

XDCR: A separated cluster for Indexing / Querying

©2014 Couchbase, Inc. 27

• Use a separated Cluster for Indexing and Querying to avoid the load on the main one

• Reporting cluster vs. operational one

• Active-Passive XDCR

Page 28: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Configuration Settings

and their Effects

Page 29: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Indexing Settings

©2014 Couchbase, Inc. 29

• Index path

To use separated disks for the data and the indexes in order to improve I/O performance

Page 30: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Indexing Settings

©2014 Couchbase, Inc. 30

• Indexing interval

Controls how up to date the index is by default

• ‘stale = false’ as explained before

Page 31: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Indexing Settings

©2014 Couchbase, Inc. 31

• Maximum number of in parallel working Indexers

To increase the number of threads per node means higher level of concurrency, but also higher disk and CPU load

Page 32: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Rebalance Settings

©2014 Couchbase, Inc. 32

• Index aware rebalance

• By default indexing happens as part of the rebalance operation

• Ensures that you get query results from a new node during rebalance that are consistent with the query results you would have received from the node before rebalance started

Performance impact if enabled, so rebalance takes significantly more time

Page 33: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Rebalance Settings

©2014 Couchbase, Inc. 33

• Rebalance before compaction

• Default is 16, which means that 16 vBuckets are moved before rebalance is paused

Higher value may increase rebalance performance because it implicitly increases the rebalance priority

Page 34: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Rebalance Settings

©2014 Couchbase, Inc. 34

• Rebalance moves per node

• The default is 1

The number of vBuckets moved at a time during the rebalance operation

Page 35: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Compaction Settings

©2014 Couchbase, Inc. 35

• (Auto) Compaction

• Necessary because append only structures are used

• In-place updates are expensive

• Removes thumb stone objects and fragmentation

• Process Database and View compaction in parallel

Implies a heavier processing and disk I/O load during the compaction process

Page 36: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Compaction Settings

©2014 Couchbase, Inc. 36

Page 37: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Resource Requirements

Page 38: Best Practices - Couchbase Indexing in Production: Couchbase Connect 2014

Resource Requirements

©2014 Couchbase, Inc. 38

More CPU cores are recommended

Configure your OS File System Buffer !

Use SSD-s for Views !

CPU Disk (size, I/O)

Number of Views per Design Document

Number of the emitted items

Compaction

Complexity of

Map/Reduce

functions

Size of the emitted

value

0 200

ms

0 5000

q / s