how-to nosql 3.0 webinar series: couchbase 104 - views and indexing

34
Couchbase 104 Justin Michaels [email protected] | @justindmichaels

Upload: couchbase

Post on 02-Jul-2015

821 views

Category:

Software


6 download

DESCRIPTION

In Couchbase 104 for 3.0, explore the power of creating views and indexes in Couchbase. Learn the underlying view architecture for how views and indexes are built in Couchbase.

TRANSCRIPT

Page 1: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Couchbase 104Justin Michaels

[email protected] | @justindmichaels

Page 2: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Views and Indexes Overview

Page 3: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Indexes are “views” into Data

• shortcut derived from and pointing into, a greater volume of values, data,

information or knowledge

Traditional Index Examples

• Table of Contents

• Card Catalog

Indexes and Views

©2014 Couchbase, Inc. 3

Page 4: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

In Couchbase Map-Reduce is used to maintain Indexes

Map functions are applied to JSON documents and they output or "emit" data that is organized in an Index form

Each emit() call produces a row in the index

Couchbase Views - Map-Reduce Indexes

©2014 Couchbase, Inc. 4

Page 5: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Map-Reduce is a technique designed for dealing with semi-structured data by parallel processing across a distributed system

Different than Hadoop Map/Reduce

• Map functions identify data with collections, process them, and output transformed values

• Reduce functions take the output of Map functions and perform numeric aggregate calculations on them

What is Map Reduce?

©2014 Couchbase, Inc. 5

Page 6: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Map inputs:

• Document – Application data

• Metadata – Couchbase data

Map outputs:

• Document ID

• View Key: User configurable based on JSON fields

• View Value: Only needed when reducing, use ‘null’ otherwise

Produces Index:

• B-tree Structure

• Sorted Alphabetically

Map Functions

©2014 Couchbase, Inc.

Page 7: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Built-in reduce functions (Optional)

• _count – provides a count of unique keys

• _sum – provides a sum total of values

• _stats – provides statistics (max, min, avg, etc.) of values

Operate on results emitted by map function

Results stored pre-computed for fast access

Custom reductions are possible

Reduce Functions

©2014 Couchbase, Inc.

Page 8: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Architecture

Page 9: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

33 2

Architecture - Couchbase View Engine

2

Managed Cache

Dis

k Q

ueu

e

Disk

Replication Queue

App Server

Couchbase Server Node

Doc 1

Doc 1

To other node

View engine Doc 1Doc 1

©2014 Couchbase, Inc.9

Page 10: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

COUCHBASE SERVER CLUSTER

User Configured Replica Count = 1

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc

SERVER 1

REPLICA

Doc 4

Doc 1

Doc 8

Doc

Doc

Doc

APP SERVER 1

COUCHBASE Client Library

CLUSTER MAP

COUCHBASE Client Library

CLUSTER MAP

APP SERVER 2

Doc 9

• Indexing is distributed across nodes

• Parallelize the effort

• Each node has index for data stored on it

• Queries combine the results from required nodes

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc

SERVER 2

REPLICA

Doc 4

Doc 1

Doc 8

Doc

Doc

Doc

Doc 9

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc

SERVER 3

REPLICA

Doc 4

Doc 1

Doc 8

Doc

Doc

Doc

Doc 9

Query

Architecture - Couchbase View Engine

Page 11: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Buckets have one or more DESIGN DOCUMENTS

• Distributed across cluster when created

DESIGN DOCUMENTS contain one or more VIEW definitions

• Design Documents are processed in parallel

• All the views in a single design document are processed sequentially

Architecture – Design Document

BUCKET A

Design document 1View 1

View 2

View 3

Design document 2View 4

View 5

Design document 3 View 6

View 7BUCKET B©2014 Couchbase, Inc.

Page 12: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Architecture – Couchbase Map Reduce

©2014 Couchbase, Inc. 12

Patch

Management

Many others..

Individual document operations are atomic

Views are eventually consistent in relation to documents

Incremental Map-Reduce

• Spread load across nodes

• Each node indexes it’s data

Map Reduce

Process, filter, map

and emit a row

Aggregate mapped

data

Default:

_count

_sum

_stats

Page 13: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Architecture - Index Building Details

©2014 Couchbase, Inc. 13

Views are maintained directly from managed cache

• The entire view is recreated if the view definition has changed

• All the views within a design document are incrementally updated

Views are updated automatically according to:

• Update Interval (time period); default 5000 millisecondsOR (as of 3.x)

• Update Documents (number of changes); default 5000 changes

Update Controlled by:

• Configured Globally via REST for Individual Design Document

• Manual updates provide application control

Page 14: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

stale = UPDATE_AFTER (default if nothing is specified)

• fast response

• can take two operations to read your own writes

stale = OK (most likely to be used)

• auto update only

• might not see your own writes

• least frequent updates -> least resource impact -> highest performance

stale = FALSE (only when TRULY required)

• use with persistTo during set if data needs to force view update

• BUT aware of delay it adds on set and query operation

Architecture - Index Building Details

©2014 Couchbase, Inc.

Page 15: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

In addition to data replicas, optionally create replica for indexes

• Build an index using the data in replica vBuckets

Enabled per bucket (Bucket Config) or per design document (REST API)

• Each node must maintain index for active and replica data

• Implies additional CPU and I/O overhead

Failover and Failures

• Without replica indexes complete view is rebuilt

• Replica indexes enabled if present and queries remain consistent

Architecture - Index Building Details (Replicas)

©2014 Couchbase, Inc.

Page 16: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Architecture - Disk Structure

Each design document creates it’s own set of index files

Index data is always read from disk

• File format allows for successful I/O caching by operating system

Separate disk devices for view versus data files

• Both are append-only

• Both are compacted in parallel

• Better use of IO and caching

• Possible to use SSD’s for improved performance on one or other (or both)

©2014 Couchbase, Inc.

Page 17: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Development vs Production Views

Development Views

• Can be edited

• Can be test on full/partial dataset

• Not automatically maintained

Production Views

• Always operate on full document set

• Cannot be modified

• Automatically updated

Development Views are ‘published’ to Production

Simple creation of the view definition NOT a move to new cluster

Execute Development View on Entire Cluster

Development View

Create

Edit/Refine

Sample Index

Subset

Production View

Full Index

Promote to ProductionFull Data

Full DataBucket Content

©2014 Couchbase, Inc.

Page 18: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Writing Views

Page 19: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Map() Function => Index

function(doc, meta) {emit(doc.username, doc.email)

} indexed key output value(s)create row

json doc doc metadata

Every Document passes through View Map() functions

Map

View Anatomy

©2014 Couchbase, Inc.

Page 20: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Single Element Keys (Text Key)

function(doc, meta) {emit(doc.email, doc.points)

}text key

Map

meta.id doc.email doc.points

u::1 [email protected] 1000

u::35 [email protected] 1200

u::20 [email protected] 900

View Anatomy

©2014 Couchbase, Inc.

Page 21: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Compound Keys (Array)

function(doc, meta) {emit(dateToArray(doc.timestamp), 1)

} array key

Array Based Index Keys get sorted as Strings,

but can be grouped by array elements

Map

meta.id dateToArray(doc.timestamp) value

u::20 [2012,10,9,18,45] 1

u::1 [2012,9,26,11,15] 1

u::35 [2012,8,13,2,12] 1

View Anatomy

Page 22: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

key = “” (exact match)

keys = [ ] (set of keys match)

startkey/endkey = “” (range queries on view key)

startkey_docID/endkey_docID = “” (range queries on meta.id)

stale (false, update_after, ok)

group/group_by (aggregate with grouping)

View Anatomy - Parameters

©2014 Couchbase, Inc.

Page 23: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View Anatomy - Collation

©2014 Couchbase, Inc.

23

1234567890 < aAbBcCdDeEfFgGhHiIjJkKlLmM...

Unicode Collation

a < á < A < Á < b

1234567890 < a-z < A-Z

Byte Order

Page 24: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View Anatomy - Sample Document

Document ID

©2014 Couchbase, Inc.

Page 25: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View Anatomy - Sample Index

ValueKey

©2014 Couchbase, Inc.

Page 26: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View Anatomy - Examples

©2014 Couchbase, Inc. 26

Patch

Management

Many others..

Page 27: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View Anatomy - Querying

©2014 Couchbase, Inc. 27

Patch

Management

• Simple View Access

• Exact Match

• Range

• With Reduction

• With Grouping

Page 28: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Best Practices

Page 29: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View size is determined by key and value contents

• Emit as little as possible … not full document

• Only use values when required by a reduce function

• Only emit either null or the secondary key (doc ID included with each row)

View distribution:

• More views per designdoc require more time to update all views in group

• Single views per designdoc may require more CPU

• Group views in designdocs by update frequency, rather than subject/topic

View Best Practices

©2014 Couchbase, Inc.

Page 30: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Queries should have consistent response times

• Indexes are pre-materialized

• Expect to use “stale.ok”

File system cache availability for the index has a big impact on performance

• Indexes are disk based

• Reduce cluster quota to give more system cache

In house performance results show that by doubling system cache availability

• query latency reduces by half

• throughput increases by 50%

View Best Practices

©2014 Couchbase, Inc.

Page 31: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

View Best Practices

31

Patch

Management

Many others..

Avoid computing too many things in a single View

Select (filter) data to avoid unnecessary entries in the View

• Use document types to make Views more selective

Project (map) only necessary data and emit it as value

• When possible emit a null value and perform additional Get to retrieve the whole document

Use the built in reduce functions if possible

©2014 Couchbase, Inc.

Page 32: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Couchbase Query Language

32

Page 33: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Querying with N1QL (“Nickel”)

33

Person

JSON can model our

Complex World

N1QL Can Query

that World

N1QL Developer Preview and Tutorial

http://docs.couchbase.com/developer/n1ql-dp3/n1ql-intro.html

http://query.pub.couchbase.com/tutorial/#1©2014 Couchbase, Inc.

Page 34: How-To NoSQL 3.0 Webinar Series: Couchbase 104 - Views and Indexing

Thank You!

Next Session:

Couchbase 105 | December 3, 2014 | 10am Pacific

Cross Data Center Replication (aka XDCR)

34

Justin Michaels

[email protected] | @justindmichaels