couchbase live europe 2015: n1ql: performance tuning and scaling

36
N1QL: Query Performance & Scale in Couchbase Server 4.0 Cihan Biyikoglu Dir. Product Management 1

Upload: couchbase

Post on 16-Jul-2015

173 views

Category:

Software


0 download

TRANSCRIPT

N1QL: Query Performance & Scalein Couchbase Server 4.0

Cihan BiyikogluDir. Product Management

1

©2015 Couchbase Inc. 2

Agenda

Part I - Architectural Overview New Cluster Architecture with Couchbase Server 4.0

Query Processing & Indexing

Part II - Optimizing Queries Execution Plans and Operators

Optimizing Queries - Filtering, Index Selection and Joins

Optimizing Apps - Consistency Dials

QA

Demos & More Demos…

©2015 Couchbase Inc. 3

Disclaimer

Couchbase Server 4.0 is still in development. Detail presented in this presentation may change based on customer feedback and other factors by the time the final version of the product is released.

Architecture Overview

Part I

©2014 Couchbase Inc.

Full Cluster Architecture

5

STORAGE

Couchbase Server 1

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 2

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 3

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 4

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 5

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 6

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service

©2014 Couchbase Inc.

Full Cluster Architecture

6

STORAGE

Couchbase Server 1

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 2

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 3

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 4

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 5

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service STORAGE

Couchbase Server 6

SHARD7

SHARD9

SHARD5

SHARDSHARDSHARD

Managed Cache

Cluster ManagerCluster

Manager

Managed Cache

Storage

Data Service

Index Service

Query Service

©2015 Couchbase Inc. 7

MDS – Multi-Dimensional Scaling

For more Information on multi-dimensional scaling

Session: MDS - A new Architecture for Independent Workload Scaling in Couchbase

Server 4.0

7

Query Processing Overview

©2015 Couchbase Inc. 9

Query Service - Capacity Management

With MDS, query service can be moved to an isolated set of nodes Independently control CPU, RAM, Network IO…

Added CPU: higher intra-query parallelizationAdded RAM: improved caching with larger result sets

Added Node: better availability and load balancing

Couchbase Cluster

node1 node8

Data ServiceIndex Service

Query Service

©2015 Couchbase Inc. 10

Query Execution

Submitting Queries in N1QL Stateless Connectivity through REST

Load-Balance across Query Service nodes

Prepared vs Ad-hoc Query Execution

Consistency Dials – more on this later…

©2015 Couchbase Inc. 11

Query Execution

Parallelization factor is #cores on Query Service Node

Execution Flow

Indexing Overview

©2015 Couchbase Inc. 13

Index Service Capacity Management

With MDS, index service can be moved to an isolated set of nodes Independently control CPU, RAM, Disk IO, Network IO…

Added RAM: better caching of indexesAdded CPU: faster index maintenance and index scan throughput under large number of indexes

Added Node: better availability, index maintenance and index scan isolation between indexes

Couchbase Cluster

node1 node8

Data Service

Index Service

Query Service

©2015 Couchbase Inc. 14

Views vs Indexes

Global Secondary Indexes vs Views GSI – Index Service

Global Secondary Indexes is a new indexing technology that allows independently partitioned and independently scalable indexes.

Views – Data Service

Incremental Map/Reduce Views that provide full partition alignment and paired scalability with Data Service.

New

Index Scan

©2015 Couchbase Inc. 15

Data Service

Projector & Router

Secondary Indexing – new in Couchbase Server 4.0

Index Scan

Query Service

Query Processor

cbq-engine

Index Service

SupervisorIndex maintenance &

Scan coordinator

Index#2Index#1

Query Processorcbq-engine

Bucket#1 Bucket#2

DCP StreamIndex#4Index#3

...

Bucket#2

Bucket#1

©2015 Couchbase Inc. 16

Views vs new Indexes

Map/Reduce Views New Indexes in v4.0

Partitioning Aligned to Data – Data Service Independent – Indexing Service

Scale Scale with Data Service Independently Scale Index Service

Fetch with Doc Key Single Node* Single Node*

Fetch with Index Key Scatter-Gather Single Node

Range Scan Scatter-Gather Single Node

Grouping, Aggregates & Reduce Built-in with Views API With N1QL

Caching Filesystem ForestDB Caching

Storage Couchstore ForestDB

Availability Replica Based Multiple Identical Indexes load balanced

*If defined as a Primary Index

©2015 Couchbase Inc. 17

Primary vs Secondary Indexes

Primary vs Secondary Primary Index is a full list of document keys within a given bucket

CREATE PRIMARY INDEX index_name

ON bucket_name

USING GSI|VIEW

WITH `{"nodes”: [“node_name”], “defer_build”:true}`; //GSI-ONLY

Secondary Index is an index on a field/expression on a subset of documents for lookups

CREATE INDEX index_name

ON bucket_name (field/expression, …)

USING GSI|VIEW

WHERE filter_expressions

WITH `{"nodes”: [“node_name”], “defer_build”:true}`; //GSI-ONLY

©2015 Couchbase Inc. 18

Deferred Index Building

Index building can be deferred to build multiple indexes all at once with greater scan efficiency.

CREATE INDEX … WITH {…“defer_build”:true};

BUILD INDEX ON bucket_name(Index_name, …) USING GSI;

Optimizing Queries

Part II

©2015 Couchbase Inc. 20

Execution Plans & Explain

EXPLAIN query Plan is assembled into an execution flow expressed through the

operators

Operators stream results up and down the stream

Sequence ParallelPrimary

Scan

InitialProject

Fetch

InitialProjectFetch

InitialProjectFetch

Limit

©2015 Couchbase Inc. 21

Operators

Main Operations Scans

PrimaryScan: Scan of the Primary Index based on document keys

IndexScan: Scan of the Secondary Index based on a predicate

Fetch

Fetch: Reach into the Data service with a document key

Projection Operations

InitialProject: reducing the stream size to the fields involved in query.

FinalProject: final shaping of the result to the requested JSON shape

©2015 Couchbase Inc. 22

Operators cont.

Operator Assembly

Parallel: execute all child operations in parallel

Sequence: execute child items in a sequence

Filtering Operators

Filter: Apply a filter expression (ex. WHERE field = “value”)

Limit: limit the number of items returned to N

Offset: start returning items from a specified item count

©2015 Couchbase Inc. 23

Operators cont.

Join Operators

Join: Join left and right keyspaces on attributes and document key

Unnest: Join operation between a parent and a child with a nested array where parent is repeated for each child array item.

Nest: Grouping operation between a parent and a child array where child array is embedded into the parent.

DEMOExecution Plans

Demo #1

Common Techniques for Tuning Queries

©2015 Couchbase Inc. 26

Minimize Items Scanned

Primary Index Scan vs. Index Scan Primary Index can only filter on document keys thus typically means

“full-scan” of the bucket Secondary Index is typically done with predicates and are smaller in

size thus better to scan

Index Selection: Based on matching expressions matching in Index and WHERE clause

DEMO #2SELECT name,updated FROM `beer-sample` WHERE type="beer" AND abv>0 ORDER BY name LIMIT 10;

Vs.

CREATE INDEX i_type on `beer-sample`(type) USING GSI;

SELECT name,updated FROM `beer-sample` WHERE type="beer" AND abv>0 ORDER BY name LIMIT 10;

©2015 Couchbase Inc. 27

Minimize Items Scanned

Limit & Filters help eliminate rows early in the execution plan With Limit, Upstream operators are signaled to stop by limit when enough

rows accumulate

Ex: Remember to Filter on Document type with buckets that contain multiple types.

DEMO #3SELECT b1.name as beer_name, b2.name as brewery_name, b2.country

FROM `beer-sample` AS b1 JOIN `beer-sample` AS b2 on KEYS b1.brewery_id

WHERE abv>0;

vs

SELECT b1.name as beer_name, b2.name as brewery_name, b2.country

FROM `beer-sample` AS b1 JOIN `beer-sample` AS b2 on KEYS b1.brewery_id

WHERE b1.type="beer” and abv>0;

©2015 Couchbase Inc. 28

Joins

Joins are efficient by nature Left hand value is joined to the right hand document key with nested

loop.

Query: Get brewery location for each beer:SELECT …

FROM `beer-sample` AS b1

JOIN `beer-sample` AS b2 on KEYS b1.brewery_id

WHERE b1.type="beer”;

For each document with type=“beer” take b1.brewery_id and look for and equal document key in b2.

Optimizing Applications

©2015 Couchbase Inc. 30

New Consistency Settings!

View Stale-ness Ok: unbounded – query what’s available in the index/view now

False: query after all changes up to the request timestamp (and maybe more) has been indexed for a given index or view.

New Indexes with Couchbase Server 4.0 Improves granularity of the consistency logical-timestamp.

New: Scan Consistency can be set to any logical timestamp

Indicate stale=false to stale=ok and everything in between

©2015 Couchbase Inc. 31

Flexible Consistency Settings

Time

t1 insert (k1, v1)

t2 do other business logic computation

t3 issue query/read on (k1,v1) with t3 vs t1

Catch up all the indexes to t3 and then issue query

Identical to “stale=false”

Catch up all the indexes to t1 and then issue query

Improved efficiency over “stale=false”

Recap

©2015 Couchbase Inc. 33

Recap

New Unique Query and Indexing Architecture Workload isolation with MDS gives you a great performance and scale

advancement.Queries load balance across servers and parallelize for optimum performanceNew Secondary Indexing and Views give your queries as boost

Familiar Concepts from your past life will help tune queries Understand Execution Plans Understand Indexes and Index Selection Filter & Limit aggressively Understand JOINs

Use powerful new Consistency Dials for best efficiency

©2015 Couchbase Inc. 34

Couchbase Server 4.0

Couchbase Server 4.0

Download the Developer Preview in a few weeks…

Couchbase.com

Q&ACihan Biyikoglu

[email protected]@cihangirb

©2015 Couchbase Inc. 36

DEMO --Initial setup & reset

CREATE PRIMARY INDEX p1 ON `beer-sample` USING GSI;DROP INDEX `beer-sample`.i_type USING GSI;

--DEMO #1EXPLAIN SELECT name,updated FROM `beer-sample` WHERE type="beer" ORDER BY name LIMIT 10;

-- DEMO #2SELECT name,updated FROM `beer-sample` WHERE type="beer" AND abv>0 ORDER BY name LIMIT 10;--Vs.--CREATE INDEX i_type on `beer-sample`(type) USING GSI;SELECT name,updated FROM `beer-sample` WHERE type="beer" AND abv>0 ORDER BY name LIMIT 10;

--DEMO #3SELECT b1.name as beer_name, b2.name as brewery_name, b2.country FROM `beer-sample` AS b1 JOIN `beer-sample` AS b2 onKEYS b1.brewery_id WHERE b1.abv>0 LIMIT 10; -- VS --SELECT b1.name as beer_name, b2.name as brewery_name, b2.country FROM `beer-sample` AS b1 JOIN `beer-sample` AS b2 onKEYS b1.brewery_id WHERE b1.type="beer" AND b1.abv>0 LIMIT 10;

-- DEMO #4SELECT name, updated FROM `beer-sample` WHERE type="beer” LIMIT 1;