couchbase analytics: an overview – connect silicon valley 2017

27
Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved. COUCHBASE ANALYTICS An Overview

Upload: couchbase

Post on 21-Jan-2018

48 views

Category:

Technology


0 download

TRANSCRIPT

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

COUCHBASE ANALYTICSAn Overview

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

AGENDA01/

02/

03/

04/

What is Couchbase Analytics

How to use it?

From the inside out

Developer Preview 4

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Why Couchbase Analytics?

• Support OLTP and OLAP processing in a single platform

• Eliminate the need for a separate OLAP system

• Eliminate ETL

• Reduces latency

• Reduces complexity

• Enables more intelligent applications

• Enable data exploration and ad hoc analytics

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

What is Couchbase Analytics?

• Common programming model & data model

• Unified management

• Fast data synchronization

• Extend Couchbase Platform to power real-time analytics

• Ad-hoc queries (“Ask me anything!”)

• Workload isolation

• Independent scaling

Scale out

architecture

Query Mobile & IoT AnalyticsPreview

Memory-first

architecture

Unified Programming

Search

Core Database Engine

HOW TO USE

IT?

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Data: Beer Sample

{

"name": "Commonwealth Brewing #1",

"city": "Boston",

"state": "Massachusetts",

"code": "",

"country": "United States",

"phone": "",

"website": "",

"type": "brewery",

"updated": "2010-07-22 20:00:20",

"description": "",

"address": [ ],

"geo": {

"accuracy": "APPROXIMATE",

"lat": 42.3584,

"lng": -71.0598

}

}

{

"name": "Piranha Pale Ale",

"abv": 5.7,

"ibu": 0,

"srm": 0,

"upc": 0,

"type": "beer",

"brewery_id": "110f04166d",

"updated": "2010-07-22 20:00:20",

"description": "",

"style": "American-Style Pale Ale",

"category": "North American Ale"

}

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Simple Join

[{

"brewer": "(512) Brewing Company",

"beer": "(512) ALT"

},

{

"brewer": "(512) Brewing Company",

"beer": "(512) Bruin"

},

{

"brewer": "(512) Brewing Company",

"beer": "(512) IPA"

}]

"Get 3 beers with their breweries"

SELECT bw.name AS brewer, br.name AS beer

FROM breweries bw, beers br

WHERE br.brewery_id = meta(bw).id

ORDER BY bw.name, br.name

LIMIT 3;

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Non-key Self Join

[{

"brewer1": "aberdeen_brewing",

"brewer2": "hoffbrau_steaks_brewery_2",

"beer": "Scottish Ale"

},

{

"brewer1": "aberdeen_brewing",

"brewer2": "carlyle_brewing",

"beer": "Scottish Ale"

},

{

"brewer1": "aberdeen_brewing",

"brewer2": "belhaven_brewery",

"beer": "Scottish Ale"

}]

"Get 3 beer names used by different breweries"

SELECT b1.name AS beer,

b1.brewery_id AS brewer1,

b2.brewery_id AS brewer2

FROM beers b1, beers b2

WHERE b1.name = b2.name

AND b1.brewery_id != b2.brewery_id

ORDER BY b1.brewery_id

LIMIT 3;

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Nested Outer Join

[{

"beers": [

{ "abv": 8.2, "name": "(512) Pecan Porter" },

{ "abv": 5.8, "name": "(512) Pale" }, ...

],

"brewer": "(512) Brewing Company"

},

{

"beers": [

{ "abv": 7.2, "name": "21A IPA" },

{ "abv": 5.8, "name": "North Star Red" }, ...

],

"brewer": "21st Amendment Brewery Cafe"

}]

"Get 2 breweries and the list of their beers"

SELECT bw.name AS brewer, (

SELECT br.name, br.abv

FROM beers br

WHERE br.brewery_id = meta(bw).id

) AS beers

FROM breweries bw

ORDER BY bw.name

LIMIT 2;

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Grouping and Aggregation

[{

"num_beers": 57,

"brewery_id": "midnight_sun_brewing_co"

},

{

"num_beers": 49,

"brewery_id": "rogue_ales"

},

{

"num_beers": 38,

"brewery_id": "anheuser_busch"

}

]

"Get all breweries that produce more than 37 beers"

SELECT br.brewery_id,

COUNT(*) AS num_beers

FROM beers br

GROUP BY br.brewery_id

HAVING num_beers > 37

ORDER BY num_beers DESC;

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Putting it all together

[{

"num_beers": 5,

"beer_strength": 12.02,

"city": "Vorchdorf"

},

{

"num_beers": 8,

"beer_strength": 10.3125,

"city": "Buggenhout"

},

{

"num_beers": 11,

"beer_strength": 10.045454545454545,

"city": "Fraserburgh"

}]

"Explore beer characteristics by city"

SELECT bw.city, COUNT(*) AS num_beers,

AVG(br.abv) AS beer_strength

FROM beers br, breweries bw

WHERE br.brewery_id = meta(bw).id

GROUP BY bw.city

HAVING num_beers > 1

ORDER BY beer_strength DESC

LIMIT 3;

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Couchbase Analytics DDL: Lifecycle

• DDL for shadow datasets

CREATE BUCKET `beer-sample`;

CREATE SHADOW DATASET beers ON `beer-sample` WHERE `type` = "beer";

CREATE SHADOW DATASET breweries ON `beer-sample` WHERE `type` = "brewery";

CONNECT BUCKET `beer-sample`;

SELECT * FROM beers ORDER BY abv DESC LIMIT 12;

DISCONNECT BUCKET `beer-sample`;

DROP DATASET breweries ;

DROP DATASET beers;

DROP BUCKET `beer-sample`;

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Couchbase Analytics DDL: Lifecycle

• DDL for shadow datasets for external data

CREATE BUCKET `beer-sample` WITH { "nodes": "node1.mydomain.com,node2.mydomain.com" };

CREATE SHADOW DATASET beers ON `beer-sample` WHERE `type` = "beer";

CREATE SHADOW DATASET breweries ON `beer-sample` WHERE `type` = "brewery";

CONNECT BUCKET `beer-sample` WITH { "password": "!@#", "timeout": 2000 };

SELECT * FROM beers ORDER BY abv DESC LIMIT 12;

DISCONNECT BUCKET `beer-sample`;

DROP DATASET breweries ;

DROP DATASET beers;

DROP BUCKET `beer-sample`;

FROM THE

INSIDE OUT

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Why another service?

• Common programming model & data model

• Unified management

• Fast data synchronization

• Extend Couchbase Platform to power real-time analytics

• Ad-hoc queries (“Ask me anything!”)

• Workload isolation

• Independent scaling

Scale out

architecture

Query Mobile & IoT AnalyticsPreview

Memory-first

architecture

Unified Programming

Search

Core Database Engine

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Couchbase Query and Analytics

Many queries Each touches a little data Fewer queries Each touches a lot of data

Couchbase Query Couchbase Analytics

Optimized for

Analytics

(OLAP)

Optimized for

Operations

(OLTP)

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

"Get the 10 chattiest users in a timeframe"

SELECT user.id, COUNT(message) AS count

FROM gbook_messages AS message, gbook_users AS user

WHERE message.author_id = user.id

AND message.send_time BETWEEN "2001-11-28T09:57:13" AND "2001-11-29T09:57:13"

GROUP BY user.id

ORDER BY count DESC

LIMIT 10;

Example: Join, Grouping, and Aggregation

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Couchbase Query and Analytics – Performance Tradeoff

1m (<10) 1h (<500) 1d (<5000)

Join GBy CBA Join GBy N1QL GSI

1w (<25K) 1mo (<100K) 3mo (<300K) 6mo (<600K)

Join GBy CBA Join GBy N1QL GSI

interval (# records)

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

"Secret" Sauce: Query Parallelism

• Massively Parallel Query Processor (MPP) executes complex queries on large datasets

• Comprehensive query language

Query takes 1 minute Query takes 15 seconds

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Couchbase Analytics Coupling

• Separate services, separate nodes

• Multi-Dimensional Scaling

• Workload isolation

• Parallel shadowing of data(sets) via DCP

• Low impact on data nodes

• Low latency

ANALYTICS

ANALYTICS

ANALYTICS

ANALYTICS

DATA

DATA

DATA

DEVELOPER

PREVIEW 4

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

What is in the Developer Preview?

• Common programming model & data model

• Unified management

• Fast data synchronization

• Extend Couchbase Platform to power real-time analytics

• Ad-hoc queries (“Ask me anything!”)

• Workload isolation

• Independent scaling

Scale out

architecture

Query Mobile & IoT AnalyticsPreview

Memory-first

architecture

Unified Programming

Search

Core Database Engine

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Workbench

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Get it

https://www.couchbase.com/downloads

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

THANK YOU

APPENDIX

Confidential and Proprietary. Do not distribute without Couchbase consent. © Couchbase 2017. All rights reserved.

Couchbase Analytics and friends

Operations Analytics

BatchOnline

Key Value CB Query CB Analytics Spark Hadoop

𝜇s ms 30s Minutes+

1 record Trillions of records

Start up overhead

Job-based

Parallel query

ETL