ccsf12-app-development-with-indexes-queries-and-geo

49
1 1 Developing with Views: See Inside the Data J Chris Anderson Architect

Upload: couchbase

Post on 26-Jun-2015

2.288 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: CCSF12-App-Development-with-Indexes-Queries-and-Geo

1 1

Developing with Views:See Inside the Data

J Chris AndersonArchitect

Page 2: CCSF12-App-Development-with-Indexes-Queries-and-Geo

2

What we’ll talk about

• Lifecycle of a view• Index definition, build, and query phase• Consistency options (async by default)• Emergent Schema - Views and Documents• Patterns:• Secondary index• Basic aggregations (avg ratings by brewery)• Time-based analytics with group_level• Leaderboard• Schema Evolution

Page 3: CCSF12-App-Development-with-Indexes-Queries-and-Geo

3 3

view Lifecycle:Define - Build - query

Page 4: CCSF12-App-Development-with-Indexes-Queries-and-Geo

4

View Definition (in JavaScript)

like:CREATE INDEX city ON brewery city;

4

Page 5: CCSF12-App-Development-with-Indexes-Queries-and-Geo

5

Distributed Index Build Phase

• Optimized for lookups, in-order access and aggregations• All view reads from disk (different performance profile)• View builds against every document on every node–This is why you should group them in a design document

• Automatically kept up to date

Doc 4

Doc 2

Doc 5

SERVER 1

Doc 6

Doc 4

SERVER 2

Doc 7

Doc 1

SERVER 3

Doc 3

Doc 9

Doc 7

Doc 8 Doc 6

Doc 3

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

Doc 9

Doc 5

DOC

DOC

DOC

Doc 1

Doc 8 Doc 2

Replica Docs Replica Docs Replica Docs

Active Docs Active Docs Active Docs

Page 6: CCSF12-App-Development-with-Indexes-Queries-and-Geo

6

• Efficiently fetch an row or group of related rows.• Queries use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries

Dynamic Range Queries with Optional Aggregation

Doc 4

Doc 2

Doc 5

SERVER 1

Doc 6

Doc 4

SERVER 2

Doc 7

Doc 1

SERVER 3

Doc 3

Doc 9

Doc 7

Doc 8 Doc 6

Doc 3

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

DOC

Doc 9

Doc 5

DOC

DOC

DOC

Doc 1

Doc 8 Doc 2

Replica Docs Replica Docs Replica Docs

Active Docs Active Docs Active Docs

?startkey=“J”&endkey=“K”{ “rows”:[{“key”:“Juneau”,“value”:null}]}

Page 7: CCSF12-App-Development-with-Indexes-Queries-and-Geo

7

Queries run against stale indexes by default

• stale=update_after (default if nothing is specified)–always get fastest response–can take two queries to read your own writes

• stale=ok–auto update will trigger eventually–might not see your own writes for a few minutes– least frequent updates -> least resource impact

• stale=false–Use with Persistence observe if data needs to be included in

view results–BUT aware of delay it adds, only use when really required

Page 8: CCSF12-App-Development-with-Indexes-Queries-and-Geo

8

Development vs. Production Views

• Development views index a subset of the data.• Publishing a view builds the

index across the entire cluster.• Queries on production

views are scattered to all cluster members and results are gathered and returned to the client.

Page 9: CCSF12-App-Development-with-Indexes-Queries-and-Geo

9 9

Emergent Schema

Page 10: CCSF12-App-Development-with-Indexes-Queries-and-Geo

10

Emergent Schema

JSON.org

Github API

Twitter API

"Capture the user's intent"

• Falls out of your key-value usage• Helps to know what's efficient• Mostly you can relax

Page 11: CCSF12-App-Development-with-Indexes-Queries-and-Geo

11 11

Query Pattern:Find by Attribute

Page 12: CCSF12-App-Development-with-Indexes-Queries-and-Geo

12

Find documents by a specific attribute

• Lets find beers by brewery_id!

Page 13: CCSF12-App-Development-with-Indexes-Queries-and-Geo

13

The index definition

Page 14: CCSF12-App-Development-with-Indexes-Queries-and-Geo

14

The result set: beers keyed by brewery_id

Page 15: CCSF12-App-Development-with-Indexes-Queries-and-Geo

15 15

Query Pattern:Basic Aggregations

Page 16: CCSF12-App-Development-with-Indexes-Queries-and-Geo

16

Use a built-in reduce function with a group query

• Lets find average abv for each brewery!

Page 17: CCSF12-App-Development-with-Indexes-Queries-and-Geo

17 17

We are reducing doc.abv with _stats

Page 18: CCSF12-App-Development-with-Indexes-Queries-and-Geo

18 18

Group reduce (reduce by unique key)

Page 19: CCSF12-App-Development-with-Indexes-Queries-and-Geo

19 19

Query Pattern:Time-based Rollups

Page 20: CCSF12-App-Development-with-Indexes-Queries-and-Geo

20

Find patterns in beer comments by time

{   "type": "comment",   "about_id": "beer_Enlightened_Black_Ale",   "user_id": 525,   "text": "tastes like college!",   "updated": "2010-07-22 20:00:20"}{   "id": "f1e62"}

timestamp

Page 21: CCSF12-App-Development-with-Indexes-Queries-and-Geo

21

Query with group_level=2 to get monthly rollups

Page 22: CCSF12-App-Development-with-Indexes-Queries-and-Geo

22

dateToArray() is your friend

• String or Integer based timestamps• Output optimized for group_level queries• array of JSON numbers:

[2012,9,21,11,30,44]

dateT

oArra

y()

Page 23: CCSF12-App-Development-with-Indexes-Queries-and-Geo

23

group_level=2 results

• Monthly rollup• Sorted by time—sort the query results in your

application if you want to rank by value—no chained map-reduce

Page 24: CCSF12-App-Development-with-Indexes-Queries-and-Geo

24

group_level=3 - daily results - great for graphing

• Daily, hourly, minute or second rollup all possible with the same index.

• http://crate.im/posts/couchbase-views-reddit-data/

Page 25: CCSF12-App-Development-with-Indexes-Queries-and-Geo

25 25

Query Pattern:Leaderboard

Page 26: CCSF12-App-Development-with-Indexes-Queries-and-Geo

26

Aggregate value stored in a document

• Lets find the top-rated beers!{   "brewery": "New Belgium Brewing",   "name": "1554 Enlightened Black Ale",   "abv": 5.5,   "description": "Born of a flood...",   "category": "Belgian and French Ale",   "style": "Other Belgian-Style Ales",   "updated": "2010-07-22 20:00:20", “ratings” : { “jchris” : 5, “scalabl3” : 4, “damienkatz” : 1 }, “comments” : [ “f1e62”, “6ad8c” ]}

ratings

Page 27: CCSF12-App-Development-with-Indexes-Queries-and-Geo

27 27

Sort each beer by its average rating

• Lets find the top-rated beers!

average

Page 28: CCSF12-App-Development-with-Indexes-Queries-and-Geo

28 28

WHat Not to Write

Page 29: CCSF12-App-Development-with-Indexes-Queries-and-Geo

29

Most common mistakes

• Reduces that don’t reduce• Trying to do too many things with one view• Emitting too much data into a view value• Expecting view query performance to be as fast as get/set• Recursive queries require application code.

Page 30: CCSF12-App-Development-with-Indexes-Queries-and-Geo

30 30

Geographic index

Page 31: CCSF12-App-Development-with-Indexes-Queries-and-Geo

31

Experimental Status

• Not yet using Superstar trees • (only fast on large clusters)

• Optimized for bulk loading

Page 32: CCSF12-App-Development-with-Indexes-Queries-and-Geo

32 32

Full Text index

Page 33: CCSF12-App-Development-with-Indexes-Queries-and-Geo

33

Elastic Search Adapter

ElasticSearch

• Elastic Search is good for ad-hoc queries and faceted browsing• Our adapter is aware of changing Couchbase topology• Indexed by Elastic Search after stored to disk in Couchbase

Page 34: CCSF12-App-Development-with-Indexes-Queries-and-Geo

34 34

Questions?

Page 35: CCSF12-App-Development-with-Indexes-Queries-and-Geo

35 35

Views Under The Hood

J Chris AndersonArchitect

THIS TALK IS NOT WRITTEN YETmaybe combine with Dustin’s internals talk about vbucket handoff

Page 36: CCSF12-App-Development-with-Indexes-Queries-and-Geo

36

What we’ll talk about

• Key areas/topics discussed

Page 37: CCSF12-App-Development-with-Indexes-Queries-and-Geo

37 37

Dynamic Time Range Queries

Page 38: CCSF12-App-Development-with-Indexes-Queries-and-Geo

38

The B-tree Index

• Helps to know what's efficient• Superstar

http://damienkatz.net/2012/05/stabilizing_couchbase_server_2.html

Page 39: CCSF12-App-Development-with-Indexes-Queries-and-Geo

39

• Incremental reduce values are stored in the tree

Logical View B-tree

REDUCES

REDUCES

Page 40: CCSF12-App-Development-with-Indexes-Queries-and-Geo

40

• Incremental reduce values are stored in the tree

Logical View B-tree

7 5 5 3 2 3 7 5 5 3 2 3

2525 REDUCES

REDUCES

Page 41: CCSF12-App-Development-with-Indexes-Queries-and-Geo

41

• Incremental reduce values are stored in the tree

Reduce!

7 5 5 3 2 3 7 5 5 3 2 3

2525

_count

function(keys, values) { return keys ? values.length : sum(values);}

_count

function(keys, values) { return keys ? values.length : sum(values);}

Page 42: CCSF12-App-Development-with-Indexes-Queries-and-Geo

42

• You can query that tree dynamically• Lots of the patterns are about pulling value from this data structure

Dynamic Queries

2525

7 5 5 3 2 3 7 5 5 3 2 3

{ }{ }?startkey=“abba”&endkey=“robot”{ “value”:19}?startkey=“abba”&endkey=“robot”{ “value”:19}

_count

function(keys, values) { return keys ? values.length : sum(values);}

_count

function(keys, values) { return keys ? values.length : sum(values);}

Page 43: CCSF12-App-Development-with-Indexes-Queries-and-Geo

43

• Queries use cached values from B-tree inner nodes when possible• Take advantage of in-order tree traversal with group_level queries

Dynamic Queries

2525{{7 5 5 3 2 3 7 5 5 3 2 3 {{

{ }{ }?startkey=“abba”&endkey=“robot”{ “value”:19}?startkey=“abba”&endkey=“robot”{ “value”:19}

(7 5 5 2)(7 5 5 2)

1919

_count

function(keys, values) { return keys ? values.length : sum(values);}

_count

function(keys, values) { return keys ? values.length : sum(values);}

Page 44: CCSF12-App-Development-with-Indexes-Queries-and-Geo

44

• Incremental reduce values are stored in the tree

Respect Reduce! (anti-pattern)

function(keys, values) { return values;}

function(keys, values) { return values;}

DO NOT DO THIS!

IT DOESN’T reduce

DO NOT DO THIS!

IT DOESN’T reduce

[“ace”, “argh!”,“asphalt”]s[“ace”, “argh!”,“asphalt”]s[“front”, “garage”,“hibernate”]s[“front”, “garage”,“hibernate”]s[“pluto”, “nectar”,“mirage”]s[“pluto”, “nectar”,“mirage”]s

[“ace”, “argh!”,“asphalt”, “front”, “garage”,“hibernate”][“ace”, “argh!”,“asphalt”, “front”, “garage”,“hibernate”]

Page 45: CCSF12-App-Development-with-Indexes-Queries-and-Geo

45

Just use the Map

• If you think you need “the identity reduce”—just use the map.

[“ace”, “argh!”,“asphalt”, “front”, “garage”,“hibernate”][“ace”, “argh!”,“asphalt”, “front”, “garage”,“hibernate”]

USE THE MAP

USE THE MAP

Page 46: CCSF12-App-Development-with-Indexes-Queries-and-Geo

46

Lookup via key-range

• Find tables during yesterdays lunch shift• Find shifts owned by which manager

7 5 5 3 2 3 7 5 5 3 2 3

2525

?startkey=“abba”&endkey=“robot”{ “value”:19}?startkey=“abba”&endkey=“robot”{ “value”:19}

Page 47: CCSF12-App-Development-with-Indexes-Queries-and-Geo

4747

Schema evolution

Page 48: CCSF12-App-Development-with-Indexes-Queries-and-Geo

48

Application and Views

• Interactive schema fully controlled by application• If your code can handle it, the database can• Learn to write views defensively

Page 49: CCSF12-App-Development-with-Indexes-Queries-and-Geo

49

Incremental schema evolution

• Use a view to decide which documents need work• Make your workers idempotent• Once all your data is cleaned up, and old clients are no

longer writing the old format• The cleanup view is obsolete, so is any app code for

dealing with the old case• You've evolved!