couchbase 103 - views and map-reduce

80
Monday, October 14, 13

Upload: couchbase

Post on 20-Aug-2015

5.614 views

Category:

Technology


0 download

TRANSCRIPT

Monday, October 14, 13

Technical  Evangelist

twi0er:  @scalabl3email:  [email protected]

Jasdeep  Jaitla

Couchbase  103:  Views

Monday, October 14, 13

Technical  Evangelist

twi0er:  @scalabl3email:  [email protected]

Jasdeep  Jaitla

Couchbase  103:  Views

Monday, October 14, 13

Monday, October 14, 13

WHAT  IS  A  VIEW?

Monday, October 14, 13

Views are Indexes

• Indexes are methodologies to speed up access to information• Examples:-Dewey Decimal System-Card Catalogs-Hierarchal File Folders

• In databases, Indexes are specialized structures for searching for data, typically one or two key fields

Monday, October 14, 13

Indexing Subsystem

• Storing data and Indexing data are separate systems in all databases

• In explicit schema scenarios (RDBMS), Indexes are optimized based on the data type(s)

• In flexible schema scenarios Map-Reduce is used to create indexes

Monday, October 14, 13

What is Map-Reduce?

• Map-Reduce is a technique designed for dealing with Big Data and processing in parallel in distributed systems

• Map-Reduce is also specifically designed for dealing with unstructured or semi-structured data

• Map functions identify data with collections, process them, and output transformed values

• Reduce functions take the output of Map functions and perform numeric aggregate calculations on them

Monday, October 14, 13

Views: Map-Reduce Indexes

• In Couchbase, Map-Reduce is specifically used to create Indexes.

• Map functions are applied to JSON documents and they output or "emit" data that is organized in an Index

CRUD Operations MAP()

emit()

(processed)

Monday, October 14, 13

Sample View

• Creates an Index of Beer Names (doc.name) and the Alcohol By Volume values (doc.abv)

- Filters Documents• Only JSON Documents with json key doc.type == "beer"• and doc.brewery_id is non-null • and doc.name is non-null

- Outputs• Beer Name (doc.name) [searchable]• Beer Alcohol By Volume (doc.abv) [row value]

function (doc, meta) {// if json doc has this stuff, emit the doc.name field

! if (doc.type == “beer” && doc.brewery_id && doc.name) {! ! emit(doc.name, doc.abv);! }}

Monday, October 14, 13

Monday, October 14, 13

ARCHITECTURE

Monday, October 14, 13

Storage to Index

Couchbase Server

EP EngineRAM Cache

Disk Write Queue

Replication Queue

View Engine

Indexers

Application Server

Replica Couchbase Cluster Machine

Monday, October 14, 13

Storage to Index

Couchbase Server

EP EngineRAM Cache

Disk Write Queue

Replication Queue

View Engine

Indexers

Application Server

storage ops

Replica Couchbase Cluster Machine

Monday, October 14, 13

Views: Eventual Consistency

Couchbase Server

EP EngineRAM Cache

Disk Write Queue

Replication Queue

View Engine

Indexers

Application Server

Replica Couchbase Cluster Machine

Monday, October 14, 13

Views: Eventual Consistency

Couchbase Server

EP EngineRAM Cache

Disk Write Queue

Replication Queue

View Engine

Indexers

Application Server

storage ops

Replica Couchbase Cluster Machine

Time 1

Monday, October 14, 13

Views: Eventual Consistency

Couchbase Server

EP EngineRAM Cache

Disk Write Queue

Replication Queue

View Engine

Indexers

Application Server

Replica Couchbase Cluster Machine

Time 1

get

Monday, October 14, 13

Views: Eventual Consistency

Couchbase Server

EP EngineRAM Cache

Disk Write Queue

Replication Queue

View Engine

Indexers

Application Server

Replica Couchbase Cluster Machine

Time 1

get

Time 2

Monday, October 14, 13

Why  Use  Map-­‐Reduce  Indexes?

• Index  (Find)  Documents  by  different  JSON  Values  

•Query  Documents  by  JSON  Values  

• Create  StaVsVcs  and  Aggregates

When  are  Indexes  Necessary?

•Documents  are  Keyed  by  Random  ProperVes  (UUID,  GUID,  etc.)

• IteraVng  through  Lists  of  Documents  with  Random  Keys

• IteraVng  through  Lists  of  Documents  on  different  JSON  ProperVes  (i.e.  all  User  docs,  all  Product  docs,  by  Timestamp,  etc.)

Monday, October 14, 13

Monday, October 14, 13

ANATOMY  OF  A  VIEW

Monday, October 14, 13

Buckets  >>  Design  Documents  >>  Views

Couchbase Bucket

Monday, October 14, 13

Buckets  >>  Design  Documents  >>  Views

Couchbase Bucket

Design Document 1

Monday, October 14, 13

Buckets  >>  Design  Documents  >>  Views

Couchbase Bucket

Design Document 1

View

Monday, October 14, 13

Buckets  >>  Design  Documents  >>  Views

Couchbase Bucket

Design Document 1

ViewView

Monday, October 14, 13

Buckets  >>  Design  Documents  >>  Views

Couchbase Bucket

Design Document 1

ViewViewView

Monday, October 14, 13

Buckets  >>  Design  Documents  >>  Views

Couchbase Bucket

Design Document 1 Design Document 2

ViewViewView

Monday, October 14, 13

Buckets  >>  Design  Documents  >>  Views

Couchbase Bucket

Design Document 1 Design Document 2

ViewViewViewView

Monday, October 14, 13

Buckets  >>  Design  Documents  >>  Views

Couchbase Bucket

Design Document 1 Design Document 2

View ViewViewViewView

Monday, October 14, 13

Buckets  >>  Design  Documents  >>  Views

Couchbase Bucket

Design Document 1 Design Document 2

View ViewViewViewView

Indexers Are Allocated Per Design Doc

Monday, October 14, 13

Buckets  >>  Design  Documents  >>  Views

Couchbase Bucket

Design Document 1 Design Document 2

View ViewViewViewView

Indexers Are Allocated Per Design Doc

All Updated at Same TimeAll Updated at Same TimeAll Updated at Same Time

Monday, October 14, 13

Buckets  >>  Design  Documents  >>  Views

Couchbase Bucket

Design Document 1 Design Document 2

View ViewViewViewView

Indexers Are Allocated Per Design Doc

All Updated at Same TimeAll Updated at Same TimeAll Updated at Same Time

Can Only Access Data in the Bucket Namespace

Can Only Access Data in the Bucket Namespace

Monday, October 14, 13

Buckets  >>  Design  Documents  >>  Views

Couchbase Bucket

Design Document 1 Design Document 2

View ViewViewViewView

All Updated at Same TimeAll Updated at Same TimeAll Updated at Same Time

Can Only Access Data in the Bucket Namespace

Can Only Access Data in the Bucket Namespace

Monday, October 14, 13

Buckets  >>  Design  Documents  >>  Views

Couchbase Bucket

Design Document 1 Design Document 2

View ViewViewViewView

Can Only Access Data in the Bucket Namespace

Can Only Access Data in the Bucket Namespace

Monday, October 14, 13

Map()  FuncVon  =>  Index

function(doc,  meta)  {emit(doc.username,  doc.email)

}

Every Document passes through View Map() functions

Map

Monday, October 14, 13

Map()  FuncVon  =>  Index

function(doc,  meta)  {emit(doc.username,  doc.email)

}

json doc

Every Document passes through View Map() functions

Map

Monday, October 14, 13

Map()  FuncVon  =>  Index

function(doc,  meta)  {emit(doc.username,  doc.email)

}

json doc doc metadata

Every Document passes through View Map() functions

Map

Monday, October 14, 13

Map()  FuncVon  =>  Index

function(doc,  meta)  {emit(doc.username,  doc.email)

}create row

json doc doc metadata

Every Document passes through View Map() functions

Map

Monday, October 14, 13

Map()  FuncVon  =>  Index

function(doc,  meta)  {emit(doc.username,  doc.email)

}indexed keycreate row

json doc doc metadata

Every Document passes through View Map() functions

Map

Monday, October 14, 13

Map()  FuncVon  =>  Index

function(doc,  meta)  {emit(doc.username,  doc.email)

}indexed key output value(s)create row

json doc doc metadata

Every Document passes through View Map() functions

Map

Monday, October 14, 13

Single  Element  Keys  (Text  Key)

function(doc,  meta)  {emit(doc.email,  doc.points)

}

Map

Monday, October 14, 13

Single  Element  Keys  (Text  Key)

function(doc,  meta)  {emit(doc.email,  doc.points)

}text key

Map

Monday, October 14, 13

Single  Element  Keys  (Text  Key)

function(doc,  meta)  {emit(doc.email,  doc.points)

}text key

Map

meta.id doc.email doc.points

u::1 [email protected] 1000

u::35 [email protected] 1200

u::20 [email protected] 900

Monday, October 14, 13

Compound  Keys  (Array)

function(doc,  meta)  {emit(dateToArray(doc.timestamp),  1)

}

Array Based Index Keys get sorted as Strings, but can be grouped by array elements

Map

Monday, October 14, 13

Compound  Keys  (Array)

function(doc,  meta)  {emit(dateToArray(doc.timestamp),  1)

}array key

Array Based Index Keys get sorted as Strings, but can be grouped by array elements

Map

Monday, October 14, 13

Compound  Keys  (Array)

function(doc,  meta)  {emit(dateToArray(doc.timestamp),  1)

}array key

Array Based Index Keys get sorted as Strings, but can be grouped by array elements

Map

meta.id dateToArray(doc.3mestamp) value

u::20 [2012,10,9,18,45] 1

u::1 [2012,9,26,11,15] 1

u::35 [2012,8,13,2,12] 1

Monday, October 14, 13

32 32Monday, October 14, 13

QUERYING  VIEWS

32 32Monday, October 14, 13

View Query Parameters

•  key$=$“”$­  used%for%exact%match%of%index1key%

•  keys$=$[]$­  used%for%matching%set%of%index1keys%

•  startkey/endkey$=$“”$­  used%for%range%queries%on%index1keys%

•  startkey_docID/endkey_docID$=$“”$­  used%for%range%queries%on%meta.id%

•  stale=[false,$update_a;er,$true]$­  used%to%decide%indexer%behavior%from%client%

•  group/group_by$­  used%with%reduces%to%aggregate%with%grouping%

Monday, October 14, 13

Most  Common  Query’s  Are  Ranges

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

[email protected] u::4

[email protected] u::3

Monday, October 14, 13

Most  Common  Query’s  Are  Ranges

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

[email protected] u::4

[email protected] u::3

?startkey=”b1”  &  endkey=”zZ”

Pulls  the  Index-­‐Keys  between  UTF-­‐8  Range  specified  by  the  startkey  and  endkey.

Monday, October 14, 13

Most  Common  Query’s  Are  Ranges

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

[email protected] u::4

[email protected] u::3

?startkey=”bz”  &  endkey=”zn”

Pulls  the  Index-­‐Keys  between  UTF-­‐8  Range  specified  by  the  startkey  and  endkey.

Monday, October 14, 13

Most  Common  Query’s  Are  Ranges

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

[email protected] u::4

[email protected] u::3

Monday, October 14, 13

Index-­‐Key  Matching

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

[email protected] u::4

[email protected] u::3

Monday, October 14, 13

Index-­‐Key  Matching

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

[email protected] u::4

[email protected] u::3

?key=”[email protected]”  

Match  a  Single  Index-­‐Key

Monday, October 14, 13

Index-­‐Key  Set  Matches

doc.email meta.id

[email protected] u::1

[email protected] u::7

[email protected] u::2

[email protected] u::5

[email protected] u::6

[email protected] u::4

[email protected] u::3

?keys=[“[email protected]”,“[email protected]”]

Query  MulVple  in  the  Set  (Array  NotaVon)

Monday, October 14, 13

Understanding  CollaVon  Order

1234567890  <  aAbBcCdDeEfFgGhHiIjJkKlLmM...

Unicode  Colla3on

1234567890  <  a-­‐z  <  A-­‐ZByte  Order

a < á < A < Á < b

If  it  were  Byte  Order  2  Queries  Merged:

With  Unicode  Colla3on  gets  both  y  and  Y:

startkey="y"&endkey="z"  merged  with  startkey="Y"&endkey="Z"

startkey="y"&endkey="z"

Monday, October 14, 13

Understanding Stale

stale  =  UPDATE_AFTER  (default  if  nothing  is  specified)always  get  fastest  responsecan  take  two  queries  to  read  your  own  writes

stale  =  OKauto  update  will  trigger  eventuallymight  not  see  your  own  writes  for  a  few  minutesleast  frequent  updates  -­‐>  least  resource  impact

stale  =  FALSEUse  with  Persistence  observe  if  data  needs  to  be  included  in  view  resultsBUT  aware  of  delay  it  adds,  only  use  when  really  required

Monday, October 14, 13

Built-In Reduces

• Are faster than creating your own reduces for the same information- _count

• gives count for number of items in Index- _sum

• sums value parameters (for numeric values only)- _stats

• gives sum, count, min, max and sum of squares for statistics

Monday, October 14, 13

Custom Reduces

• Are a bit tricky at first, it's a skill!• Learn about it through our docs, practice first, most common

problem in custom reduces is that they don't "reduce" the data• Can be creatively used!• Always do it in a separate Design Document to sandbox it from

your existing Views, if you have a logic problem or error it won't interrupt existing Views

Monday, October 14, 13

32 32Monday, October 14, 13

BEER  SAMPLE  VIEW

32 32Monday, October 14, 13

Beer  Sample  Database  Example

{ "name": "Aventinus Weizenstarkbier / Doppel Weizen Bock", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, almost black-colored and streaked with fine top-fermenting yeast, this beer has a compact and persistent head. This is a very intense wheat doppelbock with a complex spicy chocolate-like arome with a hint of banana and raisins. On the palate, you experience a soft touch and on the tongue it is very rich and complex, though fresh with a hint of caramel. It finishes in a rich soft and lightly bitter impression.", "style": "South German-Style Weizenbock", "category": "German Ale"}

{ "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json"}

meta doc

Monday, October 14, 13

Beer  Sample  Database  Example

{ "name": "Aventinus Weizenstarkbier / Doppel Weizen Bock", "abv": 8.2, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "110f1f2012", "updated": "2010-07-22 20:00:20", "description": "Dark-ruby, almost black-colored and streaked with fine top-fermenting yeast, this beer has a compact and persistent head. This is a very intense wheat doppelbock with a complex spicy chocolate-like arome with a hint of banana and raisins. On the palate, you experience a soft touch and on the tongue it is very rich and complex, though fresh with a hint of caramel. It finishes in a rich soft and lightly bitter impression.", "style": "South German-Style Weizenbock", "category": "German Ale"}

{ "id": "110f37fa30", "rev": "1-000000000", "expiration": 0, "flags": 0, "type": "json"}

meta docalcohol by volume (abv)

brewery_id (key)document key

Monday, October 14, 13

Map  FuncVon  -­‐  Index  DefiniVon

30Monday, October 14, 13

Map  FuncVon  -­‐  Index  DefiniVon

30

+row

Monday, October 14, 13

Map  FuncVon  -­‐  Index  DefiniVon

30

indexed key+row

Monday, October 14, 13

Map  FuncVon  -­‐  Index  DefiniVon

30

indexed key value(s)+row

Monday, October 14, 13

Result  Set  -­‐  Brewery  ID’s  by  Beer

31Monday, October 14, 13

Result  Set  -­‐  Brewery  ID’s  by  Beer

31

brewery_id

document key (of the beer)

alcohol by volume (abv)

Monday, October 14, 13

Reduce  Values  (doc.abv)  with  _stats

34 34Monday, October 14, 13

Reduce  Values  (doc.abv)  with  _stats

34 34

add _stats built-in reduction

Monday, October 14, 13

Query  with  Group  and  Reduce

33

Find average alcohol by volume per brewery.

Monday, October 14, 13

Query  with  Group  and  Reduce

33

Find average alcohol by volume per brewery.

set group=true & reduce=true

add _stats built-in reduction

Monday, October 14, 13

Groups  Brewery_ID’s,  Reduces  for  Stats

35 35Brewery ID’s are Grouped, and _stats collected (Reduced)

Monday, October 14, 13

Groups  Brewery_ID’s,  Reduces  for  Stats

35 35

group=true & reduce=true

number of beers by this brewery max abvmin abv

Brewery ID’s are Grouped, and _stats collected (Reduced)

Monday, October 14, 13

Monday, October 14, 13

INTERFACE  DEMO

Monday, October 14, 13

Monday, October 14, 13

Q  &  A

Monday, October 14, 13