webinar couchbase 104 - views and indexing
DESCRIPTION
Learn the architecture and use of Views, the structure of Map-Reduce functions, design documents, querying views and view query parameters, primary aggregate reduces and grouping, eventual consistency of indexes and strategies of use. What will be covered during this training: What are Indexes What is a Map-Reduce Understanding Design Documents Admin Console Overview Anatomy of Map Functions Batch Processing Range Querying, Index-Key Querying, Set Querying RDBMS Queries vs. Map-Reduce Queries Grouping and Group Level Eventual Consistency and Stale Parameter Tips for Creating Views and Sandboxing TestsTRANSCRIPT
Technical Evangelist
twi0er: @scalabl3email: [email protected]
Jasdeep Jaitla
Couchbase 104: Views and Indexing
WHAT IS A VIEW?
Views are Indexes
• Indexes are methodologies to speed up access to information• Examples:- Dewey Decimal System- Card Catalogs- Hierarchal File Folders
• In databases, Indexes are specialized structures for searching for data, typically one or two key fields
Indexing Subsystem
• Storing data and Indexing data are separate systems in all databases
• In explicit schema scenarios (RDBMS), Indexes are optimized based on the data type(s)
• In flexible schema scenarios Map-Reduce is used to create indexes
What is Map-Reduce?
• Map-Reduce is a technique designed for dealing with Big Data and processing in parallel in distributed systems
• Map-Reduce is also specifically designed for dealing with unstructured or semi-structured data
• Map functions identify data with collections, process them, and output transformed values
• Reduce functions take the output of Map functions and perform numeric aggregate calculations on them
Views: Map-Reduce Indexes
• In Couchbase, Map-Reduce is specifically used to create Indexes
• Map functions are applied to JSON documents and they output or "emit" data that is organized in an Index form
CRUD Operations MAP()
emit()
(processed)
function (doc, meta) { if (doc.type == “beer” && doc.brewery_id && doc.name) { emit(doc.name, doc.abv); } }
Sample View
function (doc, meta) { if (doc.type == “beer” && doc.brewery_id && doc.name) { emit(doc.name, doc.abv); } }
Sample View
function (doc, meta) { if (doc.type == “beer” && doc.brewery_id && doc.name) { emit(doc.name, doc.abv); } }
Sample View
function (doc, meta) { if (doc.type == “beer” && doc.brewery_id && doc.name) { emit(doc.name, doc.abv); } }
Sample View
function (doc, meta) { if (doc.type == “beer” && doc.brewery_id && doc.name) { emit(doc.name, doc.abv); } }
Sample View
function (doc, meta) { if (doc.type == “beer” && doc.brewery_id && doc.name) { emit(doc.name, doc.abv); } }
Sample View
function (doc, meta) { if (doc.type == “beer” && doc.brewery_id && doc.name) { emit(doc.name, doc.abv); } }
Sample View
Sample View
• Creates an Index of Beer Names (doc.name) and the Alcohol By Volume values (doc.abv)
- Filters Documents• Only JSON Documents with json key doc.type == "beer"• and doc.brewery_id is non-null • and doc.name is non-null
- Outputs• Beer Name (doc.name) [searchable]• Beer Alcohol By Volume (doc.abv) [row value]
function (doc, meta) { if (doc.type == “beer” && doc.brewery_id && doc.name) { emit(doc.name, doc.abv); } }
ARCHITECTURE
Storage to Index
Couchbase Server
EP EngineRAM Cache
Disk Write Queue
Replication Queue
View Engine
Indexers
Application Server
Replica Couchbase Cluster Machine
Storage to Index
Couchbase Server
EP EngineRAM Cache
Disk Write Queue
Replication Queue
View Engine
Indexers
Application Server
storage ops
Replica Couchbase Cluster Machine
Views: Eventual Consistency
Couchbase Server
EP EngineRAM Cache
Disk Write Queue
Replication Queue
View Engine
Indexers
Application Server
Replica Couchbase Cluster Machine
Views: Eventual Consistency
Couchbase Server
EP EngineRAM Cache
Disk Write Queue
Replication Queue
View Engine
Indexers
Application Server
storage ops
Replica Couchbase Cluster Machine
Time 1
Views: Eventual Consistency
Couchbase Server
EP EngineRAM Cache
Disk Write Queue
Replication Queue
View Engine
Indexers
Application Server
Replica Couchbase Cluster Machine
Time 1
get
Views: Eventual Consistency
Couchbase Server
EP EngineRAM Cache
Disk Write Queue
Replication Queue
View Engine
Indexers
Application Server
Replica Couchbase Cluster Machine
Time 1
get
Time 2
Why Use Map-‐Reduce Indexes?
• Index (Find) Documents by different JSON Values
• Query Documents by JSON Values
• Create StaXsXcs and Aggregates
When are Indexes Necessary?
•Documents are Keyed by Random ProperXes (UUID, GUID, etc.)
• IteraXng through Lists of Documents with Random Keys
• IteraXng through Lists of Documents on different JSON ProperXes (i.e. all User docs, all Product docs, by Timestamp, etc.)
ANATOMY OF A VIEW
Buckets >> Design Documents >> Views
Couchbase Bucket
Buckets >> Design Documents >> Views
Couchbase Bucket
Design Document 1 Design Document 2
View ViewViewViewView
Buckets >> Design Documents >> Views
Couchbase Bucket
Design Document 1 Design Document 2
View ViewViewViewView
Indexers Are Allocated Per Design Doc
All Updated at Same TimeAll Updated at Same TimeAll Updated at Same Time
Can Only Access Data in the Bucket Namespace
Can Only Access Data in the Bucket Namespace
Map() FuncXon => Index
function(doc, meta) { emit(doc.username, doc.email)
}
Every Document passes through View Map() functions
Map
Map() FuncXon => Index
function(doc, meta) { emit(doc.username, doc.email)
}
json doc
Every Document passes through View Map() functions
Map
Map() FuncXon => Index
function(doc, meta) { emit(doc.username, doc.email)
}
json doc doc metadata
Every Document passes through View Map() functions
Map
Map() FuncXon => Index
function(doc, meta) { emit(doc.username, doc.email)
}create row
json doc doc metadata
Every Document passes through View Map() functions
Map
Map() FuncXon => Index
function(doc, meta) { emit(doc.username, doc.email)
}indexed keycreate row
json doc doc metadata
Every Document passes through View Map() functions
Map
Map() FuncXon => Index
function(doc, meta) { emit(doc.username, doc.email)
}indexed key output value(s)create row
json doc doc metadata
Every Document passes through View Map() functions
Map
Single Element Keys (Text Key)
function(doc, meta) { emit(doc.email, doc.points)
}
Map
Single Element Keys (Text Key)
function(doc, meta) { emit(doc.email, doc.points)
}text key
Map
Single Element Keys (Text Key)
function(doc, meta) { emit(doc.email, doc.points)
}text key
Map
meta.id doc.email doc.points
u::1 [email protected] 1000
u::35 [email protected] 1200
u::20 [email protected] 900
Compound Keys (Array)
function(doc, meta) { emit(dateToArray(doc.timestamp), 1)
}
Array Based Index Keys get sorted as Strings, but can be grouped by array elements
Map
Compound Keys (Array)
function(doc, meta) { emit(dateToArray(doc.timestamp), 1)
}array key
Array Based Index Keys get sorted as Strings, but can be grouped by array elements
Map
Compound Keys (Array)
function(doc, meta) { emit(dateToArray(doc.timestamp), 1)
}array key
Array Based Index Keys get sorted as Strings, but can be grouped by array elements
Map
meta.id dateToArray(doc.3mestamp) value
u::20 [2012,10,9,18,45] 1
u::1 [2012,9,26,11,15] 1
u::35 [2012,8,13,2,12] 1
32 32
QUERYING VIEWS
32 32
View Query Parameters
• key$=$“”$ used%for%exact%match%of%index1key%
• keys$=$[]$ used%for%matching%set%of%index1keys%
• startkey/endkey$=$“”$ used%for%range%queries%on%index1keys%
• startkey_docID/endkey_docID$=$“”$ used%for%range%queries%on%meta.id%
• stale=[false,$update_a;er,$true]$ used%to%decide%indexer%behavior%from%client%
• group/group_by$ used%with%reduces%to%aggregate%with%grouping%
Most Common Query’s Are Ranges
doc.email meta.id
[email protected] u::1
[email protected] u::7
[email protected] u::2
[email protected] u::5
[email protected] u::6
[email protected] u::4
[email protected] u::3
Most Common Query’s Are Ranges
doc.email meta.id
[email protected] u::1
[email protected] u::7
[email protected] u::2
[email protected] u::5
[email protected] u::6
[email protected] u::4
[email protected] u::3
?startkey=”b1” & endkey=”zZ”
Pulls the Index-‐Keys between UTF-‐8 Range specified by the startkey and endkey.
Most Common Query’s Are Ranges
doc.email meta.id
[email protected] u::1
[email protected] u::7
[email protected] u::2
[email protected] u::5
[email protected] u::6
[email protected] u::4
[email protected] u::3
?startkey=”bz” & endkey=”zn”
Pulls the Index-‐Keys between UTF-‐8 Range specified by the startkey and endkey.
Most Common Query’s Are Ranges
doc.email meta.id
[email protected] u::1
[email protected] u::7
[email protected] u::2
[email protected] u::5
[email protected] u::6
[email protected] u::4
[email protected] u::3
Index-‐Key Matching
doc.email meta.id
[email protected] u::1
[email protected] u::7
[email protected] u::2
[email protected] u::5
[email protected] u::6
[email protected] u::4
[email protected] u::3
Index-‐Key Matching
doc.email meta.id
[email protected] u::1
[email protected] u::7
[email protected] u::2
[email protected] u::5
[email protected] u::6
[email protected] u::4
[email protected] u::3
?key=”[email protected]”
Match a Single Index-‐Key
Index-‐Key Set Matches
doc.email meta.id
[email protected] u::1
[email protected] u::7
[email protected] u::2
[email protected] u::5
[email protected] u::6
[email protected] u::4
[email protected] u::3
?keys=[“[email protected]”, “[email protected]”]
Query MulXple in the Set (Array NotaXon)
Understanding CollaXon Order
1234567890 < aAbBcCdDeEfFgGhHiIjJkKlLmM...
Unicode Colla3on
1234567890 < a-‐z < A-‐ZByte Order
a < á < A < Á < b
If it were Byte Order 2 Queries Merged:
With Unicode Colla3on gets both y and Y:
startkey="y"&endkey="z" merged with startkey="Y"&endkey="Z"
startkey="y"&endkey="z"
Understanding Stale
stale = UPDATE_AFTER (default if nothing is specified) always get fastest response can take two queries to read your own writes
stale = OK auto update will trigger eventually might not see your own writes for a few minutes least frequent updates -‐> least resource impact
stale = FALSE Use with Persistence observe if data needs to be included in view results BUT aware of delay it adds, only use when really required
Built-In Reduces
• Are faster than creating your own reduces for the same information
- _count • gives count for number of items in Index
- _sum • sums value parameters (for numeric values only)
- _stats • gives sum, count, min, max and sum of squares for
statistics!
!
Custom Reduces
• Are a bit tricky at first, it's a skill!• Learn about it through our docs, practice first, most common
problem in custom reduces is that they don't "reduce" the data• Can be creatively used!• Always do it in a separate Design Document to sandbox it from
your existing Views, if you have a logic problem or error it won't interrupt existing Views
32 32
BEER SAMPLE VIEW
32 32
Beer Sample Database Example
{! "name": "Aventinus Weizenstarkbier / Doppel Weizen Bock",! "abv": 8.2,! "ibu": 0,! "srm": 0,! "upc": 0,! "type": "beer",! "brewery_id": "110f1f2012",! "updated": "2010-07-22 20:00:20",! "description": "Dark-ruby, almost black-colored and streaked with fine top-fermenting yeast, this beer has a compact and persistent head. This is a very intense wheat doppelbock with a complex spicy chocolate-like arome with a hint of banana and raisins. On the palate, you experience a soft touch and on the tongue it is very rich and complex, though fresh with a hint of caramel. It finishes in a rich soft and lightly bitter impression.",! "style": "South German-Style Weizenbock",! "category": "German Ale"!}
{! "id": "110f37fa30",! "rev": "1-000000000",! "expiration": 0,! "flags": 0,! "type": "json"!}
meta doc
Beer Sample Database Example
{! "name": "Aventinus Weizenstarkbier / Doppel Weizen Bock",! "abv": 8.2,! "ibu": 0,! "srm": 0,! "upc": 0,! "type": "beer",! "brewery_id": "110f1f2012",! "updated": "2010-07-22 20:00:20",! "description": "Dark-ruby, almost black-colored and streaked with fine top-fermenting yeast, this beer has a compact and persistent head. This is a very intense wheat doppelbock with a complex spicy chocolate-like arome with a hint of banana and raisins. On the palate, you experience a soft touch and on the tongue it is very rich and complex, though fresh with a hint of caramel. It finishes in a rich soft and lightly bitter impression.",! "style": "South German-Style Weizenbock",! "category": "German Ale"!}
{! "id": "110f37fa30",! "rev": "1-000000000",! "expiration": 0,! "flags": 0,! "type": "json"!}
meta docalcohol by volume (abv)
brewery_id (key)document key
Map FuncXon -‐ Index DefiniXon
30
Map FuncXon -‐ Index DefiniXon
30
+row
Map FuncXon -‐ Index DefiniXon
30
indexed key+row
Map FuncXon -‐ Index DefiniXon
30
indexed key value(s)+row
Result Set -‐ Brewery ID’s by Beer
31
Result Set -‐ Brewery ID’s by Beer
31
brewery_id
document key (of the beer)
alcohol by volume (abv)
Reduce Values (doc.abv) with _stats
34 34
Reduce Values (doc.abv) with _stats
34 34
add _stats built-in reduction
Query with Group and Reduce
33
Find average alcohol by volume per brewery.
Query with Group and Reduce
33
Find average alcohol by volume per brewery.
set group=true & reduce=true
add _stats built-in reduction
Groups Brewery_ID’s, Reduces for Stats
35 35Brewery ID’s are Grouped, and _stats collected (Reduced)
Groups Brewery_ID’s, Reduces for Stats
35 35
group=true & reduce=true
number of beers by this brewery max abvmin abv
Brewery ID’s are Grouped, and _stats collected (Reduced)
INTERFACE DEMO
Q & A