how-to nosql 3.0 webinar series: couchbase 103 - data modeling
DESCRIPTION
In Couchbase 103 for 3.0, you'll learn the fundamentals of creating data models with Couchbase 3.0 including modeling, JSON strategies, common key patterns. We'll also explore modeling differences between NOSQL and RDBMS systems.TRANSCRIPT
Couchbase 103Todd Greenstein | Engineering, Couchbase
Modeling in NOSQL vs RDBMS
Key-value store with:
special support for JSON documents
counter and string data types
store binaries up to 20MB
Built-in and transparent memcached-compatible caching layer
Distributed around a cluster of servers
Generate secondary indexes using map/reduce queries
The basics of Couchbase Server
©2014 Couchbase, Inc. 3
RDMS Modeling
©2014 Couchbase, Inc. 4
• RDBMS organizes data as tables
- Tables represent data in rows; n columns of m rows
- Table rows have a specific schema, each column as a static type
- Simple Datatypes: strings, numbers, datetimes, booleans, can be
represented by columns in a single table
- Complex Datatypes: dictionaries/hashes, arrays/lists are difficult to
be represented in a single table [Impedence Mismatch]
• All rows have identical schema, schema changes are painful and
resource intensive
• Reading/Writing/Transactions require locking
Couchbase – NOSQL Modeling
©2014 Couchbase, Inc. 5
• Couchbase operates like a Key-Value Document Store
- Simple Datatypes: strings, numbers, datetime, boolean, and binary data
can be stored; they are stored as Base64 encoded strings
- Complex Datatypes: dictionaries/hashes, arrays/lists, can be stored in
JSON format (simple lists can be string based with delimiter)
- JSON is a special class of string with a specific format for encoding simple
and complex data structures
• Schema is unenforced and implicit, schema changes are programmatic, done
online, and can vary from Document to Document
• Document defined schema –”Schema-less” is misleading and inaccurate
Applying the Technology to the Problem
Relational databases are optimised for questions
©2014 Couchbase, Inc. 7
Simple ecommerce example
©2014 Couchbase, Inc. 8
RDMS Complex DataTypes
©2014 Couchbase, Inc. 9
public class User {
private String name;
private String email;
private Integer age;
private Boolean gender_male;
private DateTime created_at;
private ArrayList items_viewed;
private Hashtable preferences;
private ArrayList<Books>
authored;
public User(...) {
...
}
...
}
• Simple Types are easy, make them
columns
• Complex Types are more
challenging, require separate tables
and joins, slower to store and
retrieve
• ORM's reduce complexity but trade
off additional speed/scale, hard to
optimize
Document databases are optimised for answers
©2014 Couchbase, Inc. 10
That order in a heavily denormalised document database
©2014 Couchbase, Inc. 11
Answer oriented databases
©2014 Couchbase, Inc.
order::1001{
uid: ji22jd,customer: Ann,line_items: [
{ sku: 0321293533, quan: 3, unit_price: 48.0 },{ sku: 0321601912, quan: 1, unit_price: 39.0 },{ sku: 0131495054, quan: 1, unit_price: 51.0 }
],payment: { type: Amex, expiry: 04/2001,
last5: 12345 }}
Storing together the data that we access together is efficient
SQL queries are slow because aggregations are slower
Aggregated Documents are easy to distribute
Why optimise for a certain set of questions?
©2014 Couchbase, Inc. 13
Serialization
©2014 Couchbase, Inc. 14
public class User {
private String name;
private String email;
private Integer age;
private Boolean gender_male;
private DateTime created_at;
private ArrayList items_viewed;
private Hashtable preferences;
private ArrayList<Books>
authored;
public User(...) {
...
}
...
}
“User”:{
“name”:”jack benny”,
“email”:[email protected],
“age”:”39”,
“gender”:”male”,
“created_at”:” October 13, 2014 11:13:00”,
“items_viewed”:{
…}
“preferences”:{
…}
“books”:{
…}
}
Denormalization
You could think that denormalisation is a credo of NoSQL.
In the real world, we denormalise all the time in Couchbase.
We have to decide when to embed data (i.e. denormalise) and when to refer to data.
Denormalisation
©2014 Couchbase, Inc. 16
You should embed data when:
You need speed of access (less of a concern with Couchbase)
Reads outnumber writes
You are comfortable with the slim risk of two denormalisedoccurrences of the same data losing sync, or understand programming models around these conditions.
When to embed
©2014 Couchbase, Inc. 17
You should refer to data when:
Query flexibility is important
Consistency is a priority
The data has large growth potential
When to refer
©2014 Couchbase, Inc. 18
Usually, there’s still a schema when we use Couchbase.
The difference is:
Couchbase doesn’t enforce the schema
If schema matters, you can enforce it at the application side
Schema can vary completely from document to document
Migrations are cheap and asynchronous
Impedence mismatch is yesterday’s problem
It’s still okay to store unstructured data
Schema unenforced
©2014 Couchbase, Inc. 19
The key is the key
Key design is as important as document design.
There are three broad types of key:
Human readable/deterministic: e.g. an email address
Computer generated/random: e.g. UUID
Compound: e.g. UUID with a deterministic portion
Three ways to build a key
©2014 Couchbase, Inc. 21
Human readable/deterministic
©2014 Couchbase, Inc. 22
public class user {
private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;private Array orders;private Array productsViewed;
}
{"name": "Matthew Revell","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ],“productsViewed”: [8, 33, 99, 100]
}
Key: [email protected]
Random/computer genereated
©2014 Couchbase, Inc. 23
{"name": "Matthew Revell","email": "[email protected]","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ],“productsViewed”: [8, 33, 99, 100]
}
Key: 1001
Multiple look-up documents
©2014 Couchbase, Inc. 24
u::count
1001
u::1001
{ "name": “Matthew Revell",
"facebook_id": 16172910,
"email": “[email protected]”,
“password”: ab02d#Jf02K
"created_at": "5/1/2012 2:30am",
“facebook_access_token”: xox0v2dje20,
“twitter_access_token”: 20jffieieaaixixj }
fb::16172910
1001
nflx::2939202
1001
twtr::2920283830
1001
1001
1001
uname::mrevell
1001
Compound keys
Compound keys are look-up documents with a predictable name.
It’s a continuation of the embedded versus referred data discussion.
Compound keys: example
u::1001
{
"name": "Matthew Revell",
"email": "[email protected]",
"address": "11-21 Paul Street",
"city": "London",
"postCode": "EC2A 4JU",
"telephone": "44-20-3837-9130",
"orders": [ 1, 9, 698, 32 ],
“productsViewed”: [8, 33, 99, 100]
}
Compound keys: example
u::1001
{
"name": "Matthew Revell",
"email": "[email protected]",
"address": "11-21 Paul Street",
"city": "London",
"postCode": "EC2A 4JU",
"telephone": "44-20-3837-9130",
"orders": [ 1, 9, 698, 32 ]
}
u::1001::productsviewed
{"productsList": [
8, 33, 99, 100]
}
Compound keys: example
u::1001
{
"name": "Matthew Revell",
"email": "[email protected]",
"address": "11-21 Paul Street",
"city": "London",
"postCode": "EC2A 4JU",
"telephone": "44-20-3837-9130",
"orders": [ 1, 9, 698, 32 ]
}
u::1001::productsviewed
{"productsList": [
8, 33, 99, 100]
}
p::8
{
id": 1,"name": "T-shirt","description": "Red Couchbase shirt","quantityInStock": 99,"image": "tshirt.jpg”
}
Compound keys: example
u::1001
{
"name": "Matthew Revell",
"email": "[email protected]",
"address": "11-21 Paul Street",
"city": "London",
"postCode": "EC2A 4JU",
"telephone": "44-20-3837-9130",
"orders": [ 1, 9, 698, 32 ]
}
u::1001::productsviewed
{"productsList": [
8, 33, 99, 100]
}
p::8
{
id": 1,"name": "T-shirt","description": "Red Couchbase shirt","quantityInStock": 99
}
p::8::img
“http://someurl.com/tshirt.jpg”
Couchbase views and N1QL are amazing.
You should use them where:
You discover new query patterns.
You have short-lived query types.
Ad-hoc querying.
However: user defined indexes should be your first port of call.
What about automatic indexes?
©2014 Couchbase, Inc. 30
Demo
Couchbase + Node.JS + Express + Bootstrap
Demo Presentation
©2014 Couchbase, Inc. 32
{
"name": "Aliza Kshlerin",
"username": "Felicita_Reichert61",
"email": "[email protected]",
"address": {
"street": "Ericka Route",
"suite": "Apt. 077",
"city": "Effertzfurt",
"zipcode": "83625",
"geo": {
"lat": "15.5566",
"lng": "-109.3184"
}
},
"phone": "082-502-1159",
"website": "trace.com",
"company": {
"name": "Altenwerth, Sawayn and Kiehn",
"catchPhrase": "Face to face upward-trending matrices",
"bs": "vertical aggregate infrastructures"
}
}
Mock User, generated using faker.js
• Wonderful Library for Testing
• Easily used with node
• More info: https://github.com/marak/Faker.js/
Further Information
©2014 Couchbase, Inc. 33
Couchbase Node.js Client API Reference: http://docs.couchbase.com/sdk-
api/couchbase-node-client-2.0.0/
N1QL Documentation:
• http://docs.couchbase.com/developer/n1ql-dp3/n1ql-intro.html
Next Session:
• Couchbase 104 Views and Indexes on 11/19/2014 - In this installment explore the
power of creating views and indexes in Couchbase. Learn the underlying view
architecture for how views and indexes are built in Couchbase. Explore
strategies for creating performant and efficient lookups of data stored within the
database including custom reduce operations.