how-to nosql 3.0 webinar series: couchbase 103 - data modeling

33
Couchbase 103 Todd Greenstein | Engineering, Couchbase

Upload: couchbase

Post on 08-Jul-2015

761 views

Category:

Software


9 download

DESCRIPTION

In Couchbase 103 for 3.0, you'll learn the fundamentals of creating data models with Couchbase 3.0 including modeling, JSON strategies, common key patterns. We'll also explore modeling differences between NOSQL and RDBMS systems.

TRANSCRIPT

Page 1: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Couchbase 103Todd Greenstein | Engineering, Couchbase

Page 2: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Modeling in NOSQL vs RDBMS

Page 3: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Key-value store with:

special support for JSON documents

counter and string data types

store binaries up to 20MB

Built-in and transparent memcached-compatible caching layer

Distributed around a cluster of servers

Generate secondary indexes using map/reduce queries

The basics of Couchbase Server

©2014 Couchbase, Inc. 3

Page 4: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

RDMS Modeling

©2014 Couchbase, Inc. 4

• RDBMS organizes data as tables

- Tables represent data in rows; n columns of m rows

- Table rows have a specific schema, each column as a static type

- Simple Datatypes: strings, numbers, datetimes, booleans, can be

represented by columns in a single table

- Complex Datatypes: dictionaries/hashes, arrays/lists are difficult to

be represented in a single table [Impedence Mismatch]

• All rows have identical schema, schema changes are painful and

resource intensive

• Reading/Writing/Transactions require locking

Page 5: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Couchbase – NOSQL Modeling

©2014 Couchbase, Inc. 5

• Couchbase operates like a Key-Value Document Store

- Simple Datatypes: strings, numbers, datetime, boolean, and binary data

can be stored; they are stored as Base64 encoded strings

- Complex Datatypes: dictionaries/hashes, arrays/lists, can be stored in

JSON format (simple lists can be string based with delimiter)

- JSON is a special class of string with a specific format for encoding simple

and complex data structures

• Schema is unenforced and implicit, schema changes are programmatic, done

online, and can vary from Document to Document

• Document defined schema –”Schema-less” is misleading and inaccurate

Page 6: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Applying the Technology to the Problem

Page 7: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Relational databases are optimised for questions

©2014 Couchbase, Inc. 7

Page 8: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Simple ecommerce example

©2014 Couchbase, Inc. 8

Page 9: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

RDMS Complex DataTypes

©2014 Couchbase, Inc. 9

public class User {

private String name;

private String email;

private Integer age;

private Boolean gender_male;

private DateTime created_at;

private ArrayList items_viewed;

private Hashtable preferences;

private ArrayList<Books>

authored;

public User(...) {

...

}

...

}

• Simple Types are easy, make them

columns

• Complex Types are more

challenging, require separate tables

and joins, slower to store and

retrieve

• ORM's reduce complexity but trade

off additional speed/scale, hard to

optimize

Page 10: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Document databases are optimised for answers

©2014 Couchbase, Inc. 10

Page 11: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

That order in a heavily denormalised document database

©2014 Couchbase, Inc. 11

Page 12: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Answer oriented databases

©2014 Couchbase, Inc.

order::1001{

uid: ji22jd,customer: Ann,line_items: [

{ sku: 0321293533, quan: 3, unit_price: 48.0 },{ sku: 0321601912, quan: 1, unit_price: 39.0 },{ sku: 0131495054, quan: 1, unit_price: 51.0 }

],payment: { type: Amex, expiry: 04/2001,

last5: 12345 }}

Page 13: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Storing together the data that we access together is efficient

SQL queries are slow because aggregations are slower

Aggregated Documents are easy to distribute

Why optimise for a certain set of questions?

©2014 Couchbase, Inc. 13

Page 14: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Serialization

©2014 Couchbase, Inc. 14

public class User {

private String name;

private String email;

private Integer age;

private Boolean gender_male;

private DateTime created_at;

private ArrayList items_viewed;

private Hashtable preferences;

private ArrayList<Books>

authored;

public User(...) {

...

}

...

}

“User”:{

“name”:”jack benny”,

“email”:[email protected],

“age”:”39”,

“gender”:”male”,

“created_at”:” October 13, 2014 11:13:00”,

“items_viewed”:{

…}

“preferences”:{

…}

“books”:{

…}

}

Page 15: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Denormalization

Page 16: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

You could think that denormalisation is a credo of NoSQL.

In the real world, we denormalise all the time in Couchbase.

We have to decide when to embed data (i.e. denormalise) and when to refer to data.

Denormalisation

©2014 Couchbase, Inc. 16

Page 17: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

You should embed data when:

You need speed of access (less of a concern with Couchbase)

Reads outnumber writes

You are comfortable with the slim risk of two denormalisedoccurrences of the same data losing sync, or understand programming models around these conditions.

When to embed

©2014 Couchbase, Inc. 17

Page 18: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

You should refer to data when:

Query flexibility is important

Consistency is a priority

The data has large growth potential

When to refer

©2014 Couchbase, Inc. 18

Page 19: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Usually, there’s still a schema when we use Couchbase.

The difference is:

Couchbase doesn’t enforce the schema

If schema matters, you can enforce it at the application side

Schema can vary completely from document to document

Migrations are cheap and asynchronous

Impedence mismatch is yesterday’s problem

It’s still okay to store unstructured data

Schema unenforced

©2014 Couchbase, Inc. 19

Page 20: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

The key is the key

Page 21: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Key design is as important as document design.

There are three broad types of key:

Human readable/deterministic: e.g. an email address

Computer generated/random: e.g. UUID

Compound: e.g. UUID with a deterministic portion

Three ways to build a key

©2014 Couchbase, Inc. 21

Page 22: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Human readable/deterministic

©2014 Couchbase, Inc. 22

public class user {

private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;private Array orders;private Array productsViewed;

}

{"name": "Matthew Revell","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ],“productsViewed”: [8, 33, 99, 100]

}

Key: [email protected]

Page 23: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Random/computer genereated

©2014 Couchbase, Inc. 23

{"name": "Matthew Revell","email": "[email protected]","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ],“productsViewed”: [8, 33, 99, 100]

}

Key: 1001

Page 24: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Multiple look-up documents

©2014 Couchbase, Inc. 24

u::count

1001

u::1001

{ "name": “Matthew Revell",

"facebook_id": 16172910,

"email": “[email protected]”,

“password”: ab02d#Jf02K

"created_at": "5/1/2012 2:30am",

“facebook_access_token”: xox0v2dje20,

“twitter_access_token”: 20jffieieaaixixj }

fb::16172910

1001

nflx::2939202

1001

twtr::2920283830

1001

em::[email protected]

1001

em::[email protected]

1001

uname::mrevell

1001

Page 25: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Compound keys

Compound keys are look-up documents with a predictable name.

It’s a continuation of the embedded versus referred data discussion.

Page 26: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Compound keys: example

u::1001

{

"name": "Matthew Revell",

"email": "[email protected]",

"address": "11-21 Paul Street",

"city": "London",

"postCode": "EC2A 4JU",

"telephone": "44-20-3837-9130",

"orders": [ 1, 9, 698, 32 ],

“productsViewed”: [8, 33, 99, 100]

}

Page 27: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Compound keys: example

u::1001

{

"name": "Matthew Revell",

"email": "[email protected]",

"address": "11-21 Paul Street",

"city": "London",

"postCode": "EC2A 4JU",

"telephone": "44-20-3837-9130",

"orders": [ 1, 9, 698, 32 ]

}

u::1001::productsviewed

{"productsList": [

8, 33, 99, 100]

}

Page 28: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Compound keys: example

u::1001

{

"name": "Matthew Revell",

"email": "[email protected]",

"address": "11-21 Paul Street",

"city": "London",

"postCode": "EC2A 4JU",

"telephone": "44-20-3837-9130",

"orders": [ 1, 9, 698, 32 ]

}

u::1001::productsviewed

{"productsList": [

8, 33, 99, 100]

}

p::8

{

id": 1,"name": "T-shirt","description": "Red Couchbase shirt","quantityInStock": 99,"image": "tshirt.jpg”

}

Page 29: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Compound keys: example

u::1001

{

"name": "Matthew Revell",

"email": "[email protected]",

"address": "11-21 Paul Street",

"city": "London",

"postCode": "EC2A 4JU",

"telephone": "44-20-3837-9130",

"orders": [ 1, 9, 698, 32 ]

}

u::1001::productsviewed

{"productsList": [

8, 33, 99, 100]

}

p::8

{

id": 1,"name": "T-shirt","description": "Red Couchbase shirt","quantityInStock": 99

}

p::8::img

“http://someurl.com/tshirt.jpg”

Page 30: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Couchbase views and N1QL are amazing.

You should use them where:

You discover new query patterns.

You have short-lived query types.

Ad-hoc querying.

However: user defined indexes should be your first port of call.

What about automatic indexes?

©2014 Couchbase, Inc. 30

Page 31: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Demo

Couchbase + Node.JS + Express + Bootstrap

Page 32: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Demo Presentation

©2014 Couchbase, Inc. 32

{

"name": "Aliza Kshlerin",

"username": "Felicita_Reichert61",

"email": "[email protected]",

"address": {

"street": "Ericka Route",

"suite": "Apt. 077",

"city": "Effertzfurt",

"zipcode": "83625",

"geo": {

"lat": "15.5566",

"lng": "-109.3184"

}

},

"phone": "082-502-1159",

"website": "trace.com",

"company": {

"name": "Altenwerth, Sawayn and Kiehn",

"catchPhrase": "Face to face upward-trending matrices",

"bs": "vertical aggregate infrastructures"

}

}

Mock User, generated using faker.js

• Wonderful Library for Testing

• Easily used with node

• More info: https://github.com/marak/Faker.js/

Page 33: How-To NoSQL 3.0 Webinar Series: Couchbase 103 - Data Modeling

Further Information

©2014 Couchbase, Inc. 33

Couchbase Node.js Client API Reference: http://docs.couchbase.com/sdk-

api/couchbase-node-client-2.0.0/

N1QL Documentation:

• http://docs.couchbase.com/developer/n1ql-dp3/n1ql-intro.html

Next Session:

• Couchbase 104 Views and Indexes on 11/19/2014 - In this installment explore the

power of creating views and indexes in Couchbase. Learn the underlying view

architecture for how views and indexes are built in Couchbase. Explore

strategies for creating performant and efficient lookups of data stored within the

database including custom reduce operations.