relational databases to riak

39
This presentation includes information that is condential and proprietary to Basho Technologies and should not be forwarded or distributed without Basho's prior written consent. © 2014. Basho Technologies, Inc. All Rights Reserved. This presentation includes information that is condential and proprietary to Basho Technologies and should not be forwarded or distributed without Basho's prior written consent. © 2014. Basho Technologies, Inc. All Rights Reserved. Matt Brender Developer Advocate From Relational to Riak

Upload: basho-technologies

Post on 21-Jul-2015

180 views

Category:

Technology


1 download

TRANSCRIPT

This presentation includes information that is confidential and proprietary to Basho Technologies and should not be forwarded or distributed without Basho's prior written consent. © 2014. Basho Technologies, Inc. All Rights Reserved.

This presentation includes information that is confidential and proprietary to Basho Technologies and should not be forwarded or distributed without Basho's prior written consent. © 2014. Basho Technologies, Inc. All Rights Reserved.

Matt Brender Developer Advocate

From

Relational to Riak

Relational databases give us a lot

Relationships Transactions Schemas Ad-hoc queries (SQL)

RDBMS BE LIKE

4

=>

RDBMS BE LIKE

5

OR

It’s not about NoSQL

What we really need..

is a database that makes my app faster

Enter Riak

Scalable Key => Value Schema-less Eventually Consistent Highly Available

Key => Value Masterless Schema-less Fault Tolerant High Availability Queries & Search

Scalable

Riak has a masterless architecture in which every node in a cluster is capable of serving read and write requests.

Requests are routed to nodes using standard load balancing appliances or software like Nginx or HAProxy.

Scalable

Data is guaranteed to be evenly distributed. Instead of manually sharding (partitioning) data Riak automatically distributes data evenly across a cluster by hashing keys using the SHA-1 algorithm that converts the key (bucket/key combination) into a number from:

0 - 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,976

or

0 - 2160

Scalable

Consistent Hashing – The Ring

0 - 2160

•  Linear ScalingRiak scales in a near-linear fashion so increasing the number of a nodes in a cluster increases the number of reads and writes a cluster can handle in a predictable fashion.

If 10 nodes can serve: 40,000 Writes/Second

Then 20 nodes should serve: 72,000+ Writes/Second

“To enable rapid iteration at scale, Riot moved to Riak to support millions of concurrent players at any moment.”

Scalable

RELATIONAL SCALABILITY

16

•  Designed for vertical scale

•  Cost Considerations a key element of vertical scaling

•  Sharding or re-distribution is I/O intensive

A - K L - P Q - Z

Key => Value

Riak stores data as a combination of keys and values in buckets

•  Keys – simply binary values used to identify Objects.*

•  Values – can be numbers, strings, objects, binaries, etc.

•  Buckets – used to define a virtual namespace for storing Riak objects.

Key => Value

curl http://127.0.0.1:8098/types/places/buckets/country/keys/US

{

"Alpha2_s": "US”, "Alpha3_s": "USA”, "EnglishName_s": "United States”, "NumericCode_i": 840 }

Riak offers both HTTP and Protocol Buffers APIs. The following HTTP API example uses curl to retrieve a value by key:

Note: Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler.

There are a diverse group of client libraries for Riak that support both the HTTP and Protocol Buffer APIs:

Key => Value

Basho Supported Libraries:•  Java•  Ruby•  Python•  PHP•  Erlang•  .NET•  Node.js

Community Libraries:•  C•  Clojure•  Go•  Perl•  Scala•  R

Schemas are not enforced by Riak, but by your application.

Schema-less

You still:•  Design a schema•  Denormalize dependent

data types

But get:•  Single reads for common

access patterns•  Richer, simpler data

structures

curl http://127.0.0.1:8098/types/places/buckets/country/keys/US

{

"Alpha2_s": "US”, "Alpha3_s": "USA”, "EnglishName_s": "United States”, "NumericCode_i": 840 }

Schemas are not enforced by Riak, but by your application.

Schema-less

Application Type Key Value

Session User/Session ID Session Data

Advertising Campaign ID Ad Content Logs Date Log File

Sensor Date, Date/Time Sensor Updates

User Data Login, email, UUID User Attributes

Content Title, Integer Text, JSON/XML/HTTP document, images, etc.

Eventually Consistent

C = ConsistencyA = AvailabilityP = Partition Tolerance

Client Client

DBDBDB

Network Partition

Cap theorem states that a distributed system can at most support 2 out of these 3 properties

Eventually Consistent

Read repair operations take place on every successful read, which updates replicas copy that may be out of sync.

Active anti-entropy (AAE) is a background operation that compares Merkle trees to repair operations.

Nodes periodically send their current view of the ring state to a randomly selected peer over gossip protocol.

get(“bucket/key”)

Eventually Consistent

Dotted Version Vectors are a tool used by Riak to track the logical sequence of updates to a key/value pair (versus the chronological order of events) and manage the process of merging siblings created as one of the side effects of eventual consistency.

A:1 B:1A:1

C:1B:1

C:2B:1 C2

C1

> curl http://127.0.0.1:8098/types/places/buckets/country/keys/US Siblings: 47fGOQwxRzq6wsbM7idvFB 2mJD0DEGoxdxdHUqS3bYt3 7Y68tqVG99xHBDu7AKtmb4

> curl -H "Accept: multipart/mixed" http://127.0.0.1:8098/types/places/buckets/country/keys/US

--RigRoRk6lkPXYIqBOv1jKEacnlr Content-Type: application/json Link: </buckets/country>; rel="up” Etag: 47fGOQwxRzq6wsbM7idvFB Last-Modified: Wed, 05 Nov 2014 22:44:00 GMT {"Alpha2_s":"US","Alpha3_s":"USA","EnglishName_s":"United States","NumericCode_i":840} --RigRoRk6lkPXYIqBOv1jKEacnlr Content-Type: application/json Link: </buckets/country>; rel="up”

...

Eventually Consistent

Riak Data Types (Convergent Replicated Data Types or CRDTs) are a developer-friendly way to keep track of updates in an eventually consistent environment:

•  MapSupports the nest of and of the Riak Data Types.

•  RegisterA named binary field that can only be used as part of a Map.

•  Counter Keeps tracks of increments and decrements on an integer

•  FlagValues limited to enable or disable

•  SetA collection of unique binary values that supports add and remove operations on one or more values

Eventually Consistent

Hinted handoff allows Riak nodes to temporarily take over storage operations for a failed node and update that node with changes when it comes back online.

put(“bucket/key”)

High Availability

RELATIONAL AVAILABILITY

28

•  Master/Replica Architecture

•  Assumption of Transactional Consistency

•  What happens under failure conditions?

master

replica replica replica

coordination

X X

Write/ Read

Write/ Read

WAIT

master

coordination

Riak automatically replicates between clusters•  Configurable number of

remote replicas•  Options for real-time sync and

full sync•  Spanning tree support for

cascading replication

Geo-Data Locality allows localized data processing

•  Reduced latency to end-users

•  Allows sub 5ms responses •  Active-Active ensures

continuous user experience

High Availability

Riak Multi-Datacenter (MDC) Replication

IN REVIEW

RDBMS BE LIKE

32

OR

CV CV

NoSQL Database

Unstructured Data

No pre-defined Schema

Small and Large Data Sets on Commodity HW

Many Models:

K/V, document store, graph

Variety of Query Methods

RELATIONAL & NOSQL What’s the difference?

Relational Database

Structured Data

Defined Schema

Tables with Rows/Columns

Indexed

w/ Table Joins

SQL

33

Biggest change for dev: reads

Biggest change for ops: administration

THE COST OF DOWNTIME

36

WHAT YOU WILL GAIN

37

More flexible, fluid designs

More natural data representations

Scaling without pain

Reduced operational complexity

RIAK DEPLOYED WORLDWIDE

39

Questions