yes, sql!

38
Yes, SQL! Uri Cohen

Upload: uri-cohen

Post on 15-Jan-2015

990 views

Category:

Technology


0 download

DESCRIPTION

From qconsf 2010 - this presentation focuses on how the classic querying models like plain SQL and JPA map to distributed data stores. It first reviews the current distributed data stores landscape and its querying models, and then discuss the wide range of APIs for data extraction from these data stores. It then discusses the main challenges of mapping various APIs to a distributed data model and the trade offs to be aware off.

TRANSCRIPT

Page 1: Yes, Sql!

Yes, SQL!

Uri Cohen

Page 2: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 2

> SELECT * FROM qcon2010.speakers WHERE name=‘Uri Cohen’

+-----------------------------------------------------+| Name | Company | Role | Twitter |+-----------------------------------------------------+| Uri Cohen | GigaSpaces | Product Manager | @uri1803 |+-----------------------------------------------------+

> db.speakers.find({name:”Uri Cohen”}){ “name”:”Uri Cohen”, “company”: { name:”GigaSpaces”, products:[“XAP”, “IMDG”] domain: “In memory data grids” } “role”:”product manager”, “twitter”:”@uri1803”}

Page 3: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 3

Agenda

• SQL– What it is and isn’t good for

• NoSQL– Motivation & Main Concepts of Modern Distributed Data Stores

– Common interaction models

• Key/Value, Column, Document

• NOT consistency and distribution algorithms

• One Data Store, Multiple APIs– Brief intro to GigaSpaces

– Key/Value challenges

– SQL challenges: Add-hoc querying, Relationships (JPA)

Page 4: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 4

A FEW (MORE) WORDS ABOUT SQL

Page 5: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 5

SQL

(Usually) Centralized Transactional, consistent Hard to Scale

Page 6: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 6

SQL

Static, normalized data schema• Don’t duplicate, use FKs

Page 7: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 7

SQL

Add hoc query support Model first, query later

Page 8: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 8

SQL

Standard Well known Rich ecosystem

Page 9: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 9

(BRIEF) NOSQL RECAP

Page 10: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 10

NoSql (or a Naive Attempt to Define It)

A loosely coupled collection of non-relational data stores

Page 11: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 11

NoSql (or a Naive Attempt to Define It)

(Mostly) d i s t r i b u t e d

Page 12: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 12

NoSql (or a Naive Attempt to Define It)

scalable (Up & Out)

Page 13: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 13

NoSql (or a Naive Attempt to Define It)

Not (always) ACID • BASE anyone?

Page 14: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 14

Why Now?

Timing is everything…• Exponential Increase in data & throughput • Non or semi structured data that changes

frequently

Page 15: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 15

A Universe of Data Models

Key / Value Column

{ “name”:”uri”, “ssn”:”213445”, “hobbies”:[”…”,“…”], “…”: { “…”:”…” “…”:”…” } }

{ { ... }}

{ { ... }}

Document

Page 16: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 16

Key/Value

• Have the key? Get the value– That’s about it when it comes to querying

– Map/Reduce (sometimes)

– Good for

• cache aside (e.g. Hibernate 2nd level cache)

• Simple, id based interactions (e.g. user profiles)

• In most cases, values are Opaque

K1 V1

K2 V2

K3 V3

K4 V1

Page 17: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 17

Key/Value

Scaling out is relatively easy (just hash the keys)• Some will do that automatically for you • Fixed vs. consistent hashing

Page 18: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 18

Key/Value

• Implementations: – Memcached, Redis, Riak

– In memory data grids (mostly Java-based) started this way

• GigaSpaces, Oracle Coherence, WebSphere XS,

JBoss Infinispan, etc.

Page 19: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 19

Column Based

Page 20: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 20

Column Based

• Mostly derived from Google’s BigTable / Amazon Dynamo papers

• One giant table of rows and columns– Column == pair (name and a value, sometimes timestamp)

– Each row can have a different number of

columns

– Table is sparse:

(#rows) × (#columns) ≥ (#values)

Page 21: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 21

Column Based

• Query on row key – Or column value (aka secondary index)

• Good for a constantly changing, (albeit flat) domain model

Page 22: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved

Document

Think JSON (or BSON, or XML)

22

_id:1

_id:2

_id:3

{ “name”:”Lady Gaga”, “ssn”:”213445”, “hobbies”:[”Dressing up”,“Singing”], “albums”: [{“name”:”The fame” “release_year”:”2008”}, {“name”:”Born this way” “release_year”:”2011”}] }

{ { ... }}

{ { ... }}

Page 23: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved

Document

• Model is not flat, data store is aware of it – Arrays, nested documents

• Better support for ad hoc queries– MongoDB excels at this

• Very intuitive model • Flexible schema

Page 24: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 24

What if you didn’t have to choose?

JPA

JDBC

Page 25: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 25

A Brief Intro to GigaSpaces

In Memory Data Grid • With optional write behind to

a secondary storage

Page 26: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 26

A Brief Intro to GigaSpaces

Tuple based• Aware of nested tuples (and soon collections)

– Document like

• Rich querying and map/reduce semantics

Page 27: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 27

A Brief Intro to GigaSpaces

Transparent partitioning & HA• Fixed hashing based on a chosen

property

JAVA Virtual MachineJAVA Virtual Machine JAVA Virtual MachineJAVA Virtual Machine

Replication

Primary 1Backup 1

Replication

Backup 2Primary 2

Page 28: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 28

A Brief Intro to GigaSpaces

Transactional (Like, ACID)• Local (single partition)• Distributed (multiple partitions)

Page 29: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 29

Use the Right API for the Job

• Even for the same data…– POJO & JPA for Java apps with complex domain model

– Document for a more dynamic view

– Memcached for simple, language neutral

data access

– JDBC for:

• Interaction with legacy apps

• Flexible ad-hoc querying (e.g. projections)

Page 30: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved

Memcached (the Daemon is in the Details)

MemcachedClient

Page 31: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved

MemcachedClient

Memcached (the Daemon is in the Details)

Page 32: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 32

SQL/JDBC – Query Them All

Query may involve Map/Reduce• Reduce phase includes merging and sorting

Page 33: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 33

SQL/JDBC – Things to Consider

• Unique and FK constraints are not practically enforceable

• Sorting and aggregation may be expensive • Distributed transactions are evil

– Stay local…

Page 34: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 34

JPA

It’s all about relationships…

Page 35: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved

JPA Relationships

To embed or not to embed, that is the question….

Easy to partition and scale Easy to query: user.accounts[*].type = ‘checking’

× Owned relationships only

Page 36: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved

JPA Relationships

To embed or not to embed, that is the question….

Any type of relationship × Partitioning is hard× Querying involves joining

Page 37: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 37

Summary

• One API doesn’t fit all– Use the right API for the job

• Know the tradeoffs– Always ask what you’re giving up, not just what you’re

gaining

Page 38: Yes, Sql!

®Copyright 2010 Gigaspaces Ltd. All Rights Reserved 38

THANK YOU!

@uri1803http://blog.gigaspaces.com