polyglot persistence for java developers: time to move out of the relational comfort zone? (gids...

88
Polyglot persistence for Java developers: time to move out of the relational comfort zone? Chris Richardson Author of POJOs in Action Founder of the original CloudFoundry.com @crichardson [email protected] http://plainoldobjects.com

Upload: chris-richardson

Post on 28-Jul-2015

340 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

Polyglot persistence for Java developers:

time to move out of the relational comfort zone?

Chris RichardsonAuthor of POJOs in ActionFounder of the original CloudFoundry.com @crichardson [email protected]://plainoldobjects.com

Page 2: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Presentation GoalThe benefits and drawbacks

of polyglot persistence and

How to design applications that use this approach

Page 3: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

About Chris

Page 4: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

About Chris

Founder of a startup that’s creating a platform for developing

event-driven microservices

Page 5: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Agenda

• Why polyglot persistence?

• Persisting entities with MongoDB and Cassandra

• Querying data with MongoDB and Cassandra

• Scaling MongoDB and Cassandra

Page 6: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Relational Databases

Page 7: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Example: Food to Go

• Take-out food delivery service

• “Launched” in 2006

Page 8: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Food To Go Architecture

Order taking

Restaurant Management

MySQL Database

CONSUMER RESTAURANT OWNER

Page 9: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Example: Device management server ~ 2003

• Everything was stored in a Oracle database

• Device metadata

• Firmware patches!

• ….

Page 10: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

RDBMS are great

• SQL = Rich, declarative query language

• Database enforces referential integrity

• ACID semantics

• Well understood by developers

• Well supported by frameworks and tools, e.g. Spring JDBC, Hibernate, JPA

• Well understood by operations

Page 11: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Impact of SSD/Flash storage

• HDD = 200 IOPS vs. SSD = 100K IOPS

• Massive performance improvement

• Expands the range of use cases that a single RDBMS server can cost-effectively support

Page 12: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

• Hosted relational database

• Compatible with MySQL 5.6 but with 5x performance

• Vertically scales to 32 vCPUs and 244 GiB of RAM

• SSD-backed virtualized storage layer, replicated 6 ways across 3 AZs

• Up to 15 replicas that share storage with master - minimal replication lag

• Fast restart after crash

• No redo log replay

• SSD-backed virtualized storage layer purpose-built for database workloads

• Fast fail-over to replica after master instance failure without data loss

AWS Aurora

http://aws.amazon.com/rds/aurora/details/

Page 13: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

NEW SQL

• Next generation SQL databases, e.g. VoltDB, MemSQL, ...

• Leverage modern, multi-core, commodity hardware

• In-memory

• Horizontally scalable

• Transparently shardable

• ACID

“Current databases are designed for 1970s hardware and for both OLTP and data

warehouses”http://nms.csail.mit.edu/~stavros/pubs/OLTP_sigmod08.pdf

Page 14: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

An RDBMS is great for many applications but ….

Page 15: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Limitations of relational databases

• Scalability

• Multi data center, distributed database

• Schema updates

• O/R impedance mismatch

• Handling semi-structured data

Page 16: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Solution: Spend $$$ on Oracle’s high-end databases and servers

Page 17: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Not so bad…

http://www.powerandmotoryacht.com/megayachts/megayacht-musashi

Page 18: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

… or is it?

http://www.iwtg.net/

Page 19: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Solution: Spend $$$ - open-source stack + DevOps people

http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_5_series/madone_5_2/#

Page 20: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Apply the scale cube

X axis - horizontal duplication

Z axis

- data

partit

ioning

Y axis - functional

decomposition

Scale b

y split

ting s

imilar

thing

s

Scale by splitting

different things

Page 21: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Applying the scale cube

• Y-axis splits/functional decomposition

• Application = Set[Microservice] - each with its own database

• Monolithic database is functionally decomposed

• Different types of entities in different databases

• Z-axis splits/sharding

• Entities of the same type partitioned across multiple databases

Page 22: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

How does each service access data?

?

Velocity and Volume

Variety of Data

Fixed or ad hoc queries

Access patternsDistributionLatency

Page 23: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Velocity and Volume?

• Velocity - speed at which data moves

• Volume - the amount of data

• Does it fit on a single machine?

Page 24: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Variety of Data?

• Relational

• Aggregate oriented

• Graph

• Complex nested structures

• Semi structured

• Text

• Binary blogs, e.g. images

Page 25: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Fixed or ad hoc queries?

• Fixed set of queries

• Known in advance

• Slowly changing

• Ad hoc queries

• Users can submit ad hoc queries

Page 26: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Access patterns

• PK-oriented access, e.g. load-modify-update a business entity

• Bulk queries and/or updates

• Non-relational queries:

• text search

• graph-oriented

• geo search

• …

Page 27: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Reads vs. Writes

• Mix of reads and writes

• Write intensive, e.g. logging application

• Read intensive

• Data analytics/warehouse

• Slowly changing data

• …

Page 28: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Distribution

• Single database

• Multiple active databases

• on a LAN (low latency)

• on a WAN (high latency)

Page 29: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Transactions

• Mandatory ACID

• Eventual consistency OK?

Page 30: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Latency

• When should new data show up in results?

• Low latency - seconds, milliseconds?

• High latency - next day?

Page 31: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

And then pick your database…

Page 32: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Use a NoSQL database

Benefits

• Higher performance

• Higher scalability

• Richer data-model

• Schema-less

Drawbacks

• Limited transactions

• Limited querying

• Relaxed consistency

• Unconstrained data

Page 33: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Example NoSQL Databases

Database Key features

Cassandra Extensible column store, very scalable, distributed

MongoDB Document-oriented, fast, scalable

Redis Key-value store, very fast

DynamoDB AWS hosted key-value and document store

Neo4j Graph Database

http://nosql-database.org/ lists 150 NoSQL databases

Page 34: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Relative popularity

http://www.indeed.com/jobtrends/mongodb%2Ccassandra%2Credis%2Cneo4j%2Cdynamodb.html

Page 35: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

But there are many other options

• Blob store, e.g. AWS S3

• Text search engine, e.g. ElasticSearch, AWS CloudSearch, …

• Big data technology: Apache Hadoop, Apache Spark, …

• Real time streaming: Storm, Spark Streaming, …

Page 36: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Polyglot persistence

IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg

Event sourcing and CQRS are a great approach

Page 37: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Agenda

• Why polyglot persistence?

• Persisting entities with MongoDB and Cassandra

• Querying data with MongoDB and Cassandra

• Scaling MongoDB and Cassandra

Page 38: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Food to Go – Domain model (partial)class Restaurant { long id; String name; Set<String> serviceArea; Set<TimeRange> openingHours; List<MenuItem> menuItems; }

class MenuItem { String name; double price; }

class TimeRange { long id; int dayOfWeek; int openTime; int closeTime; }

Page 39: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Database schemaID Name …

1 Ajanta

2 Montclair Eggshop

Restaurant_id zipcode

1 94707

1 94619

2 94611

2 94619

Restaurant_id dayOfWeek openTime closeTime

1 Monday 1130 1430

1 Monday 1730 2130

2 Tuesday 1130 …

RESTAURANT table

RESTAURANT_ZIPCODE table

RESTAURANT_TIME_RANGE table

Page 40: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

RestaurantRepository

public interface RestaurantRepository { void addRestaurant(Restaurant restaurant); Restaurant findById(long id); ...}

Food To Go will have scaling eventually issues

Page 41: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

MongoDB

• Document-oriented database

• JSON-style documents: Lists, Maps, primitives

• Schema-less

• Transaction = update of a single document

• Rich query language for dynamic/ad hoc queries + geo queries

• Tunable writes: speed vs. reliability

• Highly scalable and available

Page 42: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

MongoDB use cases

• High volume writes

• Complex data

• Semi-structured data

Page 43: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

MongoDB data modelServer

Database: Food To Go

Collection: Restaurants

{ "_id" : ObjectId("4bddc2f49d1505567c6220a0") "name": "Ajanta", "serviceArea": ["94619", "99999"], "openingHours": [

{

"dayOfWeek": 1, "open": 1130, "close": 1430 }, {

"dayOfWeek": 2, "open": 1130, "close": 1430

}, …

] }

BSON = binary JSON Sequence of bytes on disk è fast i/o

16MByte limit

PK

Page 44: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Many NoSQL Databases =

Aggregate-oriented

Page 45: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Basic MongoDB collection operations...

• insert(document(s), options)

• Application assigned ids

• Mongo generated UUID

• update(query, update, options)

• query - selects document(s)

• update - replace or modify document (e.g. increment a field)

• options - upset , multi, … (optional)

• remove(query, options)

Page 46: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

....Basic MongoDB collection operations

• find/findOne(criteria, projection)

• criteria - query

• projection - fields to return (optional)

Page 47: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Using Spring Data for Mongo

@Repository class RestaurantRepositoryMongoDbImpl implements RestaurantRepository {

@Override public void add(Restaurant restaurant) { mongoTemplate.insert(restaurant, "restaurants"); } @Override public Restaurant findDetailsById(int id) { return mongoTemplate.findById(id, Restaurant.class, "restaurants"); }

Spring Data’s Generic Repositories = even less code

Page 48: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Apache Cassandra

• Distributed/Extensible row store: row ~= java.util.SortedMap

• Transaction = update of a row

• Fast writes = append to a log

• Tunable reads/writes: consistency ⇔ latency/availability

• Extremely scalable

• Transparent and dynamic clustering

• Rack and datacenter aware data replication

Page 49: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Apache Cassandra use cases

• Big data

• Multiple Data Center distributed database

• (Write intensive) Logging

• High-availability (writes)

Page 50: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Cassandra data model

KeyspaceTable

K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3

N1 V1 TS1 N2 V2 TS2 N3 V3 TS3K2

Column Name

Column Value

TimestampRow Key

Column name/value: number, string, Boolean, timestamp, counter, and composite

Page 51: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Inserting/updating data

table.insert(key=K1, (N4, V4, TS4), …)Idempotent= transaction

Table

K1 N1 V1 TS1

N2 V2 TS2 N3 V3 TS3

Table

K1 N1 V1 TS1

N2 V2 TS2 N3 V3 TS3 N4 V4 TS4

optional column TTL

Application assigned keys - natural or UUID

Page 52: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Reading data

table.slice(key=K1, startColumn=N2, endColumn=N4)

Tables

K1 N1 V1 TS1

N2 V2 TS2 N3 V3 TS3 N4 V4 TS4

K1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4

Cassandra has secondary indexes but they aren’t always helpful

Page 53: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Cassandra Query Language

• SQL-like

• DDL: Create table, ...

• DML: Insert, Update, Select, ...

• Restricted WHERE clauses, e.g. PK equality only (if you want efficiency)

• Primary key:

• Simple - 1 storage table row ⇔ 1 CQL row

• Compound - 1 storage table row ⇔ multiple CQL rows! (clustered rows)

Page 54: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Representing restaurants

create table restaurant ( restaurant_id int PRIMARY KEY, name text, service_area set<text>, day_of_weeks list<int>, opening_times list<int>, closing_times list<int> );

Page 55: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Inserting and retrieving restaurants

insert into restaurants.restaurant( restaurant_id, name, service_area, day_of_weeks, opening_times, closing_times) Values(?, ?, ?, ?, ?, ?)

select * from restaurants.restaurant where restaurant_id = ?

Page 56: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Storing restaurants in Cassandra

name Ajanta1 serviceArea:94619 -

serviceArea:94618 -

Set member

daysOfWeeks:0 Monday

daysOfWeeks:1 Monday

Element index

Element value

Page 57: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Cassandra Java APIs

• Java Driver

• https://github.com/datastax/java-driver

• Netflix Astanyx

• http://techblog.netflix.com/2013/12/astyanax-update.html

• Spring Data for Cassandra

• http://projects.spring.io/spring-data-cassandra/

Page 58: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Java Driver : Inserting a restaurantpublic class AvailableRestaurantRepositoryCassandraImpl ...

public AvailableRestaurantRepositoryCassandraImpl(Session session) { insertStatement = session.prepare( "insert into restaurants.restaurant(restaurant_id, name, service_area, day_of_weeks, opening_times, closing_times) Values(?, ?, ?, ?, ?, ?);" ); ... }

@Override public void add(Restaurant restaurant) { List<Integer> dayOfWeeks = new ArrayList<Integer>(); List<Integer> openingTimes = new ArrayList<Integer>(); List<Integer> closingTimes = new ArrayList<Integer>(); for (TimeRange tr : restaurant.getOpeningHours()) { dayOfWeeks.add(tr.getDayOfWeek()); openingTimes.add(tr.getOpenHour()); closingTimes.add(tr.getClosingTime()); } session.execute(insertStatement.bind(restaurant.getId(), restaurant.getName(), restaurant.getServiceArea(), dayOfWeeks, openingTimes, closingTimes )); }

Page 59: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Java Driver : Finding a restaurantpublic class AvailableRestaurantRepositoryCassandraImpl implements AvailableRestaurantRepository {

public AvailableRestaurantRepositoryCassandraImpl(Session session) { this.findByIdStatement = session.prepare( "select * from restaurants.restaurant where restaurant_id = ?;"); ... }

@Override public Restaurant findDetailsById(int id) { Row row = session.execute(findByIdStatement.bind(id)).all().get(0); List<Integer> dayOfWeeks = row.getList("day_of_weeks", Integer.class); List<Integer> openingTimes= row.getList("opening_times", Integer.class); List<Integer> closingTimes = row.getList("closing_times", Integer.class); Set<TimeRange> openingHours = new HashSet<TimeRange>(); for (int i = 0 ; i < dayOfWeeks.size(); i++) { openingHours.add( new TimeRange(dayOfWeeks.get(i), openingTimes.get(i), closingTimes.get(i))); } Restaurant r = new Restaurant(row.getString("name"), ..., row.getSet("service_area", String.class), openingHours, null); r.setId(id); return r; }

Page 60: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Agenda

• Why polyglot persistence?

• Persisting entities with MongoDB and Cassandra

• Querying data with MongoDB and Cassandra

• Scaling MongoDB and Cassandra

Page 61: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Finding available restaurants

Available restaurants =Serve the zip code of the delivery address

AND Are open at the delivery time

public interface AvailableRestaurantRepository {

List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime);

...}

Page 62: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Finding available restaurants on Monday, 6.15pm for 94619 zipcode

Straightforward three-way join

select r.*from restaurant r inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_id where ’94619’ = sa.zip_code and tr.day_of_week=’monday’ and tr.openingtime <= 1815 and 1815 <= tr.closingtime

Page 63: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

MongoDB = easy to query{ serviceArea:"94619", openingHours: { $elemMatch : { "dayOfWeek" : "Monday", "open": {$lte: 1815}, "close": {$gte: 1815} } }}

DBCursor cursor = collection.find(qbeObject); while (cursor.hasNext()) { DBObject o = cursor.next(); … }

db.availableRestaurants.ensureIndex({serviceArea: 1})

Page 64: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Using Spring Data for Mongo@Repository class RestaurantRepositoryMongoDbImpl implements RestaurantRepository {

@Override public List<AvailableRestaurant> findAvailableRestaurants( Address deliveryAddress, Date deliveryTime) { int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime);

Query query = new Query( where("serviceArea").is(deliveryAddress.getZip()) .and("openingHours") .elemMatch( where("dayOfWeek").is(dayOfWeek) .and("openingTime").lte(timeOfDay) .and("closingTime").gte(timeOfDay))); return mongoTemplate.find( query, AvailableRestaurant.class, AVAILABLE_RESTAURANTS_COLLECTION); }

Page 65: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

BUT how to do this with Cassandra??!

• How can Cassandra support a query that has• A 3-way join • Multiple =• > and < ?

è We need to denormalize the data!!

Page 66: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Simplification #1: Denormalization

Restaurant_id Day_of_week Open_time Close_time Zip_code

1 Monday 1130 1430 947071 Monday 1130 1430 946191 Monday 1730 2130 947071 Monday 1730 2130 946192 Monday 0700 1430 94619…

SELECT restaurant_id FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time AND open_time < 1815

Simpler query: § No joins § Two = and two <

Page 67: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Simplification #2: Application filtering

SELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ AND zip_code = 94619 AND 1815 < close_time AND open_time < 1815

Even simpler query • No joins • Two = and one <

This is a CQL query!

Page 68: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Available restaurants tablecreate table available_restaurants ( id int, name text, zip_code text, day_of_week int, open_time int, close_time int, primary key ((zip_code, day_of_week), close_time, id) ) ;

Compound primary key

Clustering columns prefix column names

Composite partition key = row key

Page 69: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Cassandra available_restaurants table

1430:1:name Ajanta94619:Monday

1430:1:open_time 1130

close_time:id:≪column name≫zipcode:day of week

1730:1:name Ajanta

1730:1:open_time 2130

1430:2:name Egg shop

1430:2:open_time 0800

primary key ((zip_code, day_of_week), close_time, id)

Page 70: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Finding available restaurants

select * from available_restaurants where zip_code = '94619' and day_of_week = 1 and close_time > 1815;

Page 71: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Cassandra query

@Repository class AvailableRestaurantRepositoryCassandraImpl implements RestaurantRepository {

public AvailableRestaurantRepositoryCassandraImpl(Session session) {

this.findAvailable = session.prepare( "Select open_time, restaurant_name " + "From restaurants.available_restaurants " + "Where zip_code = ? " + "And day_of_week = ? " + " And close_time >= ?;"

); … }

Page 72: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Cassandra query@Repository class AvailableRestaurantRepositoryCassandraImpl implements RestaurantRepository {

@Override public List<AvailableRestaurant> findAvailableRestaurants( Address deliveryAddress, Date deliveryTime) { List<AvailableRestaurant> result = new ArrayList<AvailableRestaurant>(); int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime);

BoundStatement bound = findAvailable.bind(deliveryAddress.getZip(), DateTimeUtil.dayOfWeek(deliveryTime), timeOfDay); for (Row row : session.execute(bound).all()) { if (row.getInt("open_time") <= timeOfDay) { result.add( new AvailableRestaurant(row.getString("restaurant_name")) ); } } return result; }

Page 73: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

NoSQL ⇒ Denormalized representation for each query

Page 74: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Sorry Ted!

http://en.wikipedia.org/wiki/Edgar_F._Codd

Page 75: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

About Cassandra and MongoDB

• Cassandra:

• Efficient storage of complex aggregates

• Limited queries requiring denormalized representation

• MongoDB

• Efficient storage of complex aggregates

• Rich ad hoc queries

But where they get really interesting is when it comes to scaling

Page 76: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Agenda

• Why polyglot persistence?

• Persisting entities with MongoDB and Cassandra

• Querying data with MongoDB and Cassandra

• Scaling MongoDB and Cassandra

Page 77: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

Scaling MongoDB: Replica SetsReplica Set

Mongod (secondary)

Mongod (primary)

Mongod (secondary)

Client

http://docs.mongodb.org/manual/replication/

WritesConsistent reads Inconsistent reads

replication

Automatic master election

Connects to seed servers

Page 78: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

Mongos

Scaling MongoDB: ShardingReplica Set 2 (aka. Shard 2)

Mongod (secondary)

Mongod (primary)

Mongod (secondary)

Replica Set 1 (aka. Shard 1)

Mongod (secondary)

Mongod (primary)

Mongod (secondary)

Mongos

Client

Config Server

mongod

mongod

mongod

http://docs.mongodb.org/manual/core/sharding-introduction/

Key-based routingor

Scatter/gather

Page 79: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

MongoDB Sharding

• Collection is partitioned into chunks

• Each shard is responsible for one or more chunks

• Range-based sharding

• Each chunk is responsible for a range of keys

• Efficient execution of range queries BUT risk of uneven distribution

• Hash-based sharding

• Key is hashed and mapped into chunk

• Good distribution BUT range queries processed by all shards

Page 80: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

MongoDB reads and writes

• Writes

• Trade-off: request latency vs. safety

• No acknowledgement!

• Acknowledgement by primary or by primary & N - 1 replicas

• Acknowledgement after committing to journal

• Tag-based, e.g. write to servers in different data centers

• Reads

• Read uncommitted isolation - reads can return data that has not been committed yet

• Master - the default

• Secondary - if stale data is ok

• Use tags

{ w: N, j: true/false, wtimeout: timeout }

Page 81: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Cassandra cluster

http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2

Key

Partitioner

64/128-bit hash(a.ka. token)

VNode owns a range of

hash values

ReplicasMurmurHashMD5

Node owns

collection of vnodes

Page 82: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Multiple data centers

DC 1 DC 2

Page 83: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Cassandra reads and writes

• Any node can handle any request

• Plays the role of coordinator

• Communicates with replica nodes

• Write request

• Update is written to commit log of one or more replicas

• Other replicas are updated asynchronously

• Read request

• Read data from one or more replicas

• Choose the most recent data based on timestamp

• Read repair : sends updates to stale replicas

No Master!

Page 84: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Cassandra read and write consistency

• For each read and write request you specify:

• How many nodes to read/write before responding

• Local (single DC) vs. Multi-DCs

• All replicas in all DCs will eventually be updated

• Trade-off:

• More nodes: greater consistency but less availability and higher latency

• Fewer nodes: less consistency but higher availability and lower latencyhttp://www.datastax.com/documentation/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html

Page 85: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Consistency examples

• High-performance, high-availability writes, e.g. logging

• Write consistency of ANY - even replicas can be down

• Read consistency of ONE - any replica

• Consistent reads

• (nodes_written + nodes_read) > replication_factor

• Read/Write consistency of LOCAL_QUORUM

• Globally consistent reads

• Read/write consistency of QUORUM

Page 86: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Comparing Cassandra and MongoDB

• Cassandra

• Replica model

• Write to any replica (or Node)

• Sync locally/async globally

• MongoDB

• Master/slave model

• Write to master

• Sync to possibly remote master

Page 87: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Summary

• Each SQL/NoSQL database = set of tradeoffs

• NoSQL databases:

• Diverse

• Aggregate-oriented (typically)

• Use query-oriented data modeling (typically)

• Polyglot persistence: leverage the strengths of SQL and NoSQL databases

Page 88: Polyglot persistence for Java developers: time to move out of the relational comfort zone? (gids 2015)

@crichardson

Questions?

@crichardson [email protected]

http://plainoldobjects.com