SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt

Download SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt

Post on 07-May-2015




0 download

Embed Size (px)


The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you cant use JDBC and must partition your data.In this presentation you will learn about popular NoSQL databases MongoDB, and Cassandra - as well at VoltDB. We will compare and contrast each databases data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.


<ul><li>1.SQL, NoSQL, NewSQL? Whats a developer to do?Chris RichardsonAuthor of POJOs in ActionFounder of the original CloudFoundry.com@crichardsoncrichardson@vmware.com</li></ul><p>2. Overall presentation goal The joy and pain ofbuilding Java applications that use NoSQL and NewSQL 2 3. About Chris3 4. About Chris4 5. (About Chris)5 6. About Chris()6 7. About Chris7 8. About Chrishttp://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/8 9. About Chris Developer Advocate for CloudFoundry.com Signup at CloudFoundry.com using promo code EgyptJUG9 10. Agenda Why NoSQL? NewSQL? Persisting entities Implementing queries 3/18/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 10 11. Food to Go Take-out food delivery service Launched in 2006 Used a relational database (naturally) 11 12. Success Growth challenges Increasing domain model complexity Increasing traffic Increasing data volume Distribute across data centersBut relational databases make this difficult 13. Problem: Complex object graphs ?!ID COL1 COL2 COLPoor performance Many inserts Many joins13 14. Problem: Semi-structured dataCustomer attribute tableCustomer_IdName Value1Region CA1Type Bank Lack of constraints Poor query performance, e.g. multiple outer joins 14 15. Problem: Semi-structured data Customer tableId Name Street Other_Attributes1Acme Inc 180 MainXML/JSON/Blob2Failed Bank1 Wall StreetCant be queried 15 16. Problem: Schema evolution IdFirst_Name Last_Name 1 MariaDoe 2 John Smith 9948429292 Ben GraysonLocks?Application downtime? Id First_NameLast_Name DOB 1Maria Doe 10/14/38 16 17. Problem: Scaling Moores law is your friend BUT Scaling reads: Master/slave But beware of consistency issues Scaling writes Extremely difficult/impossible/expensive Vertical scaling is limited and requires $$ Horizontal scaling is limited/requires $$ 18. Problem: distribution App AppSynchronizationDB DB WAN Datacenter 1 Datacenter 2Many databases dont support this out of the box 3/18/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 18 19. Solution: Buy high end technology http://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPG 20. Solution: Hire more developers Application-level sharding Build your own replication middleware http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_4_series/madone_4_5 21. Solution: Use NoSQL Benefits Higher Limited performancetransactions Higher scalability Relaxed Richer data- consistency modelUnconstraineddataDrawbacks Schema-less21 22. MongoDB Document-oriented database JSON-style documents: Lists, Maps, primitives Schema-less Transaction = update of a single document Rich query language for dynamic queries Tunable writes: speed reliability Highly scalable and available 22 23. MongoDB use cases Use cases High volume writes Complex data Semi-structured data Who is using it? Shutterfly, Foursquare Bit.ly Intuit SourceForge, NY Times GILT Groupe, Evite, SugarCRM3/18/12 Copyright (c) 2011 Chris Richardson. All rights reserved.Slide 23 24. Apache Cassandra Column-oriented database/Extensible row store Think Row ~= java.util.SortedMap Transaction = update of a row Fast writes = append to a log Tunable reads/writes: consistency latency/availability Extremely scalable Transparent and dynamic clustering Rack and datacenter aware data replication CQL = SQL-like DDL and DML 24 25. Cassandra use cases Use cases Big data Multiple Data Center distributed database Persistent cache (Write intensive) Logging High-availability (writes) Who is using it Digg, Facebook, Twitter, Reddit, Rackspace Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX The largest production cluster has over 100 TB of data in over 150 machines. Casssandra web site 3/18/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 25 26. Other NoSQL databasesType ExamplesExtensible columns/Column- Hbaseoriented SimpleDB DynamoDBGraphNeo4jKey-valueRedis MembaseDocument CouchDbhttp://nosql-database.org/ lists 122+ NoSQL databases 26 27. Solution: Use NewSQL Relational databases with SQL and ACID transactionsAND New and improved architecture Radically better scalability and performance NewSQL vendors: ScaleDB, NimbusDB, , VoltDB3/18/12 Copyright (c) 2011 Chris Richardson. All rights reserved.Slide 27 28. Stonebrakers motivations Current databases are designed for 1970s hardware Stonebraker: http://www.slideshare.net/VoltDB/sql-myths-webinarSignificant overhead in logging, latching, locking, B-tree, and buffer managementoperationsSIGMOD 08: Though the looking glass: http://dl.acm.org/citation.cfm?id=137671328 29. About VoltDB Open-source In-memory relational database Durability thru replication; snapshots and logging Transparent partitioning Fast and scalable VoltDB is very scalable; it should scale to 120partitions, 39 servers, and 1.6 million complex transactions per second at over 300 CPU coreshttp://www.mysqlperformanceblog.com/2011/02/28/is-voltdb-really-as-scalable-as-they-claim/ 3/18/12Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 29 30. The future is polyglot persistence e.g. Netflix RDBMS SimpleDB Cassandra Hadoop/Hbase IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg30 31. Spring Data is here to helpForNoSQL databaseshttp://www.springsource.org/spring-data 31 32. Spring Data sub-projects Various sub-projects SQL: Spring Data JPA, JDBC extensions Commons: Polyglot persistence Key-Value: Redis, Riak Document: MongoDB Graph: Neo4j GORM for NoSQL What you get: Wrapper classes analogous to JDBC template Generic Repository Cross-store persistence 32 33. Proceed with caution Dont commit to a NoSQL DB until you have done a significant POC Encapsulate your data access code so you can switch Hope that one day you wont need ACID (or complex queries)33 34. Agenda Why NoSQL? NewSQL? Persisting entities Implementing queries 3/18/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 34 35. Food to Go Place Order use case1. Customer enters delivery address anddelivery time2. System displays available restaurants3. Customer picks restaurant4. System displays menu5. Customer selects menu items6. Customer places order35 36. Food to Go Domain model (partial)class Restaurant { class TimeRange {long id; long id;String name; int dayOfWeek;Set serviceArea; int openingTime;Set openingHours; int closingTime;List menuItems; }} class MenuItem { String name; double price; } 36 37. Database schemaID Name RESTAURANT1Ajanta table2Montclair EggshopRestaurant_idzipcodeRESTAURANT_ZIPCODE194707table194619294611294619 RESTAURANT_TIME_RANGE tableRestaurant_id dayOfWeek openTime closeTime1 Monday1130 14301 Monday1730 21302 Tuesday 1130 37 38. How to implement the repository?public interface AvailableRestaurantRepository {void add(Restaurant restaurant);Restaurant findDetailsById(int id);} Restaurant? TimeRange MenuItemRestaurant aggregate38 39. MongoDB: persisting restaurants is easyServer Database: Food To GoCollection: Restaurants { "_id" : ObjectId("4bddc2f49d1505567c6220a0") "name": "Ajanta", "serviceArea": ["94619", "99999"],BSON = "openingHours": [{ binary "dayOfWeek": 1, JSON"open": 1130,"close": 1430 },{Sequence"dayOfWeek": 2,"open": 1130, of bytes on"close": 1430disk fast }, ] i/o } 39 40. Using the MongoDB CLI&gt; r = {name: Ajanta}&gt; db.restaurants.save(r)&gt; r{ "_id" : ObjectId("4e555dd9646e338dca11710c"), "name" : "Ajanta" }&gt; r = db.restaurants.findOne({name:"Ajanta"}){ "_id" : ObjectId("4e555dd9646e338dca11710c"), "name" : "Ajanta" }&gt; r.type= "Indian&gt; db.restaurants.save(r)&gt; db.restaurants.update({name:"Ajanta"},{$set: {name:"Ajanta Restaurant"}, $push: { menuItems: {name: "Chicken Vindaloo"}}})&gt; db.restaurants.find(){ "_id" : ObjectId("4e555dd9646e338dca11710c"), "menuItems" :[ { "name" : "Chicken Vindaloo" } ], "name" : "Ajanta Restaurant","type" : "Indian" }&gt; db.restaurants.remove(r.id)40 41. Spring Data for Mongo code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImplimplements AvailableRestaurantRepository { public static String AVAILABLE_RESTAURANTS_COLLECTION = "availableRestaurants"; @Autowired private MongoTemplate mongoTemplate; @Override public void add(Restaurant restaurant) { mongoTemplate.insert(restaurant, AVAILABLE_RESTAURANTS_COLLECTION); } @Overridepublic Restaurant findDetailsById(int id) { return mongoTemplate.findOne(new Query(where("_id").is(id)), Restaurant.class, AVAILABLE_RESTAURANTS_COLLECTION);}} 41 42. Spring Configuration@Configurationpublic class MongoConfig extends AbstractDatabaseConfig { @Value("#{mongoDbProperties.databaseName}") private String mongoDbDatabase;@Beanpublic Mongo mongo() throws UnknownHostException, MongoException {return new Mongo(databaseHostName);} @Bean public MongoTemplate mongoTemplate(Mongo mongo) throws Exception {MongoTemplate mongoTemplate = new MongoTemplate(mongo, mongoDbDatabase);mongoTemplate.setWriteConcern(WriteConcern.SAFE);mongoTemplate.setWriteResultChecking(WriteResultChecking.EXCEPTION);return mongoTemplate;}} 42 43. Cassandra data model Column Column Row ValueName Timestamp KeyKeyspaceColumn Family K1N1V1 TS1N2 V2 TS2 N3V3 TS3 K2N1V1 TS1N2 V2 TS2 N3V3 TS3Column name/value: number, string, Boolean, timestamp, and composite43 44. Cassandra inserting/updating data Column FamilyK1N1 V1 TS1 N2 V2 TS2 N3 V3 TS3Idempotent= transaction CF.insert(key=K1, (N4, V4, TS4), ) Column FamilyK1N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 44 45. Cassandra retrieving dataColumn FamilyK1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4CF.slice(key=K1, startColumn=N2, endColumn=N4) K1N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 Cassandra has secondary indexes but they arent helpful for these use cases45 46. Option #1: Use a column per attribute Column Name = path/expression to access property value Column Family: RestaurantDetailsopeningHours[0].dayOfWeek Monday name Ajanta serviceArea[0]946191openingHours[0].open1130 typeindianserviceArea[1] 94707 openingHours[0].close 1430Egg openingHours[0].dayOfWeekMonday name serviceArea[0] 94611shop2Break openingHours[0].open0830 typeserviceArea[1] 94619Fast openingHours[0].close1430 47. Option #2: Use a single columnColumn value = serialized object graph, e.g. JSONColumn Family: RestaurantDetails 2attributes: { name: Montclair Eggshop, }1attributes { name: Ajanta, }2attributes { name: Eggshop, } 47 48. Cassandra codepublic class AvailableRestaurantRepositoryCassandraKeyImplimplements AvailableRestaurantRepository {@AutowiredHome grownprivate final CassandraTemplate cassandraTemplate;wrapper classpublic void add(Restaurant restaurant) {cassandraTemplate.insertEntity(keyspace, RESTAURANT_DETAILS_CF, restaurant);}public Restaurant findDetailsById(int id) {String key = Integer.toString(id);return cassandraTemplate.findEntity(Restaurant.class, keyspace, key, RESTAURANT_DETAILS_CF);}http://en.wikipedia.org/wiki/Hector48 49. Using VoltDB Use the original schema Standard SQL statements BUT YOU MUST Write stored procedures and invoke them using proprietary interface Partition your data3/18/12 Copyright (c) 2011 Chris Richardson. All rights reserved.Slide 49 50. About VoltDB stored procedures Key part of VoltDB Replication = executing stored procedure on replica Logging = log stored procedure invocation Stored procedure invocation = transaction3/18/12 Copyright (c) 2011 Chris Richardson. All rights reserved.Slide 50 51. About partitioning Partition column RESTAURANT table IDName 1 Ajanta 2 Eggshop 3/18/12Copyright (c) 2011 Chris Richardson. All rights reserved.Slide 51 52. Example cluster Partition 1aPartition 2a Partition 3a ID Name IDName ID Name 1Ajanta2 Eggshop .. Partition 3bPartition 1b Partition 2b ID Name ID Name IDName ..1Ajanta2 Eggshop VoltDB Server 1VoltDB Server 2 VoltDB Server 3 3/18/12 Copyright (c) 2011 Chris Richardson. All rights reserved.Slide 52 53. Single partition procedure: FAST SELECT * FROM RESTAURANT WHERE ID = 1 High-performance lock free codeID Name IDNameID Name 1Ajanta1 Eggshop.. VoltDB Server 1 VoltDB Server 2 VoltDB Server 33/18/12 Copyright (c) 2011 Chris Richardson. All rights reserved.Slide 53 54. Multi-partition procedure: SLOWER SELECT * FROM RESTAURANT WHERE NAME = AjantaCommunication/Coordination overheadID Name IDNameID Name 1Ajanta1 Eggshop.. VoltDB Server 1 VoltDB Server 2 VoltDB Server 33/18/12 Copyright (c) 2011 Chris Richardson. All rights reserved.Slide 54 55. Chosen partitioning schemePerformance is excellent: muchfaster than MySQL55 56. Stored procedure AddRestaurant@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0)public class AddRestaurant extends VoltProcedure {public final SQLStmt insertRestaurant =new SQLStmt("INSERT INTO Restaurant VALUES (?,?);");public final SQLStmt insertServiceArea =new SQLStmt("INSERT INTO service_area VALUES (?,?);");public final SQLStmt insertOpeningTimes =new SQLStmt("INSERT INTO time_range VALUES (?,?,?,?);");public final SQLStmt insertMenuItem =new SQLStmt("INSERT INTO menu_item VALUES (?,?,?);");public long run(int id, String name, String[] serviceArea, long[] daysOfWeek, long[] openingTimes, long[] closingTimes, String[] names, double[] prices) {voltQueueSQL(insertRestaurant, id, name);for (String zipCode : serviceArea)voltQueueSQL(insertServiceArea, id, zipCode);for (int i = 0; i &lt; daysOfWeek.length ; i++)voltQueueSQL(insertOpeningTimes, id, daysOfWeek[i], openingTimes[i], closingTimes[i]);for (int i = 0; i &lt; names.length ; i++)voltQueueSQL(insertMenuItem, id, names[i], prices[i]);voltExecuteSQL(true); return 0;}}56 57. VoltDb repository add()@Repositorypublic class AvailableRestaurantRepositoryVoltdbImpl implements AvailableRestaurantRepository {@Autowiredprivate VoltDbTemplate voltDbTemplate;@Overridepublic void add(Restaurant restaurant) {invokeRestaurantProcedure("AddRestaurant", restaurant);}private void invokeRestaurantProcedure(String procedureName, Restaurant restaurant) { Object[] serviceArea = restaurant.getServiceArea().toArray(); long[][] openingHours = toArray(restaurant.getOpeningHours());Flatten Object[][] menuItems = toArray(restaurant.getMenuItems()); Restaurant voltDbTemplate.update(procedureName, restaurant.getId(), restaurant.getName(), serviceArea, openingHours[0], openingHours[1], openingHours[2], menuItems[0], menuItems[1]);} 57 58. VoltDbTemplate wrapper classpublic class VoltDbTemplate { private Client client; VoltDB client API public VoltDbTemplate(Client client) { this.client = client; } public void update(String procedureName, Object... params) { try { ClientResponse x = client.callProcedure(procedureName, params); } catch (Exception e) { throw new RuntimeException(e); } }58 59. VoltDb server configurationFood To Gositesperhost="5" kfactor="0" /&gt; ...... ... voltcompiler target/classessrc/main/resources/sql/voltdb-project.xml foodtogo.jarbin/voltdb leader localhost catalog foodtogo.jar deployment deployment.xml59 60. Performance Benchmarking is still work in progress but so far http://www.youtube.com/watch? v=b2F-DItXtZsMongoCassandraVoltDBInsert for PK AwesomeFast*AwesomeFind by PKAwesomeFa...</p>