what comes after nosql? newsql: a new era of challenges in

103
What Comes After NoSQL? NewSQL: A New Era of Challenges in DBMS Scalable Data Processing Júlio Alcântara Tavares, Angelo Brayner, José Maria Monteiro Departamento de Computação Universidade Federal do Ceará - UFC Short Course October 2016

Upload: dinhduong

Post on 10-Dec-2016

238 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: What Comes After NoSQL? NewSQL: A New Era of Challenges in

What Comes After NoSQL? NewSQL: A New Era of Challenges in DBMS Scalable

Data Processing

Júlio Alcântara Tavares, Angelo Brayner, José Maria MonteiroDepartamento de Computação

Universidade Federal do Ceará - UFC

Short CourseOctober 2016

Page 2: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Authors

Page 3: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Júlio Alcântara Tavares• Email: [email protected]

• Short Bio:• Senior Database Administrator (DBA) and Computer Science Professor.

Bachelor of Science (B.Sc.) Degree in Computer Science. Postgraduate in Software Engineering and in Computer Networks. Master of Science (M.Sc.). PhD Candidate - Federal University of Ceará (UFC). Research Fields: DBMSs Core Technologies and algorithms, performance techniques for novel storage media. Advanced skills in Database Management Systems Core Concepts and Implementation. Technical certifications: ORACLE DATABASE, IBM (SOA, DB2, Cloud Computing) and EXIN (ITIL).

Authors

Page 4: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Angelo Brayner• Email: [email protected]

• Short Bio:• Received the MSc degree in Computer Science from the State

University of Campinas (UNICAMP), Brazil, in 1994. In 1999 he received the PhD degree from the University of Kaiserslautern, Germany, working in the field of Transaction Management in Multidatabase systems. He works at Federal University of Ceará (UFC), Brazil, as full professor and researcher. His current research interests include high performance transaction systems, data management in wireless sensor networks, database self-tuning and solid-state-memory-aware database systems;

Authors

Page 5: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• José Maria Monteiro • Email: [email protected]

• Short Bio:• Received the MSc degree in Computer Science from the Federal

University of Ceará (UFC), Brazil, in 2001. In 2008 he received the PhD degree from the Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Brazil, working in the field of Self-managed and Autonomic Databases. He has been with the Federal University of Ceará (UFC), Brazil, since 2010 as full professor and researcher at ARIDA (Advanced Research In DAtabase) research group. He has published more than 30 papers on inter- national journals and conference proceedings and has coordinated several research and development projects. His current research interests include autonomic databases, cloud databases and big data. José Maria Monteiro has served as the Computer Science Department’s Chair (2012-2016);

Authors

Page 6: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Agenda

Page 7: What Comes After NoSQL? NewSQL: A New Era of Challenges in

1. Introduction and Motivation

2. Background1. The Big Data Scenario

2. Database Transaction Models

3. Transaction Isolation Levels

4. The CAP Theorem

5. ACID x BASE

6. Data Consistency in Replicated Databases

Agenda

3. NoSQL Databases1. Key Features

2. Categories

3. NoSQL Examples

4. Comparison

4. NewSQL Databases1. Definition and

Characteristics

2. Categories

3. Case Study

4. Overall Comparison

Page 8: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Introduction and Motivation

Page 9: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Introduction - Thinking Extreme Data

Page 10: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Recently…

• New massive data-centric applications (social networks, IoT, etc);

• Relacional databases are not suitable for this kind of applications;

• New data store systems - NoSQL:

• Provide quite eficiente horizontal sacalability for simple read/writedatabase operations on distributed servers;

Introduction

Page 11: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• 6 key features of NoSQL systems:

1. The ability to horizontally scale;

2. The ability to replicate and to distribute data (data partitivo);

3. Support of a simple call level interface (instead of SQL);

4. The use of more permissive concurrency control mechanisms;

5. Efficient approaches of distributed indexes and the use of main memory to store data;

6. The ability to dynamically add new attributes to data records (schemeless);

Introduction

Page 12: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Is NoSQL the silver bullet?

• NoSQL systems generally do not provide ACID transactional properties;

• Thus, updates are eventually propagated and there are limited guarantees on data consistency;

• Some applications require a stronger consistency level;

• NewSQL systems:

• Try to offer the best of both worlds (Relational and NoSQL):

• To preserve ACID properties, one of the main weakness of NoSQL systems, and

• To reach scalable performance;

• The key idea is to present scalable NoSQL performance, but avoiding ACID idiosyncrasiespresented by relacional database systems;

Introduction

Page 13: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Objectives

Page 14: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• This (short) course aims to:

• Present the evolution of the main databases and data processing paradigms, over the past decades;

• Show which databases techniques have been used to improve data management in NewSQL paradigm;

• Discuss how NewSQL technologies can provide efficient data access and query processing in scalable and distributed DBMSs environments;

• Discuss and compare NewSQL, NoSQL and Relational DBMSs behavior, regarding the most critical characteristics presented by ACID properties;

• Show practical examples and benchmarks, using the most important free and commercial (Relational, NoSQL, NewSQL) DBMSs;

Objectives

Page 15: What Comes After NoSQL? NewSQL: A New Era of Challenges in

1. Introduction and Motivation

2. Background1. The Big Data Scenario

2. Database Transaction Models

3. Transaction Isolation Levels

4. The CAP Theorem

5. ACID x BASE

6. Data Consistency in Replicated Databases

Agenda

3. NoSQL Databases1. Key Features

2. Categories

3. NoSQL Examples

4. Comparison

4. NewSQL Databases1. Definition and

Characteristics

2. Categories

3. Case Study

4. Overall Comparison

Page 16: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Background

Page 17: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• In the last decade, the industry, in nearly every sector of the economy, has moved to a data-driven world.

• There has been an explosion in data volume, which, in turn, induced the need for richer and more flexible data storing and processing systems.

The Big Data Scenario

Page 18: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• The database technology is currently at an unprecedented and exciting inflection point.

• On one hand, the need for data storing and processing systems has never been higher than the current level, on the other hand the number of new data management products that are available has exploded over the recent years [Floratou et al. 2012].

The Big Data Scenario

Page 19: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Figure 1: The Five V´s of Big Data.

Page 20: What Comes After NoSQL? NewSQL: A New Era of Challenges in

The Five V's of Big Data

Page 21: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Inherently a transaction is characterized by four properties (commonly referred as ACID):• Atomicity: all of the operations in the

transaction will complete, or none will.

• Consistency: transactions never observe or result in inconsistent data.

• Isolation: the transaction will behave as if it is the only operation being performed.

• Durability: upon completion of the transaction, the operation will not be reversed, but will be persistent.

Database Transaction Models

Figure 2: ACID properties details.

Page 22: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• The Serializability Theory:• A schedule S is correct if it is serial or conflict serializable [Bernstein:1986];

• A correct schedule preserves the ACID properties;

• Serializability is ensured by the two-phase locking (2PL) protocol [Gray:1976];

• Decreases the concurrency degree (that is, the interleaving between transaction operations);

• The multiversion concurrency control (MVCC) protocol;

• Try to increase the concurrency degree;

Database Transaction Models

Page 23: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• An isolation level is the degree in which the execution of a given transaction is isolated from all other concurrent transactions.

• Real-world applications usually require a compromise between data consistency and concurrency degree.

• A more restrictive isolation level (e. g., serializable) enables a higher data consistency degree (avoiding undesirable phenomena like dirty reads or lost updates).

• On the other hand, by relaxing the isolation level, one could increase transaction concurrency degree (and transaction throughput) for the price of reducing data consistency degree.

• It is important to note that isolation levels are implemented by a locking mechanism which delays the execution of database operations to ensure a given isolation level.

Transactions Isolation Levels

Page 24: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Transactions Isolation Levels

Table 1: Consistency Levels and Locking ANSI-92 Isolation Levels.

Page 25: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Theoretically, there are three options for a shared-data distributed system:• Forfeit Partition Tolerance: Eric Brewer names 2-Phase-Commit (2PC) as a trait of this

option, since 2PC supports temporarily partitions (node crashes, lost messages) by waiting until all messages are received.

• Forfeit Consistency: In case of partition data can still be used, but since the nodes cannot communicate with each other there is no guarantee that the data is consistent. It implies optimistic locking and inconsistency resolving protocols.

• Forfeit Availability: Data can only be used if its consistency is guaranteed. This implies pessimistic locking, since we need to lock any updated object until the update has been propagated to all nodes. In case of a network partition it might take quite long until the database is in a consistent state again, thus we cannot guarantee high availability anymore.

The CAP Theorem

Page 26: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• The CAP theorem states that it is impossible for a distributed computer system (or any shared-data system) to simultaneously provide all three of the following guarantees:• Consistency: includes all modifications on data

that must be visible to all clients once they are committed. At any given point in time, all clients can read the same data.

• Availability: means that all operations on data, whether read or write, must end with a response within a specified time;

• Partition Tolerance: means that even in the case of components’ failures, operations on the database must continue;

The CAP Theorem

Figure 3: CAP Theorem characteristics.

Page 27: What Comes After NoSQL? NewSQL: A New Era of Challenges in

ACID x BASE

• Atomicity• Consistency• Isolation• Durability

• BasicallyAvailable (CP)

• Soft-state• Eventually

consistent (AP)

Figure 4: ACID x BASE Comparison [Brewer:2000].

Page 28: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• The term BASE [Brewer:2000]:• Basically Available: There will be a response to any request. But, this response may

“fail” (no result is returned), or the data may be in an inconsistent or changing state.

• Soft state: The state of the system may change over time, thus the state of the system is always "soft'".

• Eventual consistency: The system will eventually become consistent (at some time in the future) once it stops receiving input.

ACID x BASE

Page 29: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• A replicated database is a special case of distributed database, in which multiple copies of the data items are stored in different servers interconnected by a communication network.

• The notion of replicated database has been widely used as a strategy to improve data availability and to maximize transaction throughput (number of committed transactions per second).

• Main Replication Architectures• Master/Slave

• Master/Master (Multi-Master)

Data Consistency in Replicated DBMSs

Page 30: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Data Consistency in Replicated DBMSs

Figure 5: Master-Slave Architecture: All writes are performed in the master and are propagated to the slaves node.

Figure 6: Master-Master Architecture: Write operations can be performed in more than one master node.

Page 31: What Comes After NoSQL? NewSQL: A New Era of Challenges in

1. Introduction, Motivation and Objectives

2. Background1. The Big Data Scenario

2. Database Transaction Models

3. Transaction Isolation Levels

4. The CAP Theorem

5. ACID x BASE

6. Data Consistency in Replicated Databases

Agenda

3. NoSQL Databases1. Key Features

2. Categories

3. NoSQL Examples

4. Comparison

4. NewSQL Databases1. Definition and

Characteristics

2. Categories

3. Case Study

4. Overall Comparison

Page 32: What Comes After NoSQL? NewSQL: A New Era of Challenges in

NoSQL Databases

Page 33: What Comes After NoSQL? NewSQL: A New Era of Challenges in

NoSQL Key Features

• The ability to horizontally scale I/O operation (read/write) throughput distributed on several servers;

• The ability to replicate and to distribute data over a high number of servers (data partition);

• Support of a simple call level interface or protocol (instead of the SQL language);

• CAP Theorem

• Rejection of fixed table schema and joins operations

• Breed of non-relational database products

Page 34: What Comes After NoSQL? NewSQL: A New Era of Challenges in

NoSQL Key Features

• The use of more permissive concurrency control mechanisms than the ACID transaction model, implemented by most RDBMS;

• Efficient approaches of distributed indexes and the use of main memory to store data;

• The ability to dynamically add new attributes to data records (schemaless).

• Developer Driven

Page 35: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Key-Value Stores

• Document Stores

• Column Family Stores

• Graph Databases

NoSQL Categories

Figure 9: Document Storage Example.

Figure 7: Row Store x Column Store Design.

Figure 8: Key/Value Storage Example.

Page 36: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Key-Value Stores

• Document Stores

• Column Family Stores

• Graph Databases

NoSQL Categories

Figure 11: Graph Relationships Example.

Figure 10: Column Family Layout Example.

Page 37: What Comes After NoSQL? NewSQL: A New Era of Challenges in

NoSQL ExampleMongoDB

Page 38: What Comes After NoSQL? NewSQL: A New Era of Challenges in

MongoDB

A P

C

• Characteristics• General purpose database, almost as fast as

the key:value NoSQL type.

• High availability.

• Scalability (from a standalone server to distributed architectures of huge clusters).

• Aggregation: batch data processing and aggregate calculations using native MongoDB operations.

Page 39: What Comes After NoSQL? NewSQL: A New Era of Challenges in

MongoDB• Characteristics

• Load Balancing

• The balancer decides when to migrate the data and the destination Shard, so they are evenly distributed among all servers in the cluster.

• Native Replication: syncing data across all the servers at the replica set.

• Automatic failover: automatic election of a new primary when it has gone down.

• Zero downtime upgrades.

• NoSql Characteristics?

- Strong Consistency by Default

- $lookup in the newest version: perform joins?

A P

C

Page 40: What Comes After NoSQL? NewSQL: A New Era of Challenges in

MongoDB – Document Modeling

Relational DBMS MongoDB

Database Database

Table, Vision Collection

Row/Tuple Document (JSON, BSON)

Column(Rigid Schema)

Field(Flexible Schema)

Index Index

Join Embedded Document

Foreign Key Reference

Partition Sharding

> db.user.findOne({idade:25}){

"_id" : ObjectId(“1654e0bd42…"),“first" : “Bia",“last" : “Souza",“age" : 25,

“hobby" : [“Planes",“Programming” ] ,

“recommendations": { “color": “blue", “sport": “Soccer"}

}

Page 41: What Comes After NoSQL? NewSQL: A New Era of Challenges in

NoSQL Categories Comparison

Page 42: What Comes After NoSQL? NewSQL: A New Era of Challenges in

NoSQL Categories - Comparison

Datamodel Performance Scalability Flexibility Complexity Functionality

Key-value store High High High None Variable (None)

Column Store High High Moderate Low Minimal

Document Store High Variable (High) High Low Variable (Low)

Graph Database Variable Variable High High Graph Theory

Table 2: NoSQL categories and respective features comparison.

Page 43: What Comes After NoSQL? NewSQL: A New Era of Challenges in

1. Introduction, Motivation and Objectives

2. Background1. The Big Data Scenario

2. Database Transaction Models

3. Transaction Isolation Levels

4. The CAP Theorem

5. ACID x BASE

6. Data Consistency in Replicated Databases

Agenda

3. NoSQL Databases1. Key Features

2. Categories

3. NoSQL Example

4. Comparison

4. NewSQL Databases1. DBMSs Evolution

2. Definition and Characteristics

Page 44: What Comes After NoSQL? NewSQL: A New Era of Challenges in

NewSQL Databases

Page 45: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Early 2000s• Sharding Middleware

• Combine multiple nodes together into a logical database• Route (or rewrite) queries to correct node

• Characteristics

• Combine multiple nodes together into a logical database• Route (or rewrite) queries to correct node

• Problems• Too much effort writing middleware rather than working on core

applications

• As a result, there are many features implemented in application code

DBMSs Evolution

Page 46: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Late 2000s• NoSQL

• Characteristics:• Priorize high availability and high scalability, but forget transactional

guarantees

• Non-relational data models• Problems

• It is necessary to write code to handle eventually consistent data

• Lack of transactions• Lack of joins

• Not all apps can give up transactional properties

DBMSs Evolution

Page 47: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• SQL• Set-based Data Access• Standard, Powerful, Versatile language• Petabytes of existing data• $$ Multi-Billion application investments• Tools, Skills, Business Processes

• ACID Transactions • Guarantees of data consistency & durability• Coherent concurrency model• Ease of programming

• Data/Application Independence• Manage data independently of the application• Minimize application responsibility for data management

But SQL is Cool !

Page 48: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• There are some possible solutions:• Buy High End Technology

• Application level sharding

• Control transactions consistency in application

• Use Traditional SQL• Use NoSQL

• Use NewSQL

In this Context, when Thinking in Fast Data ...

Page 49: What Comes After NoSQL? NewSQL: A New Era of Challenges in

The term NewSQL is

confusing and not very well-defined.

NewSQL Definition

Page 50: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Systems that deliver the scalability and flexibility promised by NoSQL while retaining the support for SQL queries and/or ACID, or to improve performance for appropriate workloads.

NewSQL Definition

Matt Asslet – 451 Group (April 4th, 2011)https://www.451research.com/report-short?entityId=66963

Page 51: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Matt Aslett, research director at 451 Research, coined the term NewSQL about five years ago, although in an interview he said defining the new category was “kind of accidental.” At the time, the idea was to recognize a set of vendors that were taking the best aspects of a SQL database and designing new products for modern architecture, specifically cloud architecture.

NewSQL Definition

Page 52: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• A more narrow definition for NewSQL system’s implementations: • Use a lock-free concurrency control

scheme

• Use shared-nothing distributed architecture

NewSQL Definition

Michael Stonebraker – Blog@CACM (June 16th, 2011)http://cacm.acm.org/blogs/blog-cacm/109710

Page 53: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Stonebraker Paper• SQL as the primary interface

• ACID support for transactions

• Non-Locking concurrency control

• High per-node performance

• Scalable, shared nothing architecture

NewSQL Characteristics

Michael Stonebraker – Blog@CACM (June 16th, 2011)http://cacm.acm.org/blogs/blog-cacm/109710

Page 54: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Wikipedia Article• NewSQL is a class of modern

relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while still maintaining the ACID guarantees of a traditional database system.

NewSQL Characteristics

Wikipedia (October 19th, 2013)https://en.wikipedia.org/wiki/NewSQL

Page 55: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Applications targeted by NewSQL DBMSs are:• Short-lived (i.e., no user stalls)

• Touch a small subset of data using index lookups (i.e., no full table scans or large distributed joins)

• Are repetitive (i.e., executing the same queries with different inputs)

NewSQL Characteristics

Page 56: What Comes After NoSQL? NewSQL: A New Era of Challenges in
Page 57: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Case Study

Why NewSQL Databases

can be much faster than conventional databases?

Let’s hear from

Dr. Michael Stonebraker! (~3 Min Video)

Page 58: What Comes After NoSQL? NewSQL: A New Era of Challenges in
Page 59: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Shared Nothing Architecture

Page 60: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Storage Perspective

Figure 12: Storage perspective regarding Sql, NoSQL and NewSQL.

Page 61: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• New Approaches (New Design)• VoltDB

• NuoDB

• Google Spanner

• Memsql

• Clustrix

• SAP Hana

NewSQL Categories

• New Storage Engines• TokuDB

• ScaleDB

• Transparent Clustering (Middleware)• ScaleBase

• ScaleArc

• dbShards

Page 62: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Distributed Concurrency Control/Locking Protocols

• 2PL

• MVCC

• 2PL + MVCC

• TO

• (Among others…)

• Main Memory Systems

• Hybrid Architectures

• Query Code Compilation

NewSQL Core Technologies

Page 63: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Comparison

Table 3: Properties Comparison against Traditional Sql, NoSQL and NewSQL

Page 64: What Comes After NoSQL? NewSQL: A New Era of Challenges in

NewSQL DBMSs Examples

Page 65: What Comes After NoSQL? NewSQL: A New Era of Challenges in

TimesTenOracle Main Memory Relational Database

Page 66: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Characteristics• Oracle Main Memory DBMS

• Relational database management system with persistence and recoverability

• Originally designed and implemented at Hewlett-Packard labs in Palo Alto/California, TimesTen was spun out into a separate startup in 1996 and acquired by Oracle Corporation in 2005

• Can be used Stand Alone as a cache layer in Oracle DBMS

• TimesTEN is “Near” NewSQL

Oracle TimesTEN

Page 67: What Comes After NoSQL? NewSQL: A New Era of Challenges in

TimesTEN – Architecture

Figure 15: TimesTEN overall main memory architecture.

Page 68: What Comes After NoSQL? NewSQL: A New Era of Challenges in

TimesTEN – Architecture

Figure 16: TimesTEN replication architecture.

Page 69: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Cache Groups – Configure as a Oracle Cache Layer• Write-through cache directs write I/O onto cache and through to

underlying permanent storage before confirming I/O completion to the host. • This ensures data updates are safely stored on, for example, a shared

storage array, but has the disadvantage that I/O still experiences latencybased on writing to that storage.

• Write-through cache is good for applications that write and then re-read data frequently as data is stored in cache and results in low read latency.

Oracle TimesTEN

Page 70: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Cache Groups – Configure as a Oracle Cache Layer• Write-around cache is a similar technique to write-through cache,

but write I/O is written directly to permanent storage, bypassing the cache.

• This can reduce the cache being flooded with write I/O that will not subsequently be re-read, but has the disadvantage is that a read request for recently written data will create a “cache miss” and have to be read from slower bulk storage and experience higher latency.

Oracle TimesTEN

Page 71: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Cache Groups – Configure as a Oracle Cache Layer• Write-back cache is where write I/O is directed to cache and completion is

immediately confirmed to the host.

• This results in low latency and high throughput for write-intensive applications, but there is data availability exposure risk because the only copy of the written data is in cache.

• Suppliers have added resiliency with products that duplicate writes. Users need to consider whether write-back cache solutions offer enough protection as data is exposed until it is staged to external storage.

• Write-back cache is the best performing solution for mixed workloads as both read and write I/O have similar response time levels.

Oracle TimesTEN

Page 72: What Comes After NoSQL? NewSQL: A New Era of Challenges in

VoltDB

Page 73: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Characteristics• Performance and Scale without Sacrificing Consistency • In-memory storage • Stored procedure interface, async/sync procedure execution • Serializing all data access• Horizontal partitioning• Multi-master replication (“k-safety”)• Snapshots + Command Logging• K-Safety

• Without K-Safety, any node fail break the whole DB• Snapshot and shutdown minor segments during network partitions• Single-partition transactions are very fast• Multi-partition transactions are slower (manager)• Main Memory Table Lock

VoltDB

Page 74: What Comes After NoSQL? NewSQL: A New Era of Challenges in

VoltDB – Data Architecture

Figure 13: VoltDB data architecture.

Page 75: What Comes After NoSQL? NewSQL: A New Era of Challenges in

VoltDB – Internal Architecture

Figure 14: VoltDB internal architecture.

Page 76: What Comes After NoSQL? NewSQL: A New Era of Challenges in

NuoDB

Page 77: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Characteristics• Designed to deliver traditional enterprise database services on modern

public/private cloud infrastructures• Commodity Hardware

• Rethink some traditional RDBMS concepts to achieve scalability without sacrifice ACID properties

• Architecture Design Motivation• Every SQL RDBMS to date has adopted the 1970s idea of central

control• On those systems going faster requires a faster machine • Tweaking and optimizing the old architectures provides marginal

benefit• Elastic scalability requires a comprehensive rethink• Distinct Components: Storage Manager x Transaction Manager

NuoDB

Page 78: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Sql Layer (Parser, Optimizer, Query Executor)• Traditional RDBMS

• Monolithic • Synchronous• Scape Up

• NuoDB• Peer to Peer (P2P)• Asynchronous• Distributed• Scalable

NuoDB

Page 79: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Elasticity

• Scale-out and scale-in according to demand

• Virtualization

• Efficiently & dynamically run many databases on many heterogeneous commodity machines

• Extreme Availability

• Resilience to infrastructure failure, rolling upgrades, dynamics changes to database structure

• Geo-distribution

• Run “hot/hot” in multiple datacenters simultaneously, with locality of access and datacenter failover

• “Zero” DBA

• No backups, minimal performance tuning, automated everything

SQL ACID ≠ Elastic Cloud

Page 80: What Comes After NoSQL? NewSQL: A New Era of Challenges in

NuoDB – Overall Architecture

Management

Transaction

Handling

Storage

•Three tiers • Each is independent• Platform Agnostic• Extensible at various point

Page 81: What Comes After NoSQL? NewSQL: A New Era of Challenges in

NuoDB – Physical Architecture• Transaction Server

• Machine/VM running transaction managers

• Archive Server• Machine/VM running archive managers

• Agent• Admin daemons runs on all machines/VMs

• Broker• Runs on any NuoDB Machine/VM

Page 82: What Comes After NoSQL? NewSQL: A New Era of Challenges in

NuoDB – Overall Architecture

Console Broker Transaction Engine

Storage Manager

Agent Agent

Page 83: What Comes After NoSQL? NewSQL: A New Era of Challenges in

NuoDB Architecture – Scale Out

ManagementTransaction

HandlingStorage

Console Broker

Transaction Engine

Storage Manager

Agent

Agent

SAN

Transaction Engine

Agent

Storage Manager HDFS

Transaction Engine

Transaction Engine

Agent

Agent

Broker

Agent

Page 84: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Case Study

Page 85: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Features/Performance Comparison• Traditional SQL

• Oracle

• NoSQL

• MongoDB

• NewSQL

• Oracle TimesTEN

• VoltDB

• NuoDB

NewSQL DBMSs – Case Study

Page 86: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Let´s See in Action!

NewSQL DBMSs – Case Study

Page 87: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Overall Comparison

Page 88: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Year Released

Main Memory Storage

Partitioning Concurrency Control

Replication Summary

Clustrix [6] 2006 No Yes MVCC+2PL Strong+Passive MySQL-compatible DBMS that supports shared-nothing, distributed execution.

CockroachDB[7] 2014 No Yes MVCC Strong+Passive Built on top of distributed key/value store. Uses softwares hybrid clocks for WAN replication.

Google Spanner[24] 2012 No Yes MVCC+2PL Strong+Passive WAN-replicated, shared-nothing DBMS that uses special hardware for timestamp generation.

H-Store[8] 2007 Yes Yes TO Strong+Passive Single-threaded execution engines per partition. Optimized for stored procedures.

Hy-Per[9] 2010 Yes Yes MVCC Strong+Passive HTAP DBMS that uses query compilation aand memory eficiente indexes.

MemSQL[11] 2012 Yes Yes MVCC Strong+Passive Distributed, shared-nothing DBMS using compiled queries. Supports MySQL wire protocol.

NuoDB[14] 2013 Yes Yes MVCC Strong+Passive Split architecture with multiple in-memory executor nodes and a single shared storage node.

SAPHANA[55] 2010 Yes Yes MVCC Strong+Passive Hybrid storage(rows + cols).Amalgamation of previus TREX, P*TIME, and MaxDB systems.

VoltDB[17] 2008 Yes Yes TO Strong+Active Single-threaded execution engines per partition.Supports streaming operators.

AgilData[1] 2007 No Yes MVCC+2PL Strong+Passive Shared-nothing database sharding over single-node MySQL instances.

MariaDB MaxScale[10] 2015 No Yes MVCC+2PL Strong+Passive Query router that supports custom SQL rewriting. Relies on MySQL cluster for coordination

ScaleArc[10] 2009 No Yes Mixed Strong+Passive Rule-based query router for MySQL, SAQL server and Oracle.

Amazon Aurora[3] 2014 No No MVCC Strong+Passive Custom log-structured MySQL engine for RDS.

ClearDB[5] 2010 No No MVCC+2PL Strong+Active Centralize router that mirrors a single-node MySQL, instance in multiple data centers.

NE

W A

RC

HIT

EC

TU

RE

SM

IDD

LE

WA

RE

DB

AA

S

Page 89: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Research Opportunities

Page 90: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Fine-grain Elasticity

• Hybrid Architectures

• Larger than Memory DBs• Ex.: Anti-cache

• Use Machine learning to: • Understand and improve concurrency related topics (e.g. intra&inter transaction

dependencies) in NewSQL technologies• Understand database data access patterns

• Many-Core Concurrency

• Ability to execute analytical queries and machine learning algorithms on freshly obtained data

Research Opportunities

Page 91: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Conclusion and Final Remarks

Page 92: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• Growing data volume requires ever more efficient ways to store and process it

• Hard to pick one because they're not on a common scale

• NewSQL is an established trend with a number of options

• No silver bullet

• Most NewSQL systems are using known ideas to achieve high-performance for OLTP workloads

Conclusion and Final Remarks

Page 93: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• The longer-term question is whether NewSQL will survive as a database category.

• What are your true requirements, and what are the nice-to-haves?

• NewSQL database systems are not a radical departure from existing system architectures but rather represent the next chapter in the continuous development of database technologies?

• Nothing significantly new about the core concurrency control schemes in NewSQL systems other than laudable engineering to make these algorithms work well in the context of modern hardware and distributed operating environments.

Conclusion and Final Remarks

Page 94: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Download Link

Page 95: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Download Link

Fell Free to Download Our Slides, Scripts and

Running Examples:https://goo.gl/IHQxJA

Page 96: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Acknowledgements

Page 97: What Comes After NoSQL? NewSQL: A New Era of Challenges in

References

Page 98: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• [Adya et al. 2000] Adya, A., Liskov, B., ONeil, B., and ONeil, P. (2000). Generalized Isolation Level Definitions. In proceedingsof the IEEE International Conference on Data Engineering, San Diego, CA, March. [ANSI 1992] ANSI (1992). ANSI X3.135-1992.ANSI/ISO.

• [Ãetintemel et al. 2014] Ãetintemel, U., Du, J., Kraska, T., Madden, S., Maier, D. Meehan, J., Pavlo, A., Stonebraker, M.,Sutherland, E., Tatbul, N., Tufte, K., Wang, H., and Zdonik, S. B. (2014). S-Store: A Streaming NewSQL System for Big VelocityApplications. PVLDB, 7(13):1633–1636.

• [Apache 2016a] Apache (2016a). Apache cassandra. http://cassandra.apache.org. Accessed at September 2016.

• [Apache 2016b] Apache (2016b). Apache hbase. http://wiki.apache.org/hadoop/Hbase. Accessed at September 2016.

• [Arulraj et al. 2015] Arulraj, J., Pavlo, A., and Dulloor, S. R. (2015). Let’s talk about storage recovery methods for non-volatilememory database systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data,SIGMOD ’15, pages 707–722, New York, NY, USA. ACM.

• [Berenson et al. 1995] Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil, E., and O’Neil, P. (1995). A critique of ansi sqlisolation levels. In Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, SIGMOD ’95,pages 1–10, New York, NY, USA. ACM.

• [Bernstein and Goodman 1983] Bernstein, P. A. and Goodman, N. (1983). Multiversion concurrency control;theory andalgorithms. ACM Trans. Database Syst., 8(4):465–483.

References

Page 99: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• [Bernstein et al. 1986] Bernstein, P. A., Hadzilacos, V., and Goodman, N. (1986). Concurrency Control and Recovery in DatabaseSystems. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.

• [Brewer 2000] Brewer, E. A. (2000). Towards robust distributed systems (abstract). In Proceedings of the Nineteenth Annual ACMSymposium on Principles of Distributed Computing, PODC ’00, pages 7–, New York, NY, USA. ACM.

• [Cattell 2011] Cattell, R. (2011). Scalable sql and nosql data stores. SIGMOD Rec., 39:12–27.

• [Chang et al. 2006] Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., andGruber, R. E. (2006). Bigtable: A distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium onOperating Systems Design and Implementation - Volume 7, OSDI ’06, pages 15–15, Berkeley, CA, USA. USENIX Association.

• [Corbett et al. 2013] Corbett, J. C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J. J., Ghemawat, S., Gubarev, A., Heiser, C.,Hochschild, P., Hsieh,W. C., Kanthak, S., Kogan, E., Li, H., Lloyd, A., Melnik, S., Mwaura, D., Nagle, D., Quinlan, S., Rao, R., Rolig, L.,Saito, Y., Szymaniak, M., Taylor, C., Wang, R., and Woodford, D. (2013). Spanner: Google’s globally distributed database. ACMTrans. Comput. Syst., 31(3):8.

• [Dean and Ghemawat 2008] Dean, J. and Ghemawat, S. (2008). Mapreduce: Simplified data processing on large clusters.Commun. ACM, 51(1):107–113.

• [DeBrabant et al. 2013] DeBrabant, J., Pavlo, A., Tu, S., Stonebraker, M., and Zdonik, S. (2013). Anti-caching: A new approach todatabase management system architecture. Proc. VLDB Endow., 6(14):1942–1953.

References

Page 100: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• [Doshi et al. 2013] Doshi, K. A., Zhong, T., Lu, Z., Tang, X., Lou, T., and Deng, G. (2013). Blending sql and newsql approaches: Referencearchitectures for enterprise big data challenges. In Proceedings of the 2013 International Conference on Cyber-Enabled DistributedComputing and Knowledge Discovery, CYBERC ’13, pages 163–170, Washington, DC, USA. IEEE Computer Society.

• [Eswaran et al. 1976] Eswaran, K. P., Gray, J., Lorie, R. A., and Traiger, I. L. (1976). The notions of consistency and predicate locks in adatabase system. Commun. ACM, 19(11):624–633.

• [Fang et al. 2011] Fang, R., Hsiao, H.-I., He, B., Mohan, C., and Wang, Y. (2011). High Performance Database Logging using Storage ClassMemory. In Proceeding of the IEEE International Conference on Data Engineering, pages 1221–1231, USA.

• [Floratou et al. 2012] Floratou, A., Teletia, N., DeWitt, D. J., Patel, J. M., and Zhang, D. (2012). Can the elephants handle the nosqlonslaught? Proc. VLDB Endow., 5(12):1712–1723.

• [Gilbert and Lynch 2012] Gilbert, S. and Lynch, N. A. (2012). Perspectives on the cap theorem. IEEE Computer, 45(2):30–36.

• [Glushkov, Ivan 2016] GLUSHKOV, Ivan. NewSQL Overview. Disponível em: <http://pt.slideshare.net/IvanGlushkov/newsql-overview>.Acesso em: 31 mar. 2016.

• [Grolinger et al. 2013] Grolinger, K., Higashino, W. A., Tiwari, A., and Capretz, M. A. (2013). Data management in cloud environments:Nosql and newsql data stores. J. Cloud Comput., 2(1):49:1–49:24.

• [Gurevich 2015] Gurevich, Y. (2015). Comparative survey of nosql/newsql db systems. Master’s thesis, Open University of Israel,Computer Science Division.

References

Page 101: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• [Harizopoulos et al. 2008] Harizopoulos, S., Abadi, D. J., Madden, S., and Stonebraker, M. (2008). OLTP through the LookingGlass, and What We Found There. In Proceedings of the ACM SIGMOD International Conference on Management of Data,pages 981–992, Vancouver, Canada.

• [Katarina et al. 2013] Katarina Grolinger, Wilson A Higashino, Abhinav Tiwari and Miriam AM Capretz. Data management incloud environments: NoSQL and NewSQL data stores”; Journal of Cloud Computing: Advances, Systems and Applications2013, 2:22, http://www.journalofcloudcomputing.com/content/2/1/22.

• [Katsov 2012] Katsov, I. (2012). Nosql data modeling techniques. https://highlyscalable.wordpress.com/2012/03/01/nosql-datamodeling-techniques/. Accessed at September 2016.

• [Moniruzzaman 2014] Moniruzzaman, A. B. M. (2014). Newsql: Towards nextgeneration scalable RDBMS for onlinetransaction processing (OLTP) for big data management. CoRR, abs/1411.7343.

• [Neo4J 2016] Neo4J (2016). Neo4j graph nosql database. http://neo4j.com/. Accessed at September 2016.

• [NuoDB 2016a] NuoDB (2016a). Nuodb technical whitepaper. http://go.nuodb.com/white-paper.html. Accessed atSeptember 2016.

• [NuoDB 2016b] NuoDB (2016b). Nuodb: The sql database for the cloud. http://www.nuodb.com. Accessed at September2016.

• [NuoDB 2016c] NuoDB, I. (2016c). Nuodb at a glance. http://doc.nuodb.com/Latest/Default.htmNuoDB-At-a-Glance.htm.Accessed at September 2016.

References

Page 102: What Comes After NoSQL? NewSQL: A New Era of Challenges in

• [Pavlo and Aslett 2016] Pavlo, A. and Aslett, M. (2016). What’s really new with newsql? SIGMOD Rec., 45(2):45–55.

• [Shvachko et al. 2010] Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010). The hadoop distributed file system. InProceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST ’10, pages 1–10,Washington, DC, USA. IEEE Computer Society.

• [Stonebraker, Michael 2011] Stonebraker, Michael. NewSQL: An Alternative to NoSQL and Old SQL for New OLTP Apps.Communications of the ACM Blog. Retrieved 2012-07-06.

• [VoltDB 2016] VoltDB (2016). Voltdb: The world’s fastest, in-memory operational database. http://www.voltdb.com.Accessed at September 2016.

• [Tailor et al. 2016] TAILOR, Hemang; CHOUDHARY, Sushant; JAIN, Vinay. NewSQL: The Best of Both "OldSQL" and "NoSQL".Disponível em: <http://pt.slideshare.net/SUSHANTBCHOUDHARY/newsql-the-best-o>. Acesso em: 31 mar. 2016.

References

Page 103: What Comes After NoSQL? NewSQL: A New Era of Challenges in

Thank You!Questions?

Júlio Alcântara Tavares, Angelo Brayner,

José Maria Monteiro

105