relational cloud, a database-as-a-service for the cloud

43
In The Name Of God 1 Relational Cloud

Upload: hossein-riasati

Post on 16-Jul-2015

204 views

Category:

Software


3 download

TRANSCRIPT

Page 1: Relational cloud, A Database-as-a-Service for the Cloud

In The Name Of God

1Relational Cloud

Page 2: Relational cloud, A Database-as-a-Service for the Cloud

A Database-as-a-Service for the Cloud

Relational Cloud

By: Hossein Riasati[[email protected]]

University of Tehran, College of Farabi

Page 3: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 3

“Simplicity is the ultimate sophistication.” ~ Leonardo da Vinci

Page 4: Relational cloud, A Database-as-a-Service for the Cloud

A cloud database is a database that typically runs on a cloud computing platform.

1. Virtual machine Image

2. Database as a service (DBaaS)

Cloud Database

4Relational Cloud

Page 5: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 5

Moving tasks from database user to service operator:

• Configuration

• Scalability

• Performance tuning

• Backup

• Privacy

• Access control

• Licensing

• Pay-per-use

What is DBaaS?

Page 6: Relational cloud, A Database-as-a-Service for the Cloud

Some Cloud Databases

• Amazon RDS

• Microsoft SQL Azure (MSSQL)

• Google Cloud SQL (MySQL)

• EnterpriseDB (PostgreSQL)

• Garantia Data (NoSQL)

• MongoLab (MongoDB)

• StormDB

• Xeround

10 Most Useful Cloud Databases6Relational Cloud

Page 7: Relational cloud, A Database-as-a-Service for the Cloud

Amazon RDS is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database management tasks, freeing you up to focus on your applications and business.

• MySQL (2009)

• Oracle (2011)

• SQL Server (2012)

• PostgreSQL (2013)

Amazon Relational Database Service (RDS)

7Relational Cloud

Page 8: Relational cloud, A Database-as-a-Service for the Cloud

• A cloud-based service offering data-storage capabilities

• Based on Microsoft SQL Server

• High availability

• Elastic scale

• Rapid provisioning

• Pay-per-use

Microsoft SQL Azure

8Relational Cloud

Page 9: Relational cloud, A Database-as-a-Service for the Cloud

9Relational Cloud

Relational Database + Cloud Computing = Relational Cloud

Page 10: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 11

• Efficient Multi-tenancy

• Elastic Scalability

• Database Privacy

Challenges in Relational Cloud

Page 11: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 12

Goal: minimize the number of machines required, while meeting application-level query performance goals

1st approach: DB-in-VM

• Each database on a single VM

• Multiple VMs on a single physical machine

• Requires 2x to 3x more machines

• Delivers 6x to 12x less performance

Efficient Multi-tenancy

Page 12: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 13

2nd approach:

• Single database server on each machine

• Multiple logical databases on each server

• Relational Cloud periodically determines which databases should be placed on which machines

• Using a non-linear optimization formulation

• Estimates the resource utilization of multiple databases

Efficient Multi-tenancy

Page 13: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 14

• When a database work-load exceeds the capacity of a single machine

• Query processing (and the corresponding data) is partitioned amongst multiple nodes

• Workload-aware partitioner

• Automatically analyze complex query workloads

• Map data items to nodes

• Minimize the number of multi-node transactions/statements

Elastic Scalability

Page 14: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 15

• CryptDB

• Prevents administrators from seeing a user's data

• Adjustable security

• Different encryption levels for different types of data

• Only a 22.5% performance reduction in throughput

Privacy

Page 15: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 16

• Existing unmodified DBMS engines in the back-end nodes

• Each tenant of the system can load one or more databases

• Applications communicate with Relational Cloud using a standard connectivity layer such as JDBC.

System Design

Page 16: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 17

System Design

Page 17: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 18

• Partition each database into one or more pieces when the load on a database exceeds the capacity of a single machine

• Place the database partitions on the back-end machines

• Minimize the number of machines

• Balance load

• Migrate the partitions as needed without causing downtime

• Replicate the data for availability

• Secure the data and process the queries so that they can run on untrusted back-ends over encrypted data

Role of front-end nodes

Page 18: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 19

• Goals:

• To scale a single database to multiple nodes

• To enable more granular placement and load balance

• Current strategy is well-suited to OLTP and Web workloads

• OLTP vs. OLAP

• Minimizes the number of multi-node transactions

• Workload-aware partitioning strategy

• Front-end node periodically analyzes query execution traces to identify sets of tuples that are accessed together

Database Partitioning

Page 19: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 21

• Execution graph (weighted)

• Each node is a tuple or collection of tuples

• An edge is drawn between any two nodes whose tuples are touched within a single transaction

• G, Karypis and V, Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput., 20(1), 1998

• Output of the partitioner is an assignment of individual tuples to logical partitions

Database Partitioning

Page 20: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 22

• Where to dispatch each query?

• Classification problem (Decision Tree)

• Features: the tuple attributes

• Target field: Partition label for each tuple

• Independence from schema layout & foreign key information

• Discover correlations hidden in the data

Database Partitioning

Page 21: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 23

• Big graph problem!

• Database with N tuples

• N nodes

• N2 edges

• Existing graph partitioning implementations scale only to a few tens of millions of nodes

• Heuristic methods:

• Blanket statement removal

• Sampling tuples and transactions

Database Partitioning

Page 22: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 24

Ready for part 2 ?!

Page 23: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 25

• Monitoring the resource requirements of each workload

• Predicting the load multiple workloads will generate when run together on a server

• Assigning workloads to physical servers

• Migrating them between physical nodes

Monitoring and consolidation engine: Kairos

Kairos input: existing (non-consolidated) collection of workloads, and a set of target physical machines

Placement & Migration

Page 24: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 26

1. Resource Monitor• Through an automated statistics collection process, the resource monitor

captures a number of DBMS and OS statistics

2. Combined Load Predictor• Developed a non-linear model of CPU, RAM, and disk

• To predict the combined resource requirements when multiple workloads are consolidated onto a single physical server

• Accuracy at predicting the combined disk requirements of multiple workloads is up to 30 x better than simply assuming that disk I/O combines additively

3. Consolidation Engine• Kairos uses non-linear optimization techniques to place database partitions on

back-end nodes

Kairos components

Page 25: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 27

• Relocate database partitions across physical nodes

• Why migration?

1. For scheduled maintenance and administration tasks

2. To respond to load changes

• Live migration: without downtime or reducing performance

• Currently developing and testing a cache-like approach

Live Migration

Page 26: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 28

Last Part: Privacy

Page 27: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 29

• Encrypt each value of each row independently into an onion

• Back-end DBMS unable to answer queries

• A design that will allow DBAs to perform tuning tasks without having any visibility into the actual stored data

• Adjustable Security

CryptDB

Page 28: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 30

1. RND: Randomized encryption

2. DET: Deterministic encryption

3. OPE: Order-preserving encryption

4. HOM: Homomorphic encryption

Cryptographic Techniques

Weaker

Page 29: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 31

Onion Layers of Encryption

Page 30: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 32

• Start the database with all data encrypted with the most private scheme, RND.

• JDBC client has access to the keys for all onion layers of every ciphertext stored on the server (by computing them based on a single master key).

• When the JDBC client driver receives SQL queries from the application, it computes the onion keys needed by the server to decrypt certain columns to the maximum privacy level that will allow the query execute on the server.

CryptDB

Page 31: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 33

• Security level dynamically adapts based on the queries thatapplications make.

• For simplicity, CryptDB encrypts all data items in a column using the same set of keys.

• Each layer of the onion has a different key (different from any other column).

CryptDB

Page 32: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 34

• The encryption algorithms are symmetric; in order for the server to remove a layer, the server must receive the symmetric onion key for that layer from the JDBC client.

• Once the entire column has been decrypted, the original onion ciphertext is discarded, since inner onion layers can support a superset of queries compared to outer layers.

• Key factor in performance: ciphertext expansion

CryptDB

Page 33: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 35

• SELECT i_price, ... FROM item WHERE i_id=N

• Initially each column in the database is separately encrypted in several layers of encryption, RND the outer layer.

• JDBC client will decrypt the i_id column to DET level 4 by sending the appropriate decryption key to the server.

• The query will return RND-encrypted ciphertexts to the JDBC client, which will decrypt them for the application.

CryptDB Example

Page 34: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 36

• SELECT c_discount, w_tax, ... FROM customer,

warehouse WHERE w_id=c_w_id AND c_id=N

• JDBC client needs to decrypt the w_id and c_w_id columns to DET level 2.

• JDBC client needs to decrypt c_id column to DET level 4 and send the DET-encrypted value N to the server.

CryptDB Example

Page 35: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 37

• SELECT SUM(ol_amount) FROM order_line WHERE

ol_o_id=N

• Server needs the keys to adjust the encryption of the ol_amount field to HOM.

CryptDB Example

Page 36: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 38

Experiments

Page 37: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 39

Experiments:Consolidation/Multi-tenancy

• Obtained the load statistics for about 200 servers from three data centers hosting the production database servers

Page 38: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 40

Experiments:Consolidation/Multi-tenancy

Multiplexing efficiency for TPC-C workloads

Page 39: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 41

• Measured the time to process 100,000 statements (selects/updates)

• Client-side overhead: an average per statement 25.6 ms

• Server-side overhead:

Experiments: CryptDB Performance

Page 40: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 42

Experiments: Scalability

Page 41: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 43

Experiments: Network Latency

Page 42: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 44

So, what do you think about

Relational Cloud?

Page 43: Relational cloud, A Database-as-a-Service for the Cloud

Relational Cloud 45

Thank You