relational cloud, a database-as-a-service for the cloud
TRANSCRIPT
In The Name Of God
1Relational Cloud
A Database-as-a-Service for the Cloud
Relational Cloud
By: Hossein Riasati[[email protected]]
University of Tehran, College of Farabi
Relational Cloud 3
“Simplicity is the ultimate sophistication.” ~ Leonardo da Vinci
A cloud database is a database that typically runs on a cloud computing platform.
1. Virtual machine Image
2. Database as a service (DBaaS)
Cloud Database
4Relational Cloud
Relational Cloud 5
Moving tasks from database user to service operator:
• Configuration
• Scalability
• Performance tuning
• Backup
• Privacy
• Access control
• Licensing
• Pay-per-use
What is DBaaS?
Some Cloud Databases
• Amazon RDS
• Microsoft SQL Azure (MSSQL)
• Google Cloud SQL (MySQL)
• EnterpriseDB (PostgreSQL)
• Garantia Data (NoSQL)
• MongoLab (MongoDB)
• StormDB
• Xeround
10 Most Useful Cloud Databases6Relational Cloud
Amazon RDS is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database management tasks, freeing you up to focus on your applications and business.
• MySQL (2009)
• Oracle (2011)
• SQL Server (2012)
• PostgreSQL (2013)
Amazon Relational Database Service (RDS)
7Relational Cloud
• A cloud-based service offering data-storage capabilities
• Based on Microsoft SQL Server
• High availability
• Elastic scale
• Rapid provisioning
• Pay-per-use
Microsoft SQL Azure
8Relational Cloud
9Relational Cloud
Relational Database + Cloud Computing = Relational Cloud
Relational Cloud 11
• Efficient Multi-tenancy
• Elastic Scalability
• Database Privacy
Challenges in Relational Cloud
Relational Cloud 12
Goal: minimize the number of machines required, while meeting application-level query performance goals
1st approach: DB-in-VM
• Each database on a single VM
• Multiple VMs on a single physical machine
• Requires 2x to 3x more machines
• Delivers 6x to 12x less performance
Efficient Multi-tenancy
Relational Cloud 13
2nd approach:
• Single database server on each machine
• Multiple logical databases on each server
• Relational Cloud periodically determines which databases should be placed on which machines
• Using a non-linear optimization formulation
• Estimates the resource utilization of multiple databases
Efficient Multi-tenancy
Relational Cloud 14
• When a database work-load exceeds the capacity of a single machine
• Query processing (and the corresponding data) is partitioned amongst multiple nodes
• Workload-aware partitioner
• Automatically analyze complex query workloads
• Map data items to nodes
• Minimize the number of multi-node transactions/statements
Elastic Scalability
Relational Cloud 15
• CryptDB
• Prevents administrators from seeing a user's data
• Adjustable security
• Different encryption levels for different types of data
• Only a 22.5% performance reduction in throughput
Privacy
Relational Cloud 16
• Existing unmodified DBMS engines in the back-end nodes
• Each tenant of the system can load one or more databases
• Applications communicate with Relational Cloud using a standard connectivity layer such as JDBC.
System Design
Relational Cloud 17
System Design
Relational Cloud 18
• Partition each database into one or more pieces when the load on a database exceeds the capacity of a single machine
• Place the database partitions on the back-end machines
• Minimize the number of machines
• Balance load
• Migrate the partitions as needed without causing downtime
• Replicate the data for availability
• Secure the data and process the queries so that they can run on untrusted back-ends over encrypted data
Role of front-end nodes
Relational Cloud 19
• Goals:
• To scale a single database to multiple nodes
• To enable more granular placement and load balance
• Current strategy is well-suited to OLTP and Web workloads
• OLTP vs. OLAP
• Minimizes the number of multi-node transactions
• Workload-aware partitioning strategy
• Front-end node periodically analyzes query execution traces to identify sets of tuples that are accessed together
Database Partitioning
Relational Cloud 21
• Execution graph (weighted)
• Each node is a tuple or collection of tuples
• An edge is drawn between any two nodes whose tuples are touched within a single transaction
• G, Karypis and V, Kumar, A fast and high quality multilevel scheme for partitioning irregular graphs, SIAM J. Sci. Comput., 20(1), 1998
• Output of the partitioner is an assignment of individual tuples to logical partitions
Database Partitioning
Relational Cloud 22
• Where to dispatch each query?
• Classification problem (Decision Tree)
• Features: the tuple attributes
• Target field: Partition label for each tuple
• Independence from schema layout & foreign key information
• Discover correlations hidden in the data
Database Partitioning
Relational Cloud 23
• Big graph problem!
• Database with N tuples
• N nodes
• N2 edges
• Existing graph partitioning implementations scale only to a few tens of millions of nodes
• Heuristic methods:
• Blanket statement removal
• Sampling tuples and transactions
Database Partitioning
Relational Cloud 24
Ready for part 2 ?!
Relational Cloud 25
• Monitoring the resource requirements of each workload
• Predicting the load multiple workloads will generate when run together on a server
• Assigning workloads to physical servers
• Migrating them between physical nodes
Monitoring and consolidation engine: Kairos
Kairos input: existing (non-consolidated) collection of workloads, and a set of target physical machines
Placement & Migration
Relational Cloud 26
1. Resource Monitor• Through an automated statistics collection process, the resource monitor
captures a number of DBMS and OS statistics
2. Combined Load Predictor• Developed a non-linear model of CPU, RAM, and disk
• To predict the combined resource requirements when multiple workloads are consolidated onto a single physical server
• Accuracy at predicting the combined disk requirements of multiple workloads is up to 30 x better than simply assuming that disk I/O combines additively
3. Consolidation Engine• Kairos uses non-linear optimization techniques to place database partitions on
back-end nodes
Kairos components
Relational Cloud 27
• Relocate database partitions across physical nodes
• Why migration?
1. For scheduled maintenance and administration tasks
2. To respond to load changes
• Live migration: without downtime or reducing performance
• Currently developing and testing a cache-like approach
Live Migration
Relational Cloud 28
Last Part: Privacy
Relational Cloud 29
• Encrypt each value of each row independently into an onion
• Back-end DBMS unable to answer queries
• A design that will allow DBAs to perform tuning tasks without having any visibility into the actual stored data
• Adjustable Security
CryptDB
Relational Cloud 30
1. RND: Randomized encryption
2. DET: Deterministic encryption
3. OPE: Order-preserving encryption
4. HOM: Homomorphic encryption
Cryptographic Techniques
Weaker
Relational Cloud 31
Onion Layers of Encryption
Relational Cloud 32
• Start the database with all data encrypted with the most private scheme, RND.
• JDBC client has access to the keys for all onion layers of every ciphertext stored on the server (by computing them based on a single master key).
• When the JDBC client driver receives SQL queries from the application, it computes the onion keys needed by the server to decrypt certain columns to the maximum privacy level that will allow the query execute on the server.
CryptDB
Relational Cloud 33
• Security level dynamically adapts based on the queries thatapplications make.
• For simplicity, CryptDB encrypts all data items in a column using the same set of keys.
• Each layer of the onion has a different key (different from any other column).
CryptDB
Relational Cloud 34
• The encryption algorithms are symmetric; in order for the server to remove a layer, the server must receive the symmetric onion key for that layer from the JDBC client.
• Once the entire column has been decrypted, the original onion ciphertext is discarded, since inner onion layers can support a superset of queries compared to outer layers.
• Key factor in performance: ciphertext expansion
CryptDB
Relational Cloud 35
• SELECT i_price, ... FROM item WHERE i_id=N
• Initially each column in the database is separately encrypted in several layers of encryption, RND the outer layer.
• JDBC client will decrypt the i_id column to DET level 4 by sending the appropriate decryption key to the server.
• The query will return RND-encrypted ciphertexts to the JDBC client, which will decrypt them for the application.
CryptDB Example
Relational Cloud 36
• SELECT c_discount, w_tax, ... FROM customer,
warehouse WHERE w_id=c_w_id AND c_id=N
• JDBC client needs to decrypt the w_id and c_w_id columns to DET level 2.
• JDBC client needs to decrypt c_id column to DET level 4 and send the DET-encrypted value N to the server.
CryptDB Example
Relational Cloud 37
• SELECT SUM(ol_amount) FROM order_line WHERE
ol_o_id=N
• Server needs the keys to adjust the encryption of the ol_amount field to HOM.
CryptDB Example
Relational Cloud 38
Experiments
Relational Cloud 39
Experiments:Consolidation/Multi-tenancy
• Obtained the load statistics for about 200 servers from three data centers hosting the production database servers
Relational Cloud 40
Experiments:Consolidation/Multi-tenancy
Multiplexing efficiency for TPC-C workloads
Relational Cloud 41
• Measured the time to process 100,000 statements (selects/updates)
• Client-side overhead: an average per statement 25.6 ms
• Server-side overhead:
Experiments: CryptDB Performance
Relational Cloud 42
Experiments: Scalability
Relational Cloud 43
Experiments: Network Latency
Relational Cloud 44
So, what do you think about
Relational Cloud?
Relational Cloud 45
Thank You