skew-aware automatic database partitioning in shared
TRANSCRIPT
![Page 1: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/1.jpg)
Skew-Aware Automatic Database Partitioning in Shared-Nothing, Parallel OLTP Systems
SIGMOD 2012, Pavlo et al.
Hefu Chai
![Page 2: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/2.jpg)
Credit
• Part of slides from Andy Pavlo
![Page 3: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/3.jpg)
There is a saying…
• Girls are really only interested in two things. They want a guy that is good looking, or they want a guy that really knows a lot about databases.
Andy Pavlo
![Page 4: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/4.jpg)
4
ClientApplication
Database Cluster
Procedure NameInput Parameters
TransactionExecution
Database Cluster
TransactionResult
![Page 5: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/5.jpg)
5
![Page 6: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/6.jpg)
Existing database partitioning Techniques
• Notion of data declustering• Overhead of maintaining transaction consistency
• Lock contention
Not applicable to OLTP systems !
![Page 7: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/7.jpg)
Fast Repetitive Small
OLTP Transactions
![Page 8: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/8.jpg)
We need an approach that supports…
• Stored Procedure
• Load balancing in the presence of time-varying skew
• Complex schemas
• Deployments with larger number of partitions
![Page 9: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/9.jpg)
Automatic Database Design Toolfor Parallel Systems
Skew-Aware Automatic Database Partitioningin Shared-Nothing, Parallel OLTP SystemsSIGMOD 2012
![Page 10: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/10.jpg)
What are the key issues
• Distributed transactions
• Temporal workload skew
![Page 11: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/11.jpg)
0
25,000
50,000
75,000
100,000
125,000
150,000
4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64
txn/s
Partitions
No Distributed Txns 20% Distributed Txns
TPC-C NewOrder
11
Distributed transactions
![Page 12: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/12.jpg)
What are the key issues
• Distributed transactions
• Temporal workload skew
![Page 13: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/13.jpg)
Temporal workload skew
• Think about the example of Wikipedia• Even though the average load of the cluster for the entire day is uniform, the
load across the cluster for any point is unbalanced
• Static Skew Vs. Temporal Skew
![Page 14: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/14.jpg)
CUSTOMER
ORDERSITEM
CUSTOMER
ORDERSITEM
CUSTOMER
ORDERSITEM
CUSTOMER
ORDERSITEM
…
SkewEstimator
DTxnEstimator
Schema
Workload
---------------------
DDL SELECT * FROM WAREHOUSE WHERE W_ID = 10;INSERT INTO ORDERS(O_W_ID, O_D_ID, O_C_ID) VALUES (10, 9, 12345);
⋮
SELECT * FROM WAREHOUSE WHERE W_ID = 10;INSERT INTO ORDERS(O_W_ID, O_D_ID, O_C_ID) VALUES (10, 9, 12345);
⋮
SELECT * FROM WAREHOUSE WHERE W_ID = 10;SELECT * FROM DISTRICTD_W_ID = 10 AND D_ID =9;INSERT INTO ORDERS(O_W_ID, O_D_ID, O_C_ID) VALUES (10, 9, 12345);
⋮
SELECT * FROM WAREHOUSE WHERE W_ID = 10;SELECT * FROM DISTRICTWHERE D_W_ID = 10 AND D_ID =9;INSERT INTO ORDERS(O_W_ID, O_D_ID, O_C_ID,…) VALUES(10, 9, 12345,…);
⋮
DDLCUSTOMER
ORDERS
ITEM
14
![Page 15: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/15.jpg)
• Maintain the tradeoffbetween distributed transactions and temporal skew
• Extend design space to include replicated secondary indexes
• Organically handling stored procedure routing
Large Neighborhood Search
Skew-Aware Cost Model
![Page 16: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/16.jpg)
For each table:
• Horizontally partition
• Replicate on all partitions
• Replicate a secondary index for a subset of its column
• Effectively route incoming transaction requests
What are the design options
![Page 17: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/17.jpg)
o_id o_c_id o_w_id …78703 1004 5 -78704 1002 3 -78705 1006 7 -78706 1005 6 -78707 1005 6 -78708 1003 12 -
c_id c_w_id c_last …1001 5 RZA -1002 3 GZA -1003 12 Raekwon -1004 5 Deck -1005 6 Killah -1006 7 ODB -
CUSTOMER ORDERS
CUSTOMER
ORDERS
CUSTOMER
ORDERS
CUSTOMER
ORDERS
17
Horizontal Partitioning
![Page 18: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/18.jpg)
CUSTOMER
ORDERS
CUSTOMER
ORDERS
CUSTOMER
ORDERS
ITEMi_id i_name i_price …
603514 XXX 23.99 -267923 XXX 19.99 -475386 XXX 14.99 -578945 XXX 9.98 -476348 XXX 103.49 -784285 XXX 69.99 -
ITEM ITEM ITEM18
Table Replication
![Page 19: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/19.jpg)
CUSTOMER
ORDERS
CUSTOMER
ORDERS
CUSTOMER
ORDERS
ITM ITEM ITEM
CUSTOMERc_id c_w_id c_last …1001 5 RZA -1002 3 GZA -1003 12 Raekwon -1004 5 Deck -1005 6 Killah -1006 7 ODB -
19
Secondary Index
![Page 20: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/20.jpg)
CUSTOMER
ORDERS
CUSTOMER
ORDERS
CUSTOMER
ORDERS
ITEM ITEM ITEM
Client Application
NewOrder(5, “Method Man”, 1234)
20
Stored Procedure Routing
![Page 21: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/21.jpg)
What are the key technique contributions
• Large-Neighborhood Search
• Skew-Aware Cost Model
![Page 22: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/22.jpg)
Input
Workload
---------------------
Schema
DDL
Initial Design
Large-Neighborhood Search
22
• Select the most frequently accessed column for horizontal partitioning• Greedily replicate read-only tables until no space left• Select next most frequently accessed, read-only column as secondary• Index attribute• Select the routing parameter for stored procedures
Initial Design
![Page 23: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/23.jpg)
Initial Design Relaxation
Large-Neighborhood Search
23
• Allow LNS to escape a local minimum and jump to a new neighborhood of potential solutions• Horticulture must decide:
• How many tables to relax• Which tables to relax• What design options will be examined for each relaxed table
Relaxation
![Page 24: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/24.jpg)
Best Design
Relaxation Local Search
Restart
Large-Neighborhood Search
24
Local Search
![Page 25: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/25.jpg)
What are the key technique contributions
• Large-Neighborhood Search
• Skew-Aware Cost Model
![Page 26: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/26.jpg)
DistributedTransactions
WorkloadSkew Factor+
Cost Model
![Page 27: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/27.jpg)
Skew-Aware Cost Model
• Accentuates the properties that are important in a DB
• Compute quickly
• Estimate the cost of an incomplete design
• The cost estimates must increase monotonically as more variables
are set
![Page 28: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/28.jpg)
• Measure• How much workload executes as a single-partition transactions
• How uniformly load is distributed across the cluster
Skew-Aware Cost Model
Tradeoff!
![Page 29: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/29.jpg)
Skew-Aware Cost Model
Total number of partitions accessed divided by total number of partitions could havebeen accessed, and scale it up.
Coordinator Cost
![Page 30: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/30.jpg)
Skew-Aware Cost Model
To avoid time varying skew, divide W into finite intervals
Skew Factor
![Page 31: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/31.jpg)
Incomplete Designs
• Query that references a table with an unset attribute in a design as being unknown• For each unknown query:
• Coordinator Cost: Assume that any unknown query is single-partitioned• Skew Factor: Assume that unknown queries execute on all partitions in the cluster
• ‘Unknown’ change to ‘known’• ‘Known’ cannot change to ‘Unknown’
monotonically increase!
![Page 32: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/32.jpg)
Optimizations
• Access Graphs
• Workload Compression
![Page 33: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/33.jpg)
Vertex: TableEdge: tables are co-accessedWeight of edges: the number of times the queries forming the relationship
Access Graph
![Page 34: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/34.jpg)
Optimizations
• Access Graphs
• Workload Compression
![Page 35: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/35.jpg)
• combine sets of similar queries in individual transactions into fewer weighted records• combine similar transactions into a smaller number of weighted records in the same manner
Workload Compression
![Page 36: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/36.jpg)
TATP TPC-C TPC-C Skewed
(txn/s)
+88% +16% +183%
Horticulture State-of-the-Art
Throughput
0
10,000
20,000
30,000
40,000
50,000
60,000
4 8 16 32 640
2,000
4,000
6,000
8,000
10,000
12,000
14,000
4 8 16 32 640
10,00020,00030,00040,00050,00060,00070,00080,000
4 8 16 32 64
![Page 37: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/37.jpg)
TATP SEATS
TPC-C TPC-C Skewed
AuctionMark TPC-E
Search Times%
Single
-Partit
ioned
Tran
sactio
ns
![Page 38: Skew-Aware Automatic Database Partitioning in Shared](https://reader030.vdocuments.mx/reader030/viewer/2022040307/624753f96bfd7d3be9434eee/html5/thumbnails/38.jpg)
38
Andy: it works !