tractor pulling on data warehouse

29
Tractor Pulling on Datawarehouses Martin Kersten , Volker Markl Meikel Poess, Kai-Uwe Settler Alfons Kemper, Ani Nica, DBTest 2011

Upload: planetdata-network-of-excellence

Post on 11-May-2015

908 views

Category:

Technology


1 download

DESCRIPTION

This topic was presented by Martin Kersten (CWI) at the 4th International Workshop on Testing Database Systems (DBTest 2011) on June 13th, 2011 in Athens, Greece.Publication: http://bit.ly/yK5JZkAbstract: Robustness of database systems under stress is hard to quantify, because there are many factors involved, most notably the user expectation to perform a job within certain bounds of the user requirements. Nevertheless, robustness of database system is very important to end users. In this paper we develop a database benchmark suite, inspired by tractor pulling, where robustness is measured as a system's ability to process data despite a continuous increase in system load, as defined in terms of data volume, query volume and complexity. A functional evaluation is performed against several systems to highlight the benchmark capabilities.

TRANSCRIPT

Page 1: Tractor Pulling on Data Warehouse

Tractor Pulling on Datawarehouses

Martin Kersten, Volker MarklMeikel Poess, Kai-Uwe Settler

Alfons Kemper, Ani Nica,

DBTest 2011

Page 2: Tractor Pulling on Data Warehouse

The good old days• The early eighties when – Oracle appeared on the scene– Ingres was a respected innovator on

RDBMS– System R fought the Codasyl battle– IMS was still dominating the market

• There was a need for a metric to evaluate the solutions

Page 3: Tractor Pulling on Data Warehouse
Page 4: Tractor Pulling on Data Warehouse

The good old days• Turned into an organised battle– TPC-C, TPC-H, TPC-D, TPC-W… – hundreds of benchmarks to proof one’s

muscles

Page 5: Tractor Pulling on Data Warehouse

• We need tools to assess a solution space

• We don’t need weapons to win a ‘war’

Page 6: Tractor Pulling on Data Warehouse

Dagstuhl 2010 Robust Query Processing

Page 7: Tractor Pulling on Data Warehouse
Page 8: Tractor Pulling on Data Warehouse

• With each step in the pull the tension of the Tractor increases (exponentially)

• The Tractor driver is throttling and changing gears to keep it going

Page 9: Tractor Pulling on Data Warehouse

Ingredients of the DBMS Tractor Pull

• A tractor pull is a series of workload steps for which we measure the performance

• Each step is defined by – Catalog changes– Database load, delete+load+create

index– Query processing, BI grouped statistics– Concurrency– Act of God operations

Page 10: Tractor Pulling on Data Warehouse

A database soil

Generate a small database < RAMUse a single data type

Page 11: Tractor Pulling on Data Warehouse

A database soil

Cop

COPY the smaller relation into the larger one

Page 12: Tractor Pulling on Data Warehouse

A database soil

Page 13: Tractor Pulling on Data Warehouse

Query template

SELECT R0.B0, ...,Ri.Bi, count(*), avg(R0.B0),avg(R1.B0), avg(R1.B1),. . ., avg(Ri.B0), . . .FROM R0, . . . , RiWHERE selectpattern(R0, . . . , Ri) AND joinpattern(R0, . . . , Ri)GROUP BY R0.B0, . . . , Ri.BiORDER BY R0.B0, . . . , Ri.Bi

Linear, Cyclic, Star-based, Clique query patterns

The n-th query load includes the n-1 th query load

Page 14: Tractor Pulling on Data Warehouse

Scenarios• Tractor pull workload

• W(N) = < S, L, Pre, Qry, Post, qry, db>– Schema adjustments– Loading the database – Pre-optimization– Query execution– Post optimization– query characteristics– db growth function

Page 15: Tractor Pulling on Data Warehouse

Hill scenario• The Hills scenario models a data

warehouse that grows with a modest growth rate of g ∈ (0, 1) (e.g., g = 0.2).

• It starts out from a main-memory focus until it overflows into a few disks.

• It will highlight a system’s robustness to deal with the memory-disk performance chasm.

Page 16: Tractor Pulling on Data Warehouse

Hill scenarioA modest growing warehouse with a

single user.The database fits in memory and spills

over to disk

D ∈ (0%, 100%), G∈ (0, 1)Number of connections at track I : 1db(0) = (D x RAM) x ( 1 / (2 x dom) )db(i) = g x i x db(0)qry(0) = 1, qry(i) = 4|qry(i)| = 1 + 4 x i

Page 17: Tractor Pulling on Data Warehouse

A stable warehouse with a multiple users.Query templates stress complexity

d∈(0%,100%), g=0, C>1Number of connections at track i : Cdb(0) = (d × RAM) × (1) 2×domdb(i) = 0 (no growth)qry(0) = 0, qry(i) = C |Q(i)| = 1 + C × i

Meadow scenario

Page 18: Tractor Pulling on Data Warehouse

A growing warehouse with a multiple users.

Query templates stress complexity

d∈(0%,100%), g∈ (0,10)Number of connections at track i : idb(0) = (d × RAM) × (1) 2×domdb(i) = g × i × db(0)qry(0) = 0, qry(i) = i × 4|Q(i)| = 1 + 4 × i (i+1)/2

Rockies scenario

Page 19: Tractor Pulling on Data Warehouse

Robustness metrics• It is a multi-dimensional metric

aimed at measuring the deviation from the expected norm

• Robust(N)=<L, S, QO, QOk, QE, QEk, H>– Standard deviation of the loading time L– ,, Storage requirements– ,, Query optimization (per track– ,, Query execution (per track)– ,, Holistic

Page 20: Tractor Pulling on Data Warehouse

A hill scenario

Page 21: Tractor Pulling on Data Warehouse

A meadow Scenario

Page 22: Tractor Pulling on Data Warehouse

A Rockies scenario

Page 23: Tractor Pulling on Data Warehouse

Take aways

• Robustness is all about comparisons. We need methods to quickly determine difference in behavior.

• If the system reaches the end of the field we are happy. If it blows up or if the queries are behaving worse along the way it is not robust.

Page 24: Tractor Pulling on Data Warehouse

Conclusions• Tractorpulling is an effective new

toolkit for robustness testing a DBMS in various dimensions

• Refinements for ease of analysis is needed (GUIs)

• http://sourceforge.net/projects/tractorpulling

Page 25: Tractor Pulling on Data Warehouse
Page 26: Tractor Pulling on Data Warehouse
Page 27: Tractor Pulling on Data Warehouse
Page 28: Tractor Pulling on Data Warehouse
Page 29: Tractor Pulling on Data Warehouse