wisconsin benchmark june 2001 prof. sang ho lee

1

Wisconsin benchmark

June 2001

Prof. Sang Ho Lee

Soongsil University

[email protected]

2

Overview (1)

References D. Bitton, D. J. DeWitt and C. Turbyfill, Benchmarking Database

Systems: A Systematic Approach, Proc. of the Ninth Int. Conference on Very Large Data Bases: 8-19, 1983.

D. Bitton and C. Turbyfill, A Retrospective on the Wisconsin Benchmark, In: Readings in Database Systems, M. Stonebraker ed., Morgan Kaufmann, 1988.

D. DeWitt, The Wisconsin Benchmark: Past, Present, and Future, In: The Benchmark Handbook: 269-316, J. Gray ed., Morgan Kaufmann, 1993.

Developed to measure the DIRECT database machine initially in 1983

The first “real” benchmark for relational databases Timeliness, simplicity and portability made it widely used !

3

Overview (2)

Synthetic database and controlled workload 32 queries in total Metric: elapsed time Focuses on access methods and query optimization in

relational databases Limitations

A single-user benchmark No test of concurrency control and recovery Tests features of the query optimizer only

No longer widely used to evaluate single-processor relational systems, but fairly used to evaluate database systems on parallel processors (Gamma, Tandem, Volcano, etc.)

4

Original test databases

Synthetic databases: approx. 5 M bytes Three relations: identical attributes but different

cardinalities Onektup (1000 tuples) Tenktup1 (10,000 tuples) Tenktup2 (10,000 tuples)

13 integer attributes + 3 52-byte string attributes One tuple = 182 bytes

Strings 3 distinguishing characters in position 1, 27, 52 The same character is padded in other positions String4 has only 4 unique values

5

Original tenktup relation

unique1unique2twofourtentwentyhundredthousandtwothousfivethoustenthousodd100even100stringu1stringu2string4

intintintintintintintintintintintintint

charcharchar

0 - 99990 - 9999

0 - 10 - 30 - 9

0 - 190 - 990 - 999

0 - 19990 - 49990 - 9999

5050

randomrandomrotatingrotatingrotatingrotatingrotatingrandomrandomrandomrandomrotatingrotatingrandomrotatingrotating

candidate keydeclared key

0,1,0,1,...0,1,2,3,0,1,...0,1,…,9,0,...

0,1,…,19,0,...0,1,…,99,0,...

candidate key1,3,5,…,99,1,...

2,4,6,…,100,2,...candidate keycandidate key

Name Type Range Order Comment

6

Indexes

Three indexes Clustered unique index (unique2) Non-clustered unique index (unique1) Non-clustered non-unique index (hundred)

7

Retrospective on test database

Why 2-byte integers only ? Why 52-byte fixed-length string ?

Ad-hoc survey shows that fixed or variable length strings of 20-30 characters are more common.

Most strings are differentiated by the first few characters in the string.

All values are uniformly distributed -- unrealistic Is 5M bytes database too small ? Hard to scale database

2-byte integer restricts the max. size of database to 32768 tuples

8

Scaling the benchmark relations

unique1unique2twofourtentwentyonePercenttenPercenttwentyPercentfiftyPercentunique3evenOnePercentoddOnePercentstringu1stringu2string4

0 - (maxtuples - 1)0 - (maxtuples - 1)

0 - 10 - 30 - 9

0 - 190 - 990 - 90 - 40 - 1

0 - (maxtuples - 1)0,2,4,…,1981,3,5,…,199

randomsequential

randomrandomrandomrandomrandomrandomrandomrandomrandomrandomrandomrandomrandomcyclic

unique, random orderunique, sequential(unique1 mod 2)(unique1 mod 4)

(unique1 mod 10)(unique1 mod 20)

(unique1 mod 100)(unique1 mod 10)(unique1 mod 5)(unique1 mod 2)

unique1(onePercent * 2)

(onePercent * 2) + 1candidate keycandidate key

Name Range Order Comment

9

Test queries: Strategies

To avoid compounding factors, default execution parameters are set 1,000 tuples in result All 16 attributes in the result Result output mode - into a relation Integer attributes in selection predicates One relation queries - tenktup

Three basic performance factors are varied Storage structure of relation Indexing: no index, primary index (unique2), secondary index (unique1) Selectivity

In retrospective: 1000 tuples in result are too many Not always all attributes in the result tuples Composite index should be included

10

Test queries: An overview

Totally 32 queries Relational instruction set

Selection with different selectivity factors Projections with different percentages of duplicate attributes 2-way and 3-way joins Simple aggregates and aggregate functions Updates: insert, delete, update

11

Experimental environments (1)

Hardware CPU : Ultra SPARC processor 233 MHz 1EA. Main memory : 128 MB HDD : 4GB internal HDD 1EA., 36GB external HDD 2EA.

OS : SunOS 5.7 DBMS A Experimental repetition frequency

Run a query 5 times Read garbage data after each query execution, to flush buffers

Measurement time An arithmetic mean of 5 query elapsed times

12

Experimental environments (2)

Test database scaling : 20 times bigger than original database records Data tablespace : 2 GB Index tablespace : 1 GB Rollback space : 500 MB Temporary tablespace : 300 MB

Query optimization method CHOOSE : Cost-based optimization is a base method, if there are

not statistical data, rule-based optimization is used ANALIZE TABLE

13

DBMS parameters

data_block_size : 2048 bytes db_block_buffers : 20000 blocks (40 MB) shared_pool_size : 10240000 bytes (10 MB) log_buffer : 20480000 bytes (20 MB) log_checkpoint_interval : 40000 OS blocks (20 MB)

SunOS block size : 512 bytes/block

log_checkpoint_timeout : 0 other parameters : default value used

14

Selections (1)

A selection operation depends on a number of different factors Hardware speed, architecture and quality of software Storage organization of relation and index Selectivity factor Query output mode

8 queries in total 6 queries

Into temporary table (1%, 10%) vs. (no index, prime index, secondary index)

2 queries Outputs to screen 1% and one tuple returned

15

Selections (2)

Query 1 (no index) – 1% selectionINSERT INTO TEMPSELECT * FROM BASERELATION1WHERE unique2D BETWEEN :lower AND :upper lower : random value , upper : lower + ( # of tuples * selectivity )

Query 3 ( clustered index ) – 1 % selectionINSERT INTO TMPSELECT * FROM BASERELATION1WHERE unique2D BETWEEN :lower AND :upper lower : random value , upper : lower + ( # of tuples * selectivity )

Query 5 –1% selection via a non-clustered indexINSERT INTO TMPSELECT * FROM BASERELATION1WHERE unique1D BETWEEN :lower AND :upperlower : random value , upper : lower + ( # of tuples * selectivity )

16

Selections (3)

QueriesResponse

timeQuery Execution Plan

Query 1

- 1% selection with no index

4.634 sec

0 INSERT STATEMENT Optimizer=CHOOSE 1 0 TABLE ACCESS (FULL) OF 'BASERELATION1'

Query 3

- 1% selection with clustered

index

0.640 sec

0 INSERT STATEMENT Optimizer=CHOOSE 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BASERELATION1' 2 1 INDEX (RANGE SCAN) OF 'INDEX_UNIQUE2A' (UNIQUE)

Query 5

- 1% selection with non-

clustered index

3.763 sec

0 INSERT STATEMENT Optimizer=CHOOSE 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BASERELATION1' 2 1 INDEX (RANGE SCAN) OF 'INDEX_UNIQUE1A' (UNIQUE)

17

Selection (4)

Index usefulness Clustered index vs. non-clustered index

18

Joins (1)

To show effect of three different factors Complexity of a query Performance of join algorithms Effectiveness of query optimizers

Three basic join queries JoinABprime: join A with 10% of A (Bprime) JoinASelB: join A with 10% of B JoinCselAselB: join of C, 10% of A and 10% of B

Three versions of each query, resulting in 9 queries totally No index A clustered index A non-clustered index

19

Joins (2)

Select Select

Join Scan

Join

1000 tuples 1000 tuples

1000 tuples1000 tuples

1000 tuples

10000 tuples 10000 tuples

A B

C1000 tuples

JoinCselAselB

20

Joins (3)

Query 11 (no index) - JoinCselAselBINSERT INTO TMPSELECT * FROM BASERELATION1, BASERELATION2,APRIMEWHERE (Aprime.unique2A = BASERELATION1.unique2D)

AND (BASERELATION1.unique2D = BASERELATION2.unique2E) AND (BASERELATION1.unique2D BETWEEN :lower AND :upper )

lower : random value , upper : lower + ( # of tuples * selectivity ) Query14 (clustered index) - JoinCselAselB

INSERT INTO TMPSELECT * FROM BASERELATION1, BASERELATION2, APRIMEWHERE (Aprime.unique2A = BASERELATION1.unique2D)

AND (BASERELATION1.unique2D = BASERELATION2.unique2E) AND (BASERELATION1.unique2D BETWEEN :lower AND :upper )

lower : random value , upper : lower + ( # of tuples * selectivity )

21

Join (4)

Query 17 (non-clustered index) – JoinCselAselBINSERT INTO TMP

SELECT *

FROM BASERELATION1, BASERELATION2,APRIME

WHERE (Aprime.unique1A = BASERELATION1.unique1D)

AND (BASERELATION1.unique1D = BASERELATION2.unique1E)

AND (BASERELATION1.unique1D BETWEEN :lower AND :upper )

lower : random value , upper : lower + ( # of tuples * selectivity )

22

Join (5)

QueriesResponse


Query 11

- JoinCselAselB with no index

163.232

sec

0 INSERT STATEMENT Optimizer=CHOOSE 1 0 MERGE JOIN 2 1 MERGE JOIN 3 2 SORT (JOIN) 4 3 TABLE ACCESS (FULL) OF 'APRIME' 5 2 SORT (JOIN) 6 5 TABLE ACCESS (FULL) OF 'BASERELATION1' 7 1 SORT (JOIN) 8 7 TABLE ACCESS (FULL) OF 'BASERELATION2'

Query 14

- JoinCselAselB with clustered

index

31.078

sec

0 INSERT STATEMENT Optimizer=CHOOSE 1 0 NESTED LOOPS 2 1 NESTED LOOPS 3 2 TABLE ACCESS (FULL) OF 'APRIME' 4 2 TABLE ACCESS (BY INDEX ROWID) OF 'BASERELATION1' 5 4 INDEX (UNIQUE SCAN) OF 'INDEX_UNIQUE2A' (UNIQUE) 6 1 INDEX (UNIQUE SCAN) OF 'INDEX_UNIQUE2B' (UNIQUE)

23

Join (6)

QueriesResponse


Query 17

- JoinCselAselB with non-

clustered index

260.762

sec

0 INSERT STATEMENT Optimizer=CHOOSE 1 0 NESTED LOOPS 2 1 NESTED LOOPS 3 2 TABLE ACCESS (FULL) OF 'APRIME' 4 2 TABLE ACCESS (BY INDEX ROWID) OF 'BASERELATION1' 5 4 INDEX (UNIQUE SCAN) OF 'INDEX_UNIQUE1A' (UNIQUE) 6 1 INDEX (UNIQUE SCAN) OF 'INDEX_UNIQUE1B' (UNIQUE)

24

Join (7)

According to using indexes With no index : Sort merge join With indexes : Nested loops join

Table access sequence Firstly, the table that have a few data was accessed

25

Projections (1)

Implementation of projection A pass is made to discard unwanted attributes

• A complete scan of relation Second phase is to eliminate duplicates

• By sorting or hashing Query 18: Projection with 1% projection

insert into tmp select distinct two, four, ten, twenty, onePercent, string4 from tenktup1;

Query 19: Projection with 100% projection insert into tmp

select distinct two, four, ten, twenty, onePercent, tenPercent, twentyPercent, fiftyPercent, unique3, evenOnePercent, oddOnePercent, stringu1, stringu2, string4from tenktup1;

In retrospective, should have been tested with more large relation!

26

Projections (2)

Query 18 – Projection with 1% ProjectionINSERT INTO TMP

SELECT DISTINCT two, four, ten, twenty, onePercent, string4

FROM BASERELATION1

Query 19 – Projection with 100% ProjectionINSERT INTO TMP

SELECT DISTINCT two, four, ten, twenty, onePercent, tenPercent,

twentyPercent, fiftyPercent, unique3, evenOnePercent, oddOnePercent,

stringu1, stringu2, string4

FROM BASERELATION1

27

Projections (3)

QueriesResponse


Query 18

- Projection with 1%

projection

7.808

sec

0 INSERT STATEMENT Optimizer=CHOOSE 1 0 SORT (UNIQUE) 2 1 TABLE ACCESS (FULL) OF 'BASERELATION1'

Query 19

- Projection with 100% projection

442.342

sec

0 INSERT STATEMENT Optimizer=CHOOSE 1 0 SORT (UNIQUE) 2 1 TABLE ACCESS (FULL) OF 'BASERELATION1'

28

Projection (4)

DISTINCT keyword Full table scan & sort

Difference to the number of selected rows

29

Aggregate queries (1)

Three aggregate queries with two version (no index or with secondary index)

Min scalar aggregate queries insert into temp select min(tenkup1.unique2) from tenktup1; Q20 (no index), Q23 (cluster index)

Min aggregate function queries with 100 partitions insert into temp

select min(tenkup1.unique3) from tenktup1group by tenktup1.onePercent

Q21 (no index), Q24 (cluster index) Sum aggregate function queries with 100 partitions:

similarly

30


Query 20 (no index) - Minimum Aggregate FunctionINSERT INTO TMPSELECT MIN (BASERELATION1.unique2D) FROM BASERELATION1

Query 21 (no index) - Minimum Aggregate Function with 100 Partitions INSERT INTO TMP

SELECT MIN (BASERELATION1.unique3D) FROM BASERELATION1 GROUP BY BASERELATION1.onePercentD

Query 23 (clustered index) - Minimum Aggregate FunctionINSERT INTO TMPSELECT MIN (BASERELATION1.unique2D) FROM BASERELATION1

Query 24 (clustered index) - Minimum Aggregate Function with 100 Partitions INSERT INTO TMP

SELECT MIN (BASERELATION1.unique3D) FROM BASERELATION1 GROUP BY BASERELATION1.onePercentD

31


QueriesResponse


Query 20

- Min function with no index

4.454 sec0 INSERT STATEMENT Optimizer=CHOOSE 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (FULL) OF 'BASERELATION1'

Query 23

- Min function with clustered

index

0.128 sec

0 INSERT STATEMENT Optimizer=CHOOSE 1 0 SORT (AGGREGATE) 2 1 INDEX (FULL SCAN) OF 'INDEX_UNIQUE2A' (UNIQUE)

Query 21

- Min function with no index and Group by

clause

5.606 sec

0 INSERT STATEMENT Optimizer=CHOOSE 1 0 SORT (GROUP BY) 2 1 TABLE ACCESS (FULL) OF 'BASERELATION1'

Query 24

- Min function with clustered

index and Group by clause

5.478 sec

0 INSERT STATEMENT Optimizer=CHOOSE 1 0 SORT (GROUP BY) 2 1 TABLE ACCESS (FULL) OF 'BASERELATION1'

32


Index usefulness for Min/Max aggregate function Group by clause

Irrelative to using indexes, full table scan is occurred

33

Updates (1)

To measure cost of updating relation and indexes Four simple update queries

Insert 1 tuple (Q26 and Q29) Update key attribute of 1 tuple (Q28 and Q31) Update non-key attribute of 1 tuple (Q32) Delete 1 tuple (Q27 and Q30)

Problems Not enough update to cause a significant reorganization of index

pages No concurrency control and recovery No bulk update The Halloween problem

34

Updates (2)

Query 26 (no index) – Insert 1 tupleINSERT INTO TENKTUP1 VALUES ( :upper,:upper,0,2,0,10,50,688,1950,4950, 9950,1,100,‘MxxxxxxxxxxxxxxxxxxxxxxxxxxGxxxxxxxxxxxxxxxxxxxxxxxxxC’,‘GxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxA’,‘OxxxxxxxxxxxxxxxxxxxxxxxxxxOxxxxxxxxxxxxxxxxxxxxxxxxxO’ )upper : random number that is larger than the total number of tuples

Query 27 (no index) – Delete 1 tupleDELETE FROM TENKTUP1 WHERE unique1= : upperupper : random number

Query 29 (with index) – Insert 1 tupleINSERT INTO TENKTUP1 VALUES ( :upper,:upper,0,2,0,10,50,688,1950,4950,9950,1,100,‘MxxxxxxxxxxxxxxxxxxxxxxxxxxGxxxxxxxxxxxxxxxxxxxxxxxxxC’,‘GxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxA’,‘OxxxxxxxxxxxxxxxxxxxxxxxxxxOxxxxxxxxxxxxxxxxxxxxxxxxxO’ )upper : random number that is larger than the total number of tuples

Query 30 (with index) – Delete 1 tupleDELETE FROM TENKTUP1 WHERE unique1=:upper upper : random number

35

Updates (3)

QueriesResponse


Query 26

- Insert 1 tuple with no index

0.181 sec0 INSERT STATEMENT Optimizer=CHOOSE

Query 29

- Insert 1 tuple with index

0.237 sec0 INSERT STATEMENT Optimizer=CHOOSE

Query 27

- Delete 1 tuple with no index

4.224 sec0 DELETE STATEMENT Optimizer=CHOOSE 1 0 DELETE OF 'BASERELATION1' 2 1 TABLE ACCESS (FULL) OF 'BASERELATION1'

Query 30

- Delete 1 tuple with index

0.134 sec

0 DELETE STATEMENT Optimizer=CHOOSE 1 0 DELETE OF 'BASERELATION1' 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'BASERELATION1' 3 2 INDEX (UNIQUE SCAN) OF 'INDEX_UNIQUE1A' (UNIQUE)

36

Updates (4)

No usefulness of index in case of insertion Index usefulness in case of deletion

37

Revisiting Wisconsin Benchmark

Criticized by a number of deficiencies Single-user testing only Absence of bulk update, database load and unload tests No outer join tests Its use of uniformly distributed attribute values lack of tests involving host language variables No “order by” clause Overly simple aggregation tests Simple join queries

Weak collection of data types is not bad !

wisconsin benchmark june 2001 prof. sang ho lee

Documents

random order unique

unique1 onepercent

bytes database

sequential unique1 mod

candidate key candidate

size of database

test database scaling

test queries