performance evaluation and benchmarking of dbmss

30
Performance evaluation and benchmarking of DBMSs INF5100 Autumn 2008 Jarle Søberg

Upload: others

Post on 26-Dec-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance evaluation and benchmarking of DBMSs

Performance evaluation and

benchmarking of DBMSs

INF5100 Autumn 2008

Jarle Søberg

Page 2: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 2

Overview

• What is performance evaluation and

benchmarking?• Theory

• Examples

• Domain-specific benchmarks and benchmarking

DBMSs• We focus on the most popular one: TPC

Page 3: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 3

What is benchmarking?

1. Evaluation techniques and metrics

2. Workload

3. Workload characterization

4. Monitors

5. Representation

Page 4: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 4

Evaluation techniques and metrics

• Examining systems with respect to one or more metrics• Speed in km/h

• Accuracy

• Availability

• Response time

• Throughput

• Etc.

• An example: Early processor comparison based on the speed of the addition instruction, since it was most used instruction

• Metric selection is based on evaluation technique (next slide)

Criteria to compare

the performance

Page 5: Performance evaluation and benchmarking of DBMSs

Three main evaluation techniques

Analytical modeling• On the paper

• Formal proofs

• Simplifications

• Assumptions

Simulation• Closer to reality

• Still omitted

details

Measurements• Investigates

real system

5INF5100 © 2008 Jarle Søberg

Page 6: Performance evaluation and benchmarking of DBMSs

6

Evaluation techniques and metrics

• Three main evaluation techniques

Criterion Analytical

modeling

Simulation Measurement

(concrete syst.)

Stage Any Any Postprototype

Time required Small Medium Varies

Tools Analysts Computer

languages

Instrumentation

Accuracy Low Moderate Varies

Trade-off

evaluation

Easy Moderate Difficult

Cost Small Medium High

Saleability Low Medium High

© 1

991, R

aj J

ain

INF5100 © 2008 Jarle Søberg

Page 7: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 7

What is benchmarking?

• “benchmark v. trans. To subject (a system) to a

series of tests in order to obtain prearranged

results not available on competitive systems”

• S. Kelly-Bootle

The Devil’s DP Dictionary

In other words: Benchmarks are measurements

used to differ two or more systems

Page 8: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 8

Workload

• Must fit the systems that are benchmarked• Instruction frequency for CPUs

• Transaction frequencies

• Select level of detail and use as workload1. Most frequent request

2. Most frequent request types

3. Time-stamped sequence of requests (a trace)• From real system, e.g. to perform measurements

4. Average resource demand• For analytical modeling

• Rather than real resource demands

5. Distribution of resource demands• When having a large variance

• Good for simulations

Page 9: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 9

Workload

• Representativeness• Arrival rate

• Resource demands

• Resource usage profile

• Timeliness• Workload should represent usage patterns

Page 10: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 10

Workload characterization

• Repeatability is important

• Observe real-user behavior and create a repeatable workload based on that?

• One should only need to change workload parameters

• Transaction types• Instructions• Packet sizes• Source/destinations of packets• Page reference patterns

• Generate new traces for each parameter?

Page 11: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 11

Monitors

• How do we obtain the results from sending the workload into the system?

• Observe the activities• Performance

• Collect statistics

• Analyze data

• Display results

• Either monitor all activities or sample• E.g. top monitor update in Linux

• On-line• Continuously display system state

• Batch• Collect data and analyze later

Page 12: Performance evaluation and benchmarking of DBMSs

Monitors

• In system• Put monitors inside system

• We need the source code

• Gives great detail?

• May add overhead?

• As black-box• Measure input and output, is that all good?

12INF5100 © 2008 Jarle Søberg

Page 13: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 13

Benchmarking: Represented by common

mistakes

• Only average behavior represented in test workload

• Variance is ignored

• Skewness of device demands ignored• Evenly distribution of I/O or network requests during

test, which might not be the case in real environments

• Loading level controlled inappropriately• Think time, i.e. the time between workload items, and

number of users increased/decreased inappropriately

• Caching effects ignored• Order of arrival for requests• Elements thrown out of the queues?

Page 14: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 14

Common mistakes in benchmarking

• Buffer sizes not appropriate• Should represent the values used in production

systems

• Inaccuracies due to sampling ignored• Make sure to use accurate sampled data

• Ignoring monitoring overhead

• Not validating measurements• Is the measured data correct?

• Not ensuring same initial conditions• Disk space, starting time of monitors, things are run

by hand …

Page 15: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 15

Common mistakes in benchmarking

• Not measuring transient performance• Depending on the system, but if the system is more in

transitions than steady states, this has to be

considered: Know your system!

• Collecting too much data but doing very little

analysis• In measurements, often all time is used to obtain the

data, but less time is available to analyze it

• It is more fun to experiment than analyze the data

• It is hard to use statistical techniques to get significant

results; let’s just show the average

Page 16: Performance evaluation and benchmarking of DBMSs

The art of data presentation

It is not what you say, but how you say it.

- A. Putt

• Results from performance evaluations aim to

help in decision making

• Decision makers do not have time to dig into

complex result sets

• Requires prudent use of words, pictures, and

graphs to explain the results and the analysis

INF5100 © 2008 Jarle Søberg 16

Page 17: Performance evaluation and benchmarking of DBMSs

Some glorious examples

INF5100 © 2008 Jarle Søberg 17

Availa

bili

ty

Unava

ilabili

tyDay of the week Day of the week

Page 18: Performance evaluation and benchmarking of DBMSs

Some glorious examples (cont.)

INF5100 © 2008 Jarle Søberg 18

40

30

20

10

Response

time

100

75

50

25

Utilization

Throughput

20

15

10

5

Throughput

Utilization

Response

time

Page 19: Performance evaluation and benchmarking of DBMSs

INF5100 © 2007 Jarle Søberg 19

Overview

• What is performance evaluation and

benchmarking?• Theory

• Examples

• Domain-specific benchmarks and

benchmarking DBMSs• We focus on the most popular one: TPC

19INF5100 © 2008 Jarle Søberg

Page 20: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 20

Domain-specific benchmarks

• No single metric can measure the performance of

computer systems on all applications• Simple update-intensive transactions for online

databases

vs.

• Speed in decision-support queries

Page 21: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 21

The key criteria for a domain-specific

benchmark

• Relevant• Perform typical operations within the problem domain

• Portable• The benchmark should be easy to implement and run

on many different systems and architectures

• Scaleable• To larger systems or parallel systems as they evolve

• Simple• It should be understandable in order to maintain

credibility

Page 22: Performance evaluation and benchmarking of DBMSs

TPC: Transaction Processing Performance

Council• Background

• IBM released an early benchmark, TP1, in early 80’s• ATM transactions in batch-mode

• No user interaction• No network interaction

• Originally internally used at IBM, and thus poorly defined• Exploited by many other commercial vendors

• Anon (i.e. Gray) et al. released a more well thought of benchmark, DebitCredit, in 1985

• Total system cost published with the performance rating• Test specified in terms of high-level functional requirements

• A bank with several branches and ATMs connected to the braches• The benchmark workload had scale-up rules• The overall transaction rate would be constrained by a response time

requirement

• Vendors often deleted key requirements in DebitCredit to improve their performance results

22INF5100 © 2008 Jarle Søberg

Page 23: Performance evaluation and benchmarking of DBMSs

TPC: Transaction Processing Performance

Council

• A need for a more standardized benchmark

• In 1988, eight companies came together and

formed TPC

• Started making benchmarks based on the

domains used in DebitCredit.

23INF5100 © 2008 Jarle Søberg

Page 24: Performance evaluation and benchmarking of DBMSs

Early (and obsolete) TPCs

• TPC-A• 90 percent of transactions must complete in less than 2

seconds• 10 ATM terminals per system and the cost of the terminals was

included in the system price• Could be run in a local or wide-area network configuration

• DebitCredit has specified only WANs• The ACID requirements were bolstered and specific tests added

to ensure ACID viability• TPC-A specified that all benchmark testing data should be

publicly disclosed in a Full Disclosure Report

• TPC-B• Vendors complained about all the extra in TPC-A• Vendors of servers were not interested in adding terminals and

networks• TPC-B was a standardization of TP1 (to the core)

24INF5100 © 2008 Jarle Søberg

Page 25: Performance evaluation and benchmarking of DBMSs

TPC-C

• On-line transaction processing (OLTP)

• More complex than TPC-A

• Handles orders in warehouses• 10 sales districts

• 3000 costumers

• Each warehouse must cooperate with the other

warehouses to complete orders

• TPC-C measures how many complete business

operations can be processed per minute

25INF5100 © 2008 Jarle Søberg

Page 26: Performance evaluation and benchmarking of DBMSs

TPC-C (results)

© 2

007 T

PC

26INF5100 © 2008 Jarle Søberg

Page 27: Performance evaluation and benchmarking of DBMSs

TPC-E

• Is considered a

successor of

TPC-C

• Brokerage house• Customers

• Accounts

• Securities

• Pseudo-real data

• More complex than

TPC-C

Characteristic TPC-E TPC-C

Tables 33 9

Columns 188 92

Min Cols / Table 2 3

Max Cols / Table 24 21

Data Type Count Many 4

Data Types UID, CHAR,

NUM, DATE,

BOOL, LOB

UID, CHAR, NUM,

DATE

Primary Keys 33 8

Foreign Keys 50 9

Tables w/ Foreign

Keys

27 7

Check

Constraints

22 0

Referential

Integrity

Yes No

© 2

00

7 T

PC

27INF5100 © 2008 Jarle Søberg

Page 28: Performance evaluation and benchmarking of DBMSs

TPC-E (results)

© 2

00

7 T

PC

28INF5100 © 2008 Jarle Søberg

Page 29: Performance evaluation and benchmarking of DBMSs

TPC-H

• Decision support

• Simulates an environment in which users connected to

the database system send individual queries that are not

known in advance

• Metric• Composite Query-per-Hour Performance Metric (QphH@Size)

• Selected database size against which the queries are executed

• The query processing power when queries are submitted by a

single stream

• The query throughput when queries are submitted by multiple

concurrent users

29INF5100 © 2008 Jarle Søberg

Page 30: Performance evaluation and benchmarking of DBMSs

INF5100 © 2008 Jarle Søberg 30

Reference

• The Art of Computer Systems Performance Analysis

• Raj Jain, 1991

• The Benchmark Handbook for Database and Transaction Processing Systems

• Jim Gray, 1991

• The TPC homepage: www.tpc.org

• Poess, M. and Floyd, C. 2000. New TPC benchmarks for decision support and web commerce. SIGMOD Rec. 29, 4 (Dec. 2000), 64-71