dbms metrology: measuring query time

DBMS Metrology: Measuring Query Time

SABAH CURRIM, RICHARD T. SNODGRASS, YOUNG-KYOON SUH, and RUI ZHANG, University of Arizona

TODS, Machida B4, 2017/06/16

1

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. SUMMARY12. FUTURE WORK

2

ABSTRACT• How to measure query execution time?

• BUT itʼs hard because there is much variance• Other processes running, I/O time, RDBMS heuristics...

• The authors proposed a query time measurement procedure,the Tucson Timing Protocol Version 2 (TTPv2)• More precise and robust query timing

3

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK

4

INTRODUCTION: usual method“We used the UNIX time command to measure the elapsed time and CPU time. All queries were run 10 times. The resultant CPU usage was averaged.” H.-Y. Hwang and Y.-T. Yu [1987]

Table I shows results from this previous method.

Variability arises from necessary daemon processes and their I/O and other interactions with the DBMS.

5

INTRODUCTION: contribution• Tucson Timing Protocol Version 1 (TTPv1) • dropped about 20% of the timings because of phantom processes

• Tucson Timing Protocol Version 2 (TTPv2) • drops many fewer query executions • adds many more sanity checks • based on an elaborated structural causal model • estimates process I/O in a more sophisticated manner

6


7

BACKGROUND: problem definition• Independent variable?

Dependent Variable?Ø Dependent Variable

• What is to be measured ?Ø Single Statement

• Wall-Clock Time• DBMSprocesses• Daemons• User Processes

8

BACKGROUND: • Two types of overall time• Process Time(PT)

• Sum of JDBC,CPU,I/O,Network• Elapsed Time(ET)

• Between the start and finish• Table1 shows ET

• ET is highly variable!Ø Deduce I/O and CPU time

9

BACKGROUND: experiments

10


11

MEASURING QUERY TIME• Execute queries with different cardianalities: Q@Cs• Execute each Q@C multiple times: Query Executions (QEs)• This protocol drops Q@Cs with fewer than 6 QEs.• 10 QEs for each Q@C is sufficient for timing purposes.

• But 10 QEs each Q@C is a knob available to the experimenter.• Eight QEs per Q@C would work fine for many situations.

12

MEASURING QUERY TIME: cache effect• Pure warm cache• DBMS main memory is large enough to hold all the base tables and

intermediate results so no I/O is needed.

• Partial warm cache• DBMS main memory is not large enough to hold all the base tables

and intermediate results so some I/O is required.

13

MEASURING QUERY TIME: cache effect• Multiple caches involved:

a. Disk drivesb. Disk controllersc. Network file systemd. Operating systeme. DBMS bufferf. JDBCg. L1,L2

• Donʼt consider L1,L2 and JDBC because they are generally utilized in both warm and cold cache

14

MEASURING QUERY TIME: eliminate cache• Four steps to eliminate the caching and buffering:

1. Installed the disk directly on the machine that also runs the DBMS.ØNetwork file system caches

2. Filled both the disk drive and disk controller caches by reading a 64MB data file.ØDisk drive and disk controller caches

3. Used the Linux “drop_caches” facility to discard cached clean pages from the O/S file cache.ØOperating System caches

4. Provided API to discard cached content residing inside the DBMSØDBMS buffer

15

MEASURING QUERY TIME: still caching• Query plan caching• A previously determined plan is used when the query reappears.• Query plan caching is a good thing in contrast to data cashing.

• Query result caching• The result of a query is stored in case that same query is requested

shortly after.• QEs that show query result cashing are discarded by the protocol.

16

MEASURING QUERY TIME: Data Repeatability• Across-run means different ”runs” of executions of the same

query.• The variable tables should be identical in all aspects when a

cardinality is specified.• Identical no matter the DBMS and across runs• Save the random number seeds as scenario parameter

• Sometimes the deleted rows are simply flagged without being physically removed and different cardinalities have the same number of occupied pages.• They use the “SELECT … INTO …” command to populate the newly

created table to the target a cardinality.

17

MEASURING QUERY TIME: Plan Repeatability Across Runs• Sometimes the same query at the same cardinality resulted

in different plans across runs

• Instead of executing multiple runs for each experiment, each query is executed multiple times in close succession so that the samples are guaranteed to be collected on the same plan

18

MEASURING QUERY TIME: Within-Run Plan Repeatability• Within-run means different “executions” of the same query

within the same run.

• They use the PreparedStatement class in JDBC to ensure within-run plan repeatability.

• Check plan reuse and automatically restart run if there is a violation

19

MEASURING QUERY TIME: Operating System Daemons and User Processes• Three other sources of I/O activity during QE• User processes• Utility processes• Operating system daemons

• User processes• Isolated the DBMS server machines and used VNC to connect to the

server remotely.• Only the DBMS is running on this machine• No keyboard, mouse, monitor, etc

20

MEASURING QUERY TIME: Wall-Clock Query Time• Table Ⅲ shows several Linux system calls that return the

current time• They use the Java method currentTimeMillis()

ØMillisecond is enough

21

MEASURING QUERY TIME: limiting interferencea. Stopped as many operating systems as possibleb. Eliminated network delays c. Eliminated user interactionsd. The same query on the same database content in the same

environmente. Ensured repeatability of I/O

22

MEASURING QUERY TIME: Per-Process Measures• Nine per-process measures

a. min flt: number of minor page faultsb. maj flt: number of major page faultsc. Utime: number of ticks in user moded. Stime: number of ticks in system modee. Number of voluntary context switchf. Number of involuntary context switchg. Blockio delay timeh. Number of read system callsi. Number of write system calls

23

MEASURING QUERY TIME: Overall Measures• Eight overall measures

a. Utimeb. Stimec. IOWait timed. Gueste. Idle timef. IRQg. SoftIRQh. processes

24

MEASURING QUERY TIME: Time Measurement Resolution• They use different resolutions for CT and ET depending on

whether the process is in mixed or pure computation mode

• CT measurement of a process in mixed modeØTick (~10ms)ØProc file systemʼs tick-based per-process

• ET measurement of the mixed processØmillisecondØSystem.currentTimeMills()

25

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. SUMMARY12. FUTURE WORK

26

THE PRINCIPAL CHALLENGE• The goal is to infer/calculate the total time used by the query

processes to actually execute the query

• The challenge is to estimate the portion of IOWait time directly or indirectly caused by activities of the DBMS processes

• They elaborate a simple structured model to provide extended coverage of factors.

27


28

AN ELABORATED CAUSAL MODEL• Considered Variable• Twice as many factors than TTPv1

• SoftIRQ Ticks , IOwait TicksØOverall only

• User Ticks , System TicksØBoth

• Other five measuresØPer-process only

29

AN ELABORATED CAUSAL MODEL• Relationship

30

AN ELABORATED CAUSAL MODEL• Predicted Correlations

31


32

TESTING THE CASUAL MODEL• Exploratory Model Analysis

33

TESTING THE CASUAL MODEL• Confirmatory Model

Analysis

• Green: as predicted• Yellow: close• Red: wrong

34


35

TIMING CONSIDERATIONS• Process Interactions• During the QE other

processes may run, start or stop

ØItʼs hard to know who does what and creates noise

36

TIMING CONSIDERATIONS: Calculating the CPU Time• The CPU time is easy to calculate because we have per-

type system and user time, once we determine which DBMS process was actually running the query.

37

TIMING CONSIDERATIONS: Calculating the I/O Time

38

TIMING CONSIDERATIONS

39

TIMING CONSIDERATIONS: Calculating the I/O Time

40


41

TIMING PROTOCOL• General protocol

i. Perform Sanity Checksii. Drop Query Executionsiii. Drop selected Q@Csiv. Calculate Query Timev. Post Sanity Checks

42

TIMING PROTOCOL• Step 0: Set Up the System and Run the Queries• Prepare three machines• Enable only one CPU core on the QE machine• Write experiment specification in XML• Run the experiment

43

TIMING PROTOCOL• Step 1: Perform Sanity Checks

44

TIMING PROTOCOL• Step 2: Drop Query Executions

45

TIMING PROTOCOL• Step 3: Drop Selected Q@Cs

46

TIMING PROTOCOL• Step 4: Calculate Query Time• The TTPv1 protocol used the regression coefficient associated with

the query process to determine the contribution of the IOWait time to the total query time

• Additional measures

47

TIMING PROTOCOL• Should we take maximum? Minimum? average?,median?• Maximum, Minimum → ❌

• Average?• Fig.8 shows some non-uniformity in the

distribution→ average ❌

• We should take median

48

TIMING PROTOCOL• Step 5: Post Sanity Checks

49


50

REPORTING THE RESULTS

51


52

EVALUATION

53

EVALUATION

54


55

FUTURE WORK• Better causal model• Better understanding strange cases• Use additional measures• Less restrictions• Extend to support the Windows operating system• Extend to measure aspect outside of the taxonomy of query

time in fig1.

56

dbms metrology: measuring query time

Documents