dbms metrology: measuring query time

56
DBMS Metrology: Measuring Query Time SABAH CURRIM, RICHARD T. SNODGRASS, YOUNG-KYOON SUH, and RUI ZHANG, University of Arizona TODS, Machida B4, 2017/06/16 1

Upload: others

Post on 19-Feb-2022

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DBMS Metrology: Measuring Query Time

DBMS Metrology: Measuring Query Time

SABAH CURRIM, RICHARD T. SNODGRASS, YOUNG-KYOON SUH, and RUI ZHANG, University of Arizona

TODS, Machida B4, 2017/06/16

1

Page 2: DBMS Metrology: Measuring Query Time

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. SUMMARY12. FUTURE WORK

2

Page 3: DBMS Metrology: Measuring Query Time

ABSTRACT• How to measure query execution time?

• BUT itʼs hard because there is much variance• Other processes running, I/O time, RDBMS heuristics...

• The authors proposed a query time measurement procedure,the Tucson Timing Protocol Version 2 (TTPv2)• More precise and robust query timing

3

Page 4: DBMS Metrology: Measuring Query Time

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK

4

Page 5: DBMS Metrology: Measuring Query Time

INTRODUCTION: usual method“We used the UNIX time command to measure the elapsed time and CPU time. All queries were run 10 times. The resultant CPU usage was averaged.” H.-Y. Hwang and Y.-T. Yu [1987]

Table I shows results from this previous method.

Variability arises from necessary daemon processes and their I/O and other interactions with the DBMS.

5

Page 6: DBMS Metrology: Measuring Query Time

INTRODUCTION: contribution• Tucson Timing Protocol Version 1 (TTPv1) • dropped about 20% of the timings because of phantom processes

• Tucson Timing Protocol Version 2 (TTPv2) • drops many fewer query executions • adds many more sanity checks • based on an elaborated structural causal model • estimates process I/O in a more sophisticated manner

6

Page 7: DBMS Metrology: Measuring Query Time

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK

7

Page 8: DBMS Metrology: Measuring Query Time

BACKGROUND: problem definition• Independent variable?

Dependent Variable?Ø Dependent Variable

• What is to be measured ?Ø Single Statement

• Wall-Clock Time• DBMSprocesses• Daemons• User Processes

8

Page 9: DBMS Metrology: Measuring Query Time

BACKGROUND: • Two types of overall time• Process Time(PT)

• Sum of JDBC,CPU,I/O,Network• Elapsed Time(ET)

• Between the start and finish• Table1 shows ET

• ET is highly variable!Ø Deduce I/O and CPU time

9

Page 10: DBMS Metrology: Measuring Query Time

BACKGROUND: experiments

10

Page 11: DBMS Metrology: Measuring Query Time

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK

11

Page 12: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME• Execute queries with different cardianalities: Q@Cs• Execute each Q@C multiple times: Query Executions (QEs)• This protocol drops Q@Cs with fewer than 6 QEs.• 10 QEs for each Q@C is sufficient for timing purposes.

• But 10 QEs each Q@C is a knob available to the experimenter.• Eight QEs per Q@C would work fine for many situations.

12

Page 13: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME: cache effect• Pure warm cache• DBMS main memory is large enough to hold all the base tables and

intermediate results so no I/O is needed.

• Partial warm cache• DBMS main memory is not large enough to hold all the base tables

and intermediate results so some I/O is required.

13

Page 14: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME: cache effect• Multiple caches involved:

a. Disk drivesb. Disk controllersc. Network file systemd. Operating systeme. DBMS bufferf. JDBCg. L1,L2

• Donʼt consider L1,L2 and JDBC because they are generally utilized in both warm and cold cache

14

Page 15: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME: eliminate cache• Four steps to eliminate the caching and buffering:

1. Installed the disk directly on the machine that also runs the DBMS.ØNetwork file system caches

2. Filled both the disk drive and disk controller caches by reading a 64MB data file.ØDisk drive and disk controller caches

3. Used the Linux “drop_caches” facility to discard cached clean pages from the O/S file cache.ØOperating System caches

4. Provided API to discard cached content residing inside the DBMSØDBMS buffer

15

Page 16: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME: still caching• Query plan caching• A previously determined plan is used when the query reappears.• Query plan caching is a good thing in contrast to data cashing.

• Query result caching• The result of a query is stored in case that same query is requested

shortly after.• QEs that show query result cashing are discarded by the protocol.

16

Page 17: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME: Data Repeatability• Across-run means different ”runs” of executions of the same

query.• The variable tables should be identical in all aspects when a

cardinality is specified.• Identical no matter the DBMS and across runs• Save the random number seeds as scenario parameter

• Sometimes the deleted rows are simply flagged without being physically removed and different cardinalities have the same number of occupied pages.• They use the “SELECT … INTO …” command to populate the newly

created table to the target a cardinality.

17

Page 18: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME: Plan Repeatability Across Runs• Sometimes the same query at the same cardinality resulted

in different plans across runs

• Instead of executing multiple runs for each experiment, each query is executed multiple times in close succession so that the samples are guaranteed to be collected on the same plan

18

Page 19: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME: Within-Run Plan Repeatability• Within-run means different “executions” of the same query

within the same run.

• They use the PreparedStatement class in JDBC to ensure within-run plan repeatability.

• Check plan reuse and automatically restart run if there is a violation

19

Page 20: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME: Operating System Daemons and User Processes• Three other sources of I/O activity during QE• User processes• Utility processes• Operating system daemons

• User processes• Isolated the DBMS server machines and used VNC to connect to the

server remotely.• Only the DBMS is running on this machine• No keyboard, mouse, monitor, etc

20

Page 21: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME: Wall-Clock Query Time• Table Ⅲ shows several Linux system calls that return the

current time• They use the Java method currentTimeMillis()

ØMillisecond is enough

21

Page 22: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME: limiting interferencea. Stopped as many operating systems as possibleb. Eliminated network delays c. Eliminated user interactionsd. The same query on the same database content in the same

environmente. Ensured repeatability of I/O

22

Page 23: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME: Per-Process Measures• Nine per-process measures

a. min flt: number of minor page faultsb. maj flt: number of major page faultsc. Utime: number of ticks in user moded. Stime: number of ticks in system modee. Number of voluntary context switchf. Number of involuntary context switchg. Blockio delay timeh. Number of read system callsi. Number of write system calls

23

Page 24: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME: Overall Measures• Eight overall measures

a. Utimeb. Stimec. IOWait timed. Gueste. Idle timef. IRQg. SoftIRQh. processes

24

Page 25: DBMS Metrology: Measuring Query Time

MEASURING QUERY TIME: Time Measurement Resolution• They use different resolutions for CT and ET depending on

whether the process is in mixed or pure computation mode

• CT measurement of a process in mixed modeØTick (~10ms)ØProc file systemʼs tick-based per-process

• ET measurement of the mixed processØmillisecondØSystem.currentTimeMills()

25

Page 26: DBMS Metrology: Measuring Query Time

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. SUMMARY12. FUTURE WORK

26

Page 27: DBMS Metrology: Measuring Query Time

THE PRINCIPAL CHALLENGE• The goal is to infer/calculate the total time used by the query

processes to actually execute the query

• The challenge is to estimate the portion of IOWait time directly or indirectly caused by activities of the DBMS processes

• They elaborate a simple structured model to provide extended coverage of factors.

27

Page 28: DBMS Metrology: Measuring Query Time

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK

28

Page 29: DBMS Metrology: Measuring Query Time

AN ELABORATED CAUSAL MODEL• Considered Variable• Twice as many factors than TTPv1

• SoftIRQ Ticks , IOwait TicksØOverall only

• User Ticks , System TicksØBoth

• Other five measuresØPer-process only

29

Page 30: DBMS Metrology: Measuring Query Time

AN ELABORATED CAUSAL MODEL• Relationship

30

Page 31: DBMS Metrology: Measuring Query Time

AN ELABORATED CAUSAL MODEL• Predicted Correlations

31

Page 32: DBMS Metrology: Measuring Query Time

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK

32

Page 33: DBMS Metrology: Measuring Query Time

TESTING THE CASUAL MODEL• Exploratory Model Analysis

33

Page 34: DBMS Metrology: Measuring Query Time

TESTING THE CASUAL MODEL• Confirmatory Model

Analysis

• Green: as predicted• Yellow: close• Red: wrong

34

Page 35: DBMS Metrology: Measuring Query Time

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK

35

Page 36: DBMS Metrology: Measuring Query Time

TIMING CONSIDERATIONS• Process Interactions• During the QE other

processes may run, start or stop

ØItʼs hard to know who does what and creates noise

36

Page 37: DBMS Metrology: Measuring Query Time

TIMING CONSIDERATIONS: Calculating the CPU Time• The CPU time is easy to calculate because we have per-

type system and user time, once we determine which DBMS process was actually running the query.

37

Page 38: DBMS Metrology: Measuring Query Time

TIMING CONSIDERATIONS: Calculating the I/O Time

38

Page 39: DBMS Metrology: Measuring Query Time

TIMING CONSIDERATIONS

39

Page 40: DBMS Metrology: Measuring Query Time

TIMING CONSIDERATIONS: Calculating the I/O Time

40

Page 41: DBMS Metrology: Measuring Query Time

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK

41

Page 42: DBMS Metrology: Measuring Query Time

TIMING PROTOCOL• General protocol

i. Perform Sanity Checksii. Drop Query Executionsiii. Drop selected Q@Csiv. Calculate Query Timev. Post Sanity Checks

42

Page 43: DBMS Metrology: Measuring Query Time

TIMING PROTOCOL• Step 0: Set Up the System and Run the Queries• Prepare three machines• Enable only one CPU core on the QE machine• Write experiment specification in XML• Run the experiment

43

Page 44: DBMS Metrology: Measuring Query Time

TIMING PROTOCOL• Step 1: Perform Sanity Checks

44

Page 45: DBMS Metrology: Measuring Query Time

TIMING PROTOCOL• Step 2: Drop Query Executions

45

Page 46: DBMS Metrology: Measuring Query Time

TIMING PROTOCOL• Step 3: Drop Selected Q@Cs

46

Page 47: DBMS Metrology: Measuring Query Time

TIMING PROTOCOL• Step 4: Calculate Query Time• The TTPv1 protocol used the regression coefficient associated with

the query process to determine the contribution of the IOWait time to the total query time

• Additional measures

47

Page 48: DBMS Metrology: Measuring Query Time

TIMING PROTOCOL• Should we take maximum? Minimum? average?,median?• Maximum, Minimum → ❌

• Average?• Fig.8 shows some non-uniformity in the

distribution→ average ❌

• We should take median

48

Page 49: DBMS Metrology: Measuring Query Time

TIMING PROTOCOL• Step 5: Post Sanity Checks

49

Page 50: DBMS Metrology: Measuring Query Time

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK

50

Page 51: DBMS Metrology: Measuring Query Time

REPORTING THE RESULTS

51

Page 52: DBMS Metrology: Measuring Query Time

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK

52

Page 53: DBMS Metrology: Measuring Query Time

EVALUATION

53

Page 54: DBMS Metrology: Measuring Query Time

EVALUATION

54

Page 55: DBMS Metrology: Measuring Query Time

Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK

55

Page 56: DBMS Metrology: Measuring Query Time

FUTURE WORK• Better causal model• Better understanding strange cases• Use additional measures• Less restrictions• Extend to support the Windows operating system• Extend to measure aspect outside of the taxonomy of query

time in fig1.

56