dbms metrology: measuring query time
TRANSCRIPT
DBMS Metrology: Measuring Query Time
SABAH CURRIM, RICHARD T. SNODGRASS, YOUNG-KYOON SUH, and RUI ZHANG, University of Arizona
TODS, Machida B4, 2017/06/16
1
Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. SUMMARY12. FUTURE WORK
2
ABSTRACT• How to measure query execution time?
• BUT itʼs hard because there is much variance• Other processes running, I/O time, RDBMS heuristics...
• The authors proposed a query time measurement procedure,the Tucson Timing Protocol Version 2 (TTPv2)• More precise and robust query timing
3
Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK
4
INTRODUCTION: usual method“We used the UNIX time command to measure the elapsed time and CPU time. All queries were run 10 times. The resultant CPU usage was averaged.” H.-Y. Hwang and Y.-T. Yu [1987]
Table I shows results from this previous method.
Variability arises from necessary daemon processes and their I/O and other interactions with the DBMS.
5
INTRODUCTION: contribution• Tucson Timing Protocol Version 1 (TTPv1) • dropped about 20% of the timings because of phantom processes
• Tucson Timing Protocol Version 2 (TTPv2) • drops many fewer query executions • adds many more sanity checks • based on an elaborated structural causal model • estimates process I/O in a more sophisticated manner
6
Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK
7
BACKGROUND: problem definition• Independent variable?
Dependent Variable?Ø Dependent Variable
• What is to be measured ?Ø Single Statement
• Wall-Clock Time• DBMSprocesses• Daemons• User Processes
8
BACKGROUND: • Two types of overall time• Process Time(PT)
• Sum of JDBC,CPU,I/O,Network• Elapsed Time(ET)
• Between the start and finish• Table1 shows ET
• ET is highly variable!Ø Deduce I/O and CPU time
9
BACKGROUND: experiments
10
Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK
11
MEASURING QUERY TIME• Execute queries with different cardianalities: Q@Cs• Execute each Q@C multiple times: Query Executions (QEs)• This protocol drops Q@Cs with fewer than 6 QEs.• 10 QEs for each Q@C is sufficient for timing purposes.
• But 10 QEs each Q@C is a knob available to the experimenter.• Eight QEs per Q@C would work fine for many situations.
12
MEASURING QUERY TIME: cache effect• Pure warm cache• DBMS main memory is large enough to hold all the base tables and
intermediate results so no I/O is needed.
• Partial warm cache• DBMS main memory is not large enough to hold all the base tables
and intermediate results so some I/O is required.
13
MEASURING QUERY TIME: cache effect• Multiple caches involved:
a. Disk drivesb. Disk controllersc. Network file systemd. Operating systeme. DBMS bufferf. JDBCg. L1,L2
• Donʼt consider L1,L2 and JDBC because they are generally utilized in both warm and cold cache
14
MEASURING QUERY TIME: eliminate cache• Four steps to eliminate the caching and buffering:
1. Installed the disk directly on the machine that also runs the DBMS.ØNetwork file system caches
2. Filled both the disk drive and disk controller caches by reading a 64MB data file.ØDisk drive and disk controller caches
3. Used the Linux “drop_caches” facility to discard cached clean pages from the O/S file cache.ØOperating System caches
4. Provided API to discard cached content residing inside the DBMSØDBMS buffer
15
MEASURING QUERY TIME: still caching• Query plan caching• A previously determined plan is used when the query reappears.• Query plan caching is a good thing in contrast to data cashing.
• Query result caching• The result of a query is stored in case that same query is requested
shortly after.• QEs that show query result cashing are discarded by the protocol.
16
MEASURING QUERY TIME: Data Repeatability• Across-run means different ”runs” of executions of the same
query.• The variable tables should be identical in all aspects when a
cardinality is specified.• Identical no matter the DBMS and across runs• Save the random number seeds as scenario parameter
• Sometimes the deleted rows are simply flagged without being physically removed and different cardinalities have the same number of occupied pages.• They use the “SELECT … INTO …” command to populate the newly
created table to the target a cardinality.
17
MEASURING QUERY TIME: Plan Repeatability Across Runs• Sometimes the same query at the same cardinality resulted
in different plans across runs
• Instead of executing multiple runs for each experiment, each query is executed multiple times in close succession so that the samples are guaranteed to be collected on the same plan
18
MEASURING QUERY TIME: Within-Run Plan Repeatability• Within-run means different “executions” of the same query
within the same run.
• They use the PreparedStatement class in JDBC to ensure within-run plan repeatability.
• Check plan reuse and automatically restart run if there is a violation
19
MEASURING QUERY TIME: Operating System Daemons and User Processes• Three other sources of I/O activity during QE• User processes• Utility processes• Operating system daemons
• User processes• Isolated the DBMS server machines and used VNC to connect to the
server remotely.• Only the DBMS is running on this machine• No keyboard, mouse, monitor, etc
20
MEASURING QUERY TIME: Wall-Clock Query Time• Table Ⅲ shows several Linux system calls that return the
current time• They use the Java method currentTimeMillis()
ØMillisecond is enough
21
MEASURING QUERY TIME: limiting interferencea. Stopped as many operating systems as possibleb. Eliminated network delays c. Eliminated user interactionsd. The same query on the same database content in the same
environmente. Ensured repeatability of I/O
22
MEASURING QUERY TIME: Per-Process Measures• Nine per-process measures
a. min flt: number of minor page faultsb. maj flt: number of major page faultsc. Utime: number of ticks in user moded. Stime: number of ticks in system modee. Number of voluntary context switchf. Number of involuntary context switchg. Blockio delay timeh. Number of read system callsi. Number of write system calls
23
MEASURING QUERY TIME: Overall Measures• Eight overall measures
a. Utimeb. Stimec. IOWait timed. Gueste. Idle timef. IRQg. SoftIRQh. processes
24
MEASURING QUERY TIME: Time Measurement Resolution• They use different resolutions for CT and ET depending on
whether the process is in mixed or pure computation mode
• CT measurement of a process in mixed modeØTick (~10ms)ØProc file systemʼs tick-based per-process
• ET measurement of the mixed processØmillisecondØSystem.currentTimeMills()
25
Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. SUMMARY12. FUTURE WORK
26
THE PRINCIPAL CHALLENGE• The goal is to infer/calculate the total time used by the query
processes to actually execute the query
• The challenge is to estimate the portion of IOWait time directly or indirectly caused by activities of the DBMS processes
• They elaborate a simple structured model to provide extended coverage of factors.
27
Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK
28
AN ELABORATED CAUSAL MODEL• Considered Variable• Twice as many factors than TTPv1
• SoftIRQ Ticks , IOwait TicksØOverall only
• User Ticks , System TicksØBoth
• Other five measuresØPer-process only
29
AN ELABORATED CAUSAL MODEL• Relationship
30
AN ELABORATED CAUSAL MODEL• Predicted Correlations
31
Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK
32
TESTING THE CASUAL MODEL• Exploratory Model Analysis
33
TESTING THE CASUAL MODEL• Confirmatory Model
Analysis
• Green: as predicted• Yellow: close• Red: wrong
34
Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK
35
TIMING CONSIDERATIONS• Process Interactions• During the QE other
processes may run, start or stop
ØItʼs hard to know who does what and creates noise
36
TIMING CONSIDERATIONS: Calculating the CPU Time• The CPU time is easy to calculate because we have per-
type system and user time, once we determine which DBMS process was actually running the query.
37
TIMING CONSIDERATIONS: Calculating the I/O Time
38
TIMING CONSIDERATIONS
39
TIMING CONSIDERATIONS: Calculating the I/O Time
40
Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK
41
TIMING PROTOCOL• General protocol
i. Perform Sanity Checksii. Drop Query Executionsiii. Drop selected Q@Csiv. Calculate Query Timev. Post Sanity Checks
42
TIMING PROTOCOL• Step 0: Set Up the System and Run the Queries• Prepare three machines• Enable only one CPU core on the QE machine• Write experiment specification in XML• Run the experiment
43
TIMING PROTOCOL• Step 1: Perform Sanity Checks
44
TIMING PROTOCOL• Step 2: Drop Query Executions
45
TIMING PROTOCOL• Step 3: Drop Selected Q@Cs
46
TIMING PROTOCOL• Step 4: Calculate Query Time• The TTPv1 protocol used the regression coefficient associated with
the query process to determine the contribution of the IOWait time to the total query time
• Additional measures
47
TIMING PROTOCOL• Should we take maximum? Minimum? average?,median?• Maximum, Minimum → ❌
• Average?• Fig.8 shows some non-uniformity in the
distribution→ average ❌
• We should take median
48
TIMING PROTOCOL• Step 5: Post Sanity Checks
49
Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK
50
REPORTING THE RESULTS
51
Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK
52
EVALUATION
53
EVALUATION
54
Outline0. ABSTRACT1. INTRODUCTION2. BACKGROUND3. MEASURING QUERY TIME4. THE PRINCIPAL CHALLENGE5. AN ELABORATED CAUSAL MODEL6. TESTING THE CAUSAL MODEL7. TIMING CONSIDERATIONS8. TIMING PROTOCOL9. REPORTING THE RESULTS10. EVALUATION11. FUTURE WORK
55
FUTURE WORK• Better causal model• Better understanding strange cases• Use additional measures• Less restrictions• Extend to support the Windows operating system• Extend to measure aspect outside of the taxonomy of query
time in fig1.
56