lecture 1 database tuning
TRANSCRIPT
-
8/2/2019 Lecture 1 Database Tuning
1/26
Database Tuning
-
8/2/2019 Lecture 1 Database Tuning
2/26
Performance The measure of efficiency for an application or
multiple applications running in the sameenvironment
Is usually measured in: response time: the time that a single task takes to
completeCan be shortened by: Reducing contention and wait times, particularly disk I/O wait
times
Using faster components Reducing the amount of time the resources are needed
throughput: the volume of work completed in a fixedtime period
is commonly measured in transactions per second (tps), but itcan also be measured per minute, per hour, per day, and so
on
-
8/2/2019 Lecture 1 Database Tuning
3/26
Database Performance
Can be thought by using the concepts of supplyand demand Users demand information from the database
The DBMS supplies information to those requesting it
The rate at which the DBMS supplies the demand forinformation can be termed database performance
Five factors influence database performance:
Workload
ThroughputResource
Optimization
Contention
-
8/2/2019 Lecture 1 Database Tuning
4/26
Factors Influencing Database Performance
Workload that is requested of the DBMS definesthe demand A combination of online transactions, batch jobs, ad
hoc queries, data warehousing analysis, and systemcommands directed through the system
Can fluctuate drastically, sometimes can be predicted
Throughput defines the overall capability of thecomputer to process data Composite of I/O speed, CPU speed, parallel
capabilities of the machine, and the efficiency of theoperating system and system software
-
8/2/2019 Lecture 1 Database Tuning
5/26
Factors Influencing Database Performance (cont.)
Resources of the system include database kernel,disk space, memory, cache controllers, andmicrocode
Optimization of queries is primarily accomplishedinternal to the DBMSMany factors that need to be optimized:
SQL formulation
Database parameters
Contention is the condition in which two or more
components of the workload are attempting touse a single resource in a conflicting way As contention increases, throughput decreases
-
8/2/2019 Lecture 1 Database Tuning
6/26
Database Performance Definition
Database performance can be defined asthe optimization ofresource use to increasethroughput and minimize contention,enabling the largest possible workload to
be processed
Performance management tasks not
covered by the definition should be handled
by someone other that the DBA or at aminimum shared between the DBA andother technicians
-
8/2/2019 Lecture 1 Database Tuning
7/26
Performance Tuning
Adjusting various parameters and design choices toimprove system performance for a specific application.
Tuning is best done by
1. identifying bottlenecks, and
2. eliminating them.
Can tune a database system at 3 levels: Hardware -- e.g., add disks to speed up I/O, add memory to
increase buffer hits, move to a faster processor.
Database system parameters -- e.g., set buffer size to avoid
paging of buffer, set checkpointing intervals to limit log size.System may have automatic tuning.
Higher level database design, such as the schema, indices
and transactions (more later)
-
8/2/2019 Lecture 1 Database Tuning
8/26
Bottlenecks
Performance of most systems (at least before they aretuned) usually limited by performance of one or a fewcomponents: these are called bottlenecks E.g. 80% of the code may take up 20% of time and 20% of code
takes up 80% of time
Worth spending most time on 20% of code that take 80% of time
Bottlenecks may be in hardware (e.g. disks are very busy,CPU is idle), or in software
Removing one bottleneck often exposes another
De-bottlenecking consists of repeatedly finding
bottlenecks, and removing them This is a heuristic
-
8/2/2019 Lecture 1 Database Tuning
9/26
Identifying Bottlenecks
Transactions request a sequence of services e.g. CPU, Disk I/O, locks With concurrent transactions, transactions may have to wait
for a requested service while other transactions are beingserved
Can model database as a queueing system with a queuefor each service transactions repeatedly do the following
request a service, wait in queue for the service, and get serviced
Bottlenecks in a database system typically show up as very
high utilizations (and correspondingly, very long queues) of aparticular service E.g. disk vs CPU utilization
-
8/2/2019 Lecture 1 Database Tuning
10/26
-
8/2/2019 Lecture 1 Database Tuning
11/26
Tunable Parameters
TuningTuning
Tuning
TuningTuning
ofof
of
ofof
hardwareschema
indices
materialized viewstransactions
-
8/2/2019 Lecture 1 Database Tuning
12/26
Tuning of Hardware
Even well-tuned transactions typically require a few I/Ooperations Typical disk supports about 100 random I/O operations per
second
Suppose each transaction requires just 2 random I/O operations.
Then to support n transactions per second, we need to stripe dataacross n/50 disks (ignoring skew)
Number of I/O operations per transaction can be reducedby keeping more data in memory If all data is in memory, I/O needed only for writes
Keeping frequently used data in memory reduces disk accesses,reducing number of disks required, but has a memory cost
-
8/2/2019 Lecture 1 Database Tuning
13/26
Hardware Tuning: Five-Minute Rule
Question: which data to keep in memory: If a page is accessed n times per second, keeping it in memory saves n* price-per-disk-drive
accesses-per-second-per-disk
Cost of keeping page in memory
price-per-MB-of-memory
pages-per-MB-of-memory
Break-even point: value ofn for which above costs are equal Buying memory: If accesses are more then saving is greater than cost
Solving above equation with current disk and memory prices leads to:
5-minute rule: if a page that is randomly accessed isused more frequently than once in 5 minutes it shouldbe kept in memory
(by buying sufficient memory!)
-
8/2/2019 Lecture 1 Database Tuning
14/26
Hardware Tuning: One-Minute Rule
For sequentially accessed data, more pages canbe read per second. Assuming sequential readsof 1MB of data at a time:1-minute rule: sequentially accessed data
that is accessed once or more in a minuteshould be kept in memory
Prices of disk and memory have changed greatlyover the years, but the ratios have not changed
much so rules remain as 5 minute and 1 minute rules, not 1
hour or 1 second rules!
-
8/2/2019 Lecture 1 Database Tuning
15/26
Hardware Tuning: Choice of RAID Level To use RAID 1 or RAID 5?
Depends on ratio of reads and writes
RAID 5 requires 2 block reads and 2 block writes to write out one datablock
If an application requires r reads and w writes per second RAID 1 requires r + 2w I/O operations per second RAID 5 requires: r + 4w I/O operations per second
For reasonably large r and w, this requires lots of disks tohandle workload RAID 5 may require more disks than RAID 1 to handle load!
Apparent saving of number of disks by RAID 5 (by using parity, asopposed to the mirroring done by RAID 1) may be illusory!
Thumb rule: RAID 5 is fine when writes are rare and datais very large, but RAID 1 is preferable otherwise If you need more disks to handle I/O load, just mirror them since
disk capacities these days are enormous!
-
8/2/2019 Lecture 1 Database Tuning
16/26
Tuning the Database DesignSchema tuning
Can be done in several ways: Splitting tables
Sometimes splitting normalized tables can improve performance Can split tables in two ways:
Horizontally
Vertically Adds complexity to the applications
Denormalization: Adding redundant columns Adding derived attributes Collapsing tables
Duplicating tables Cluster together on the same disk page records that would
match in a frequently required join, compute join very efficiently when required.
-
8/2/2019 Lecture 1 Database Tuning
17/26
Tuning the Database Design
Schema tuning (cont.)
Vertical Splitting Example
account relation with the following schema:
account(account-number, brach-name, balance)can be split into two relations:account-branch(account-number, branch-name)account-balance(account-number, balance)
-
8/2/2019 Lecture 1 Database Tuning
18/26
Tuning the Database DesignIndex tuning (cont.)
When should indexes be considered Unique indexes are implicitly used in conjunction with a
primary key for the primary key to work Foreign keys are also excellent candidates for an index
because they are often used to join the parent table
Most, if not all, columns used for table joins should be indexed Columns that are frequently referenced in the ORDER BY and
GROUP BY clauses should be considered for indexes Indexes should be created on columns with a high number of
unique values, or columns when used as filter conditions inthe WHERE clause return a low percentage of rows of datafrom a table
The effective use of indexes requires a thorough knowledgeof table relationships, query and transaction requirements,and the data itself
-
8/2/2019 Lecture 1 Database Tuning
19/26
Tuning the Database DesignIndex tuning (cont.)
When should indexes be avoided Indexes should not be used on small tables Indexes should not be used on columns that return a high
percentage of data rows when used as a filter condition inaquery's WHERE clause
Tables that have frequent, large batch update jobs run canbe indexed However, the batch job's performance is slowed considerably by the
index Might consider dropping the index before the batch job, and then re-
creating the index after the job has completed
Indexes should not be used on columns that contain a highnumber of NULL values
Columns that are frequently manipulated should not beIndexed
-
8/2/2019 Lecture 1 Database Tuning
20/26
Tuning the Database DesignMaterialized Views
Materialized views can help speed up certain queries Particularly aggregate queries
Overheads Space Time for view maintenance
Immediate view maintenance: done as part of update time overhead paid by update transaction
Deferred view maintenance: done only when required update transaction is not affected, but system time is spent on view
maintenance until updated, the view may be out-of-date
-
8/2/2019 Lecture 1 Database Tuning
21/26
Tuning of Transactions Basic approaches to tuning of transactions
Improve set orientation
Reduce lock contention
Rewriting of queries to improve performance was importantin the past, but smart optimizers have made this lessimportant
Communication overhead and query handling overheadssignificant part of cost of each call Combine multiple embedded SQL/ODBC/JDBC queries into
a single set-oriented query
Set orientation -> fewer calls to database E.g. tune program that computes total salary for each department using
a separate SQL query by instead using a single query that computestotal salaries for all department at once (using group by)
Use stored procedures: avoids re-parsing and re-optimizationof query
-
8/2/2019 Lecture 1 Database Tuning
22/26
Tuning of Transactions (Cont.)
Reducing lock contention
Long transactions (typically read-only) thatexamine large parts of a relation result in lock
contention with update transactions E.g. large query to compute bank statistics and
regularbank transactions
To reduce contention Use multi-version concurrency control
E.g. Oracle snapshots which support multi-version 2PL
Use degree-two consistency (cursor-stability) for longTransactions
-
8/2/2019 Lecture 1 Database Tuning
23/26
Tuning of Transactions (Cont.) Long update transactions cause several problems
Exhaust lock space
Exhaust log space
and also greatly increase recovery time after a crash, and may evenexhaust log space during recovery if recovery algorithm is badlydesigned!
Use mini-batch transactions to limit number of updates
that a single transaction can carry out. E.g., if a single largetransaction updates every record of a very large relation,log may grow too big.* Split large transaction into batch of ``mini-transactions,'' each
performing part of the updates
-
8/2/2019 Lecture 1 Database Tuning
24/26
-
8/2/2019 Lecture 1 Database Tuning
25/26
-
8/2/2019 Lecture 1 Database Tuning
26/26