oracle database performance tuning
TRANSCRIPT
Oracle Database Performance TuningA Practitioner’s Perspective (Part 1)
Disclaimer : The content here is completely based on my knowledge and experience and may not be completely accurate.
Abishek V S
What is Performance
• Accepted Throughput for a given Workload
– Throughput : Rate of doing work/ Processing
– Workload : Total Work submitted to the system
What is Performance Tuning
• Database performance tuning can be defined as the optimization of resource use to increase throughput and minimize contention, enabling the largest possible workload to be processed.
What is a Performance Problem
• When one or more database tasks do not complete in a timely manner we say there is a performance problem.
– SQL running longer than usual
– Users facing slowness of application UI
– Users not being able to connect to database
What causes a Performance Problem
• Contention for a resource
– Too many sessions waiting for a lock, CPU or any other resource.
• Over utilization of the system
– The System/Database is made to do more work than it actually can
• A badly written SQL doing excessive disk access and using CPU cycles, causing slowness to others
Categorization
• Slow – Overall DB performance is slower than usual
• Spin – High CPU usage by few oracle processes/sessions
• Hang – Large number of sessions/processes waiting for one or more sessions to release resources, generally a locking or a latching problem.
Initial Diagnostics
• ASK relevant questions to all related stake holders.
• Look for repeated memory, space(Undo/Temp), contention and network related Oracle errors (It does not hurt to check the alert log)
• Measure CPU and Memory usage at the OS level
• Isolate database as the cause of the problem
Database Performance Definitions
• Response Time : Time taken for a task (query) to complete.
• Throughput : Total work done in an interval of time.
– Throughput may also be viewed as the total response time of all sessions for a given interval of time.
Performance
Response Time Throughput
Database Performance Definitions
• Service Time : Time spent on CPU
• Wait Time : Time spent waiting for a resource to be freed in effect waiting for another session or the system.
– When a session waits it posts the purpose of its wait with a wait event and can be viewed from v$session/v$session_wait, v$session_event
• Wait Event : A named label describing the reason for the wait by a session / process
• Response Time : Service Time + Wait Time
• DB Time : Total Response time – Idle wait time
• Avg Active Session (AAS) = DB Time/Elapsed Time
Performance Diagnostics Tools
• OS Commands for checking memory and CPU and IO performance
• V$views– V$SESSION, V$PROCESS
– V$SESSION_WAIT
– V$ACTIVE_SESSION_HISTORY
– V$SYSSTAT, V$SESSSSTAT
– V$WAIT_CHAIN
– V$SGASTAT, V$PGASTAT
– V$SQL, V$SQLTEXT, V$SQL_PLAN
– DBA_HIST_* Views
• STATSPACK, AWR, ASH
• OEM
Not so common symptoms
• Memory related errors like ORA-4030 and ORA-4031
• Transaction related errors like ORA-1555 and ORA-30036
• Contention related errors like ORA-4020, ORA-4021, ORA-54 etc.
• Network related errors like TNS-3136, TNS-12518 etc
Troublemakers
• DB Components prone to contention
– Buffer cache
– Library cache
– Shared Pool
– Locks/Enqueue
– Latches
• DB operations that are CPU and I/O dependant.
– Logical reads
– Sorts
– Parses
– Joins and Aggregates
High CPU
• High CPU utilization may not necessarily indicate that there is a problem, – It could just mean that the system is being well utilized.
– However, high CPU usage means that any new operations may start to interfere with the current usage. Since there is no room for growth, they can start to exhibit signs of performance degradation.
• You should investigate the reason for the high CPU usage. if :
– CPU usage is consistently high when the overall DB load is less (work done/load profile)
– System performance is poor together with High CPU usage
– One or more processes are consistently hogging CPU at the expense of other processes
High CPU
• What to look for when multiple processes are using high CPU
– Sometimes the CPU is almost equally distributed across many processes. The only thing to do here is to see if they do the same task, like executing a particular package or query.
• What to look for when only one process is using high CPU
– The approach to take here depends on the type of process involved. Determine the type of process and then act accordingly as outlined below:
– Which process is hogging the CPU?– Background process
– Oracle (user) process
– OS Process that is not related to Oracle
– Defunct process
• Common Causes
– Excess Parsing
– High Logical Reads
– Sorts/Hash/Aggregates
Wait Events
• A named label describing the reason for the wait by a session / process
– Basically a section of code
– There are 41 classes of wait events in 10.2.0.3
– There are 878 wait events in 10.2.0.3
• 209 enqueue events
• 29 latch events
• 41 I/O events
• Important wait events (Troublemakers)– Db file sequential read
– Db file scattered read
– Direct path read/write
– Log file sync
– Log file parallel write
– Buffer busy waits & read by other session
– SQL*Net message from client
– Enq: TX – row lock contention
– Latch free
– Latch: cache buffer chains
Wait Event Parameters
P1 P2 P3
P1TEXT P2TEXT P3TEXT
P1RAW P2RAW P3RAW
Wait Events
Copyright 2006 Kyle Hailey
WaitsDisk I/O
Library Cache
Enqueue
Undo
TX - row lock contention
TX - allocate ITL entry
HW
Redo
Buffer Cache
SQL*Net
TM
ST
TS
TX - index contention
US
CI
SQ
TM – table modification
TX – Transaction locks
UL – user lock
Wait Events
• Classes of Wait Events
– Every wait event belongs to a class of wait event. The following list describes each of the wait classes.
• Administrative– Waits resulting from DBA commands that cause users to wait (for example, an index rebuild)
• Application– Waits resulting from user application code (for example, lock waits caused by row level locking or explicit lock commands)
• Cluster– Waits related to Real Application Clusters resources (for example, global cache resources such as 'gc cr block busy')
• Commit– This wait class only comprises one wait event - wait for redo log write confirmation after a commit (that is, 'log file sync')
• Concurrency– Waits for internal database resources (for example, latches)
• Configuration– Waits caused by inadequate configuration of database or instance resources (for example, undersized log file sizes, shared pool size)
Wait Events
• Idle– Waits that signify the session is inactive, waiting for work (for example, 'SQL*Net message from client')
• Network– Waits related to network messaging (for example, 'SQL*Net more data to dblink')
• Other– Waits which should not typically occur on a system (for example, 'wait for EMON to spawn')
• Queue– Contains events that signify delays in obtaining additional data in a pipelined environment. The time spent on these wait events indicates
inefficiency or other problems in the pipeline. It affects features such as Oracle Streams, parallel queries, or DBMS_PIPE PL/SQL packages.
• Scheduler– Resource Manager related waits (for example, 'resmgr: become active')
• System I/O– Waits for background process I/O (for example, DBWR wait for 'db file parallel write')
Wait Events
• User I/O– Waits for user I/O (for example 'db file sequential read')
Wait Events
• Common Wait Events– buffer busy waits
– control file parallel write
– db file parallel read
– db file parallel write
– db file scattered read
– db file sequential read
– db file single write
– direct path read
– direct path write
– enqueue
– free buffer waits
– latch free
– library cache pin
– library cache lock
– log buffer space
– log file parallel write
– log file sequential read
– log file sync
I/O Performance
• I/O operations are an essential part of processing and often represent a major portion of processing time.
• There are also two basic types of I/O operations: synchronous and asynchronous, which differ in performance.
• Although hardware engineers have been working for the past decade or so to beef up I/O throughput with various offerings, I/O operations remain the slowest activity in a computer system.
• I/O operations are a necessity of every process that reads from or writes to the database.
• Database reads and writes are simple on the surface, but the path between the database and the physical disks can be a convoluted mess of software and hardware from various manufacturers, each with its own limitations.
I/O Performance
• I/O Wait Events– db file sequential read wait event is initiated by SQL statements (both user and recursive) that perform single-block read operations against indexes, rollback (or undo)
segments, and tables (when accessed via rowid), control files and data file headers. This wait event normally appears as one of the top five wait events, according to systemwide waits.
– db file scattered read wait event is much like the db file sequential read event. Instead of single-block read, this is multiblock read. The db file scattered read wait event is initiated by SQL statements (both user and recursive) that perform full scans operations against tables and indexes. Contrary to some teaching, full scans are not always bad; they are good when the SQL statement needs most of the rows in the object.
– direct path read waits are driven by SQL statements that perform direct read operations from the temporary or regular tablespaces.
– log file parallel write wait event belongs only to the LGWR process. When it is time to write, the LGWR process writes the redo buffer to the online redo logs by issuing a series of system write calls to the operating system. The LGWR process waits for the writes to complete on the log file parallel write event.
– log file sync : When a user session completes a transaction, either by a commit or a rollback.
– buffer busy waits event occurs when a session wants to access a data block in the buffer cache that is currently in use by some other session.
• There are also two basic types of I/O operations: synchronous and asynchronous, which differ in performance.
• Although hardware engineers have been working for the past decade or so to beef up I/O throughput with various offerings, I/O operations remain the slowest activity in a computer system.
• I/O operations are a necessity of every process that reads from or writes to the database.
• Database reads and writes are simple on the surface, but the path between the database and the physical disks can be a convoluted mess of software and hardware from various manufacturers, each with its own limitations.
Enqueues
• Enqueues are locks that apply to database objects.
• Enqueues are transactional, initiated by the application.
• The Oracle session is waiting to acquire a specific enqueue. The enqueue name and mode is recorded in the P1 parameter. The appropriate action to take depends on the type of enqueue being competed for.
• Up to Oracle9i Database, the enqueue wait event represents all enqueue waits; starting in Oracle Database 10g, all enqueues are broken out and have independent wait events.
Enqueues
• What Is an Enqueue Resource?– An enqueue resource is a database resource that is affected by an enqueue lock. Oracle manages the enqueue
resources using an internal array structure that can be seen through the view V$RESOURCE
• What Is an Enqueue Lock?– An enqueue lock is the lock itself. Oracle uses a separate array than the enqueue resources array to manage the
enqueue locks. This structure can be seen through the view V$ENQUEUE_LOCK
Enqueues
Lock Modes and Descriptions
Mode Description
0 None
1 Null (N)
2 Row-Share (RS), also known as Subshare lock (SS)
3 Row-Exclusive (RX), also known as Subexclusive lock (SX)
4 Share (S)
5 Share Row Exclusive (SRX), also known as Share-Subexclusive lock (SSX)
6 Exclusive (X)
Oracle uses the lock mode to determine if a resource can be shared by multiple concurrent processes.
Enqueues
Statement Mode N RS RX S SRX X
SELECT N Yes Yes Yes Yes Yes Yes
SELECT … FOR
UPDATE
RS Yes Yes* Yes* Yes* Yes* No
lock table in row share
mode
RS Yes Yes Yes Yes Yes No
INSERT RX Yes Yes Yes No No No
UPDATE RX Yes Yes* Yes* No No No
DELETE RX Yes Yes* Yes* No No No
lock table in row
exclusive mode
RX Yes Yes Yes No No No
lock table in share
mode
S Yes Yes No Yes No No
lock table in share row
exclusive mode
SRX Yes Yes No No No No
lock table in exclusive
mode
X Yes No No No No No
*Yes means sharing is possible if no conflicting row lock is held by another session.
Lock Mode Compatibility Chart
Enqueues
• Common Causes, Diagnosis, and Actions– An enqueue lock is the lock itself. Oracle uses a separate array than the enqueue resources array to manage the
enqueue locks. This structure can be seen through the view V$ENQUEUE_LOCK
select * from v$enqueue_stat where cum_wait_time > 0 order by inst_id, cum_wait_time;
INST_ID EQ TOTAL_REQ# TOTAL_WAIT# SUCC_REQ# FAILED_REQ# CUM_WAIT_TIME
---------- -- ---------- ----------- ---------- ----------- -------------
1 SQ 66551 437 66551 0 498
1 CU 64353 133 64353 0 1616
1 HW 453067 18683 453067 0 11811
1 CF 119748 76 119605 143 37842
1 TX 22687836 9480 22687758 71 672435
1 TC 3620 724 3620 0 679237
1 TM 89822967 91 89817200 5 4056333
Enqueues
• enq: TX—row lock contention (MODE=6)– This indicates contention for row-level lock. This wait occurs when a transaction tries to update or delete rows that
are currently locked by another.
• enq: TX—allocate ITL entry (MODE=4)– Execute the following query to see the magnitude of ITL waits in your database:
• select owner, object_name, subobject_name, object_type, tablespace_name, value, statistic_namefromv$segment_statistics where statistic_name = ’ITL waits’ and value > 0 order by value;
• enq: TX—row lock contention (MODE=4)– Unique Key Enforcement
– Bitmap Index Entry
Enqueues
Solving Enqueue/Locking Problems Most often an application issue
Hunt down the offending user. However, the offending user is not always the user who holds the lock.
If the user has a legitimate reason to hold the lock, the waiters should back out of their transactions.
select /*+ ordered */
a.sid
blocker_sid, a.username blocker_username,
a.serial#, a.logon_time, b.type, b.lmode mode_held,
b.ctime time_held, c.sid waiter_sid, c.request request_mode, c.ctime time_waited
from v$lock b, v$enqueue_lock c, v$session a
where a.sid = b.sid
and b.id1 = c.id1(+)
and b.id2 = c.id2(+)
and c.type(+) = 'TX'
and b.type = 'TX'
and b.block = 1
order by time_held, time_waited;
select sid, username, blocking_session, final_blocking_session from v$session where BLOCKING_SESSION_STATUS='VALID' ;
Enqueues
Solving Enqueue/Locking Problems
• Need SQL and Object/row
– Statspack fails
– V$active_session_history succeeds
• In “real time” can also use
– v$lock
– V$locked_object
– v$session
– dba_blockers
– dba_waiters
– V$wait_chain
– ?/rdbms/admin/utllockt.sql
Enqueues
• 10g and above Translates all the Enqueues
– 208 enqueue waits
– Specific to each type of enqueueenq: ST - contention Configuration
enq: TM - contention Application
enq: TW - contention Administrative
enq: TX - allocate ITL entry Configuration
enq: TX - index contention Concurrency
enq: TX - row lock contention Application
enq: TX – contention Application
5 Timed Events Avg %Total~~~~~~~~~~~~~~~~~~ wait CallEvent Waits Time (s) (ms) Time----------------------------- ------ -------- ------ ------enq: TX - row lock contention 42 126 3000 96.5CPU time 4 2.8db file sequential read 165 1 4 .4control file sequential read 214 0 1 .1log file switch completion 2 0 40 .1
V$active_session_history• Who is waiting
• SESSION_ID• SESSION_SERIAL#• USER_ID
• On what object • CURRENT_OBJ#• CURRENT_FILE#• CURRENT_BLOCK#
• With what SQL• SQL_ID
• Who is the blocker • BLOCKING_SESSION• BLOCKING_SESSION_STATUS• BLOCKING_SESSION_SERIAL#
• What is the Blocker SQL• not reliably possible
OEM externalizes all of this
Latch
• A latch is an internal Oracle mechanism used to protect data structures in the SGA from simultaneous access.
• Atomic hardware instructions like TEST-AND-SET are used to implement latches.
• Latches are more restrictive than locks in that they are always exclusive.
• Latches are never queued, but will spin or sleep until it obtains a resource or times out.
• Latches are important for performance tuning. (Don’t tune latches)
Latch
• What is latch contention?
• I want to get a latch, but someone is already holding it!
– If get was in no-wait mode, return to caller with failure
– If get was in willing-to-wait mode, continue trying:
• So, I will try to get it again immediately! And again! And again!
– This is spinning (busywaiting)
• Still can't get it so I go to sleep for a very long time... ... ...
– 10ms is very long time in latching world
– _max_exponential_sleep
– _max_sleep_holding_latch
Latch
SPIN and SLEEP
• Active wait or spin
– When an attempt to get a latch in a willing-to-wait mode fails, the process will spin and try again
• Sleep
– If the number of attempts reaches the value of SPIN_COUNT parameter, the process sleeps
– Sleeping is more expensive than spinning
• Wakeup Mechanisms
– Timeout
– Latch wait posting
Latch
• Views– V$LATCHNAME
– V$LATCH
– V$LATCHHOLDER
– V$LATCH_PARENT
– V$LATCH_CHILDREN
– V$LATCH_MISSES
• Key latches impacting performance
– redo allocation
– redo copy
– cache buffers chains
– enqueues
– row cache objects
– library cache
– shared pool
Latch
• Resolving latch contention– Latches cannot be tuned starting with Oracle 9i
– Identify the underlying problem
– Most often it turns out to be an application issue
• Excessive Parsing
• Bad Cursor mamagement (JDBC/ODBC issues)
• Ineffciient SQL
• Small Caches ( Too Big is also a problem)
• Configuration / Administration problems.
– Components prone to latch contention
• Library cache
• Buffer cache
• Shared pool
• Enqueue
• Redo log buffer
Latch
• Enqueue– latch: enqueue hash chains
• The Enqueues are memory structures in the SGA, and latches protect them. They are called "Enqueue Hash Chain" latches. And there is parent and many child latches. (285270.1)
• To be put in a queue, the session grabs an "enqueue resource" which the structure that has the details of the lock and attempts to acquire the latch that protects the Hash chain associated with the mode and the rest of the special values. After it is acquired, it places the "enqueue resource" in the chain either as holder or waiter depending on the circumstances and releases the latch.
• Shared pool– Oracle shared pool consists of many structures.
– The prominent ones are the dictionary cache, SQL area, and library cache.
– The shared pool latch protects the shared pool structures, it is taken when allocating/freeing memory heaps. (Hard Parsing)
– Prior to Oracle9i Database, the shared pool memory structures were protected by a solitary shared pool latch.
– Oracle 9i(9.2.0.6) introduces subpools controlled by _KGHDSIDX_COUNT
– Contentions for the shared pool and library cache latches are mainly due to
• Intensive hard parsing.
• And a child library cache latch must be held for the duration of the parse.
Latch
• Buffer Cache• latch: cache buffers chains
latch: cache buffers lru chainlatch: checkpoint queue latch
When data blocks are read into the SGA, their buffer headers are placed on linked lists (hash chains) that hang off hash buckets. This memory structure is protected by a number of child cache buffers chains latches (also known as hash latches or CBC latches).
The default number of hash buckets is 2 * DB_BLOCK_BUFFERS, and the value is adjustable by the _DB_BLOCK_HASH_BUCKETS parameter. (prior to 9i)
Oracle Database 10g uses a different algorithm to determine the default number of hash buckets.
Inefficient SQL Statements
Hot Blocks
Almost every time an application problem
Latch
• Library Cache
Latch
• The library cache consists – SQL statements, execution plans, parsed trees, object handles (Tables, procedures, etc) among other things.
– The structures are protected by the library cache latch.
– Oracle processes acquire the library cache latch when modifying, inspecting, pinning, locking, loading, or executing objects in the library cache structures.
• Library Cache– Latches
• latch: library cache
• latch: library cache lock
• latch: library cache pin
– Wait Events• library cache lock
• library cache pin
• library cache load lock
•
Latch
• Library Cache Locks and Pins (444560.1)– Locks and Pins are usually in share mode unless modifications are being made
• Object dependency (lock in Null)
• Cursor execution (lock in null, Pin in share)
• Cursor compilation ( Lock exclusive, Pin exclusive)
– Contention when Sessions try to• Load/compile same SQL
• Compile package others are running
• Contention for Library Cache Latches– Statements with High Version Counts
– Excessive Hard Parsing• Not Sharing SQL – use of Literal Values
• Shared Pool too small
– Too many invalidations
– Excessive Soft Parsing
Mutex
• 10gR2 new library cache latch mechanism that replace latches and takes less memory
• On 32bit linux installation a mutex was 28 bytes in size, regular latch structure was 110 bytes.
• Takes less instructions to mutex get is about 30-35 instructions latch get is 150-200 instructions
• Less contention than latches, because there can be more mutexes
• Mutexes are stored in each child cursor
• Turn off with _kks_use_mutex_pin=false , unsupported and not available in 11gR2
• In 11g+ mutexes are used instead of most library cache latches – Instead of up to 67 library cache latches there's 131072 mutexes!
– Each library cache mutex protects one library cache hash bucket
Mutex
• Mutex Event Waits– cursor: mutex X
– cursor: mutex S
– cursor: pin S
– cursor: pin X
– cursor: pin S wait on X
– library cache: mutex X
– library cache: mutex S
• Cursor mutexes are used to protect the parent cursor and also with cursor statistic operations.
• Cursor pins are used to pin a cursor in preparation for a related operation on the cursor.
• Library cache mutexes are similar to library cache latch operations in earlier releases– In all these cases, waits for these resources occurs when 2 (or more) sessions are working with the same cursors simultaneously. When
one session takes and holds a resource required by another, the second session will wait and will wait on one of these events.
Mutex
• Mutex Views
– V$MUTEX_SLEEP
• Shows the wait time, and the number of sleeps for each combination of mutex type and location.
– V$MUTEX_SLEEP_HISTORY
• Shows last individual occurrences of mutex sleeps
• Based on a circular buffer, has most detail
Oracle Tools
• AWR
– awrrpt.sql
– awrsqrpt.sql
• ASH Repository (DBA_HIST_% views)
– DBA_HIST_ACTIVE_SESS_HISTORY
Oracle Tools
• V$Views
– V$SESSION
– V$PROCESS
– V$SQL
– V$SESSION_LONGOPS
• Tracing
– 10046 trace (session trace) level 2,4,8,12
– 10053 SQL Optimizer trace
– Systemstate dump
– Shortstack
End of Part 1