presentation - latch internals by andrey s. nikolaev · andrey nikolaev oracle database performance...
TRANSCRIPT
Latch internalsRUOUG Seminar, December 6, 2012
Andrey NikolaevOracle Database Performance ExpertRDTEX, Russia
WHO AM I
• RDTEX, First Line Support Center
• http://andreynikolaev.wordpress.com
• "Latch, mutex and beyond"
• Specialize in Oracle performance tuning
• Over 20 years of Oracle related experience as a research scientist, developer, DBA, performance consultant, lecturer…
• Occasionally presented at conferences
• RUOUG member
RDTEX | Latch internals | RUOUG Seminar
2
EPISODE OF LATCH CONTENTION
• Oracle instance hangs due to heavy "cache buffers chains" latch contention
3
RDTEX | Latch internals | RUOUG Seminar
WHAT WE ALL KNEW ABOUT ORACLE LATCHES?
• Latches spin up to the instance parameter _SPIN_COUNT that defaults to 2000.
• This number can be tuned dynamically.
• Latch waits use increasing exponential backoff.
• Latch waits are very short.
• We should tune application, not the _SPIN_COUNT
RDTEX | Latch internals | RUOUG Seminar
4
CONTEMPORARY LATCHES SINCE 9.2
RDTEX | Latch internals | RUOUG Seminar
5
I was wondering to find that:
• Exclusive latch spins 10 times more.
• The exponential backoff was disappeared more then decade ago in Oracle 9.2.
• All except one latch use wait posting now.
• "Latch free" wait may be infinite.
• Sometimes it is worth to tune the _SPIN_COUNT.
• This parameter is static for most latches.
GREAT SOURCES OF LATCH INFO
“Oracle8i Internal Services for Waits, Latches, Locks, andMemory” by Steve Adams [1], 1999.
– Founded new era of Oracle performance tuning
– However, the Oracle latch technologies were evolved greatly during the decade.
“Systematic Latch Contention Troubleshooting” approach by Tanel Poder [2].
– Latchprofx.sql script revolutionized the latch tuning
“Oracle Core: Essential Internals for DBAs and Developers" by Jonathan Lewis [3], 2011.
RDTEX | Latch internals | RUOUG Seminar
6
REVIEW OF SERIALIZATION MECHANISMS IN ORACLE
• "Latches are simple, low-level serialization mechanisms that coordinatemultiuser access to shared data structures, objects, and files…"
• "A mutual exclusion object (mutex) is a low-level mechanism that prevents an object in memory from aging out or from being corrupted …"
• "Internal locks are higher-level, more complex mechanisms …"
Oracle® Database Concepts 11.2
• KGX Mutexes appeared in latest Oracle versions inside Library Cache
RDTEX | Latch internals | RUOUG Seminar
Access
Acquisition
Spinlock based
Timescale
Lifecycle
Locks
Several Modes
FIFO
No
> Milliseconds
Dynamic
Latches
Types and Modes
SIRO (spin) + FIFO
Yes
Microseconds
Static
Mutexes
Operations
SIRO
Yes
SubMicroseconds
Dynamic7
HOW SPINLOCK WORKS
RDTEX | Latch internals | RUOUG Seminar
8
CPU3:
Free FreeHolding by CPU 1
Spin polls the spinlock location
Get
Miss SleepSpin count ( )∆
Holding time (S)
CPU2: Spin Holding the spinlock
Holding by CPU 2
CPU1: Holding the spinlock
The spinlock location:
Sleep releases the CPU
• Oracle latches:
– Use atomic hardware instruction for Immediate get.
– If missed, process spins by polling latch location during Spin get.
– Number of spin cycles is bounded by spin count.
– In spin get not succeed, the process acquiring the latch sleeps.
…
CLASSIC SPINLOCKS
• Wiki: "… spinlock … waits in a loop repeatedly checking until the lock becomes available …".
• Introduced by Edsger Dijkstra [4] in 1965. Have been thoroughlyinvestigated since that time [5].
• Many sophisticated spinlock realizations were proposed and evaluated (TS, TTS, Delay, MCS, Anderson,...).
• Two general types:
– System spinlocks. Kernel OS threads cannot wait. Major metrics: atomic operations frequency. Shared bus utilization.
– User spinlocks. Oracle latch and mutex. Average lock holding time ~ 10 us. It is more efficient to poll a lock rather than pre-empt the thread doing 1 ms context switch. Metrics: CPU and elapsed times.
RDTEX | Latch internals | RUOUG Seminar
9
THE NEED FOR SPIN
RDTEX | Latch internals | RUOUG Seminar
10
Time to complete CBC latch contention testcase vs. number of threads.
No spinning_spin_count=0
_spin_count=2000
SPINLOCK REALIZATIONS
RDTEX | Latch internals | RUOUG Seminar
Spinlock: Pseudocode: Problems:
TS
pre-11.2 mutex
while(Test_and_Set(lock)); Bus saturation
TTS
Oracle latch
while(lock||Test_and_Set(lock)); Invalidation storms on release (“open door”, “thundering herds”).
Delay
Latch and 11.2 Mutex
Adjustable delay Slower under contention
Anderson, MCS, etc.
Queues. Widely used in Java, Linux kernel …
CPU and memory overhead, preemption issues
11
THE TOOLS
RDTEX | Latch internals | RUOUG Seminar12
DTRACE. SOLARIS DYNAMIC TRACING FRAMEWORK
DTrace allows us to investigate how Oracle latches perform in real time:– Create triggers on any event inside Solaris and function call inside Oracle
provider:module:function:namepid1910:oracle:kslgetl:entry
– Write trigger bodies – actions. May read and change any address location.– Can count the latch spins, trace the latch waits, perform experiments.– Measure times and distributions up to microseconds precision.
RDTEX | Latch internals | RUOUG Seminar
sqlplus /nolog @demo_00
ORADEBUG. INTERNAL ORACLE DEBUGGER
RDTEX | Latch internals | RUOUG Seminar
• oradebug call. Allows us invoke any internal Oracle function manually.
SQL> oradebug peek 200222A0 24
[200222A0, 200222B8) = 00000016 00000001 000001D0 00000007
SQL> oradebug call kslgetl 0x5001A020 1 2 3
Function returned 1
• oradebug peek. Examines contents of any memory location.
• oradebug poke. Modifies memory.
• oradebug watch. Watch a region of memory.
LATCH CONTENTION TESTCASES
RDTEX | Latch internals | RUOUG Seminar
15
create table latch_contention as select * from dual;…for i in 1..1000000loopselect dummy into a from latch_contention;
end loop;
• Exclusive and shared latches behave differently.• Contention for shared "cache buffer chains" latch arises when many sessions scan the same block concurrently and _fastpin_enable=0.
• To induce the contention for exclusive “row cache objects” latch protecting dc_users row cache just add several dynamic RLS policies:
exec dbms_rls.add_policy('system','latch_contention','rls_1', …
exec dbms_rls.add_policy('system','latch_contention','rls_2', …
…
• These dummy policies generate predicates '1=1'.• Many other contention scenarios possible.
sqlplus /nolog @demo_0_1
ROW CACHE LATCH CONTENTION TESTCASE
RDTEX | Latch internals | RUOUG Seminar
16
Elapsed and CPU times vs. number of threads. Oracle Database Appliance (24 SMT).
Latch wait
HOW ORACLE REQUESTS THE LATCH
RDTEX | Latch internals | RUOUG Seminar17
DTRACE REVEALS LATCH INTERFACE ROUTINES
Oracle calls the following functions to acquire the latch:
– kslgetl(laddr, wait, why, where) - get exclusive latch
– kslg2c(l1,l2,trc,why, where) - get two excl. child latches
– kslgetsl (laddr,wait,why,where,mode) - get shared latch
• 11g - ksl_get_shared_latch (…)
– kslg2cs(l1,l2,mode,trc,why, where)) - get two shr. child latches
– kslgpl(laddr,comment,why,where) - get parent and all childs
– kslfre(laddr) - free the latch
Oracle give us possibility to do the same by oradebug call
RDTEX | Latch internals | RUOUG Seminar
sqlplus /nolog @demo_01
Use 32-bit ports
64bit Oracle corrupts the arguments of
oradebug call in 10.2.0.4 and above!
18
“WHERE” AND “WHY” ORACLE GETS THE LATCH
To request the latch, Oracle kernel routines need:
• laddres – address of latch in SGA
• wait – flag for no-wait (0) or wait (1) latch acquisition
• where – integer code for location from where the latch is acquired. Oracle lists possible latch “where” values in x$ksllw.indx.
• why - context of why the latch is acquiring at this “where”. It contains DBA for CBC latches (Tanel Poder), SGA chunk address for shared pool latch, session address, etc… “Why” meaning for some “where” can be found in x$ksllw.ksllwlbl
• mode – requesting state for shared lathes.
– 8 SHARED mode, 16 EXCLUSIVE mode
RDTEX | Latch internals | RUOUG Seminar
19
LIST OF LATCH “WHERE” LOCATIONS
X$KSLLW – List of latch [W]here. Part of v$latch_misses:
– ksllwnam – Kernel function name(“kghupr1”, “ksqdel”, etc...)
– ksllwlbl – Description of “why” context
Join this code on indx to:
– x$ksuprlat.ksulawhr (v$latchholder)– “where” value that was used by latch holder
– x$ksllt.kslltwhr (v$latch) – “where” that was used by last successful latch acquisition
– x$ksupr.ksllawer (v$process) – “where” that was used by process is being waiting for latch
– x$kslwsc.indx (v$latch_misses) – latch sleep statistics by “where”
RDTEX | Latch internals | RUOUG Seminar
20
V$LATCH_MISSES
• Based on X$KSLWSC - No[W]ait and [S]leep [C]ount stats byContext. Documentation is rather obscure.
• The following counters are incremented on each latch sleep:
sleep_count (“where” the latch was held)
wtr_slp_count (“where” the latch get was requested)
• nwfail_count is incremented only for session, which SID is equal to _LATCH_MISS_STAT_SID parameter.
• longhold_count is incremented for shared latches only, when someone held the latch at this “where” for the entire duration of sleep.
RDTEX | Latch internals | RUOUG Seminar
21
LATCH IS HOLDING BY PROCESS, NOT SESSION
•
RDTEX | Latch internals | RUOUG Seminar
Struct ksupr {
…
Struct kslla{
ksllt *ksllalat[14];
}
…}
Process fixed array:
v$process -> x$ksupr
Struct ksllt{
…
}
List of all latches:
v$latch ->x$ksllt
Each process has an array of references to the latches it is holding.
22
KSLLA IS HIDDEN IN V$PROCESS
• The process latching info is the kslla structure embedded in theprocess state object.
• Some kslla fields were externalized to SQL.
• kslla contains array of latches holding by process at each level and information about the state of any current attempt to acquire a latch. Not externalized directly.
• When you query v$latchholder/x$ksuprlat, Oracle scans v$process/x$ksupr in order to find all the processes that are holding the latches.
• Tanel Poder uses high frequency sampling of v$latchholder to systematically troubleshoot latch contention.
RDTEX | Latch internals | RUOUG Seminar
23
X$KSUPR.KSLLA FIELDS INSTRUMENT THE LATCH GET
• ksllalaq – address of latch acquiring. Populated during immediate get (and spin before 11g)
• ksllawat - latch being waited for. This is v$process.latchwait
• ksllawhy – “why” for the latch being waited for
• ksllawere – “where” for the latch being waited for
• ksllalow – bit array of levels of currently holding latches
• ksllaspn - latch this process is spinning on. v$process.latchspin. Not populated since 8.1
• ksllaps% - inter-process post statistics
RDTEX | Latch internals | RUOUG Seminar
24
Generic “latch free” and 27 latch specific “latch free: %” wait events areassociated with sleep only. Spin is invisible to Oracle Wait Interface.
THE LATCH STRUCTURE – KSLLT
RDTEX | Latch internals | RUOUG Seminar
struct ksllt {
<Latch>
“where” and “why”
Level, latch#, class, other attributes
Statistics
Latch wait list header
…
25
LATCH SIZE BY VERSION
*nix 32bit *nix 64bit Windows 32bit
7.3.4 92 - 120
8.0.6 104 - 104
8.1.7 104 144 104
9.0.1 ? 200 160
9.2.0 196 240 200
10.1.0 ? 256 208
10.2.0 - 11.2.0.3 100 160 104
RDTEX | Latch internals | RUOUG Seminar
x$ksmfsv – list of all fixed SGA variables:SELECT DISTINCT ksmfssiz
FROM x$ksmfsv
WHERE ksmfstyp = 'ksllt';
Latch structure was bigger in 10.1 due to additional latch statistics
26
ORACLE LATCH IS NOT JUST A SINGLE MEMORY LOCATION
• Before 11g. Value of first latch byte (word for shared latches) was used to determine latch state:
0x00 – latch is free
0xFF – exclusive latch is busy. Was 0x01 in Oracle 7
0x01,0x02,… - shared latch holding by 1,2, … processes simultaneously
0x20000000 | pid - shared latch holding exclusively
• In 11g first latch word shows the pid of the latch holder
0x00 – latch is free
0x12 – Oracle process with pid 18 holds the exclusive latch
RDTEX | Latch internals | RUOUG Seminar
27
BUSY ORACLE 11G LATCH IN MEMORY
• 11.2 Exclusive latch (32-bit platform):
• 11.2 Shared latch. EXCLUSIVE mode:
RDTEX | Latch internals | RUOUG Seminar
SQL> oradebug peek 200222A0 24
[200222A0, 200222B8) = 00000016 00000001 000001D0 00000007
pid^ gets latch# level#
SQL> oradebug peek 0x&laddr 24
[2000C814, 2000C82C) = 2000001F 0000007C 0000007E 00000006
pid^ gets latch# level#
cont. sqlplus /nolog @demo_0128
LATCH FIXED TABLES
• Before Oracle 11g all the latches were externalized to SQL in v$latch (x$ksllt).
• In 11g parent (fixed SGA) latches are externalized in v$latch_parent (x$kslltr_parent).
• In 11g child (allocated on startup) latches are externalized in v$latch_children (x$kslltr_children) .
• Oracle 11g v$latch (x$kslltr) merges these data. Beware: select from x$kslltr acquires all parent latches and their children.
• Only part of latch structure is visible in x$ksllt… fixed tables:
– kslltwhr, kslltwhy – “where” and “why” latch was acquired last time.
– kslltwgt, … - statistics.
– kslltlvl, … - attributes.
– kslltwtt – latch wait time.
RDTEX | Latch internals | RUOUG Seminar
29
CUMULATIVE V$LATCH STATISTIC COUNTERS
RDTEX | Latch internals | RUOUG Seminar
Statistics: x$ksllt When and how it increments:
GETS kslltwgt “++” after wait mode latch get
MISSES kslltwff “++” after wait get if it was missed
SLEEPS kslltwsl “+number_of_sleeps” during get
SPIN_GETS ksllthst0 “++” if get was missed but not slept
WAIT_TIME kslltwtt “+wait_time” after latch get
IMMEDIATE_GETS kslltngt “++” after nowait mode latch get. Is not incremented on miss.
IMMEDIATE_MISSES kslltnfa “++” if nowait mode get was missed.
30
MANY STATISTICS WERE LOST IN 10.2
In 10.2 Oracle made obsolete many useful latch statistics.
Latch size was decreased and the following fields always contain 0 now in v$latch:
– v$latch.waiters_woken
– v$latch.waits_holding_latch
– v$latch.sleep1…sleep10
– and many undocumented statistics.
This was mentioned in 10.2 Reference.
RDTEX | Latch internals | RUOUG Seminar
31
X$KSLLD - LATCH DESCRIPTORS
• Flags and attributes describing each latch are stored in kslldt array of kslld structures.
• [K]ernel [S]ervice [L]atch [L]ock [D]escriptor.
• Some, but not all, of kslld fields are externalized via x$kslld andv$latchname.
• Located in the PGA.
RDTEX | Latch internals | RUOUG Seminar
32
LATCH ATTRIBUTES
• Each latch have at least the following attributes in kslldt :
– Name Latch name as appeared in V$ views
– SHR. Is the latch Exclusive or Shared?
– PAR. Is the latch Solitary or Parent for the family of child latches?
– G2C. Can two child latches be simultaneously requested in wait modeusing kslg2c()
– LNG. Is wait posting used for this latch? Obsolete since Oracle 9.2.
– UFS. The latch is Ultrafast. It will not increment miss statistics when STATISTICS_LEVEL=BASIC. 10.2 and above
– Level. 0-14. Used to prevent latch deadlocks
– Class. 0-7. Spin and wait class assigned to the latch. 9.2 and above.
RDTEX | Latch internals | RUOUG Seminar
33
LATCH ATTRIBUTES BY ORACLE VERSION
RDTEX | Latch internals | RUOUG Seminar
11.2.0.3 553 154 72 - 6 93
Oracle Number of latches PAR G2C LNG UFS SHARED
7.3.4.0 53 14 2 3 - -
8.0.6.3 80 21 7 3 - 3
8.1.7.4 152 48 19 4 - 9
9.2.0.8 242 79 37 - - 19
10.2.0.2 385 114 55 - 4 47
10.2.0.3 388 117 58 - 4 48
10.2.0.4 394 117 59 - 4 50
11.1.0.6 496 145 67 - 6 81
11.1.0.7 502 145 67 - 6 83
11.2.0.1 535 149 70 - 6 86
11.2.0.2 551 153 72 - 6 91
34
LATCH LEVELS
• Every latch has a level in the range 0-14 to avoid deadlocks.
• Process can request a latch X in wait mode after obtaining latch Y, if andonly if level X > level Y
• At the same level. Process can request the second G2C child X in wait mode after obtaining child Y, if and only if the child number of X < child number of Y
• If these rules are broken, the Oracle process raises:
ORA-600[504] - trying to obtain another latch at incompatible level. Note 28104.1.
ORA-600[525] - trying to obtain another latch at the same level. Note 138888.1.
ORA-600[526] - latch child order violation. Note 138889.1.
RDTEX | Latch internals | RUOUG Seminar
35
PARENT AND CHILD LATCHES
• Latch can be a parent, child, or solitary latch
• Both parent and child latches share the same latch name.
• Parent latches are static objects in fixed SGA. See x$ksmfsv and ksms.s
• Child latches are allocated during instance startup
• The parent latch can be gotten independently
• But may act as a master latch when acquired in special mode in kslgpl() during deadlock detection or select from v$latch/x$kslltr and some x$'s:
RDTEX | Latch internals | RUOUG Seminar
(latch info) wait_event=0 bits=10
holding 5000a5b8 Parent+children enqueue hash chains level=4
Location from where latch is held: ksqcmi: kslgpl:
Context saved from call: 0
state=busy
36
LATCH TREES
“Rising level” rule leads to “trees” of processes waiting for and holding the latches:
Direct SGA access program output for 9.2.0.6 instance with too smallshared pool.
RDTEX | Latch internals | RUOUG Seminar
ospid: 28067 sid: 1677 pid: 61
holding: 3800729f0 'shared pool' (156) level=7 child=1 whr=1602 kghupr1
waiter: ospid: 129 sid: 72 pid: 45
holding: a154b7120 'library cache' (157) level=5 child=17 kglupc: child
waiter: ospid: 18255 sid: 65 pid: 930
waiter: ospid: 6690 sid: 554 pid: 1654
waiter: ospid: 4685 sid: 879 pid: 1034
…
waiter: ospid: 29749 sid: 180 pid: 155
holding: a154b7db8 'library cache' (157) level=5 child=4 kglupc: child
waiter: ospid: 13104 sid: 281 pid: 220
waiter: ospid: 24089 sid: 565 pid: 636
waiter: ospid: 25002 sid: 621 pid: 1481
waiter: ospid: 16930 sid: 1046 pid: 783
…
37
LATCH CLASSES
• Classes with different spin and wait policies.
• Up to 8 classes. By default all the latches belong to class 0
• Exception – ‘process allocation latch’ belongs to class 2
• Latch can be assigned to class by its number. For example :
– _LATCH_CLASS_6='100 12 1000 2000 3000 4000'
– _LATCH_CLASSES='4:6'
• Will be discussed later.
RDTEX | Latch internals | RUOUG Seminar
SQL> select * from x$ksllclass;
INDX SPIN YIELD WAITTIME SLEEP0 SLEEP1 SLEEP2 SLEEP3 SLEEP4 ...
---- ------ ----- -------- ------ ------ ------ ------ ------
0 20000 0 1 8000 8000 8000 8000 8000
…
6 100 12 1000 2000 3000 4000 4000 4000
38
LATCH ACQUISITION IN NO-WAIT MODE
• kslgetl(laddr,0,why,where) – Exclusive latch get
• kslgetsl(laddr,0,why,where,mode) – Shared latch get
• One attempt to get latch, no spin and sleep
– sskgslgf() -> uses atomic XCHG/LDSTUB instructions for exclusive latches
– skgslocas() -> uses atomic CMPXCHG (CAS) for shared latches
• IMMEDIATE_... statistics are updated
• Used when many candidate latches are available, to bypass thelatch level rule or as part of CBC chain examination
RDTEX | Latch internals | RUOUG Seminar
sqlplus /nolog @demo_03.sql39
LATCH WAITS
RDTEX | Latch internals | RUOUG Seminar40
WAITING FOR THE LATCH
RDTEX | Latch internals | RUOUG Seminar
S G A
CPU 1 CPU 2
Process A
Process B waits(spins and sleeps)
Process A holds a latch
Latch
Process B
41
LATCH ACQUISITION IN WAIT MODE. OFFICIAL VERSION
• "… If a required latch is busy, the process requesting it spins (counts up to a large number (_LATCH_SPIN_COUNT)), tries again and if still not available, spins again.
• This loop is repeated up to _SPIN_COUNT maximum number of times.
• If after that entire loop, the latch is still not available, the process must yield the CPU and go to sleep.
• A sleep corresponds to a wait for the latch free wait event.
• Initially, a process sleeps for 0.01 s. This time is doubled in every subsequent sleep up to a maximum determined by the parameter _MAX_EXPONENTIAL_SLEEP (0.2s or 2s)…"
RDTEX | Latch internals | RUOUG Seminar
Still did not found any version, which used this.
May be it was <= Oracle 6
42
LATCH ACQUISITION IN WAIT MODEANOTHER OFFICIAL VERSION. WAS USED IN 7.3-8.1
Latch gets with wait (kslgetl(laddress,1,…))
– One fast get, no spin (sskgslgf())
– Spin get: check the latch upto _SPIN_COUNT times
– Sleep on latch free event with exponential backoff
– Repeat
RDTEX | Latch internals | RUOUG Seminar
43
8I LATCH SPIN CODE FLOW USING DTRACE
RDTEX | Latch internals | RUOUG Seminar
• … Event 10046 trace:
• WAIT #0: nam='latch free' ela= 0 p1=536893688 p2=29 p3=0
• WAIT #0: nam='latch free' ela= 0 p1=536893688 p2=29 p3=1
• WAIT #0: nam='latch free' ela= 0 p1=536893688 p2=29 p3=2
kslgetl(0x200058F8,1,2,3) - KSL GET exclusive Latch# 29
kslges(0x200058F8, ...) - wait get of exclusive latch
skgsltst(0x200058F8) ... call repeated 2000 times = SPIN_COUNT
pollsys(...,timeout=10 ms,...) - Sleep 1
skgsltst(0x200058F8) ... call repeated 2000 times
pollsys(...,timeout=10 ms,...) - Sleep 2
skgsltst(0x200058F8) ... call repeated 2000 times
pollsys(...,timeout=10 ms,...) - Sleep 3
skgsltst(0x200058F8) ... call repeated 2000 times
pollsys(...,timeout=30 ms,...) - Sleep 4 …
sqlplus /nolog @demo_04_8.sql 29
44
POTENTIAL PROBLEMS WITH EXPONENTIAL BACKOFF
• 0.01-0.01-0.01-0.03-0.03-0.07-0.07-0.15-0.23-0.39-0.39-0.71-0.71-1.35-1.35-2.0-2.0-2.0-2.0...sec
RDTEX | Latch internals | RUOUG Seminar
12]2/)1[( −= +waitN
timeout
45
• Most waits were for nothing – latch already was free.
• Latch utilization can not be high.
• Unnecessary spins provoke CPU thrashing.
9.2-11G EXCLUSIVE LATCH SPIN FLOW USING DTRACE
sqlplus /nolog @demo_05.sql 45
RDTEX | Latch internals | RUOUG Seminar
kslgetl(0x50006318, 1)
-> sskgslgf(0x50006318)= 0 -immediate latch get
-> kslges(0x50006318, ...) -wait latch get
-> skgslsgts(...,0x50006318, ...) -spin latch get
->sskgslspin(0x50006318)
... - repeated 20000 cycles = 10*_SPIN_COUNT!
-> kskthbwt(0x0)
-> kslwlmod() - set up Wait List
-> sskgslgf(0x50006318)= 0 -immediate latch get
-> skgpwwait -sleep latch get
semop(11, {17,-1,0}, 1)
Semop – infinite wait until posted!
46
RELIABLE LATCH WAITS
• Hidden latch wait revolution that we missed. Latch relies on wait posting.
• In Oracle 9.2-11.2, all the latches in default class 0 use latch wait posting. Process waits for the latch without any timeout.
• Exclusive latch spins 20000 cycles by default.
• Only first read is atomic, further reads poll the CPU cache. Latch is TTS spinlock.
• Latches assigned to non-default class wait until timeout.
• Version 9.0 was transient and used mix of backoff and wait posting. This is why _LATCH_WAIT_POSTING parameter disappeared in 9i.
• _LATCH_CLASSES and _LATCH_CLASS_X parametersdetermine the latch wait and spin.
RDTEX | Latch internals | RUOUG Seminar
47
IMAGINE THE TIME MICROSCOPE
RDTEX | Latch internals | RUOUG Seminar
Reality: One million zoomed time:
1 us 1 sec
1 ms 17 min
1 sec 11.5 days
Light speed (300000 km/sec) Sonic speed (300 m/sec)
CPU tick (2 GHz) 0.0005 sec
Avg “Cache buffers chains” latch holding time ~ 1-2 sec
Max spin time for shared latch (~2*2000*5 ticks) ~10 sec
Avg “Library Cache” latch holding time (10-20us) ~ 10-20 sec
Max spin time for exclusive latch (~20000*5 ticks) ~50 sec
Interval between CBC latch gets ~ 10 sec
OS context switch time (10us-1ms) 10sec-17min
48
LATCH WAITS UNDER TIME MICROSCOPE
Old 8i latch algorithm:Spin for latch during 5 seconds.
Go sleep for 2 hours 46 min in hope that congestion will dissolve.Spin again during 5 seconds.
Sleep again for 2 hours 46 min.…Spin again for 5 seconds.
Sleep again for 23 days (2 sec) …
New 9.2 latch algorithm:Spin for shared latch during 10 seconds (50 for exclusive latch)
If spin not succeeded, go to sleep until post.
According to v$event_histogram the majority of latch waits now take less then 17 min (1ms)
RDTEX | Latch internals | RUOUG Seminar
49
_SPIN_COUNT IS EFFECTIVELY STATIC FOR EXCLUSIVE LATCHES
• Oracle process spins for exclusive latch up to x$ksllclass[0].spincycles. This number depends on _SPIN_COUNT and _LATCH_CLASS_0 parameters.
• If _SPIN_COUNT was not set during instance startup – process will spin up to 20000 cycles. The default value for _SPIN_COUNT is only 2000.
• If instance started with _SPIN_COUNT parameter, Oracle set x$ksllclass.spin to _SPIN_COUNT if not otherwise specified by _LATCH_CLASS_0 parameter.
• The _SPIN_COUNT parameter can be changed dynamically, but this change will not have any effect for exclusive latch.
RDTEX | Latch internals | RUOUG Seminar
50
TIMED LATCH WAIT POSTING
• If wakeup post is lost in OS, waiters will sleep infinitely. Common problem in earlier 2.6.9 Linux kernel.
• This can lead to instance hang – process will never be woken up.
• _ENABLE_RELIABLE_LATCH_WAITS = FALSE changes semop()system call to semtimedop() with 0.3 sec timeout.
• Introduced in 9.2.0.6 for Bug 3632400 LMS may hang in kslges() waiting for a post. Increased robustness to OS posting problem.
• Generic. AIX ports always use timed latch wait posting
• My testcases did not revealed any measurable performance overhead of timed latch wait posting in Solaris and Linux.
RDTEX | Latch internals | RUOUG Seminar
51
FINE GRAIN LATCH TUNING
• Latch can be assigned to one of eight classes having different spin and wait policies. Standard class 0 latch use wait posting.
• _latch_class_X = “Spin Yield Waittime Sleep0 Sleep1 … Sleep7"
• Nonstandard class latch loops upto “Spin” cycles, then yields CPU. This is repeated “Yield” times. Then the process sleeps for “SleepX”microseconds using pollsys() (not semtimedop()) system call.
– If “Yield” !=0 repeat “Yield” times:
Loop up to “Spins” cycles
Yield CPU using yield() (or sched_yield())
– Sleep for “SleepX” usecs
– Then spin again …
RDTEX | Latch internals | RUOUG Seminar
52
TRACES OF NONDEFAULT CLASS LATCH
RDTEX | Latch internals | RUOUG Seminar
kslges(0x50006434, ...) - wait get of exclusive latch
sskgslspin(0x50006434) ... repeated 100 times.
yield() - Yield 1
sskgslspin(0x50006434) ... repeated 100 times.
yield() - Yield 2
sskgslspin(0x50006434) ... repeated 100 times.
pollsys(...,timeout=10 ms,...) - Sleep 1. End of spin and yield loop
…
_latch_class_6="100 2 3 10000 20000 30000 40000 50000 60000 70000 80000"
… Event 10046 trace:
WAIT #0: nam='latch free' ela= 10886 address=… number=46 tries=1 ...
WAIT #0: nam='latch free' ela= 19922 address=… number=46 tries=2 ...
WAIT #0: nam='latch free' ela= 30482 address=… number=46 tries=3 ...
WAIT #0: nam='latch free' ela= 39937 address=… number=46 tries=4 ...
...
sqlplus /nolog @demo_04.sql 46 53
PERFORMANCE OF LATCH CLASSES
"Row cache objects" latch contention testcase on ODA (24 SMT)
RDTEX | Latch internals | RUOUG Seminar
54
IS THE _SPIN_COUNT REALLY STATIC?
• Well-known Guy Harrison experiments [5] with Oracle 11g demonstrated that dynamic _SPIN_COUNT tuning influenced latch waits, CPU and throughput.
• The experiments with dynamic change of _SPIN_COUNT used “cache buffers chain” latch contention
• This latch became shared in Oracle 9.2
• Exclusive latch uses static SPIN value from x$ksllclass(_LATCH_CLASS_X). By default – 20000.
• Shared latch spin can be tuned by _SPIN_COUNT value. By default – 2000. However, this is not trivial.
RDTEX | Latch internals | RUOUG Seminar
55
SHARED LATCH BEHAVE LIKE ENQUEUE
• Oracle realization of “Read-Write” spinlocks. Appeared in 8.0.
• S and X shared latch modes are incompatible.
• If latch is held in S mode, X mode waiter blocks all further requests
• This blocking achieved by special 0x40000000 bit in the latch value. This bit is a flag that some incompatible latch acquisition is in progress.
• Queued processes will be woken up one by one on latch release.
• X mode latch gets effectively serialize the shared latch.
• See my blog for description of experiments.
RDTEX | Latch internals | RUOUG Seminar
56
TRACE OF SHARED LATCH WAIT
Oracle process spins for shared latch twice. If the first spin upto _SPIN_COUNT was unsuccessful, the process puts itself into the latch wait list and then spins again:
RDTEX | Latch internals | RUOUG Seminar
kslgetsl(0x50009BAC,1,2,3,16) - KSL GET Shared Latch in X mode
kslgess(0x50009BAC, ...) - shared latch wait get
kslskgs(0x50009BAC, ...)
sskgslspin(0x50009BAC) ... repeated 2000 times.
kslwlmod(...)
kslskgs(0x50009BAC, ...)
sskgslspin(0x50009BAC) ... repeated 2000 times.
skgpwwait(...) - sleep until posted
semop(27,...)
sqlplus /nolog @demo_05.sql 10257
SHARED LATCH ACQUISITION
• Shared latch spin in Oracle 9.2-11g is governed by _SPIN_COUNT value and can be dynamically tuned.
• X mode shared latch get spins by default up to 4000 cycles.
• S mode does not spin at all (or spins in unknown way).
RDTEX | Latch internals | RUOUG Seminar
S mode get X mode get
Held in S mode Compatible 2*_SPIN_COUNT
Held in X mode 0 2*_SPIN_COUNT
Blocking mode 0 2*_SPIN_COUNT
58
LATCH RELEASE
• Free the latch – kslfre(laddr).
• Oracle process releases the latch nonatomically.
• Then it sets up memory_barrier – perform atomic operation on address individual to each process.
• This requires less bus invalidation and ensures propagation of latch release to other local caches.
• Not fair policy. Spinners on the local CPU board have the preference.
• Then process posts the first process in the list of waiters.
RDTEX | Latch internals | RUOUG Seminar
59
LATCH CONTENTION
RDTEX | Latch internals | RUOUG Seminar60
LATCH CONTENTION
• Occurs when a latch is required by several processes at thesame time
• Mostly due to bugs or abnormal application behavior.
• As utilization increases, latch waits and spins also increase. Performance degradation can be extremely sharp
• To diagnose the latch contention we need to investigate the latch statistics [8]
RDTEX | Latch internals | RUOUG Seminar
61
DIFFERENTIAL (POINT IN TIME) LATCH STATISTICS
Latch requests arrival rate
Immediate gets efficiency
Latch sleeps ratio
Latch wait time per second
Latch spin efficiencymisses
getsspin
∆
∆=
_σ
RDTEX | Latch internals | RUOUG Seminar
misses
sleepsK
∆
∆=
time
gets
∆
∆=λ
gets
misses
∆
∆=ρ
time
timewaitW
∆
∆=
_
Should be calculated for each child latch separately.V$LATCH averaging distorts statistics
62
SAMPLED AVERAGES
By sampling of x$ksupr we can estimate:
– Length of latch wait queue :
SELECT COUNT(1) FROM X$KSUPR WHERE KSLLAWAT=LADDRESS;
– Number of spinning processes in 10g :
SELECT COUNT(1) FROM X$KSUPR WHERE KSLLALAQ=LADDRESS;
Sampling of v$lathholder estimates latch utilization
Using DTrace we can measure
– Latch acquisition and holding times
– Actual spin count
RDTEX | Latch internals | RUOUG Seminar
wL
sN
DERIVED LATCH STATISTICS
RDTEX | Latch internals | RUOUG Seminar
Latch utilization: (PASTA)
Average holding time:
Length of latch wait list:
Recurrent sleeps ratio:
Latch acquisition time:
time
timeholdinglatchU
∆
∆=≈
__ρ
"Re_*"100
"_""__"
questsGet
TimeSnapMissGetPctS
∗==
λ
ρ
WL =
κ
κσ 1−+
)(1
WNT saq += −λ
64
LATCH STATISTICS EXAMPLE
RDTEX | Latch internals | RUOUG Seminar
Latch statistics for:
0x380007358 "session allocation"
Requests rate: lambda= 1350 Hz
Miss /get: rho= .022
Sampled Utilization: U= .013
Slps /Miss: kappa= .28
Wait_time/sec: W= .021
Sampled queue length Lw= .017
Spin_gets/miss: sigma= .72
Sampled spinning procs:Ns= .013
Secondary sleeps ratio = .002
Avg holding time= 16.3 usec
sleeping time = 15.9 usec
acquisition time = 25.8 usec
Latch acquisition time distribution
measured by DTrace: ------------------------------------ Distribution Distribution Distribution Distribution --------------------------------
2048 | 2048 | 2048 | 2048 |
4096 |@@@@@@4096 |@@@@@@4096 |@@@@@@4096 |@@@@@@
8192 |@@@@@@@@8192 |@@@@@@@@8192 |@@@@@@@@8192 |@@@@@@@@
16384 |@@@@@@@@@@@@@@@@@@@@@@@16384 |@@@@@@@@@@@@@@@@@@@@@@@16384 |@@@@@@@@@@@@@@@@@@@@@@@16384 |@@@@@@@@@@@@@@@@@@@@@@@
32768 |@@@32768 |@@@32768 |@@@32768 |@@@
65536 |65536 |65536 |65536 |
nsnsnsns
Average acquisition time=21 usec
sqlplus /nolog @demo_06.sql 65
LATCH CONTENTION DIAGNOSTICS IN 9.2-11G
• Should be suspected if latch wait events are observed in “Top 5 Timed Events” AWR section
• Look for the latch with highest W
• Symptoms of contention for the latch:
– W > 1 sec/sec
– Utilization > 10%
– Acquisition (or sleeping) time sufficiently greater then holding time
• Latchprofx.sql script invented by Tanel Poder [4] greatly simplifies diagnostics.
• The script and v$latch_misses reveal “where” the contention arise
RDTEX | Latch internals | RUOUG Seminar
ρ
66
LATCH LEVELS AND LATCH CONTENTION
• Process holding latch can only acquire latch with higher level.
• Contention for a high-level latch frequently exacerbates contention for lower-level latches
• “Library cache” latch contention without many cursor versions likely is a consequence of moderate shared pool latch contention.
• Shared pool resizes, clearances and ORA-4031 dumps can induce “library cache” latch storms.
• Use latch_tree.sql script to confirm this scenario
RDTEX | Latch internals | RUOUG Seminar
67
BEWARE OF CERTAIN V$/X$
• Frequent scans of some x$ tables induce latch contention:
– x$ktcxb (v$transaction) - “Transaction allocation” latch
– x$ktadm (v$lock, dba_jobs_running) – "DML lock allocation"
– x$kqlfxpl (v$sql_plan) – "library cache" latch in 10g
– x$ksmsp – "shared pool" latch
– x$kslltr (v$latch) – all parent latches in 11g.
– …
RDTEX | Latch internals | RUOUG Seminar
68
TREATING THE LATCH CONTENTION
• The right method: tune the application and reduce the latch demand. Tune the SQL, bind variables, schema, etc…
• Many brilliant books exist on this topic. Out of scope for this presentation.
• Install latest Oracle PSU and OS recommended patches. Several latch related examples:
– Oracle bug 7627743 “Higher latch waits in 11g”. Processes in 11.1 always spun upto the maximum spin count value.
– Linux 2.6.9 kernel bug 149933. "Missing wakeup in ipc/sem". Fixed in Red Hat EL 4 Update 4
– HP-UX 11.31. Should install scheduler cumulative patchPHKL_38397 or later
RDTEX | Latch internals | RUOUG Seminar
69
WHEN WE NEED TO TUNE SPIN COUNT:
• The “right” method to tune the application may be too expensive and require complete application rewrite.
• Nowadays the CPU power is cheaper. We may already have enough free CPU resources.
• Oracle does not explicitly forbid _SPIN_COUNT tuning. However, change of undocumented parameter should be discussed with Support. For example, MOS recommends _SPIN_COUNT=5000 for 10g Streams.
• Processes spin for exclusive latch spin upto 20000 cycles, for shared latch upto 4000 cycles and infinitely for pre 11.2.0.2 mutex. Tuning may find more optimal values for your application.
• In CPU thrashing, decrease of the number of spins can reduce CPU consumption.
RDTEX | Latch internals | RUOUG Seminar
70
TUNING THE SPIN COUNT EFFICIENTLY
• Beware of side effects. You should have enough free CPU.
• First, diagnose the root cause of latch contention. Measure the latch statistics, install relevant patches.
• Spin count tuning will only be effective if latch holding time Sis in its normal microseconds range
• The number of spinning processes should remain less then number of CPUs. Beware CPU thrashing. Analyze AWR and latch statistics before and after each change.
• Spin count adjustment is different for exclusive and shared latches.
• Nonstandard latch classes may decrease CPU consumption.
• It is a common myth that CPU time will raise infinitely while weincrease spin count.
RDTEX | Latch internals | RUOUG Seminar
71
SPIN COUNT ADJUSTMENT
Shared latches:
• Dynamically by _SPIN_COUNT parameter.
• Good starting point is the multiple of default 2000 value.
• Setting _SPIN_COUNT parameter in initialization file, should be accompanied by _LATCH_CLASS_0="20000". Otherwise spin for exclusive latches will be greatly affected by next instance restart.
Exclusive latches:
• Statically by _LATCH_CLASS_0 parameter. Needs the instance restart.
• Good starting point is the multiple of default 20000 value.
• It may be preferable to increase the number of "yields" for class 0 latches.
Spin count tuning probes the tail of latch holding time distribution. See my blog for related mathematics.
RDTEX | Latch internals | RUOUG Seminar
72
SPIN SCALING
• My previous work [8] proposed approximate scaling rules to estimate effect of SPIN_COUNT tuning depending on:
K = "sleeps ratio"=“Avg Slps/Miss”
• If spin is inefficient then:
Doubling the spin count will reduce "sleeps ratio" by 2 and doubles the spin CPU consumption
• If latch holding time distribution has exponential tail, you have enough CPU resources and spin is efficient then:
Doubling the spin count will square the “sleeps ratio” coefficient. This will multiply the spin CPU consumption by
If the spin is already efficient, it is worth to increase the spin count.
RDTEX | Latch internals | RUOUG Seminar
)1( K+
1.0≥K
1.0<<K
If "sleeps ratio" for exclusive latch is 10% than increase of
spin count to 40000 may results in
10 times decrease of "latch free" wait events, and only 10%
increase of CPU consumption.
Your mileage may vary
73
CACHE BUFFER CHAINS LATCH CONTENTION TESTCASE
RDTEX | Latch internals | RUOUG Seminar
Latch wait and CPU times vs. _SPIN_COUNT
74
LONG LATCH HOLDING TIME: CPU THRASHING
• Latch contention can cause CPU starvation. Processes contending for a latch, also contend for CPU.
• Once CPU starves, OS runqueue length raise and loadaverageexceeds the number of CPUs. Some OS may shrink the time quantum. Latch holders will not receive enough time to release the latch.
• Due to priority decay, latch acquirers may preempt latch holders. This leads to priority inversion. The throughput falls.
• Transition to this metastable state is more likely if workload of your system approaches ~100% CPU
• Due to preemption, latch holding time S will raise to the CPU scheduling scale [7].
RDTEX | Latch internals | RUOUG Seminar
75
CPU THRASHING
• If CPUs are running at 100% utilization for other reasons, the latchcontention can be just a symptom of this CPU starvation itself. All wait times will be inflated by wait for CPU. However, latch holding time should be in normal range.
• CPU thrashing is unlikely on Solaris due to latch preemption control.
• To prevent the priority inversion use:
– FX priority class on Solaris
– HPUX_SCHED_NOAGE on HPUX
– SCHED_BATCH policy on Linux
RDTEX | Latch internals | RUOUG Seminar
76
LATCH SMP SCALABILITY:
• If latch utilization is in single CPU environment.
• Then in N CPU server latch utilization will be . This can be problematic:
– If single CPU system held latches only for 1% of time.
– 48 CPU server with the same per-CPU load will hold latches for 50%.
– 128 CPU Cores server will suffer huge latch (and mutex) contention.
• This is also known as "Software lockout". It may substantially affect contemporary multi-core servers.
• NUMA should overcome this intrinsic spinlock scalability restrictions.
RDTEX | Latch internals | RUOUG Seminar
1ρ
1ρρ NN ≈
77
BIBLIOGRAPHY
1. Steve Adams. Oracle8i Internal Services for Waits, Latches, Locks, and Memory. 1999.
2. Tanel Poder blog, http://blog.tanelpoder.com
3. Jonathan Lewis, Oracle Core: Essential Internals for DBAs and Developers. 2011
4. Edsger Dijkstra, Solution of a Problem in Concurrent Programming Control CACM. 1965.
5. M. Herlihy, N. Shavit, The Art of Multiprocessor Programming. 2008.
6. B. Sinharoy, et al. , Improving Software MP Efficiency for SharedMemory Systems. Proc. of 29th Hawaii Conference on System Sciences.1996.
7. Guy Harrison blog, Using _spin_count to reduce latch contention in 11g. http://guyharrison.squarespace.com/ 2008.
8. R. Johnson, et al. A new look at the roles of spinning and blocking. Proc. of 5th Workshop on Data Management on New Hardware. 2009
9. A. Nikolaev, Exploring Oracle RDBMS latches using Solaris DTrace. Proc. of MEDIAS 2011 Conf., http://arxiv.org/abs/1111.0594v1 2011
RDTEX | Latch internals | RUOUG Seminar
78
Q/A?
• Questions?
• Comments?
RDTEX | Latch internals | RUOUG Seminar
79
ACKNOWLEDGEMENTS
• Thanks to RDTEX Technical Support Centre Director S.P. Misiura for years of encouragement and support of my investigations.
• Thanks to my colleagues for discussions.
• Thanks to all our customers for participating in latch troubleshooting.
RDTEX | Latch internals | RUOUG Seminar
80
Andrey Nikolaev
http://andreynikolaev.wordpress.com
RDTEX, Moscow, Russia
www.rdtex.ru
THANK YOU!
RDTEX | Latch internals | RUOUG Seminar
IN MEMORY OF MY FRIEND FOR 30 YEARS:
RDTEX | Latch internals | RUOUG Seminar
Andrey KriushinChairman of Russian Oracle User Group (RuOUG)
Died of heart attack 02.08.2011
82