presentation - latch internals by andrey s. nikolaev · andrey nikolaev oracle database performance...

Latch internalsRUOUG Seminar, December 6, 2012

Andrey NikolaevOracle Database Performance ExpertRDTEX, Russia

WHO AM I

• [email protected]

• RDTEX, First Line Support Center

• http://andreynikolaev.wordpress.com

• "Latch, mutex and beyond"

• Specialize in Oracle performance tuning

• Over 20 years of Oracle related experience as a research scientist, developer, DBA, performance consultant, lecturer…

• Occasionally presented at conferences

• RUOUG member

RDTEX | Latch internals | RUOUG Seminar

2

EPISODE OF LATCH CONTENTION

• Oracle instance hangs due to heavy "cache buffers chains" latch contention

3


WHAT WE ALL KNEW ABOUT ORACLE LATCHES?

• Latches spin up to the instance parameter _SPIN_COUNT that defaults to 2000.

• This number can be tuned dynamically.

• Latch waits use increasing exponential backoff.

• Latch waits are very short.

• We should tune application, not the _SPIN_COUNT


4

CONTEMPORARY LATCHES SINCE 9.2


5

I was wondering to find that:

• Exclusive latch spins 10 times more.

• The exponential backoff was disappeared more then decade ago in Oracle 9.2.

• All except one latch use wait posting now.

• "Latch free" wait may be infinite.

• Sometimes it is worth to tune the _SPIN_COUNT.

• This parameter is static for most latches.

GREAT SOURCES OF LATCH INFO

“Oracle8i Internal Services for Waits, Latches, Locks, andMemory” by Steve Adams [1], 1999.

– Founded new era of Oracle performance tuning

– However, the Oracle latch technologies were evolved greatly during the decade.

“Systematic Latch Contention Troubleshooting” approach by Tanel Poder [2].

– Latchprofx.sql script revolutionized the latch tuning

“Oracle Core: Essential Internals for DBAs and Developers" by Jonathan Lewis [3], 2011.


6

REVIEW OF SERIALIZATION MECHANISMS IN ORACLE

• "Latches are simple, low-level serialization mechanisms that coordinatemultiuser access to shared data structures, objects, and files…"

• "A mutual exclusion object (mutex) is a low-level mechanism that prevents an object in memory from aging out or from being corrupted …"

• "Internal locks are higher-level, more complex mechanisms …"

Oracle® Database Concepts 11.2

• KGX Mutexes appeared in latest Oracle versions inside Library Cache


Access

Acquisition

Spinlock based

Timescale

Lifecycle

Locks

Several Modes

FIFO

No

> Milliseconds

Dynamic

Latches

Types and Modes

SIRO (spin) + FIFO

Yes

Microseconds

Static

Mutexes

Operations

SIRO

Yes

SubMicroseconds

Dynamic7

HOW SPINLOCK WORKS


8

CPU3:

Free FreeHolding by CPU 1

Spin polls the spinlock location

Get

Miss SleepSpin count ( )∆

Holding time (S)

CPU2: Spin Holding the spinlock

Holding by CPU 2

CPU1: Holding the spinlock

The spinlock location:

Sleep releases the CPU

• Oracle latches:

– Use atomic hardware instruction for Immediate get.

– If missed, process spins by polling latch location during Spin get.

– Number of spin cycles is bounded by spin count.

– In spin get not succeed, the process acquiring the latch sleeps.

…

CLASSIC SPINLOCKS

• Wiki: "… spinlock … waits in a loop repeatedly checking until the lock becomes available …".

• Introduced by Edsger Dijkstra [4] in 1965. Have been thoroughlyinvestigated since that time [5].

• Many sophisticated spinlock realizations were proposed and evaluated (TS, TTS, Delay, MCS, Anderson,...).

• Two general types:

– System spinlocks. Kernel OS threads cannot wait. Major metrics: atomic operations frequency. Shared bus utilization.

– User spinlocks. Oracle latch and mutex. Average lock holding time ~ 10 us. It is more efficient to poll a lock rather than pre-empt the thread doing 1 ms context switch. Metrics: CPU and elapsed times.


9

THE NEED FOR SPIN


10

Time to complete CBC latch contention testcase vs. number of threads.

No spinning_spin_count=0

_spin_count=2000

SPINLOCK REALIZATIONS


Spinlock: Pseudocode: Problems:

TS

pre-11.2 mutex

while(Test_and_Set(lock)); Bus saturation

TTS

Oracle latch

while(lock||Test_and_Set(lock)); Invalidation storms on release (“open door”, “thundering herds”).

Delay

Latch and 11.2 Mutex

Adjustable delay Slower under contention

Anderson, MCS, etc.

Queues. Widely used in Java, Linux kernel …

CPU and memory overhead, preemption issues

11

THE TOOLS

RDTEX | Latch internals | RUOUG Seminar12

DTRACE. SOLARIS DYNAMIC TRACING FRAMEWORK

DTrace allows us to investigate how Oracle latches perform in real time:– Create triggers on any event inside Solaris and function call inside Oracle

provider:module:function:namepid1910:oracle:kslgetl:entry

– Write trigger bodies – actions. May read and change any address location.– Can count the latch spins, trace the latch waits, perform experiments.– Measure times and distributions up to microseconds precision.


sqlplus /nolog @demo_00

ORADEBUG. INTERNAL ORACLE DEBUGGER


• oradebug call. Allows us invoke any internal Oracle function manually.

SQL> oradebug peek 200222A0 24

[200222A0, 200222B8) = 00000016 00000001 000001D0 00000007

SQL> oradebug call kslgetl 0x5001A020 1 2 3

Function returned 1

• oradebug peek. Examines contents of any memory location.

• oradebug poke. Modifies memory.

• oradebug watch. Watch a region of memory.

LATCH CONTENTION TESTCASES


15

create table latch_contention as select * from dual;…for i in 1..1000000loopselect dummy into a from latch_contention;

end loop;

• Exclusive and shared latches behave differently.• Contention for shared "cache buffer chains" latch arises when many sessions scan the same block concurrently and _fastpin_enable=0.

• To induce the contention for exclusive “row cache objects” latch protecting dc_users row cache just add several dynamic RLS policies:

exec dbms_rls.add_policy('system','latch_contention','rls_1', …

exec dbms_rls.add_policy('system','latch_contention','rls_2', …

…

• These dummy policies generate predicates '1=1'.• Many other contention scenarios possible.

sqlplus /nolog @demo_0_1

ROW CACHE LATCH CONTENTION TESTCASE


16

Elapsed and CPU times vs. number of threads. Oracle Database Appliance (24 SMT).

Latch wait

HOW ORACLE REQUESTS THE LATCH


DTRACE REVEALS LATCH INTERFACE ROUTINES

Oracle calls the following functions to acquire the latch:

– kslgetl(laddr, wait, why, where) - get exclusive latch

– kslg2c(l1,l2,trc,why, where) - get two excl. child latches

– kslgetsl (laddr,wait,why,where,mode) - get shared latch

• 11g - ksl_get_shared_latch (…)

– kslg2cs(l1,l2,mode,trc,why, where)) - get two shr. child latches

– kslgpl(laddr,comment,why,where) - get parent and all childs

– kslfre(laddr) - free the latch

Oracle give us possibility to do the same by oradebug call


sqlplus /nolog @demo_01

Use 32-bit ports

64bit Oracle corrupts the arguments of

oradebug call in 10.2.0.4 and above!

18

“WHERE” AND “WHY” ORACLE GETS THE LATCH

To request the latch, Oracle kernel routines need:

• laddres – address of latch in SGA

• wait – flag for no-wait (0) or wait (1) latch acquisition

• where – integer code for location from where the latch is acquired. Oracle lists possible latch “where” values in x$ksllw.indx.

• why - context of why the latch is acquiring at this “where”. It contains DBA for CBC latches (Tanel Poder), SGA chunk address for shared pool latch, session address, etc… “Why” meaning for some “where” can be found in x$ksllw.ksllwlbl

• mode – requesting state for shared lathes.

– 8 SHARED mode, 16 EXCLUSIVE mode


19

LIST OF LATCH “WHERE” LOCATIONS

X$KSLLW – List of latch [W]here. Part of v$latch_misses:

– ksllwnam – Kernel function name(“kghupr1”, “ksqdel”, etc...)

– ksllwlbl – Description of “why” context

Join this code on indx to:

– x$ksuprlat.ksulawhr (v$latchholder)– “where” value that was used by latch holder

– x$ksllt.kslltwhr (v$latch) – “where” that was used by last successful latch acquisition

– x$ksupr.ksllawer (v$process) – “where” that was used by process is being waiting for latch

– x$kslwsc.indx (v$latch_misses) – latch sleep statistics by “where”


20

V$LATCH_MISSES

• Based on X$KSLWSC - No[W]ait and [S]leep [C]ount stats byContext. Documentation is rather obscure.

• The following counters are incremented on each latch sleep:

sleep_count (“where” the latch was held)

wtr_slp_count (“where” the latch get was requested)

• nwfail_count is incremented only for session, which SID is equal to _LATCH_MISS_STAT_SID parameter.

• longhold_count is incremented for shared latches only, when someone held the latch at this “where” for the entire duration of sleep.


21

LATCH IS HOLDING BY PROCESS, NOT SESSION

•


Struct ksupr {

…

Struct kslla{

ksllt *ksllalat[14];

}

…}

Process fixed array:

v$process -> x$ksupr

Struct ksllt{

…

}

List of all latches:

v$latch ->x$ksllt

Each process has an array of references to the latches it is holding.

22

KSLLA IS HIDDEN IN V$PROCESS

• The process latching info is the kslla structure embedded in theprocess state object.

• Some kslla fields were externalized to SQL.

• kslla contains array of latches holding by process at each level and information about the state of any current attempt to acquire a latch. Not externalized directly.

• When you query v$latchholder/x$ksuprlat, Oracle scans v$process/x$ksupr in order to find all the processes that are holding the latches.

• Tanel Poder uses high frequency sampling of v$latchholder to systematically troubleshoot latch contention.


23

X$KSUPR.KSLLA FIELDS INSTRUMENT THE LATCH GET

• ksllalaq – address of latch acquiring. Populated during immediate get (and spin before 11g)

• ksllawat - latch being waited for. This is v$process.latchwait

• ksllawhy – “why” for the latch being waited for

• ksllawere – “where” for the latch being waited for

• ksllalow – bit array of levels of currently holding latches

• ksllaspn - latch this process is spinning on. v$process.latchspin. Not populated since 8.1

• ksllaps% - inter-process post statistics


24

Generic “latch free” and 27 latch specific “latch free: %” wait events areassociated with sleep only. Spin is invisible to Oracle Wait Interface.

THE LATCH STRUCTURE – KSLLT


struct ksllt {

<Latch>

“where” and “why”

Level, latch#, class, other attributes

Statistics

Latch wait list header

…

25

LATCH SIZE BY VERSION

*nix 32bit *nix 64bit Windows 32bit

7.3.4 92 - 120

8.0.6 104 - 104

8.1.7 104 144 104

9.0.1 ? 200 160

9.2.0 196 240 200

10.1.0 ? 256 208

10.2.0 - 11.2.0.3 100 160 104


x$ksmfsv – list of all fixed SGA variables:SELECT DISTINCT ksmfssiz

FROM x$ksmfsv

WHERE ksmfstyp = 'ksllt';

Latch structure was bigger in 10.1 due to additional latch statistics

26

ORACLE LATCH IS NOT JUST A SINGLE MEMORY LOCATION

• Before 11g. Value of first latch byte (word for shared latches) was used to determine latch state:

0x00 – latch is free

0xFF – exclusive latch is busy. Was 0x01 in Oracle 7

0x01,0x02,… - shared latch holding by 1,2, … processes simultaneously

0x20000000 | pid - shared latch holding exclusively

• In 11g first latch word shows the pid of the latch holder

0x00 – latch is free

0x12 – Oracle process with pid 18 holds the exclusive latch


27

BUSY ORACLE 11G LATCH IN MEMORY

• 11.2 Exclusive latch (32-bit platform):

• 11.2 Shared latch. EXCLUSIVE mode:


SQL> oradebug peek 200222A0 24

[200222A0, 200222B8) = 00000016 00000001 000001D0 00000007

pid^ gets latch# level#

SQL> oradebug peek 0x&laddr 24

[2000C814, 2000C82C) = 2000001F 0000007C 0000007E 00000006

pid^ gets latch# level#

cont. sqlplus /nolog @demo_0128

LATCH FIXED TABLES

• Before Oracle 11g all the latches were externalized to SQL in v$latch (x$ksllt).

• In 11g parent (fixed SGA) latches are externalized in v$latch_parent (x$kslltr_parent).

• In 11g child (allocated on startup) latches are externalized in v$latch_children (x$kslltr_children) .

• Oracle 11g v$latch (x$kslltr) merges these data. Beware: select from x$kslltr acquires all parent latches and their children.

• Only part of latch structure is visible in x$ksllt… fixed tables:

– kslltwhr, kslltwhy – “where” and “why” latch was acquired last time.

– kslltwgt, … - statistics.

– kslltlvl, … - attributes.

– kslltwtt – latch wait time.


29

CUMULATIVE V$LATCH STATISTIC COUNTERS


Statistics: x$ksllt When and how it increments:

GETS kslltwgt “++” after wait mode latch get

MISSES kslltwff “++” after wait get if it was missed

SLEEPS kslltwsl “+number_of_sleeps” during get

SPIN_GETS ksllthst0 “++” if get was missed but not slept

WAIT_TIME kslltwtt “+wait_time” after latch get

IMMEDIATE_GETS kslltngt “++” after nowait mode latch get. Is not incremented on miss.

IMMEDIATE_MISSES kslltnfa “++” if nowait mode get was missed.

30

MANY STATISTICS WERE LOST IN 10.2

In 10.2 Oracle made obsolete many useful latch statistics.

Latch size was decreased and the following fields always contain 0 now in v$latch:

– v$latch.waiters_woken

– v$latch.waits_holding_latch

– v$latch.sleep1…sleep10

– and many undocumented statistics.

This was mentioned in 10.2 Reference.


31

X$KSLLD - LATCH DESCRIPTORS

• Flags and attributes describing each latch are stored in kslldt array of kslld structures.

• [K]ernel [S]ervice [L]atch [L]ock [D]escriptor.

• Some, but not all, of kslld fields are externalized via x$kslld andv$latchname.

• Located in the PGA.


32

LATCH ATTRIBUTES

• Each latch have at least the following attributes in kslldt :

– Name Latch name as appeared in V$ views

– SHR. Is the latch Exclusive or Shared?

– PAR. Is the latch Solitary or Parent for the family of child latches?

– G2C. Can two child latches be simultaneously requested in wait modeusing kslg2c()

– LNG. Is wait posting used for this latch? Obsolete since Oracle 9.2.

– UFS. The latch is Ultrafast. It will not increment miss statistics when STATISTICS_LEVEL=BASIC. 10.2 and above

– Level. 0-14. Used to prevent latch deadlocks

– Class. 0-7. Spin and wait class assigned to the latch. 9.2 and above.


33

LATCH ATTRIBUTES BY ORACLE VERSION


11.2.0.3 553 154 72 - 6 93

Oracle Number of latches PAR G2C LNG UFS SHARED

7.3.4.0 53 14 2 3 - -

8.0.6.3 80 21 7 3 - 3

8.1.7.4 152 48 19 4 - 9

9.2.0.8 242 79 37 - - 19

10.2.0.2 385 114 55 - 4 47

10.2.0.3 388 117 58 - 4 48

10.2.0.4 394 117 59 - 4 50

11.1.0.6 496 145 67 - 6 81

11.1.0.7 502 145 67 - 6 83

11.2.0.1 535 149 70 - 6 86

11.2.0.2 551 153 72 - 6 91

34

LATCH LEVELS

• Every latch has a level in the range 0-14 to avoid deadlocks.

• Process can request a latch X in wait mode after obtaining latch Y, if andonly if level X > level Y

• At the same level. Process can request the second G2C child X in wait mode after obtaining child Y, if and only if the child number of X < child number of Y

• If these rules are broken, the Oracle process raises:

ORA-600[504] - trying to obtain another latch at incompatible level. Note 28104.1.

ORA-600[525] - trying to obtain another latch at the same level. Note 138888.1.

ORA-600[526] - latch child order violation. Note 138889.1.


35

PARENT AND CHILD LATCHES

• Latch can be a parent, child, or solitary latch

• Both parent and child latches share the same latch name.

• Parent latches are static objects in fixed SGA. See x$ksmfsv and ksms.s

• Child latches are allocated during instance startup

• The parent latch can be gotten independently

• But may act as a master latch when acquired in special mode in kslgpl() during deadlock detection or select from v$latch/x$kslltr and some x$'s:


(latch info) wait_event=0 bits=10

holding 5000a5b8 Parent+children enqueue hash chains level=4

Location from where latch is held: ksqcmi: kslgpl:

Context saved from call: 0

state=busy

36

LATCH TREES

“Rising level” rule leads to “trees” of processes waiting for and holding the latches:

Direct SGA access program output for 9.2.0.6 instance with too smallshared pool.


ospid: 28067 sid: 1677 pid: 61

holding: 3800729f0 'shared pool' (156) level=7 child=1 whr=1602 kghupr1

waiter: ospid: 129 sid: 72 pid: 45

holding: a154b7120 'library cache' (157) level=5 child=17 kglupc: child




…


holding: a154b7db8 'library cache' (157) level=5 child=4 kglupc: child





…

37

LATCH CLASSES

• Classes with different spin and wait policies.

• Up to 8 classes. By default all the latches belong to class 0

• Exception – ‘process allocation latch’ belongs to class 2

• Latch can be assigned to class by its number. For example :

– _LATCH_CLASS_6='100 12 1000 2000 3000 4000'

– _LATCH_CLASSES='4:6'

• Will be discussed later.


SQL> select * from x$ksllclass;

INDX SPIN YIELD WAITTIME SLEEP0 SLEEP1 SLEEP2 SLEEP3 SLEEP4 ...

---- ------ ----- -------- ------ ------ ------ ------ ------

0 20000 0 1 8000 8000 8000 8000 8000

…

6 100 12 1000 2000 3000 4000 4000 4000

38

LATCH ACQUISITION IN NO-WAIT MODE

• kslgetl(laddr,0,why,where) – Exclusive latch get

• kslgetsl(laddr,0,why,where,mode) – Shared latch get

• One attempt to get latch, no spin and sleep

– sskgslgf() -> uses atomic XCHG/LDSTUB instructions for exclusive latches

– skgslocas() -> uses atomic CMPXCHG (CAS) for shared latches

• IMMEDIATE_... statistics are updated

• Used when many candidate latches are available, to bypass thelatch level rule or as part of CBC chain examination


sqlplus /nolog @demo_03.sql39

LATCH WAITS


WAITING FOR THE LATCH


S G A

CPU 1 CPU 2

Process A

Process B waits(spins and sleeps)

Process A holds a latch

Latch

Process B

41

LATCH ACQUISITION IN WAIT MODE. OFFICIAL VERSION

• "… If a required latch is busy, the process requesting it spins (counts up to a large number (_LATCH_SPIN_COUNT)), tries again and if still not available, spins again.

• This loop is repeated up to _SPIN_COUNT maximum number of times.

• If after that entire loop, the latch is still not available, the process must yield the CPU and go to sleep.

• A sleep corresponds to a wait for the latch free wait event.

• Initially, a process sleeps for 0.01 s. This time is doubled in every subsequent sleep up to a maximum determined by the parameter _MAX_EXPONENTIAL_SLEEP (0.2s or 2s)…"


Still did not found any version, which used this.

May be it was <= Oracle 6

42

LATCH ACQUISITION IN WAIT MODEANOTHER OFFICIAL VERSION. WAS USED IN 7.3-8.1

Latch gets with wait (kslgetl(laddress,1,…))

– One fast get, no spin (sskgslgf())

– Spin get: check the latch upto _SPIN_COUNT times

– Sleep on latch free event with exponential backoff

– Repeat


43

8I LATCH SPIN CODE FLOW USING DTRACE


• … Event 10046 trace:

• WAIT #0: nam='latch free' ela= 0 p1=536893688 p2=29 p3=0



kslgetl(0x200058F8,1,2,3) - KSL GET exclusive Latch# 29

kslges(0x200058F8, ...) - wait get of exclusive latch

skgsltst(0x200058F8) ... call repeated 2000 times = SPIN_COUNT

pollsys(...,timeout=10 ms,...) - Sleep 1

skgsltst(0x200058F8) ... call repeated 2000 times





pollsys(...,timeout=30 ms,...) - Sleep 4 …

sqlplus /nolog @demo_04_8.sql 29

44

POTENTIAL PROBLEMS WITH EXPONENTIAL BACKOFF

• 0.01-0.01-0.01-0.03-0.03-0.07-0.07-0.15-0.23-0.39-0.39-0.71-0.71-1.35-1.35-2.0-2.0-2.0-2.0...sec


12]2/)1[( −= +waitN

timeout

45

• Most waits were for nothing – latch already was free.

• Latch utilization can not be high.

• Unnecessary spins provoke CPU thrashing.

9.2-11G EXCLUSIVE LATCH SPIN FLOW USING DTRACE

sqlplus /nolog @demo_05.sql 45


kslgetl(0x50006318, 1)

-> sskgslgf(0x50006318)= 0 -immediate latch get

-> kslges(0x50006318, ...) -wait latch get

-> skgslsgts(...,0x50006318, ...) -spin latch get

->sskgslspin(0x50006318)

... - repeated 20000 cycles = 10*_SPIN_COUNT!

-> kskthbwt(0x0)

-> kslwlmod() - set up Wait List

-> sskgslgf(0x50006318)= 0 -immediate latch get

-> skgpwwait -sleep latch get

semop(11, {17,-1,0}, 1)

Semop – infinite wait until posted!

46

RELIABLE LATCH WAITS

• Hidden latch wait revolution that we missed. Latch relies on wait posting.

• In Oracle 9.2-11.2, all the latches in default class 0 use latch wait posting. Process waits for the latch without any timeout.

• Exclusive latch spins 20000 cycles by default.

• Only first read is atomic, further reads poll the CPU cache. Latch is TTS spinlock.

• Latches assigned to non-default class wait until timeout.

• Version 9.0 was transient and used mix of backoff and wait posting. This is why _LATCH_WAIT_POSTING parameter disappeared in 9i.

• _LATCH_CLASSES and _LATCH_CLASS_X parametersdetermine the latch wait and spin.


47

IMAGINE THE TIME MICROSCOPE


Reality: One million zoomed time:

1 us 1 sec

1 ms 17 min

1 sec 11.5 days

Light speed (300000 km/sec) Sonic speed (300 m/sec)

CPU tick (2 GHz) 0.0005 sec

Avg “Cache buffers chains” latch holding time ~ 1-2 sec

Max spin time for shared latch (~2*2000*5 ticks) ~10 sec

Avg “Library Cache” latch holding time (10-20us) ~ 10-20 sec

Max spin time for exclusive latch (~20000*5 ticks) ~50 sec

Interval between CBC latch gets ~ 10 sec

OS context switch time (10us-1ms) 10sec-17min

48

LATCH WAITS UNDER TIME MICROSCOPE

Old 8i latch algorithm:Spin for latch during 5 seconds.

Go sleep for 2 hours 46 min in hope that congestion will dissolve.Spin again during 5 seconds.

Sleep again for 2 hours 46 min.…Spin again for 5 seconds.

Sleep again for 23 days (2 sec) …

New 9.2 latch algorithm:Spin for shared latch during 10 seconds (50 for exclusive latch)

If spin not succeeded, go to sleep until post.

According to v$event_histogram the majority of latch waits now take less then 17 min (1ms)


49

_SPIN_COUNT IS EFFECTIVELY STATIC FOR EXCLUSIVE LATCHES

• Oracle process spins for exclusive latch up to x$ksllclass[0].spincycles. This number depends on _SPIN_COUNT and _LATCH_CLASS_0 parameters.

• If _SPIN_COUNT was not set during instance startup – process will spin up to 20000 cycles. The default value for _SPIN_COUNT is only 2000.

• If instance started with _SPIN_COUNT parameter, Oracle set x$ksllclass.spin to _SPIN_COUNT if not otherwise specified by _LATCH_CLASS_0 parameter.

• The _SPIN_COUNT parameter can be changed dynamically, but this change will not have any effect for exclusive latch.


50

TIMED LATCH WAIT POSTING

• If wakeup post is lost in OS, waiters will sleep infinitely. Common problem in earlier 2.6.9 Linux kernel.

• This can lead to instance hang – process will never be woken up.

• _ENABLE_RELIABLE_LATCH_WAITS = FALSE changes semop()system call to semtimedop() with 0.3 sec timeout.

• Introduced in 9.2.0.6 for Bug 3632400 LMS may hang in kslges() waiting for a post. Increased robustness to OS posting problem.

• Generic. AIX ports always use timed latch wait posting

• My testcases did not revealed any measurable performance overhead of timed latch wait posting in Solaris and Linux.


51

FINE GRAIN LATCH TUNING

• Latch can be assigned to one of eight classes having different spin and wait policies. Standard class 0 latch use wait posting.

• _latch_class_X = “Spin Yield Waittime Sleep0 Sleep1 … Sleep7"

• Nonstandard class latch loops upto “Spin” cycles, then yields CPU. This is repeated “Yield” times. Then the process sleeps for “SleepX”microseconds using pollsys() (not semtimedop()) system call.

– If “Yield” !=0 repeat “Yield” times:

Loop up to “Spins” cycles

Yield CPU using yield() (or sched_yield())

– Sleep for “SleepX” usecs

– Then spin again …


52

TRACES OF NONDEFAULT CLASS LATCH


kslges(0x50006434, ...) - wait get of exclusive latch

sskgslspin(0x50006434) ... repeated 100 times.

yield() - Yield 1


yield() - Yield 2


pollsys(...,timeout=10 ms,...) - Sleep 1. End of spin and yield loop

…

_latch_class_6="100 2 3 10000 20000 30000 40000 50000 60000 70000 80000"

… Event 10046 trace:

WAIT #0: nam='latch free' ela= 10886 address=… number=46 tries=1 ...




...

sqlplus /nolog @demo_04.sql 46 53

PERFORMANCE OF LATCH CLASSES

"Row cache objects" latch contention testcase on ODA (24 SMT)


54

IS THE _SPIN_COUNT REALLY STATIC?

• Well-known Guy Harrison experiments [5] with Oracle 11g demonstrated that dynamic _SPIN_COUNT tuning influenced latch waits, CPU and throughput.

• The experiments with dynamic change of _SPIN_COUNT used “cache buffers chain” latch contention

• This latch became shared in Oracle 9.2

• Exclusive latch uses static SPIN value from x$ksllclass(_LATCH_CLASS_X). By default – 20000.

• Shared latch spin can be tuned by _SPIN_COUNT value. By default – 2000. However, this is not trivial.


55

SHARED LATCH BEHAVE LIKE ENQUEUE

• Oracle realization of “Read-Write” spinlocks. Appeared in 8.0.

• S and X shared latch modes are incompatible.

• If latch is held in S mode, X mode waiter blocks all further requests

• This blocking achieved by special 0x40000000 bit in the latch value. This bit is a flag that some incompatible latch acquisition is in progress.

• Queued processes will be woken up one by one on latch release.

• X mode latch gets effectively serialize the shared latch.

• See my blog for description of experiments.


56

TRACE OF SHARED LATCH WAIT

Oracle process spins for shared latch twice. If the first spin upto _SPIN_COUNT was unsuccessful, the process puts itself into the latch wait list and then spins again:


kslgetsl(0x50009BAC,1,2,3,16) - KSL GET Shared Latch in X mode

kslgess(0x50009BAC, ...) - shared latch wait get

kslskgs(0x50009BAC, ...)

sskgslspin(0x50009BAC) ... repeated 2000 times.

kslwlmod(...)

kslskgs(0x50009BAC, ...)

sskgslspin(0x50009BAC) ... repeated 2000 times.

skgpwwait(...) - sleep until posted

semop(27,...)


SHARED LATCH ACQUISITION

• Shared latch spin in Oracle 9.2-11g is governed by _SPIN_COUNT value and can be dynamically tuned.

• X mode shared latch get spins by default up to 4000 cycles.

• S mode does not spin at all (or spins in unknown way).


S mode get X mode get

Held in S mode Compatible 2*_SPIN_COUNT

Held in X mode 0 2*_SPIN_COUNT

Blocking mode 0 2*_SPIN_COUNT

58

LATCH RELEASE

• Free the latch – kslfre(laddr).

• Oracle process releases the latch nonatomically.

• Then it sets up memory_barrier – perform atomic operation on address individual to each process.

• This requires less bus invalidation and ensures propagation of latch release to other local caches.

• Not fair policy. Spinners on the local CPU board have the preference.

• Then process posts the first process in the list of waiters.


59

LATCH CONTENTION


LATCH CONTENTION

• Occurs when a latch is required by several processes at thesame time

• Mostly due to bugs or abnormal application behavior.

• As utilization increases, latch waits and spins also increase. Performance degradation can be extremely sharp

• To diagnose the latch contention we need to investigate the latch statistics [8]


61

DIFFERENTIAL (POINT IN TIME) LATCH STATISTICS

Latch requests arrival rate

Immediate gets efficiency

Latch sleeps ratio

Latch wait time per second

Latch spin efficiencymisses

getsspin

∆

∆=

_σ


misses

sleepsK

∆

∆=

time

gets

∆

∆=λ

gets

misses

∆

∆=ρ

time

timewaitW

∆

∆=

_

Should be calculated for each child latch separately.V$LATCH averaging distorts statistics

62

SAMPLED AVERAGES

By sampling of x$ksupr we can estimate:

– Length of latch wait queue :

SELECT COUNT(1) FROM X$KSUPR WHERE KSLLAWAT=LADDRESS;

– Number of spinning processes in 10g :

SELECT COUNT(1) FROM X$KSUPR WHERE KSLLALAQ=LADDRESS;

Sampling of v$lathholder estimates latch utilization

Using DTrace we can measure

– Latch acquisition and holding times

– Actual spin count


wL

sN

DERIVED LATCH STATISTICS


Latch utilization: (PASTA)

Average holding time:

Length of latch wait list:

Recurrent sleeps ratio:

Latch acquisition time:

time

timeholdinglatchU

∆

∆=≈

__ρ

"Re_*"100

"_""__"

questsGet

TimeSnapMissGetPctS

∗==

λ

ρ

WL =

κ

κσ 1−+

)(1

WNT saq += −λ

64

LATCH STATISTICS EXAMPLE


Latch statistics for:

0x380007358 "session allocation"

Requests rate: lambda= 1350 Hz

Miss /get: rho= .022

Sampled Utilization: U= .013

Slps /Miss: kappa= .28

Wait_time/sec: W= .021

Sampled queue length Lw= .017

Spin_gets/miss: sigma= .72

Sampled spinning procs:Ns= .013

Secondary sleeps ratio = .002

Avg holding time= 16.3 usec

sleeping time = 15.9 usec

acquisition time = 25.8 usec

Latch acquisition time distribution

measured by DTrace: ------------------------------------ Distribution Distribution Distribution Distribution --------------------------------

2048 | 2048 | 2048 | 2048 |

4096 |@@@@@@4096 |@@@@@@4096 |@@@@@@4096 |@@@@@@

8192 |@@@@@@@@8192 |@@@@@@@@8192 |@@@@@@@@8192 |@@@@@@@@

16384 |@@@@@@@@@@@@@@@@@@@@@@@16384 |@@@@@@@@@@@@@@@@@@@@@@@16384 |@@@@@@@@@@@@@@@@@@@@@@@16384 |@@@@@@@@@@@@@@@@@@@@@@@

32768 |@@@32768 |@@@32768 |@@@32768 |@@@

65536 |65536 |65536 |65536 |

nsnsnsns

Average acquisition time=21 usec


LATCH CONTENTION DIAGNOSTICS IN 9.2-11G

• Should be suspected if latch wait events are observed in “Top 5 Timed Events” AWR section

• Look for the latch with highest W

• Symptoms of contention for the latch:

– W > 1 sec/sec

– Utilization > 10%

– Acquisition (or sleeping) time sufficiently greater then holding time

• Latchprofx.sql script invented by Tanel Poder [4] greatly simplifies diagnostics.

• The script and v$latch_misses reveal “where” the contention arise


ρ

66

LATCH LEVELS AND LATCH CONTENTION

• Process holding latch can only acquire latch with higher level.

• Contention for a high-level latch frequently exacerbates contention for lower-level latches

• “Library cache” latch contention without many cursor versions likely is a consequence of moderate shared pool latch contention.

• Shared pool resizes, clearances and ORA-4031 dumps can induce “library cache” latch storms.

• Use latch_tree.sql script to confirm this scenario


67

BEWARE OF CERTAIN V$/X$

• Frequent scans of some x$ tables induce latch contention:

– x$ktcxb (v$transaction) - “Transaction allocation” latch

– x$ktadm (v$lock, dba_jobs_running) – "DML lock allocation"

– x$kqlfxpl (v$sql_plan) – "library cache" latch in 10g

– x$ksmsp – "shared pool" latch

– x$kslltr (v$latch) – all parent latches in 11g.

– …


68

TREATING THE LATCH CONTENTION

• The right method: tune the application and reduce the latch demand. Tune the SQL, bind variables, schema, etc…

• Many brilliant books exist on this topic. Out of scope for this presentation.

• Install latest Oracle PSU and OS recommended patches. Several latch related examples:

– Oracle bug 7627743 “Higher latch waits in 11g”. Processes in 11.1 always spun upto the maximum spin count value.

– Linux 2.6.9 kernel bug 149933. "Missing wakeup in ipc/sem". Fixed in Red Hat EL 4 Update 4

– HP-UX 11.31. Should install scheduler cumulative patchPHKL_38397 or later


69

WHEN WE NEED TO TUNE SPIN COUNT:

• The “right” method to tune the application may be too expensive and require complete application rewrite.

• Nowadays the CPU power is cheaper. We may already have enough free CPU resources.

• Oracle does not explicitly forbid _SPIN_COUNT tuning. However, change of undocumented parameter should be discussed with Support. For example, MOS recommends _SPIN_COUNT=5000 for 10g Streams.

• Processes spin for exclusive latch spin upto 20000 cycles, for shared latch upto 4000 cycles and infinitely for pre 11.2.0.2 mutex. Tuning may find more optimal values for your application.

• In CPU thrashing, decrease of the number of spins can reduce CPU consumption.


70

TUNING THE SPIN COUNT EFFICIENTLY

• Beware of side effects. You should have enough free CPU.

• First, diagnose the root cause of latch contention. Measure the latch statistics, install relevant patches.

• Spin count tuning will only be effective if latch holding time Sis in its normal microseconds range

• The number of spinning processes should remain less then number of CPUs. Beware CPU thrashing. Analyze AWR and latch statistics before and after each change.

• Spin count adjustment is different for exclusive and shared latches.

• Nonstandard latch classes may decrease CPU consumption.

• It is a common myth that CPU time will raise infinitely while weincrease spin count.


71

SPIN COUNT ADJUSTMENT

Shared latches:

• Dynamically by _SPIN_COUNT parameter.

• Good starting point is the multiple of default 2000 value.

• Setting _SPIN_COUNT parameter in initialization file, should be accompanied by _LATCH_CLASS_0="20000". Otherwise spin for exclusive latches will be greatly affected by next instance restart.

Exclusive latches:

• Statically by _LATCH_CLASS_0 parameter. Needs the instance restart.

• Good starting point is the multiple of default 20000 value.

• It may be preferable to increase the number of "yields" for class 0 latches.

Spin count tuning probes the tail of latch holding time distribution. See my blog for related mathematics.


72

SPIN SCALING

• My previous work [8] proposed approximate scaling rules to estimate effect of SPIN_COUNT tuning depending on:

K = "sleeps ratio"=“Avg Slps/Miss”

• If spin is inefficient then:

Doubling the spin count will reduce "sleeps ratio" by 2 and doubles the spin CPU consumption

• If latch holding time distribution has exponential tail, you have enough CPU resources and spin is efficient then:

Doubling the spin count will square the “sleeps ratio” coefficient. This will multiply the spin CPU consumption by

If the spin is already efficient, it is worth to increase the spin count.


)1( K+

1.0≥K

1.0<<K

If "sleeps ratio" for exclusive latch is 10% than increase of

spin count to 40000 may results in

10 times decrease of "latch free" wait events, and only 10%

increase of CPU consumption.

Your mileage may vary

73

CACHE BUFFER CHAINS LATCH CONTENTION TESTCASE


Latch wait and CPU times vs. _SPIN_COUNT

74

LONG LATCH HOLDING TIME: CPU THRASHING

• Latch contention can cause CPU starvation. Processes contending for a latch, also contend for CPU.

• Once CPU starves, OS runqueue length raise and loadaverageexceeds the number of CPUs. Some OS may shrink the time quantum. Latch holders will not receive enough time to release the latch.

• Due to priority decay, latch acquirers may preempt latch holders. This leads to priority inversion. The throughput falls.

• Transition to this metastable state is more likely if workload of your system approaches ~100% CPU

• Due to preemption, latch holding time S will raise to the CPU scheduling scale [7].


75

CPU THRASHING

• If CPUs are running at 100% utilization for other reasons, the latchcontention can be just a symptom of this CPU starvation itself. All wait times will be inflated by wait for CPU. However, latch holding time should be in normal range.

• CPU thrashing is unlikely on Solaris due to latch preemption control.

• To prevent the priority inversion use:

– FX priority class on Solaris

– HPUX_SCHED_NOAGE on HPUX

– SCHED_BATCH policy on Linux


76

LATCH SMP SCALABILITY:

• If latch utilization is in single CPU environment.

• Then in N CPU server latch utilization will be . This can be problematic:

– If single CPU system held latches only for 1% of time.

– 48 CPU server with the same per-CPU load will hold latches for 50%.

– 128 CPU Cores server will suffer huge latch (and mutex) contention.

• This is also known as "Software lockout". It may substantially affect contemporary multi-core servers.

• NUMA should overcome this intrinsic spinlock scalability restrictions.


1ρ

1ρρ NN ≈

77

BIBLIOGRAPHY

1. Steve Adams. Oracle8i Internal Services for Waits, Latches, Locks, and Memory. 1999.

2. Tanel Poder blog, http://blog.tanelpoder.com

3. Jonathan Lewis, Oracle Core: Essential Internals for DBAs and Developers. 2011

4. Edsger Dijkstra, Solution of a Problem in Concurrent Programming Control CACM. 1965.

5. M. Herlihy, N. Shavit, The Art of Multiprocessor Programming. 2008.

6. B. Sinharoy, et al. , Improving Software MP Efficiency for SharedMemory Systems. Proc. of 29th Hawaii Conference on System Sciences.1996.

7. Guy Harrison blog, Using _spin_count to reduce latch contention in 11g. http://guyharrison.squarespace.com/ 2008.

8. R. Johnson, et al. A new look at the roles of spinning and blocking. Proc. of 5th Workshop on Data Management on New Hardware. 2009

9. A. Nikolaev, Exploring Oracle RDBMS latches using Solaris DTrace. Proc. of MEDIAS 2011 Conf., http://arxiv.org/abs/1111.0594v1 2011


78

Q/A?

• Questions?

• Comments?


79

ACKNOWLEDGEMENTS

• Thanks to RDTEX Technical Support Centre Director S.P. Misiura for years of encouragement and support of my investigations.

• Thanks to my colleagues for discussions.

• Thanks to all our customers for participating in latch troubleshooting.


80

Andrey Nikolaev

http://andreynikolaev.wordpress.com

[email protected]

RDTEX, Moscow, Russia

www.rdtex.ru

THANK YOU!


IN MEMORY OF MY FRIEND FOR 30 YEARS:


Andrey KriushinChairman of Russian Oracle User Group (RuOUG)

Died of heart attack 02.08.2011

82

presentation - latch internals by andrey s. nikolaev · andrey nikolaev oracle database performance...

Documents