deep review of lms process
DESCRIPTION
Review of LMS process in Oracle RAC databaseTRANSCRIPT
©OraInternals Riyaj Shamsudeen
RAC Hack: Deep review of LMS/LGWR process
By Riyaj Shamsudeen
©OraInternals Riyaj Shamsudeen 2
LMS Processing (over simplified)
Rx Msg
CR / CUR block build
Msg to LGWR (if needed)
Wakeup Log buffer processing
Log file write Signal LMS
Wake up
Send Block
OS,Network stack
OS,Network stack
Copy to SGA
User session processing
Send GC Message
OS,Network stack
User LMSx LGWR
Node 1 Node 2
©OraInternals Riyaj Shamsudeen 3
GC CR latency
GC CR latency ~=
Time spent in sending message to LMS +
LMS processing (building blocks etc) + LGWR latency ( if any) +
LMS send time +
Wire latency
Averages can be misleading. Always review both total time and average to understand the issue.
Processing in the remote nodes
©OraInternals Riyaj Shamsudeen 4
LMS process – A deep dive
LMS process uses pollsys system call and listens for incoming packets, with a 10ms timeout.
Sockets are file descriptors in UNIX.
truss -d -E -v all -p 1485 |more
1.8531 0.0000 pollsys(0xFFFFFD7FFFDFBA70, 7, 0xFFFFFD7FFFDFBA20, 0x00000000) = 0
fd=36 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0
fd=29 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0
fd=33 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0
fd=41 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0
fd=42 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0
fd=39 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0
fd=40 ev=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND rev=0
timeout: 0.010000000 sec
1.8635 0.0000 pollsys(0xFFFFFD7FFFDFBA70, 7, 0xFFFFFD7FFFDFBA20, 0x00000000) = 0
Timeout 10ms
©OraInternals Riyaj Shamsudeen 5
LMS sockets
Pfiles shows that these file descriptors are sockets, essentially, LMS process is sending and receiving messages in the ports. Pfiles 1845
36: S_IFSOCK mode:0666 dev:298,0 ino:10517 uid:0 gid:0 size:0
O_RDWR|O_NONBLOCK FD_CLOEXEC
SOCK_DGRAM
SO_SNDBUF(57344),SO_RCVBUF(57344),IP_NEXTHOP(0.224.0.0)
sockname: AF_INET 127.0.0.1 port: 33320
29: S_IFSOCK mode:0666 dev:298,0 ino:41400 uid:0 gid:0 size:0
O_RDWR|O_NONBLOCK FD_CLOEXEC
SOCK_DGRAM
SO_SNDBUF(262144),SO_RCVBUF(131072),IP_NEXTHOP(0.0.2.0)
sockname: AF_INET 169.254.106.96 port: 33318
33: S_IFSOCK mode:0666 dev:298,0 ino:10518 uid:0 gid:0 size:0
O_RDWR|O_NONBLOCK FD_CLOEXEC
SOCK_DGRAM
SO_SNDBUF(262144),SO_RCVBUF(131072),IP_NEXTHOP(0.0.2.0)
sockname: AF_INET 169.254.201.54 port: 33319
Demo: demo_lms_truss.ksh demo_lms_pfiles.ksh
©OraInternals Riyaj Shamsudeen 6
LMS CPU usage
Just because LMS process runs in RT mode, does not mean that LMS process is consuming all that CPU.
#./trace_syscall_preempt_size.sh 1485
0 => pollsys timestamp : 45075622139230
0 | swtch:pswitch oracle sysinfo: timestamp : 45075622155047
0 | swtch:pswitch Vol context switch : 45075622155697 pswitch genunix`cv_timedwait_sig_hires+0x2ab
0 | resume:off-cpu On cpu 0 for: 92460
0 | resume:on-cpu Off cpu for: 10242512
0 <= pollsys timestamp : 45075632406018 elapsed : 10266788
Demo: as root trace_syscall_preempt_size.sh
Pollsys call voluntarily releases CPU until a new packet is arrived to a port or a timeout.
Uses very little of CPU if there is no work to be done. Without any work, just 92 Micro seconds of CPU used in a 10,242 micro seconds window.
©OraInternals Riyaj Shamsudeen 7
LMS – early wakeup
Kernel will schedule LMS process if there is a network packet arriving to that port.
#./trace_syscall_preempt_size.sh 1485
0 => pollsys timestamp : 45075592390763
0 | swtch:pswitch oracle sysinfo: timestamp : 45075592402281
0 | swtch:pswitch Vol context switch : 45075592402933 pswitch genunix`cv_timedwait_sig_hires+0x2ab
0 | resume:off-cpu On cpu 0 for: 55099
0 | resume:on-cpu Off cpu for: 29660449
0 <= pollsys timestamp : 45075622072662 elapsed : 29681899
Demo: as root trace_syscall_preempt_size.sh
LMS process was woken up in 29 Micro seconds.
©OraInternals Riyaj Shamsudeen 8
LMS count
Even in busy environments, I have seen LMS to be busy only 50% of the time.
To schedule a process, CPU scheduler loads the CPU registers, instruction pipeline etc, a costly process.
If you have many LMS processes, then the workload will be distributed among them. Due to RT priority, they will be moving in and out of CPU.
In a multi-processor environment, this becomes more complicated.
Version 11.2 uses much more meaningful values for LMS count.
©OraInternals Riyaj Shamsudeen 9
LMS – prstat
In Solaris, another way to check the efficiency of LMS process is through prstat micro accounting.
# prstat -mL -p 18243
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
18243 prod 6.4 5.9 0.0 0.0 0.0 0.0 88 0.0 2K 0 30K 187 oracle/1
Demo: prstat command
In this case,
Breakdown of LMS CPU usage is:
6.4% USR mode
5.9% SYS mode
0% CPU latency
88% Sleep
It breaks down the process in to micro accounting percentages.
2K voluntary context switches
0 involuntary context swiches
©OraInternals Riyaj Shamsudeen 10
LMS – session
LMS session level statistics can be used to measure the workload distribution. Of course, this is from the start of instances.
@lms_workload_perc
INST_ID PGM NAME VAL PROC_TO_INST PROC_TO_TOT INST_TO_TOT
---------- ------- ------------------------------ ---------- ------------ ----------- -----------
1 (LMS0) gc cr blocks served 62960382 15 3 25
1 (LMS1) gc cr blocks served 58701920 13 3 25
1 (LMS2) gc cr blocks served 57757849 13 3 25
...
2 (LMS5) gc cr blocks served 44476702 14 2 18
2 (LMS6) gc cr blocks served 42312824 13 2 18
...
3 (LMS0) gc cr blocks served 38465965 14 2 15
3 (LMS1) gc cr blocks served 37541589 14 2 15
...
4 (LMS6) gc cr blocks served 95517242 14 5 40
4 (LMS5) gc cr blocks served 94879180 13 5 40
...
Demo: lms_workload_distr.sql – to measure workload from the instance start
LMS process in Node 4 is busy serving CR blocks. Why?
©OraInternals Riyaj Shamsudeen 11
LMS - workload
In few cases, it is prudent to measure the current rate, instead of relying upon prior rate.
@gc_lms_workload_distr_diff.sql
Enter value for search_string: gc cr blocks served
Enter value for sleep: 60
---------|--------------|----------------|----------|---------------|---------------|-------------|
Inst | Pgm | value | totvalue | instvalue | proc2inst |inst2total |
---------|--------------|----------------|----------|---------------|---------------|-------------|
1 | (LMS0)| 348| 16931| 2993| 11| 2|
1 | (LMS1)| 335| 16931| 2993| 11| 1|
...
2 | (LMS0)| 359| 16931| 2660| 13| 2|
2 | (LMS1)| 375| 16931| 2660| 14| 2|
...
3 | (LMS0)| 132| 16931| 1231| 10| 0|
3 | (LMS1)| 194| 16931| 1231| 15| 1|
...
4 | (LMS0)| 1164| 16931| 10047| 11| 6|
4 | (LMS1)| 1784| 16931| 10047| 17| 10|
---------|--------------|----------------|----------|---------------|---------------|-------------|
Demo: gc_lms_workload_distr_diff.sql
LMS processes in Node 4 is busy here.
©OraInternals Riyaj Shamsudeen 12
LMS – applying undo
LMS process applies undo blocks to construct the CR buffer to send.
Demo: get_sesstat_sid.sql
474K undo records were applied to create CR blocks.
Following session level stats can be reviewed to determine the performance counter for undo blocks applied.
@get_sesstat_sid
Enter the wildcard character (Null=All):undo
Enter value threshold :1
Enter sid :11000
NAME VALUE
---------------------------------------------------------------- ----------
transaction tables consistent reads - undo records applied 13180
data blocks consistent reads - undo records applied 474036
©OraInternals Riyaj Shamsudeen 13
LMS – snapper.sql
Tanel’s ultra-cool snapper is useful to find the rate of few statistics on LMS session.
Demo: snapper.sql
@session_snapper out,gather=stw 15 4 11000
-- Session Snapper v2.01 by Tanel Poder ( http://www.tanelpoder.com )
----------------------------------------------------------------------------------------------------------------------
SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH
----------------------------------------------------------------------------------------------------------------------
11000, (LMS0) , STAT, cleanouts and rollbacks - consistent rea, 633, 42.2,
11000, (LMS0) , STAT, immediate (CR) block cleanout applicatio, 689, 45.93,
11000, (LMS0) , STAT, commit txn count during cleanout , 56, 3.73,
11000, (LMS0) , STAT, active txn count during cleanout , 633, 42.2,
11000, (LMS0) , STAT, cleanout - number of ktugct calls , 690, 46,
11000, (LMS0) , TIME, background cpu time , 696907, 46.46ms, 4.6%, |@ |
11000, (LMS0) , TIME, background elapsed time , 696907, 46.46ms, 4.6%, |@ |
11000, (LMS0) , WAIT, gcs remote message , 13987778, 932.52ms, 93.3%, |@@@@@@@@@@|
11000, (LMS0) , WAIT, events in waitclass Other , 150638, 10.04ms, 1.0%, |@ |
-- End of snap 2, end=2011-06-28 22:28:01, seconds=15
©OraInternals Riyaj Shamsudeen 14
GC TX/RX %
You can find the percentage for TX and RX packets using the gc_trafic_print.sql script too. These percentages are at instance level.
Demo: gc_traffic_print.sql
@gc_traffic_print.sql
---------|--------------|---------|----------------|---------|---------------|---------|-------------|---------|
Inst | CR blocks Rx | CR Rx% | CUR blocks Rx | CUR RX %| CR blocks Tx | CR TX % | CUR blks TX | CUR TX% |
---------|--------------|---------|----------------|---------|---------------|---------|-------------|---------|
1 | 283| 3.6| 950| 27.12| 214| 3.33| 665| 16.8|
2 | 7185| 91.47| 1327| 37.89| 256| 3.98| 1117| 28.22|
3 | 119| 1.51| 886| 25.29| 5798| 90.22| 1617| 40.86|
4 | 268| 3.41| 339| 9.68| 158| 2.45| 558| 14.1|
In that sampling interval, node 3 was transmitting 90% of the CR blocks and node 2 was receiving those blocks. This insight is useful to measure the workload distribution, with a larger sampling interval.
©OraInternals Riyaj Shamsudeen 15
gcs log flush sync
Before sending a reconstructed CR block or CUR block, LMS will verify that corresponding redo vectors are flushed to disk.
If the redo vector are not flushed, LMS need to wait for ‘gcs log flush sync’ event after requesting LGWR for a log flush, analogous to ‘log file sync’ event.
This is not an idle event, even though some old documentation suggest that.
©OraInternals Riyaj Shamsudeen 16
Gcs log flush sync - ASH
ASH shows that LMS waits for ‘gcs log flush sync’ event.
In this database, there is no issue and so, waits for ‘gcs log flush sync’ is not high.
select event, count(*) from v$active_session_history
where session_id in (select sid from v$session where program like '%LMS0%')
and sample_time > sysdate -(4/24)
group by event
order by 2 desc;
EVENT COUNT(*)
---------------------------------------- ----------
767
gcs log flush sync 265
latch: KCL gc element parent latch 4
©OraInternals Riyaj Shamsudeen 17
Gcs log flush sync – v$session_event
But, v$session_event for the LMS process shows that there are no waits for gcs log flush sync!
Event ‘gcs log flush sync’ is combined with other events in the wait_class “other”.
select event, trunc(time_waited_micro/1000) wait_milli, total_waits
from v$session_event where sid in (select sid from v$session where program like '%LMS0%')
order by 2 desc;
EVENT WAIT_MILLI TOTAL_WAITS
---------------------------------------- ---------- -----------
gcs remote message 218407919 83934373
events in waitclass Other 3970180 5237156
buffer busy waits 356 2897
latch: cache buffers chains 316 5197
latch: row cache objects 0 2
latch: shared pool 0 3
©OraInternals Riyaj Shamsudeen 18
Gcs log file sync - histogram
To review the impact of ‘gcs log file sync’ waits, you should review v$event_histogram.
73% of the waits complete under 1ms. This is probably not an issue.
@event_histogram.sql
Enter value for event_name: gcs log flush sync
INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT PER
---------- ---------------------------------------------------------------- --------------- ---------- ----------
1 gcs log flush sync 1 24490064 73.42
1 gcs log flush sync 2 6250630 18.74
1 gcs log flush sync 4 1848333 5.54
1 gcs log flush sync 8 597646 1.79
1 gcs log flush sync 16 142603 .42
1 gcs log flush sync 32 25006 .07
1 gcs log flush sync 64 66 0
©OraInternals Riyaj Shamsudeen 19
Gcs log file sync – Not so good
Following histogram shows an example when there is a performance issue with LFS.
If you have LFS waits and GC waits, then you should consider tuning log file sync before tuning GC events.
@event_histogram.sql
Enter value for event_name: gcs log flush sync
INST_ID EVENT WAIT_TIME_MILLI WAIT_COUNT PER
---------- ---------------------------------------------------------------- --------------- ---------- ----------
1 gcs log flush sync 1 28 .07
1 gcs log flush sync 2 24 .06
1 gcs log flush sync 4 31 .08
1 gcs log flush sync 8 33 .08
1 gcs log flush sync 16 35757 95.96
1 gcs log flush sync 32 1378 3.69
1 gcs log flush sync 64 6 .01
1 gcs log flush sync 128 2 0
©OraInternals Riyaj Shamsudeen 20
Gcs log file sync – LGWR interaction
If LGWR is suffering from performance issues, then LMS process can be seen waiting on ‘gcs log flush’ wait event in a tight 10ms loop.
If you have LFS waits and GC waits, then you should consider tuning log file sync before tuning GC events.
LMS trace file:
...
WAIT #0: nam='gcs log flush sync' ela= 10281 waittime=3 poll=0 event=136 obj#=-1 tim=1381909996
WAIT #0: nam='gcs log flush sync' ela= 10274 waittime=3 poll=0 event=136 obj#=-1 tim=1381920366
WAIT #0: nam='gcs log flush sync' ela= 10291 waittime=3 poll=0 event=136 obj#=-1 tim=1381930735
WAIT #0: nam='gcs log flush sync' ela= 10321 waittime=3 poll=0 event=136 obj#=-1 tim=1381941178
...
©OraInternals Riyaj Shamsudeen 21
GCS log flush sync - Example
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
log file sync 2,054 23,720 11548 45.8 Commit
gc buffer busy acquire 19,505 10,382 532 20.0 Cluster
gc cr block busy 5,407 4,655 861 9.0 Cluster
enq: SQ - contention 140 3,432 24514 6.6 Configurat
db file sequential read 38,062 1,305 34 2.5 User I/O
Host CPU (CPUs: 24 Cores: 24 Sockets: 24)
~~~~~~~~ Load Average
Begin End %User %System %WIO %Idle
--------- --------- --------- --------- --------- ---------
1.18 1.16 2.7 2.6 0.0 94.7
Excessive waits for log file sync for the foreground processes.
©OraInternals Riyaj Shamsudeen 22
GCS log flush sync – GC waits
Global Cache and Enqueue Services - Workload Characteristics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg global enqueue get time (ms): 7.4
Avg global cache cr block receive time (ms): 222.0
Avg global cache current block receive time (ms): 27.5
Avg global cache cr block build time (ms): 0.0
Avg global cache cr block send time (ms): 0.1
Global cache log flushes for cr blocks served %: 2.7
Avg global cache cr block flush time (ms): 15879.9
Avg global cache current block pin time (ms): 0.0
Avg global cache current block send time (ms): 0.1
Global cache log flushes for current blocks served %: 0.3
Avg global cache current block flush time (ms): 1701.3
Average waits for CR RX was at 222ms
High flush time indicating waits for LGWR process
©OraInternals Riyaj Shamsudeen 23
GCS log flush sync – GC waits
Avg
%Time Total Wait wait Waits % bg
Event Waits -outs Time (s) (ms) /txn time
-------------------------- ------------ ----- ---------- ------- -------- ------
gcs log flush sync 80,695 51 1,862 23 34.7 32.9
log file parallel write 44,129 0 880 20 19.0 15.6
Log archive I/O 1,607 0 876 545 0.7 15.5
gc cr block busy 729 71 752 1031 0.3 13.3
db file parallel write 25,752 0 434 17 11.1 7.7
enq: CF - contention 166 64 307 1850 0.1 5.4
Background processes waiting for excessive gcs log flush sync events.
High waits for log file parallel writes
©OraInternals Riyaj Shamsudeen 24
LGWR is important
So, if you think, LGWR performance is important in single instance, then it is ultra-important in RAC.
If you have LGWR related performance issues, you can almost discard other waits as symptoms.
It’s a pity that LGWR does not run in RT mode ( or even FX class).
LMS processes runs in elevated priority, but LGWR does not run in elevated priority, classic priority-inversion!
©OraInternals Riyaj Shamsudeen 25
Contact info: Email: [email protected] Blog : orainternals.wordpress.com URL : www.orainternals.com
Thank you for attending!