performance scenario: diagnosing and resolving sudden slow down on two node rac
TRANSCRIPT
![Page 1: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/1.jpg)
Performance Scenario:
Diagnosing and resolving sudden slow down on two node RAC
![Page 2: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/2.jpg)
Introduction…
•
Karl Arao, OCP‐DBA, RHCT
•
Senior Consultant at SQL*Wizard
•
RAC user for 3years
•
1st
environment on VMware
•
I “heart”
performance
•
Don’t like to guess when troubleshooting
![Page 3: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/3.jpg)
Scenario
One Thursday…a client called…
There was a SUDDEN slow down
on ALL
of the applications
…a big impact to the Business
![Page 4: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/4.jpg)
And it’s running on
RAC RACno changes on the
RAC nodes and on the applications
![Page 5: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/5.jpg)
Some of 10g Performance Features
• OEM Performance Page• ADDM • SQL Tuning advisor• AWR (DBA_HIST_)• ASH• Time Model (total time for all db calls)• Wait Class (12 wait class)• Metrics (v$ performance metric deltas)• Services
![Page 6: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/6.jpg)
Setup
• Server and Storage: SunFire
X4200 (2CPU, 12GB memory) with LUNs
on EMC CX300
• OS: RHEL 4.3 ES• Database and clusterware: Oracle 10.2.0.3• Database Files, Flash Recovery Area, OCR, and
Voting disk are located on OCFS2 filesystems
• Application: Forms and Reports (6i and also lower)
![Page 7: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/7.jpg)
Troubleshooting Principle
Systematic/Layered approach..
Understand..
Then Fix..
Lets get it on!
![Page 8: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/8.jpg)
1. Measured the OS stack
• Monitored the following– cpu
(vmstat, top, mpstat)
– io
(iostat)
– memory (vmstat, meminfo)
– network (netstat)– process info (top, ps)
![Page 9: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/9.jpg)
• CPU on server1
• CPU on server2
![Page 10: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/10.jpg)
• Datafiles
on server1
• Datafiles
on server2
![Page 11: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/11.jpg)
• OCR & voting disk on server1
• OCR & voting disk on server2
![Page 12: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/12.jpg)
• Archivelogs
on server1
• Archivelogs
on server2
![Page 13: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/13.jpg)
• Flash Recovery Area on server1
• Flash Recovery Area on server2
![Page 14: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/14.jpg)
• Memory on server1
• Memory on server2
![Page 15: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/15.jpg)
• Compared my past & current RDA of the database
• Query on some v$views.. a query on v$session showed that server1 has more connections
(89% of the total users)
2. Checked the DB environment
![Page 16: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/16.jpg)
This could be because of:1)
The clients having lower versions (< Sql*Plus 8.1
or OCI8, see Note 97926.1) that may not support TAF (FAILOVER_MODE) and Load Balancing
(LOAD_BALANCE)
OR
2) They are using TNS entries explicitly connecting to server1
2. Checked the DB environment
![Page 17: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/17.jpg)
• Users don’t have FAILOVER capabilities
2. Checked the DB environment
![Page 18: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/18.jpg)
• Checked the application module usage on server1
2. Checked the DB environment
![Page 19: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/19.jpg)
• How bout I graph it in excel? Will the data be more
meaningful?
.. YES most of the users uses the xxxlogin.fmx
module
2. Checked the DB environment
![Page 20: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/20.jpg)
3. Checked instance‐wide DB performance
• Graphed the ASH data..
.. suffering from “gc
cr
block lost” and “gc
cr
multi block request” from 7am to 4pm
![Page 21: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/21.jpg)
3. Checked instance‐wide DB performance
• Researched on Metalink
for known issues.. Found Doc ID: 563566.1 gc
lost blocks
diagnostics
• Was able to pinpoint the peak period from the graph. Then, generated ADDM and AWR
report on that peak period..
![Page 22: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/22.jpg)
3. Checked instance‐wide DB performance
• ADDM
Elapsed Time: 60min
DB Time: 61.83min
AAS: 1.03
Max CPU: 2
![Page 23: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/23.jpg)
3. Checked instance‐wide DB performance
• Should I follow these recommendations right away?
Nope collect more facts, numbers, figures
![Page 24: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/24.jpg)
3. Checked instance‐wide DB performance
• AWR
![Page 25: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/25.jpg)
3. Checked instance‐wide DB performance
• Do we have a workload distribution problem? Nope even with distributed users..
We still have performance problem..
![Page 26: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/26.jpg)
4. Checked session‐level DB performance
• The database has too many activity, where do I start? Where to drill down?
• gv$session_longops
& gv$session_wait
output too many users, and require repetitive
monitoring• In the spirit of Method‐R…
"WORK FIRST TO REDUCE THE BIGGEST RESPONSE TIME COMPONENT OF A
BUSINESS' MOST IMPORTANT USER ACTION“
• Went to the Accounting Department, checked on the desktop terminals
![Page 27: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/27.jpg)
4. Checked session‐level DB performance
• Users PC1069 (with SID 601) and PC918 (with SID 483) are on total hang
![Page 28: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/28.jpg)
4. Checked session‐level DB performance
• Checked on the – performance/wait counters
– the current SQLs
![Page 29: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/29.jpg)
4. Checked session‐level DB performance
• v$session_wait
(SID 601)
![Page 30: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/30.jpg)
4. Checked session‐level DB performance
• v$sesstat
(SID 601)
![Page 31: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/31.jpg)
4. Checked session‐level DB performance
• v$sql, v$sql_plan, v$sql_plan_statistics
(SID 601)
• Running for 98 minutes
• Just 12.14 seconds on CPU
![Page 32: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/32.jpg)
4. Checked session‐level DB performance
• v$sesstat
(SID 483)
![Page 33: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/33.jpg)
4. Checked session‐level DB performance
• v$sql, v$sql_plan, v$sql_plan_statistics
(SID 483)
• Running for 3 hours• Just 2.68 seconds on CPU
![Page 34: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/34.jpg)
4. Checked session‐level DB performance
• Another graph of ASH
![Page 35: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/35.jpg)
5. Drilled down on the network interconnect
• Generated a “cat & egrep”
command to look for problems in the interconnect from the OS Watcher “netstat”
output
(from Metalink
Doc ID: 563566.1 gc
lost blocks diagnostics)
![Page 36: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/36.jpg)
5. Drilled down on the network interconnect
$ cat server1_netstat.dat | egrep
‐i "udpInOverflows|packet
receive
errors|fragments
dropped|reassembles
failed|fragments
dropped after
timeout"
34096 fragments dropped after timeout
306030 packet reassembles failed
15 packet receive errors
34096 fragments dropped after timeout
306268 packet reassembles failed
15 packet receive errors
34096 fragments dropped after timeout
306574 packet reassembles failed
…
output snipped …
![Page 37: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/37.jpg)
5. Drilled down on the network interconnect
• Restarted the switch
STILL
THERE IS A PERFORMANCE PROBLEM
![Page 38: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/38.jpg)
5. Drilled down on the network interconnect
• Replaced the switch
THEY GOT FAST
![Page 39: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/39.jpg)
5. Drilled down on the network interconnect
karao@karl:~/Desktop$ cat karlarao.dat
| egrep
‐i "udpInOverflows|packet
receive
errors|fragments
dropped|reassembles
failed|fragments
dropped after timeout"0 packet receive errors0 packet receive errors0 packet receive errors0 packet receive errors0 packet receive errors0 packet receive errors0 packet receive errors0 packet receive errors0 packet receive errors0 packet receive errors0 packet receive errors
![Page 40: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/40.jpg)
5. Drilled down on the network interconnect
• Another graph of ASH (Stacked graph)
![Page 41: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/41.jpg)
5. Drilled down on the network interconnect
• Another graph of ASH (3d view)
![Page 42: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/42.jpg)
Conclusion
You don’t have to guess..
Even if it’s a RAC environment..
It just takes facts, numbers, figuresto solve a performance problem
![Page 43: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/43.jpg)
References and Tools
• http://karlarao.wordpress.com• http://blog.tanelpoder.com
– http://www.tanelpoder.com/files/TPT_public.zip– http://www.tanelpoder.com/files/PerfSheet.zip– Neil Gunther
& Tanel
Poder
‐
Multidimensional Visualization of Oracle
Performance using Barry007 http://arxiv.org/pdf/0809.2532
• http://ashmasters.com• http://www.perfvision.com• http://www.method‐r.com
• Metalink
Doc ID 97926.1 Failover Issues and Limitations [Connect‐time
failover and TAF]
• Metalink
Doc ID 563566.1 gc
lost blocks diagnostics• Metalink
Doc ID 301137.1 OS Watcher User Guide
![Page 44: Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC](https://reader033.vdocuments.mx/reader033/viewer/2022052904/5580aa6dd8b42aa0448b51a3/html5/thumbnails/44.jpg)
Join Oracle Users –
Philippines
• Facebookhttp://www.facebook.com/home.php#/pages/Oracle‐Users‐Philippines/86773013086?ref=ts
• Linkedinhttp://www.linkedin.com/groups?home=&gid=2028295&trk=anet_ug_hm