anomalies: prevention, detection and diagnosis with orachk ... · log_buffer integer 10175488 *...

78
Anomalies: Prevention, Detection and Diagnosis with OraChk and TFA Jared Still 2016-12-28

Upload: others

Post on 09-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Anomalies: Prevention, Detection and

Diagnosis with OraChk and TFA

Jared Still

2016-12-28

Page 2: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

© 2016 Pythian. Confidential 2

About Me

•Jared Still

[email protected]

•At Pythian since 2011

•Oracle DBA since 1994

•Oaktable

•Oracle ACE

•Known to dabble in Perl

•Claim to fame: started Oracle-L

Page 3: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

ORAchk

Get Proactive

Page 4: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

ORAchk – where to find it and Documentation

• ORAchk - Health Checks for the Oracle Stack (Doc ID 1268927.2)

• Extensive documentation

• Documention: http://docs.oracle.com/cd/E68491_01/index.htm

• Prereqs for ORAchk: http://bit.ly/2hgvjGH

• Bash 3.2

• /usr/bin/expect ▪ [root@ora12c102rac01 ~]# yum install expect

Loaded plugins: refresh-packagekit, security

Setting up Install Process

Package expect-5.44.1.15-5.el6_4.x86_64 already installed and latest version

• ssh equivalency

• Runs as root (preferred) ▪ Especially for RAC

© 2016 Pythian. Confidential 4

Page 5: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

What does ORAchk check?

• Engineered Systems and Features

• Oracle Database Installations

• Standalone

• RAC

• Elastic Stack Integration

• E-Business Suite

• Pre-install checks

• Much more – check the docs

© 2016 Pythian. Confidential 5

• What does it look for?

• Database ▪ Patches up to date?

▪ Bug fixes

▪ Vulnerabilities

▪ Log Switch time

▪ Redo write time

▪ …

• OS – Linux ▪ ShellShock Bash bug

▪ Memory config

• Configuration ▪ Parameters

▪ Sizing

▪ …

Page 6: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Focus on ORAchk

• Considering single use and analysis

• This is an introduction

• Not Covered in this presentation

• Oracle Health Check Collections Manager

• APEX based catalog of ORAchk results

• EXAchk – Engineered Systems Specific

• Various other features; Application Continuity, OID,…

© 2016 Pythian. Confidential 6

Page 7: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Installation

• Installed by default as of 11.2.0.4+

• Much more useful if upgraded from default in 11.2.0.4

• Check for new versions – upgraded quarterly

• Easy to install

• Get the file

• Unzip it

• Done

© 2016 Pythian. Confidential 7

Page 8: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

10gR2 example - Installation

• Please upgrade if you are on 10g…

• This is a test DB only on OEL 5.5

[root@ora10gR2 js01]# cd $ORACLE_HOME

[root@ora10gR2 js01]# ls -l orachk

orachk: No such file or directory

[root@ora10gR2 js01]# mkdir orachk

[root@ora10gR2 js01]# cd orachk

[root@ora10gR2 orachk]# pwd

/u01/app/oracle/product/10.2/js01/orachk

[root@ora10gR2 orachk]# unzip /tmp/orachk.zip

• Installation is complete

© 2016 Pythian. Confidential 8

Page 9: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

10gR2 example – Execution

• Run for all databases found [root@ora10gR2 orachk]# ./orachk -output /tmp/orachk-01 -dball -a

Checking Status of Oracle Software Stack - Clusterware, ASM, RDBMS

. . . . . . .

Checking for prompts for root user on all nodes...

. . . . . . . .

-------------------------------------------------------------------------------------------------------

Oracle Stack Status [0m

-------------------------------------------------------------------------------------------------------

Host Name CRS Installed ASM HOME RDBMS Installed CRS UP ASM UP RDBMS UP DB Instance Name

-------------------------------------------------------------------------------------------------------

ora10gr2 No No Yes No No Yes js01

-------------------------------------------------------------------------------------------------------

Copying plug-ins

. . . . . . . . .

*** Checking Best Practice Recommendations

Collections and audit checks log file is

/tmp/orachk-01/orachk_ora10gr2_js01_122716_155227/log/orachk.log

...

Page 10: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

OraCHK Report

• Examine Report (orachk_ora10gr2_js01_122716_155227.html)

• Fix FAIL Item and recheck

• MAA Scorecard – log_buffer size failed

• Fix and recheck

SYSDBA> alter system set log_buffer=8388608 scope=spfile;

System altered.

-- restart the database

SYSDBA> show parameter log_buffer

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

log_buffer integer 10175488

* Oracle has rounded the size up to consume a granule of memory

See Oracle Calculation of Log_Buffer Size in 10g (Doc ID 604351.1)

© 2016 Pythian. Confidential 10

Page 11: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

10gR2 – Rerun ORAchk for just one check

• Re-run just 1 check:

• Click on Show Check Ids

• Check ID: CB02802D637C344DE0431EC0E50AE8DE

• orachk -output /tmp/orachk-01 -dball -check CB02802D637C344DE0431EC0E50AE8DE

© 2016 Pythian. Confidential 11

Page 12: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Rerun just the log_buffer check

[root@ora10gR2 orachk]# ./orachk -output /tmp/orachk-01 -dball -check CB02802D637C344DE0431EC0E50AE8DE

...

-------------------------------------------------------------------------------------------------------

Host Name CRS Installed ASM HOME RDBMS Installed CRS UP ASM UP RDBMS UP DB Instance Name

-------------------------------------------------------------------------------------------------------

ora10gr2 No No Yes No No Yes js01

-------------------------------------------------------------------------------------------------------

*** Checking Best Practice Recommendations (PASS/WARNING/FAIL) ***

...

=============================================================

Node name - ora10gr2

=============================================================

. . . . .

Collecting - Database Parameters for js01 database

© 2016 Pythian. Confidential 12

Page 13: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

10gR2 – Examine the new report

• Issue corrected

• See orachk_ora10gr2_js01_122716_164736.html

© 2016 Pythian. Confidential 13

Page 14: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

12.1.0.2 – 2 Node RAC

• Something a bit more up to date.

• TFA has been upgraded on this cluster

• (more on TFA to come)

• Server may need some cleanup on Aisle Oracle.

• Old version is still installed in different location

[root@ora12c102rac01 db_1]# find /u01/cdbrac/app/grid/product/12.1.0.2/ -name orachk -type f | xargs ls -ld

-rwxr-x---. 1 oracle oinstall 1604239 Jun 9 2014 /u01/cdbrac/app/grid/product/12.1.0.2/grid_1/suptools/orachk/orachk

-rwxr-xr-x. 1 root root 2895859 Dec 17 12:24

/u01/cdbrac/app/grid/product/12.1.0.2/grid_1/tfa/ora12c102rac01/tfa_home/ext/orachk/orachk

[root@ora12c102rac01 db_1]# $ORACLE_HOME/suptools/orachk/orachk -v

ORACHK VERSION: 2.2.5_20140530

[root@ora12c102rac01 db_1]# $ORACLE_HOME/tfa/ora12c102rac01/tfa_home/ext/orachk/orachk -v

ORACHK VERSION: 12.2.0.1.2_20161215

© 2016 Pythian. Confidential 14

Page 15: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Troubleshooting hang issue – p1

• Hung due to NFS mount hanging

• See orahk-troubleshooting-nfs-stuck.txt orachk -output /tmp/orachk-01 -dball –a

===>>> Hung here on localcmd.sh, which is running df -k

# ps -fluroot| grep [o]rachk

4 S root 617 25366 0 80 0 - 34277 pipe_w Dec27 pts/6 00:00:10 … orachk -dball -a

0 S root 1144 617 0 80 0 - 2325 wait Dec27 pts/6 00:00:01 bash /tmp/orachk-

01/.input_122716_170752/watchdog.sh

1 S root 11443 617 0 80 0 - 34277 wait Dec27 pts/6 00:00:00 … orachk -dball -a

0 S root 11444 11443 0 80 0 - 2351 wait Dec27 pts/6 00:00:00 bash /root/.orachk/localcmd.sh

[root@ora12c102rac01 ~]# pstree -p 617

bash(617)-+-bash(1144)---sleep(5964)

`-bash(11443)---bash(11444)-+-df(11457)

`-grep(11458)

# ps -flp 11457

F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD

0 D root 11457 11444 0 80 0 - 1049 rpc_wa Dec27 pts/6 00:00:00 df

15

Page 16: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Troubleshooting hang issue – p2

• RPC Waits – nearly always NFS

• RPC is a TCP Service, most frequently seen in NFS

• Operations error – I decommissioned NFS server while there were remote connections

• Oops

• Remove mount from /etc/fstab and reboot

• For in depth NFS trouble shooting articles, see Resources

[root@ora12c102rac01 fd]# strace df -k

stuck here statfs("/mnt/oraback", ^C <unfinished ...>

[root@ora12c102rac01 fd]# mount| grep oraback

lestrade:/mnt/zpool1/oraback on /mnt/oraback type nfs

(rw,bg,intr,hard,rsize=32768,wsize=32768,noac,nolock,nfsvers=3,addr=192.168.1.116)

16

Page 17: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Report Analysis

• Runtime installation issues cleared up…

• Examine orachk_ora12c102rac01_js122a_122816_083224.html

• Test environment, many failures and warnings…

• First five checks have FAIL status

• “For our environment, these do not need to be corrected”

• What if you want to always exclude certain checks?

© 2016 Pythian. Confidential 17

Page 18: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Excluding Checks

• The -excludecheck option may be used

• orachk -excludecheck check_id1,check_id2,…

• See orachk_ora12c102rac01_js122a_010217_141755.html#excluded_checks

• Or use excluded_check_ids.txt

• Place in same directory as orachk

# cat excluded_check_ids.txt

F9519FD7525B4F01E04313C0E50AC339

CB95A1BF5B1160ACE0431EC0E50A12EE

3B36CD93FCE82634E0530C98EB0A543F

429C554ED92E4F54E0530D98EB0AE367

AA8C83A023362C5EE040E50A1EC0146A

• As of orachk 12.2.0.1.2, the exclude file does not work.

• Have I created a bug SR for this? Not Yet

© 2016 Pythian. Confidential 18

Page 19: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

excluded_check_ids.txt – Works in 12.2.0.1.4

• Verify orachk.log

#!/bin/bash

grepStr=''

while read chkid

do

echo $chkid

grepStr="${grepStr}|${chkid}"

done < excluded_check_ids.txt

# remove leading pipe

grepStr=${grepStr:1}

grep -iE "skipping.*($grepStr)" orachk.log | cut -d' ' -f 8- | sort -u

© 2016 Pythian. Confidential 19

Page 20: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

excluded_check_ids.txt – Works in 12.2.0.1.4

• orachk may already be skipping some of the excluded checks

Checks to Skip:

F9519FD7525B4F01E04313C0E50AC339

CB95A1BF5B1160ACE0431EC0E50A12EE

3B36CD93FCE82634E0530C98EB0A543F

429C554ED92E4F54E0530D98EB0AE367

AA8C83A023362C5EE040E50A1EC0146A

Checking orachk.log

Skipping check(429C554ED92E4F54E0530D98EB0AE367) on version 1 db_version= versions_to_run=

Skipping check(AA8C83A023362C5EE040E50A1EC0146A) on version 1 db_version= versions_to_run=

Skipping check(CB95A1BF5B1160ACE0431EC0E50A12EE) on version 4 db_version= versions_to_run=

Skipping check(CB95A1BF5B1160ACE0431EC0E50A12EE) on version 4 db_version = versions_to_run =

© 2016 Pythian. Confidential 20

Page 21: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

ORAchk Patch Recommendations

• ORAchk updated quarterly

• Knows about current patches

• See Patch Recommendation orachk_ora12c102rac01_js122a_122816_083224.html

© 2016 Pythian. Confidential 21

Page 22: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

OraCHK – some runtime options

• Use -dbserial

• Reduce load on production

• RAT_* environment variables

• RAT_DBNAMES=“db1 db2 …”

• RAT_TMPDIR=‘/some-tmp-dir’

• RAT_OUTPUT=‘/tmp/outputfiles-here’

• RAT_CRS_HOME=‘/crs-home-location’

• Many, many more in the documentation ▪ Some are platform and feature specific

▪ Check the documentation

• CLI options

• -p patch check only

• -output directory for output

• -localonly only local node

• -diff old-report new-report

• Many more…

© 2016 Pythian. Confidential 22

Page 23: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA

Trace File Analyzer

Page 24: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA – Where to find it

• TFA Collector - TFA with Database Support Tools Bundle (Doc ID 1513912.1)

• Diagnostics tool

• Gathers information from many log files

• Especially useful with RAC ▪ Checks more files than you probably know about

▪ Gathers from all nodes to one location

▪ Query tool to drill down on problem times

• Bonus – other tools installed with TFA

• ORAchk

• OS Watcher

© 2016 Pythian. Confidential 24

Page 25: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA Install

Trace File Analyzer

Page 26: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Install or Upgrade TFA • Installed by default as of 11.2.0.4+

• Much more useful if upgraded from default

• Simple install and upgrade

© 2016 Pythian. Confidential 26

[root@ora12c102rac01 TFA]# unzip p21757377_121020_Generic.zip Archive: p21757377_121020_Generic.zip inflating: TFA_User_Guide_12.1.2.8.4.pdf inflating: installTFALite inflating: README.txt [root@ora12c102rac01 p21757377_121020_Generic]# ./installTFALite TFA Installation Log will be written to File : /tmp/tfa_install_30974_2016_12_17-12_22_18.log Starting TFA installation TFA HOME : /u01/cdbrac/app/grid/product/12.1.0.2/grid_1/tfa/ora12c102rac01/tfa_home TFA Build Version: 121284 Build Date: 201612160926 Installed Build Version: 121284 Build Date: 201611221014 TFA is already installed. Patching /u01/cdbrac/app/grid/product/12.1.0.2/grid_1/tfa/ora12c102rac01/tfa_home...

Page 27: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Start TFA

[root@oravm01 ~]# /etc/rc.d/init.d/init.tfa start

Starting TFA..

Waiting up to 100 seconds for TFA to be started..

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

Successfully started TFA Process..

. . . . .

TFA Started and listening for commands

© 2016 Pythian. Confidential 27

Page 28: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA – Enable Automatic Diagnostic Collection

• tfactl set autodiagcollect=ON reposizeMB=5120

• Now enabled for all nodes

[root@oravm01 ~]# tfactl set autodiagcollect=ON reposizeMB=5120

Successfully set autodiagcollect=ON

.----------------------------------------------------------.

| oravm01 |

+---------------------------------------------+------------+

| Configuration Parameter | Value |

+---------------------------------------------+------------+

| TFA version | 12.1.2.4.1 |

| Automatic diagnostic collection | ON |

| Trimming of files during diagcollection | ON |

| Repository current size (MB) | 347 |

| Repository maximum size (MB) | 10240 |

| Inventory Trace level | 1 |

| Collection Trace level | 1 |

| Scan Trace level | 1 |

| Other Trace level | 1 |

| Max Size of TFA Log (MB) | 50 |

| Max Number of TFA Logs | 10 | ...

© 2016 Pythian. Confidential 28

Page 29: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA – Enable Automatic Diagnostic Collection

• ‘print config’ will show ‘auto’ enabled on only one node

tfactl print config | grep 'Automatic diag'

| Automatic diagnostic collection | ON |

| Automatic diagnostic collection | OFF |

• Diagnostics collected from all nodes.

• What happens if that node crashes?

• Next slide…

© 2016 Pythian. Confidential 29

Page 30: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Adhoc

Collection

• If auto collection not enabled

• tfactl diagcollect -all -from “Jan/27/2017 12:00:00" -to " Jan/27/2017 14:00:00"

• tfactl analyze -from " Jan/27/2017 12:00:00" -to " Jan/27/2017 14:00:00“ > rpt.txt

© 2016 Pythian. Confidential 30

Page 31: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA print

tfactl print -help

Usage: /u01/cdbrac/app/grid/product/12.1.0.2/grid_1/bin/tfactl print

[status|components [[component_name1] [component_name2] ...

[component_nameN]]|config|directories|hosts|actions|repository]

Prints requested details.

Options:

status Print status of TFA across all nodes in cluster

components Print the desired components in the Configuration

config Print current TFA config settings

directories Print all the directories to Inventory

hosts Print all the Hosts in the Configuration

actions Print all the Actions requested and their status

repository Print the zip file repository information

protocols Print available and restricted protocols in TFA

© 2016 Pythian. Confidential 31

Page 32: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA Config tfactl print config .---------------------------------------------------------------.

| ora12c102rac01 |

+--------------------------------------------------+------------+

| Configuration Parameter | Value |

+--------------------------------------------------+------------+

| TFA Version | 12.1.2.8.4 |

| Java Version | 1.6 |

| Public IP Network | ON |

| Automatic diagnostic collection | OFF |

| Alert Log Scan | ON |

| Trimming of files during diagcollection | ON |

| Repository current size (MB) | 276 |

| Repository maximum size (MB) | 10240 |

| Inventory Trace level | 1 |

| Collection Trace level | 1 |

| Scan Trace level | 1 |

| Other Trace level | 1 |

| Max Size of TFA Log (MB) | 50 |

| Max Number of TFA Logs | 10 |

| Max Size of Core File (MB) | 20 |

| Max Collection Size of Core Files (MB) | 200 |

| Automatic Purging | ON |

| Minimum Age of Collections to Purge (Hours) | 12 |

| Minimum Space Free to enable Alert Log Scan (MB) | 500 |

'--------------------------------------------------+------------'

32

Page 33: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA Repo

# find /u01/cdbrac/app/oracle/tfa/repository -type f -name \*.dat -mmin -10 | awk -F \/ '{ print $NF }'

ora12c102rac02.jks.com_top_17.01.04.2100.dat

ora12c102rac02.jks.com_vmstat_17.01.04.2100.dat

ora12c102rac02.jks.com_iostat_17.01.04.2100.dat

ora12c102rac02.jks.com_ifconfig_17.01.04.2100.dat

ora12c102rac02.jks.com_mpstat_17.01.04.2100.dat

ora12c102rac02.jks.com_netstat_17.01.04.2100.dat

ora12c102rac02.jks.com_meminfo_17.01.04.2100.dat

ora12c102rac02.jks.com_ps_17.01.04.2100.dat

33

• Repo files on remote node

• Find all files modified in last 10 minutes

• So we know it is working

Page 34: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA Repo Check Repo

tfactl print repository

.--------------------------------------------------------------.

| ora12c102rac01 |

+----------------------+---------------------------------------+

| Repository Parameter | Value |

+----------------------+---------------------------------------+

| Location | /u01/cdbrac/app/oracle/tfa/repository |

| Maximum Size (MB) | 10240 |

| Current Size (MB) | 278 |

| Free Size (MB) | 9962 |

| Status | OPEN |

'----------------------+---------------------------------------'

.--------------------------------------------------------------.

| Remote Node ===>>> ora12c102rac02 |

+----------------------+---------------------------------------+

| Repository Parameter | Value |

+----------------------+---------------------------------------+

| Location | /u01/cdbrac/app/oracle/tfa/repository |

| Maximum Size (MB) | 10240 |

| Current Size (MB) | 309 |

| Free Size (MB) | 9931 |

| Status | OPEN |

'----------------------+---------------------------------------'

34

Page 35: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA Status

tfactl print status

.-----------------------------------------------------------------------------------------------------.

| Host | Status of TFA | PID | Port | Version | Build ID | Inventory Status |

+----------------+---------------+------+------+------------+----------------------+------------------+

| ora12c102rac01 | RUNNING | 2892 | 5000 | 12.1.2.8.4 | 12128420161216092619 | COMPLETE |

| ora12c102rac02 | RUNNING | 2932 | 5000 | 12.1.2.8.4 | 12128420161216092619 | COMPLETE |

'----------------+---------------+------+------+------------+----------------------+------------------'

© 2016 Pythian. Confidential 35

Page 36: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA

Directories

Scanned

Small partial listing

tfactl print directories

see tfa-directories-scanned.txt

Last Rediscovery Run on ora12c102rac01 : Wed Jan 04 17:29:06 EST 2017

.---------------------------------------------------------------------------------------------------------------------.

| ora12c102rac01 |

+------------------------------------+--------------------------------------------------------+------------+----------+

| Trace Directory | Component | Permission | Added By |

+------------------------------------+--------------------------------------------------------+------------+----------+

| /etc/oracle | [CRS] | public | root |

| Collection policy : Exclusions | | | |

+------------------------------------+--------------------------------------------------------+------------+----------+

| /u01/cdbrac/app/grid/product/12.1. | [CFGTOOLS] | public | root |

| 0.2/grid_1/cfgtoollogs | | | |

| Collection policy : Exclusions | | | |

+------------------------------------+--------------------------------------------------------+------------+----------+

| /u01/cdbrac/app/grid/product/12.1. | [CFGTOOLS] | public | root |

| 0.2/grid_1/cfgtoollogs/cfgfw | | | |

| Collection policy : Exclusions | | | |

+------------------------------------+--------------------------------------------------------+------------+----------+

| /u01/cdbrac/app/grid/product/12.1. | [CFGTOOLS] | public | root |

| 0.2/grid_1/cfgtoollogs/crsconfig | | | |

| Collection policy : Exclusions | | | |

+------------------------------------+--------------------------------------------------------+------------+----------+

© 2016 Pythian. Confidential 36

Page 37: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA – What has been deployed?

[root@ora12c102rac01 db_1]# tfactl toolstatus

.---------------------------------------------.

| External Support Tools |

+----------------+--------------+-------------+

| Host | Tool | Status |

+----------------+--------------+-------------+

| ora12c102rac01 | alertsummary | DEPLOYED |

| ora12c102rac01 | exachk | DEPLOYED |

| ora12c102rac01 | pstack | DEPLOYED |

| ora12c102rac01 | orachk | DEPLOYED |

| ora12c102rac01 | oratop | DEPLOYED |

...

| ora12c102rac01 | oswbb | RUNNING |

| ora12c102rac01 | dbperf | DEPLOYED |

| ora12c102rac01 | changes | DEPLOYED |

| ora12c102rac01 | events | DEPLOYED |

| ora12c102rac01 | ps | DEPLOYED |

| ora12c102rac01 | srdc | DEPLOYED |

'----------------+--------------+-------------'

© 2016 Pythian. Confidential 37

Page 38: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA Usage

Trace File Analyzer

Page 39: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA Help tfactl -help

Usage : /u01/cdbrac/app/grid/product/12.1.0.2/grid_1/bin/tfactl <command> [options]

<command> =

start Starts TFA

stop Stops TFA

enable Enable TFA Auto restart

disable Disable TFA Auto restart

print Print requested details

access Add or Remove or List TFA Users

purge Delete collections from TFA repository

directory Add or Remove or Modify directory in TFA

host Add or Remove host in TFA

diagcollect Collect logs from across nodes in cluster

collection Manage TFA Collections

analyze List events summary and search strings in alert logs.

set Turn ON/OFF or Modify various TFA features

toolstatus Prints the status of TFA Support Tools

run <tool> Run the desired support tool

start <tool> Starts the desired support tool

stop <tool> Stops the desired support tool

syncnodes Generate/Copy TFA Certificates

diagnosetfa Collect TFA Diagnostics

uninstall Uninstall TFA from this node

© 2016 Pythian. Confidential 39

Page 40: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Options

[root@ora12c102rac01 ~]# tfactl analyze -help

Usage : /u01/cdbrac/app/grid/product/12.1.0.2/grid_1/bin/tfactl analyze [-search "pattern"]

© 2016 Pythian. Confidential 40

Components • db

• asm

• crs

• acfs

• os

• osw

• oswslabinfo

• oratop

• all

-type <error|warning|generic>

Dates, nodes and output

• [-since <n>[h|d]]

• [-from "MMM/DD/YYYY HH24:MI:SS"]

• [-to "MMM/DD/YYYY HH24:MI:SS"]

• [-for "MMM/DD/YYYY HH24:MI:SS"]

• [-node <all | local | n1,n2,..>]

• [-verbose]

• [-o <file>]

Page 41: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Adhoc

Collection

• If auto collection not enabled

• tfactl diagcollect -all -from “Jan/27/2017 12:00:00" -to " Jan/27/2017 14:00:00"

• tfactl analyze -from " Jan/27/2017 12:00:00" -to " Jan/27/2017 14:00:00“ > rpt.txt

© 2016 Pythian. Confidential 41

Page 42: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

oratop

• Like 'top‘ - continuous updates

• Show video excerpts

tfactl analyze -comp oratop -database js122a -d

• Output once to stdout

tfactl analyze -comp oratop -database js122a

© 2016 Pythian. Confidential 42

Page 43: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

oratop character output

[root@ora12c102rac01 ~]# tfactl analyze -comp oratop -database js122a

Cycle 1 - oratop: Release 14.1.2 Production on Fri Jan 27 14:20:48 2017

Oracle 12c - js1 17:20:46 up: 1.8d, 2 ins, 0 sn, 0 us, 3.4G mt, 1.4% db

ID %CPU LOAD %DCU AAS ASC ASI ASW AST IOPS %FR PGA UTPS UCPS SSRT %DBT

-------------------------------------------------------------------------------

2 32 2 0 1 0 0 0 0 5 7 287M 0 7 340u 84.8

1 5 1 0 1 0 0 0 0 3 11 242M 0 2 326u 15.2

EVENT (C) TOT WAITS TIME(s) AVG_MS PCT WAIT_CLASS

-------------------------------------------------------------------------------

DB CPU 18750 54

control file parallel write 209974 6038 28.9 18 System I/O

db file parallel write 194182 3972 20.7 12 System I/O

db file sequential read 435013 3122 7.3 9 User I/O

enq: PS - contention 443333 2553 9.4 7 Other

ID SID SPID USR PROG S PGA SQLID/BLOCKER OPN E/T STA STE EVENT/*LA W/T

-------------------------------------------------------------------------------

1 60 13112 B/G QM03 D 1.8M a3vfsb1vvtr3s SEL 2.0d ACT WAI reliable 106m

© 2016 Pythian. Confidential 43

Page 44: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

oratop wide example

• tfactl analyze -comp oratop -f -i 2 -s -database js122a ▪ -f 132 column detail

▪ -i interval seconds

▪ -s SQL mode

Oracle 12c - Primary js122a 16:04:41 up: 2.8d, 2 ins, 0 sn, 0 us, 3.4G mt, 4% fra, 8 er, 4 pdb, 35.1% db

ID %CPU LOAD %DCU AAS ASC ASI ASW ASP AST UST MBPS IOPS IORL LOGR PHYR PHYW %FR PGA TEMP UTPS UCPS SSRT DCTR DWTR %DBT

1 43 2 25 0.6 0 0 0 1 0 0 0 28 478u 1 0 0 7 260M 0 0 50 596u 30 69 81.3

2 58 2 38 0.1 0 0 0 0 0 0 0 7 512u 0 0 0 7 291M 0 0 142 249u 66 33 18.7

EVENT (C) TOTAL WAITS TIME(s) AVG_MS PCT WAIT_CLASS

DB CPU 24549 54

control file parallel write 268246 7739 28.9 17 System I/O

db file parallel write 245596 5016 20.6 11 System I/O

db file sequential read 595676 4109 6.8 9 User I/O

enq: PS - contention 553361 3726 9.7 8 Other

ID CID USERNAME MODULE ACTION SQL_ID SQL_TEXT X ELAP CPUT IOWT WAIT EXEC ROWS BUFG DISK BH% LOAD

© 2016 Pythian. Confidential 44

Page 45: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Explore TFA Components - db

• db with –type generic option

• -type generic can turn up interesting issues

[root@ora12c102rac01 ~]# tfactl analyze -comp db -since 6d

Unique generic messages for last ~6 day(s)

Occurrences percent server name generic

----------- ------- -------------------- -----

...

(Frequent Resize)

1 0.1% ora12c102rac02 Resize operation completed for file# 201, old size 5513216K, new size 5514240K

Resize operation completed for file# 201, old size 5514240K, new size 5515264K

Resize operation completed for file# 201, old size 5515264K, new size 5516288K

Resize operation completed for file# 201, old size 5516288K, new size 5517312K

...

© 2016 Pythian. Confidential 45

Page 46: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Explore TFA Components - db

• db with –type generic option

• -type generic can turn up interesting issues

( underallocated memory in one node )

1 0.1% ora12c102rac02

WARNING: Heavy swapping observed on system in last 5 mins.

pct of memory swapped in [1.67%] pct of memory swapped out [1.39%].

Please make sure there is no memory pressure and the SGA and PGA

are configured correctly. Look at DBRM trace file for more details.

Errors in file /u01/cdbrac/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/trace/-MGMTDB_dbrm_8189.trc

(incident=12098) (PDBNAME=CDB$ROOT):

ORA-00700: soft internal error, arguments: [kskvmstatact: excessive swapping observed], [], [], [], [],

[], [], [], [], [], [], []

Incident details in: /u01/cdbrac/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/incident/incdir_12098/-

MGMTDB_dbrm_8189_i12098.trc

© 2016 Pythian. Confidential 46

Page 47: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Explore TFA Components - osw

INFO: analyzing host: ora12c102rac02 (content edited to fit the screen)

Report title: OSW top logs

Report date range: last ~6 day(s)

Report (default) time zone: EST - Eastern Standard Time

Analysis started at: 28-Jan-2017 06:12:59 PM EST

Elapsed analysis time: 3 second(s).

Configuration file: /u01/cdbrac/app/grid/product/12.1.0.2/grid_1/tfa/ora12c102rac02/tfa_home/ext/tnt…

Configuration group: osw

Parameter:

Total osw rec count: 5,767, from 26-Jan-2017 06:00:22 PM EST to 28-Jan-2017 06:12:38 PM EST

OSW recs matching last ~6 day(s): 5,767, from 26-Jan-2017 06:00:22 PM EST to 28-Jan-2017 06:12:38 PM EST

statistic: t first highest lowest average non zero 3rd last trend

top.cpu.util.id: % 39.8 100.0 0.0 81.6 5,572 91.4 -100%

top.cpu.util.si: % 2.4 47.4 0.0 0.9 2,928 0.0 287%

top.cpu.util.sy: % 13.3 77.8 0.0 4.5 5,722 1.2 5%

top.cpu.util.us: % 9.6 94.3 0.0 8.1 5,740 6.2 698%

top.cpu.util.wa: % 34.9 95.4 0.0 4.9 4,391 1.2 -100%

top.loadavg.last01min: 3.07 23.89 1.02 2.06 5,473 1.40 -12%

top.mem.buffers: k 156488 230100 46824 141571 5,762 130836 -16%

top.mem.free: k 336160 718324 33444 224517 5,762 235172 -85%

top.mem.used: k 4745048 5047764 4362884 4856691 5,762 4846036 6%

top.swap.cached: k 1085320 1437792 816320 1189525 5,763 1159368 8%

top.swap.free: k 2296228 2366048 2277208 2319930 5,763 2333156 1%

top.swap.used: k 1832536 1851556 1762716 1808834 5,763 1795608 -2%

...

© 2016 Pythian. Confidential 47

Page 48: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA

Incident Simulation

Page 49: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Simulation

• “Reports of issues with system”

• Deliberately caused by this command

[root@ora12c102rac02 ~]# ip link set eth1 down

• crsctl stat res -t reports node offline

[root@ora12c102rac02 ~]# tfactl analyze -comp crs -since 1h

INFO: analyzing host: ora12c102rac02

Report title: CRS Alert Logs

Report date range: last ~1 hour(s)

Report (default) time zone: EST - Eastern Standard Time

Analysis started at: 25-Jan-2017 06:18:03 PM EST

Elapsed analysis time: 3 second(s).

© 2016 Pythian. Confidential 49

Page 50: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Simulation

Unique error messages for last ~1 hour(s)

Occurrences percent server name error

----------- ------- -------------------- -----

1 33.3% ora12c102rac02 [OCSSD(19266)]CRS-1607: Node ora12c102rac01 is

being evicted in cluster incarnation 378741272; details at (:CSSNM00007:) in

/u01/cdbrac/app/oracle/diag/crs/ora12c102rac02/crs/trace/ocssd.trc.

1 33.3% ora12c102rac02 [OCSSD(19266)]CRS-1601: CSSD Reconfiguration

complete. Active nodes are ora12c102rac02 .

1 33.3% ora12c102rac02 [OCSSD(19266)]CRS-1610: Network communication

with node ora12c102rac01 (1) missing for 90% of timeout interval. Removal of this node

from cluster in 2.370 seconds

----------- -------

3 100.0%

© 2016 Pythian. Confidential 50

Page 51: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Investigation

• eth1 is down

hmm, eth1 is down

[root@ora12c102rac02 ~]# ip a

3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN qlen 1000

link/ether 08:00:27:03:80:e6 brd ff:ff:ff:ff:ff:ff

inet 169.254.24.84/16 brd 169.254.255.255 scope global eth1:1

• Bring interface back up and all goes back to normal

• Yes, you can do this with more manual processes

• TFA just reduces time required to diagnose

© 2016 Pythian. Confidential 51

Page 52: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA

Real Life Examples

Page 53: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Bad NIC

• Found Node evicted due to network issue

• Only this node – found to be NIC

• Much faster diagnosis with TFA

tfactl analyze -from "Oct/17/2014 11:00:00" -to "Oct/17/2014 12:30:00" > tfa-rpt-

20141017-1100-1230.txt

From the report it was seen that all nodes reported the following issue:

CRS-1610:Network communication with node uornoldbp02 (2) missing for 90% of

timeout interval. Removal of this node from cluster in 2.190 seconds

This would coincide with the findings that a NIC had failed on the server.

There don't appear to be other contributing errors.

© 2016 Pythian. Confidential 53

Page 54: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA diagnosis – disk error

Pythian client - we used TFA to find cause of node eviction

3 Node RAC

tfactl analyze -from "Oct/17/2014 08:45:00" -to "Oct/17/2014 09:15:00"

> tfa-rpt-20141017-0845-0915.txt

NODE 3

This looks like the smoking gun:

1 2.9% rac3rd-node

WARNING: Write Failed. group:1 disk:1 AU:176116 offset:344064 size:8192

ASMB (ospid: 3398): terminating the instance due to error 15064

ERROR: unrecoverable error ORA-15188 raised in ASM I/O path; terminating process 3368

Termination issued to instance processes. Waiting for the processes to exit

© 2016 Pythian. Confidential 54

Page 55: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA

Other troubleshooting help

Page 56: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Stack Trace • Why use pstack?

• Often an easy method to locate bugs in Oracle Support

• Stack trace appears in:

• Trace files generated by errors

• System/State dumps

• Stack trace for running process

• Use pstack PID

pstack 30965 #0 0x0000003f5b40e810 in __read_nocancel () from /lib64/libpthread.so.0

#1 0x000000000cef52f0 in snttread ()

#2 0x000000000cef4785 in nttfprd ()

#3 0x000000000ced4e95 in nsbasic_brc ()

#4 0x000000000ced4c96 in nsbrecv ()

#5 0x000000000cee3a4e in nioqrc ()

#6 0x000000000cb53aad in opikndf2 ()

#7 0x0000000001baf0d2 in opitsk ()

#8 0x0000000001bb3e31 in opiino ()

#9 0x000000000cb56f5d in opiodr ()

...

© 2016 Pythian. Confidential 56

Page 57: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Stack Trace

• Some process/program acting strangely

• Bug suspected?

• Use gv$session to get PIDs of interest.

SQL> l

1 select s.username, s.inst_id, s.sid

2 , s.sql_id, s.program, p.spid spid

3 from gv$session s

4 left outer join gv$process p on s.inst_id = p.inst_id

5 and p.addr = s.paddr

6 where s.username = 'JKSTILL'

7 and s.program like 'sqlplus@poirot%'

8* order by username, sid

11:15:28 ora12c102rac01.jks.com - jkstill@js122a1 SQL> /

SRVR

USERNAME INST SID SQL ID PROGRAM PID

---------- ----- ------ -------------- ------------------------------------------------ -----

JKSTILL 1 48 [email protected] (TNS V1-V3) 30965

1 71 gyd5fpd63tfs0 [email protected] (TNS V1-V3) 4777

2 88 [email protected] (TNS V1-V3) 27669

© 2016 Pythian. Confidential 57

Page 58: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Stack Trace

• Not necessary to logon to each server in a cluster

• TFA makes this simple – especially for RAC

[root@ora12c102rac01 ~]# tfactl pstack 30965 4777 27669

Output from host : ora12c102rac01

------------------------------

# pstack output for pid : 30965

#0 0x0000003f5b40e810 in __read_nocancel () from /lib64/libpthread.so.0

#1 0x000000000cef52f0 in snttread ()

#2 0x000000000cef4785 in nttfprd ()

• But, searches all nodes …

© 2016 Pythian. Confidential 58

Page 59: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Stack

Trace

[root@ora12c102rac01 ~]# tfactl pstack 30965 4777 27669

Output from host : ora12c102rac01

------------------------------

# pstack output for pid : 30965

#0 0x0000003f5b40e810 in __read_nocancel () from /lib64/libpthread.so.0

#1 0x000000000cef52f0 in snttread ()

...

# pstack output for pid : 4777

#0 0x0000003f5b40e810 in __read_nocancel () from /lib64/libpthread.so.0

#1 0x000000000cef52f0 in snttread ()

# pstack output for pid : 27669

Process 27669 not found.

Output from host : ora12c102rac02

------------------------------

# pstack output for pid : 30965

Process 30965 not found.

# pstack output for pid : 4777

Process 4777 not found.

# pstack output for pid : 27669

#0 0x0000003a4b60e740 in __read_nocancel () from /lib64/libpthread.so.0

#1 0x000000000ceb67d0 in snttread ()

© 2016 Pythian. Confidential 59

Page 60: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

grep • Search all alert logs

• Find all occurrences of ORA-00600 in all alert logs

© 2016 Pythian. Confidential 60

[root@ora12c102rac01 ~]# tfactl grep -E 'ORA-[0]{0,2}600' alert_

Output from host : ora12c102rac01

------------------------------

Searching 'ORA-[0]{0,2}600' in alert_

Searching /u01/cdbrac/app/oracle/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

Searching /u01/cdbrac/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/trace/alert_-MGMTDB.log

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

Searching /u01/cdbrac/app/oracle/diag/rdbms/js122a/js122a1/trace/alert_js122a1.log

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

ORA-00600: internal error code, arguments: [25027], [3], [3], [3], [1], [], [], [], [], [], [], []

Output from host : ora12c102rac02

------------------------------

Similar output for ora12c102rac02

Page 61: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

summary [root@ora12c102rac01 ~]# tfactl summary

Output from host : ora12c102rac01

------------------------------

=====

Nodes

=====

ora12c102rac01

ora12c102rac02

=====

Homes

=====

.----------------------------------------------------------------------------------------------------------.

| Home | Type| Version | Database| Instance | Patches |

+----------------------------------------------+-----+------------+---------+----------+-------------------+

| /u01/cdbrac/app/grid/product/12.1.0.2/grid_1 | GI | 12.1.0.2.0 | | | |

| /u01/cdbrac/app/oracle/product/12.1.0.2/db_1 | DB | 12.1.0.2.0 | js122a | js122a1 | 23054327,23054246 |

'----------------------------------------------+-----+------------+---------+----------+-------------------'

Other nodes similar

© 2016 Pythian. Confidential 61

Page 62: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

summary [root@ora12c102rac01 ~]# tfactl summary

Output from host : ora12c102rac01

------------------------------

=====

Nodes

=====

ora12c102rac01

ora12c102rac02

=====

Homes

=====

.----------------------------------------------------------------------------------------------------------.

| Home | Type| Version | Database| Instance | Patches |

+----------------------------------------------+-----+------------+---------+----------+-------------------+

| /u01/cdbrac/app/grid/product/12.1.0.2/grid_1 | GI | 12.1.0.2.0 | | | |

| /u01/cdbrac/app/oracle/product/12.1.0.2/db_1 | DB | 12.1.0.2.0 | js122a | js122a1 | 23054327,23054246 |

'----------------------------------------------+-----+------------+---------+----------+-------------------'

Other nodes similar

© 2016 Pythian. Confidential 62

Page 63: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

ls • Find all alert logs

[root@ora12c102rac01 ~]# tfactl ls alert

Output from host : ora12c102rac01

------------------------------

/u01/cdbrac/app/oracle/diag/asm/+asm/+ASM1/trace/alert_+ASM1.log

/u01/cdbrac/app/oracle/diag/crs/ora12c102rac01/crs/trace/alert.log

/u01/cdbrac/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/trace/alert_-MGMTDB.log

/u01/cdbrac/app/oracle/diag/rdbms/js122a/js122a1/trace/alert_js122a1.log

Output from host : ora12c102rac02

------------------------------

/u01/cdbrac/app/oracle/diag/asm/+asm/+ASM2/trace/alert_+ASM2.log

/u01/cdbrac/app/oracle/diag/crs/ora12c102rac02/crs/trace/alert.log

/u01/cdbrac/app/oracle/diag/rdbms/_mgmtdb/-MGMTDB/trace/alert_-MGMTDB.log

/u01/cdbrac/app/oracle/diag/rdbms/js122a/js122a2/trace/alert_js122a2.log

© 2016 Pythian. Confidential 63

Page 64: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

param

• Listener parameters for all nodes

[root@ora12c102rac01 ~]# tfactl param .*_listener

Output from host : ora12c102rac01

------------------------------

JS122A.ora12c102rac01.jks.com.JS122A.js122a1.local_listener = (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.233)(PORT=1521))

JS122A.ora12c102rac02.jks.com.JS122A.js122a2.local_listener = (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.234)(PORT=1521))

JS122A.ora12c102rac01.jks.com.JS122A.js122a1.remote_listener = ora12c102rac-scan:1521

JS122A.ora12c102rac02.jks.com.JS122A.js122a2.remote_listener = ora12c102rac-scan:1521

Output from host : ora12c102rac02

------------------------------

JS122A.ora12c102rac01.jks.com.JS122A.js122a1.local_listener = (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.233)(PORT=1521))

JS122A.ora12c102rac02.jks.com.JS122A.js122a2.local_listener = (ADDRESS=(PROTOCOL=TCP)(HOST=192.168.1.234)(PORT=1521))

JS122A.ora12c102rac01.jks.com.JS122A.js122a1.remote_listener = ora12c102rac-scan:1521

JS122A.ora12c102rac02.jks.com.JS122A.js122a2.remote_listener = ora12c102rac-scan:1521

© 2016 Pythian. Confidential 64

Page 65: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Even more

• procwatcher

• sqlt

• alertsummary

• vi

• tail

• dbglevel

• history

• RDA

• changes

© 2016 Pythian. Confidential 65

Page 66: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA

Miscellaneous

Page 67: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TFA Data Masking

• Data masked during collection

• Create tfa_home/resources/mask_strings.xml

<mask_strings>

<mask_string>

<original>WidgetNode1</original>

<replacement>Node1</replacement>

</mask_string>

<mask_string>

<original>192.168.5.1</original>

<replacement>Node1-IP</replacement>

</mask_string>

...

</mask_strings>

© 2016 Pythian. Confidential 67

Page 68: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

SR Data Collection • Error Types

• ora600

• ora7445

• ora700

• ora4031

• ora4030

• ora27300

• ora27301

• Ora27302

• Performance

• dbperf

• Internal Errors

• internalerror

• Does not work as root

[root@ora12c102rac01 ~]# tfactl diagcollect -srdc -help

SRDC diagostic collections must be run as an oracle privileged user - not root

© 2016 Pythian. Confidential 68

Page 69: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

SR Data Collection Examples [oracle@ora12c102rac01 ~]$ tfactl diagcollect -srdc ora700

Enter the time of the ORA-00700 [YYYY-MM-DD HH24:MI:SS,<RETURN>=ALL] : 2017-01-

29

Enter the Database Name [<RETURN>=ALL] : js122a

No events matching the timestamp Jan/28/2017 18:00:00-Jan/29/2017 06:00:00.

The timestamp must be between Dec/28/2016 11:14:08 and Jan/25/2017 16:58:35.

...

2017/01/30 18:31:10 EST : Completed collection of zip files.

...

Logs are being collected to:

/u01/cdbrac/app/oracle/tfa/repository/srdc_ora700_collection_Mon_Jan_30_15_30_16

_PST_2017_node_local

/u01/cdbrac/app/oracle/tfa/repository/srdc_ora700_collection_Mon_Jan_30_15_30_16

_PST_2017_node_local/ora12c102rac01.tfa_srdc_ora700_Mon_Jan_30_15_30_16_PST_2017

.zip

© 2016 Pythian. Confidential 69

Page 70: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

What’s left?

Start Using Them!

© 2016 Pythian. Confidential 70

Page 72: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Resources

Page 73: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Resources

• Oracle Blog – OraCHK Overview

• https://community.oracle.com/community/support/support-blogs/database-support-

blog/blog/2015/11/09/how-to-use-orachk-to-reduce-your-risk

• Oracle Blog – TFA Overview

• https://community.oracle.com/community/support/support-blogs/database-support-

blog/blog/2016/12/12/oracle-trace-file-analyzer-tfa-an-overview-guide

• NFS rpc_wait issues • http://blog.tanelpoder.com/2013/02/21/peeking-into-linux-kernel-land-using-proc-filesystem-for-

quickndirty-troubleshooting/

• NFS Troubleshooting

• https://wiki.archlinux.org/index.php/NFS/Troubleshooting

• How to Use OraCHK to Reduce You Risk

• Oracle Trace File Analyzer (TFA) - an Overview Guide

© 2016 Pythian. Confidential 73

Page 75: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

THANK YOU

© 2016 Pythian. Confidential 75

Page 76: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

ABOUT PYTHIAN

Pythian’s 400+ IT professionals

help companies adopt and

manage disruptive technologies

to better compete

© 2016 Pythian. Confidential 76

Page 77: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

Systems currently

managed by Pythian

EXPERIENCED

Pythian experts

in 35 countries

GLOBAL

Millennia of experience

gathered and shared over

19 years

EXPERTS

11,800 2 400

© 2016 Pythian. Confidential 77

Page 78: Anomalies: Prevention, Detection and Diagnosis with OraChk ... · log_buffer integer 10175488 * Oracle has rounded the size up to consume a granule of memory See Oracle Calculation

TECHNICAL EXPERTISE

© 2016 Pythian. Confidential 78

Infrastructure: Transforming and

managing the IT infrastructure

that supports the business

DevOps: Providing critical velocity

in software deployment by adopting

DevOps practices

Cloud: Using the disruptive

nature of cloud for accelerated,

cost-effective growth

Databases: Ensuring databases

are reliable, secure, available and

continuously optimized

Big Data: Harnessing the transformative

power of data on a massive scale

Advanced Analytics: Mining data for

insights & business transformation

using data science