real life rac performance tuning - nyoug

52
Real Life RAC Performance Real Life RAC Performance Tuning Tuning Arup Nanda Arup Nanda Lead DBA Lead DBA Starwood Hotels Starwood Hotels

Upload: others

Post on 09-Feb-2022

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real Life RAC Performance Tuning - Nyoug

Real Life RAC Performance Real Life RAC Performance TuningTuning

Arup NandaArup NandaLead DBALead DBA

Starwood HotelsStarwood Hotels

Page 2: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Who am IWho am IOracle DBA for 13 years and countingOracle DBA for 13 years and countingWorking on OPS from 1999Working on OPS from 1999Implemented and supported 10g RAC in Implemented and supported 10g RAC in 83 sites since 200483 sites since 2004Speak at conferences, write papers, booksSpeak at conferences, write papers, books

Page 3: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Why This SessionWhy This SessionI get emails like this:I get emails like this:

We are facing performance issues in RAC. We are facing performance issues in RAC. What should I do next?What should I do next?

Real Life AdviceReal Life AdviceCommon Issues (with Wait Events)Common Issues (with Wait Events)Dispelling MythsDispelling MythsFormulate a Plan of AttackFormulate a Plan of AttackReal Life Case StudyReal Life Case Study

proligence.com/downloads.htmlproligence.com/downloads.html

Page 4: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Our RAC ImplementationOur RAC ImplementationOracle 10g RAC in March 2004Oracle 10g RAC in March 2004

Itanium Platform running HP/UXItanium Platform running HP/UXOracle 10.1.0.2Oracle 10.1.0.2

Result: Failed Result: Failed Second Attempt: Dec 2004 Second Attempt: Dec 2004 –– 10.1.0.310.1.0.3Result: Failed Again Result: Failed Again Third Attempt: March 2005 Third Attempt: March 2005 –– 10.1.0.410.1.0.4Result: Success! Result: Success! ☺☺

Page 5: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

ChallengesChallengesTechnologyTechnology

Lone rangerLone rangerA lot of A lot of ““mysterymystery”” and disconnected and disconnected ““factsfacts””!!

PeoplePeopleBuilding a team that could not only deliver; but Building a team that could not only deliver; but also sustain the delivered partsalso sustain the delivered partsEach day we learned something newEach day we learned something new

In todayIn today’’s session: real performance s session: real performance issues we faced and how we resolved issues we faced and how we resolved them, along with wait events.them, along with wait events.

Page 6: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Why Why ““RACRAC”” Performance?Performance?All tuning concepts in single instance All tuning concepts in single instance applied to RAC as wellapplied to RAC as wellRAC has other complexitiesRAC has other complexities

More than 1 buffer cacheMore than 1 buffer cacheMultiple caches Multiple caches –– library cache, row cachelibrary cache, row cacheInterconnectInterconnectPingingPingingGlobal LockingGlobal Locking

Page 7: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Why RAC Why RAC PerfPerf TuningTuningWe want to make sure we identify the right We want to make sure we identify the right problem and go after it problem and go after it ……. not just . not just aa problemproblem

Page 8: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Switch

VIPServiceListenerInstance

ASMClusterware

Op Sys

VIPServiceListenerInstance

ASMClusterware

Op Sys

Public Interface

Cache

Cache Fusion

OCR Voting

SwitchInterconnect

Storage

Node1 Node2

Lock Manager

Page 9: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Areas of Concern in RACAreas of Concern in RACMore than 1 buffer cacheMore than 1 buffer cacheMultiple caches Multiple caches –– library cache, row cachelibrary cache, row cacheInterconnectInterconnectGlobal LockingGlobal Locking

Page 10: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Cache IssuesCache IssuesTwo Caches, requires synchronizationTwo Caches, requires synchronizationWhat that means:What that means:

A changed block in one instance, when A changed block in one instance, when requested by another, should be sent across requested by another, should be sent across via a via a ““bridgebridge””This bridge is the InterconnectThis bridge is the Interconnect

Page 11: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Interconnect PerformanceInterconnect PerformanceInterconnect must be on a private LANInterconnect must be on a private LANPort aggregation to increase throughputPort aggregation to increase throughput

APA on HPUXAPA on HPUXIf using Gigabit over Ethernet, use Jumbo If using Gigabit over Ethernet, use Jumbo FramesFrames

Page 12: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Checking Interconnect UsedChecking Interconnect UsedIdentify the interconnect usedIdentify the interconnect used

$ $ oifcfgoifcfg getifgetiflan902 172.17.1.0 global lan902 172.17.1.0 global cluster_interconnectcluster_interconnect

lan901 10.28.188.0 global publiclan901 10.28.188.0 global public

Is lan902 the bonded interface? If not, Is lan902 the bonded interface? If not, then set itthen set it

$ $ oifcfgoifcfg setifsetif ……

Page 13: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Pop QuizPop QuizIf I have a very fast interconnect, I can If I have a very fast interconnect, I can perform the same work in multiple node perform the same work in multiple node RAC as a single server with faster CPUs. RAC as a single server with faster CPUs. True/False?True/False?Since cache fusion is now writeSince cache fusion is now write--write, a write, a fast interconnect will compensate for a fast interconnect will compensate for a slower IO subsystem. True/False?slower IO subsystem. True/False?

Page 14: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Cache Coherence TimesCache Coherence TimesThe time is a sum of time for:The time is a sum of time for:

Finding the block in the cacheFinding the block in the cacheIdentifying the masterIdentifying the masterGet the block in the interconnectGet the block in the interconnectTransfer speed of the interconnectTransfer speed of the interconnectLatency of the interconnectLatency of the interconnectReceive the block by the remote instanceReceive the block by the remote instanceCreate the consistent image for the userCreate the consistent image for the user

CPU

CPU

Interconnect

Page 15: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

So it all boils down to:So it all boils down to:Block Access CostBlock Access Cost

more blocks more blocks --> more the time> more the timeParallel QueryParallel Query

Lock Management CostLock Management CostMore coordination More coordination --> more time> more timeImplicit Cache Checks Implicit Cache Checks –– Sequence NumbersSequence Numbers

Interconnect CostInterconnect CostLatencyLatencySpeedSpeedmore data to transfer more data to transfer --> more the time> more the time

Page 16: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Hard LessonsHard LessonsIn RAC, problem symptoms may not In RAC, problem symptoms may not indicate the correct problem!indicate the correct problem!Example:Example:

When the CPU is too busy to receive or send When the CPU is too busy to receive or send packets via UDP, the packets fails and the packets via UDP, the packets fails and the ClusterwareClusterware thinks the node is down and thinks the node is down and evicts it.evicts it.

Page 17: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

OS TroubleshootingOS TroubleshootingOS utilities to troubleshoot CPU issuesOS utilities to troubleshoot CPU issues

toptopglanceglance

OS Utilities to troubleshoot process OS Utilities to troubleshoot process issues:issues:

trusstrussstracestracedbxdbxpstackpstack

Page 18: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Reducing LatencyReducing LatencyA factor of technologyA factor of technologyTCP is the most latentTCP is the most latentUDP is better (over Ethernet)UDP is better (over Ethernet)Proprietary protocols are usually betterProprietary protocols are usually better

HyperFabricHyperFabric by HPby HPReliable Datagram (RDP) Reliable Datagram (RDP) Direct Memory ChannelDirect Memory Channel

InfinibandInfinibandUDP over UDP over InfinibandInfinibandRDP over RDP over InfinibandInfiniband

Page 19: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Start with AWRStart with AWR

Page 20: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

gcgc current|crcurrent|cr grant 2grant 2--wayway

Instance 1

Instance 2

Session

Database

LMS LGWR

Log BufferLog Buffer

LMS

gc current block requestgc current grant 2-way

Page 21: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

gcgc current|crcurrent|cr block 2block 2--wayway

Instance 1

Instance 2

Session

Database

LMS LGWR

Log BufferLog Buffer

LMS

gc current block request log file syncgc current block 2-way

Page 22: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

gcgc current|crcurrent|cr block 3block 3--wayway

Instance 1

Instance 2

Instance 3

Session

Requestor

Master

Holder

gc current block 3-way

Page 23: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

RAC related StatsRAC related Stats

Page 24: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

RAC Stats contd.RAC Stats contd.

Page 25: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Page 26: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Other GC Block WaitsOther GC Block Waitsgcgc current/current/crcr block lostblock lost

Lost blocks due to Interconnect or CPULost blocks due to Interconnect or CPUgcgc curent/crcurent/cr block busyblock busy

The consistent read request was delayed, The consistent read request was delayed, most likely an I/O bottleneckmost likely an I/O bottleneck

gcgc current/current/crcr block congestedblock congestedLong run queues and/or paging due to Long run queues and/or paging due to memory deficiency.memory deficiency.

Page 27: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Hung or Slow?Hung or Slow?Check Check V$SESSIONV$SESSION for for WAIT_TIMEWAIT_TIME

If 0, then itIf 0, then it’’s not waiting; its not waiting; it’’s hungs hungWhen hung:When hung:

Take a Take a systemstatesystemstate dump from all nodesdump from all nodesWait some timeWait some timeTake another Take another systemstatesystemstate dumpdumpCheck change in values. If unchanged, then Check change in values. If unchanged, then system is hungsystem is hung

Page 28: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Chart a PlanChart a PlanRule out the obviousRule out the obviousStart with AWR ReportStart with AWR ReportStart with TopStart with Top--5 Waits5 WaitsSee if they have any significant waitsSee if they have any significant waits…… especially RAC relatedespecially RAC relatedGo on to RAC StatisticsGo on to RAC StatisticsBase your solution based on the wait Base your solution based on the wait eventevent

Page 29: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Rule out the obviousRule out the obviousIs interconnect private?Is interconnect private?Is interconnect on UDP?Is interconnect on UDP?Do you see high CPU?Do you see high CPU?Do you see a lot of IO bottleneck?Do you see a lot of IO bottleneck?How about memory?How about memory?Are the apps spread over evenly?Are the apps spread over evenly?Do you see lost blocks?Do you see lost blocks?

Page 30: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Make Simple FixesMake Simple FixesStrongly consider RAID 0+1Strongly consider RAID 0+1Highest possible number of I/O pathsHighest possible number of I/O pathsUse fastest interconnect possibleUse fastest interconnect possibleUse private collision free domain for I/CUse private collision free domain for I/CCache and NOORDER sequencesCache and NOORDER sequences

Page 31: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Enterprise ManagerEnterprise Manager

Page 32: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Buffer BusyBuffer BusyCauseCause

Instance wants to bring something from disk Instance wants to bring something from disk to the buffer cacheto the buffer cacheDelay, due to space not availableDelay, due to space not availableDelay, Delay, bb’’cozcoz the source buffer is not readythe source buffer is not readyDelay, I/O is slowDelay, I/O is slowDelay, Delay, bb’’cozcoz redo log is being flushedredo log is being flushed

In summaryIn summaryLog buffer flush Log buffer flush --> > gcgc buffer busybuffer busy

Page 33: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Parallel QueryParallel QueryOne major issue in RAC is parallel query One major issue in RAC is parallel query that goes across many nodesthat goes across many nodes

Instance 1 Instance 2QC

Slave Slave Slave Slave

ViaInterconnect

Page 34: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Restricting PQRestricting PQDefine Instance GroupsDefine Instance GroupsSpecify in Specify in init.orainit.oraprodb1.instance_groups='pqgroup1'prodb1.instance_groups='pqgroup1'

prodb2.instance_groups='pqgroup2' prodb2.instance_groups='pqgroup2'

Specify Instance Groups in SessionSpecify Instance Groups in SessionSQL> alter session set SQL> alter session set parallel_instance_groupparallel_instance_group = = 'pqgroup1'; 'pqgroup1';

Page 35: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Forcing PQ on both NodesForcing PQ on both NodesDefine a common Instance GroupDefine a common Instance Groupprodb1.instance_groups='pqgroup1prodb1.instance_groups='pqgroup1‘‘

prodb1.instance_groups=prodb1.instance_groups=‘‘pq2nodes'pq2nodes'

prodb2.instance_groups='pqgroup2'prodb2.instance_groups='pqgroup2'

prodb2.instance_groups='pq2nodes' prodb2.instance_groups='pq2nodes'

Specify Instance Groups in SessionSpecify Instance Groups in SessionSQL> alter session set SQL> alter session set parallel_instance_groupparallel_instance_group = = 'pq2nodes'; 'pq2nodes';

Page 36: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Vital Cache Fusion ViewsVital Cache Fusion Viewsgv$cache_transfergv$cache_transfer:: Monitor blocks Monitor blocks transferred by objecttransferred by objectgv$class_cache_transfergv$class_cache_transfer:: Monitor Monitor block transfer by classblock transfer by classgv$file_cache_transfergv$file_cache_transfer:: Monitor Monitor the blocks transferred per file the blocks transferred per file gv$temp_cache_transfergv$temp_cache_transfer:: Monitor Monitor the transfer of temporary tablespace the transfer of temporary tablespace blocksblocks

Page 37: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

““HotHot”” TablesTablesTables, e.g. Rate PlansTables, e.g. Rate Plans

SmallSmallCompact blocksCompact blocksHigh updatesHigh updatesHigh readsHigh reads

SymptomsSymptomsgcgc buffer busy waitsbuffer busy waits

SolutionSolutionLess rows per blockLess rows per blockHigh PCTFREE, INITRANS, High PCTFREE, INITRANS, ALTER TABLE ALTER TABLE …… MINIMIZE MINIMIZE RECORDS_PER_BLOCKRECORDS_PER_BLOCK

Page 38: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Hot SequencesHot SequencesSymptoms:Symptoms:

High waits on Sequence Number latchHigh waits on Sequence Number latchHigh waits on SEQ$ tableHigh waits on SEQ$ table

Solution:Solution:Increase the cacheIncrease the cacheMake it NOORDERMake it NOORDER

Especially AUDSESS$ sequence in SYS, Especially AUDSESS$ sequence in SYS, used in Auditingused in Auditing

Page 39: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Read Only? Say So.Read Only? Say So.Reading table data from other instances Reading table data from other instances create create ““gcgc **”” contentionscontentionsSuggestion:Suggestion:

Move Read Only tables to a single tablespaceMove Read Only tables to a single tablespaceMake this tablespace Read OnlyMake this tablespace Read OnlySQL> alter tablespace ROD read SQL> alter tablespace ROD read only;only;

Page 40: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

PartitioningPartitioningPartitioning creates several segments for Partitioning creates several segments for the same table (or index) the same table (or index) => more resources => more resources => less contention=> less contention

Page 41: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Monotonically Increasing IndexMonotonically Increasing IndexProblem:Problem:

““Reservation IDReservation ID””, a sequence generated key, a sequence generated keyIndex is heavy on one sideIndex is heavy on one side

SymptomsSymptomsBuffer busy waitsBuffer busy waitsIndex block Index block spiltsspilts

Solutions:Solutions:Reverse key indexesReverse key indexesHash partitioned index (even if the table is not Hash partitioned index (even if the table is not partitioned) 10gR2partitioned) 10gR2

Page 42: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Library CacheLibrary CacheIn RAC, Library Cache is globalIn RAC, Library Cache is globalSo, parsing cost is worse than nonSo, parsing cost is worse than non--RACRACSolutions:Solutions:

Minimize table alters, drops, creates, Minimize table alters, drops, creates, truncatestruncatesUse PL/SQL stored programs, not unnamed Use PL/SQL stored programs, not unnamed blocksblocks

Page 43: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Log FilesLog FilesIn 10g R2, the log files are in a single location:In 10g R2, the log files are in a single location:$CRS_HOME/log/<Host>/$CRS_HOME/log/<Host>/……

racgracg

crsdcrsd

cssdcssd

evmdevmd

clientclient

cssd/oclsmoncssd/oclsmon

$ORACLE_HOME/$ORACLE_HOME/racgracg/dump/dump

Page 44: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Case StudyCase Study

Page 45: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

DiagnosisDiagnosisifconfigifconfig --aa shows no congestion or shows no congestion or dropped packetsdropped packetsTop shows 1% idle time on node 2Top shows 1% idle time on node 2Top processesTop processes

LMS and LMDLMS and LMDAnd, several And, several NetbackupNetbackup processesprocesses

Page 46: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Further DiagnosisFurther DiagnosisSQL:SQL:select * from select * from v$instance_cache_transferv$instance_cache_transferwhere class = 'data block'where class = 'data block'and instance = 1;and instance = 1;

Output:Output:INSTANCE CLASS CR_BLOCK CR_BUSYINSTANCE CLASS CR_BLOCK CR_BUSY

-------------------- ------------------------------------ -------------------- --------------------CR_CONGESTED CURRENT_BLOCK CURRENT_BUSY CURRENT_CONGESTEDCR_CONGESTED CURRENT_BLOCK CURRENT_BUSY CURRENT_CONGESTED------------------------ -------------------------- ------------------------ ----------------------------------

1 data block 162478682 50971491 data block 162478682 5097149477721 347917908 2950144 16320267477721 347917908 2950144 16320267

After sometime:After sometime:INSTANCE CLASS CR_BLOCK CR_BUSYINSTANCE CLASS CR_BLOCK CR_BUSY-------------------- ------------------------------------ -------------------- --------------------CR_CONGESTED CURRENT_BLOCK CURRENT_BUSY CURRENT_CONGESTEDCR_CONGESTED CURRENT_BLOCK CURRENT_BUSY CURRENT_CONGESTED------------------------ -------------------------- ------------------------ ----------------------------------

1 data block 162480580 50971851 data block 162480580 5097185477722 347923719 2950376 16320269477722 347923719 2950376 16320269

See increases

Page 47: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Diagnosis:Diagnosis:CPU starvation by LMS/D processes caused CPU starvation by LMS/D processes caused GC waits.GC waits.

Solution:Solution:Killed the Killed the NetbackupNetbackup processesprocessesLMD and LMS got the CPULMD and LMS got the CPU

Page 48: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Increasing Interconnect SpeedIncreasing Interconnect SpeedFaster HardwareFaster Hardware

Gigabit Ethernet; not FastGigabit Ethernet; not FastInfinibandInfiniband, even if IP over IB, even if IP over IB

NIC settingsNIC settingsDuplex ModeDuplex ModeHighest Top Bit RateHighest Top Bit Rate

TCP SettingsTCP SettingsFlow Control SettingsFlow Control SettingsNetwork Interrupts for CPUNetwork Interrupts for CPUSocket Receive BufferSocket Receive Buffer

LAN PlanningLAN PlanningPrivate LANsPrivate LANsCollision DomainsCollision Domains

Page 49: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

High Speed InterconnectsHigh Speed InterconnectsOracle will support RDS over Oracle will support RDS over InfinibandInfinibandhttp://oss.oracle.com/projects/rds/http://oss.oracle.com/projects/rds/On 10 Gig Ethernet as wellOn 10 Gig Ethernet as well

Page 50: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

In summary: PlanningIn summary: PlanningAdequate CPU, Network, MemoryAdequate CPU, Network, MemorySequences Sequences –– cache, cache, noordernoorderTablespaces read onlyTablespaces read onlyUnUn--compact small hot tablescompact small hot tablesKeep undo and redo on fastest disksKeep undo and redo on fastest disksAvoid full table scans of large tablesAvoid full table scans of large tablesAvoid Avoid DDLsDDLs and unnamed PL/SQL blocksand unnamed PL/SQL blocks

Page 51: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

In summary: DiagnosisIn summary: DiagnosisStart with AWRStart with AWRIdentify symptoms and assign causesIdentify symptoms and assign causesDonDon’’t get fooled by t get fooled by ““gcgc”” waits as waits as interconnect issuesinterconnect issuesFind the correlation between Find the correlation between ““droppeddropped””packets in network, CPU issues from packets in network, CPU issues from sarsarand and ““gcgc buffer lostbuffer lost”” in in sysstatsysstat reports.reports.

Page 52: Real Life RAC Performance Tuning - Nyoug

©© Arup NandaArup Nanda

Thank You!Thank You!Download from:Download from:

proligence.com/downloads.htmlproligence.com/downloads.html