exadata - experiences - nyougnyoug.org/wp-content/uploads/2016/04/exadata_experiences.pdf · –...

Exadata- Experiences

Ifwehadtodoitagain...

ComputeNodescrashing- NetBackupBackupsonOS

• ComputeNodeServersstartedcrashingleft&right– inthemiddleofabusinessday

• CrashDumpAnalysisshowedNBUsoftwareinteractingwithMegaRAIDdriverforOSbackupcausingkernelpanic

ComputeNodescrashing- NetBackupBackupsonOS(Contd...)

• Novendorfingerpointing!– OracledevelopednewMEGARAIDdriverrpmaftertheyreproducedissuein-house

• OraclealsoadvisedtogofornativeLVMSnapshotbackupsovertotheNFSmounts

WBFC&NORMALredundancy- WaterandOil

• WBFC=WriteBackFlashCache• Itworkslikethis:

Source:Enkitec/ExpertOracleExadata

WBFC&NORMALredundancy- WaterandOil(Contd...)

• Thereare12disksand16FMODsineachcellserver

• DB/ASMlayerdon’tknowaboutcachinginI/Osubsystem

• I/Osubsytem responsibletoacknowledgewritecompletion

• DependingontheASMredundancylevel,ASMwritestoeither2or3disks

• IfweareabsorbingthosewritesinWBFC,then,thewritesgoto2FMODsindifferentFailgroups

WBFC&NORMALredundancy- WaterandOil (Contd...)

• Say,aFMODfaulted=>4Flashdisks absorbingwritesaregone• Thedirtybuffers(thattheDBthinksareondisk)ontheseare

goneaswell• But,sinceASMdid2suchwrites(NORMALredundancy), there’s

nolossofdataand“resilvering”kicksintosavetheday• Say,anotherFMODalsofaulted(Murphy?)thatheldthe2nd copy


• InsteadofdismountingwholeASMDG,theseblocksthatarelostbothprimaryandthecopyaremarkedasBADFDATAintheDB(1557421.1)

• AllDBsinclusterwillhavetheBADFDATAiftheyhadactivewritesandarecaughtincross-fire

• Youareinla-laland=>allDBsoftheclusterarecorruptedandneedrestorefromRMAN

• DBProcessingcanstillgoonifimportanttablesofyourDBarenothitwithcorruption


• selectFILE_NAME,file_id,TABLESPACE_NAMEfromdba_data_files whereFILE_IDin(selectdistinctFILE#fromgv$database_block_corruption);

• alterdatabasedatafile 6offline;• ...• alterdatabasedatafile 15offline;• run{• ALLOCATECHANNEL'T6'DEVICETYPE'SBT_TAPE'PARMS

'ENV=(NB_ORA_CLIENT=jc08bkp.mycompany.com,NB_ORA_POLICY=nbu_db_policy,NB_ORA_SCHED=full),BLKSIZE=1048576';

• restoredatafile 6,14,2,5,3,10,15;• recoverdatafile 6,14,2,5,3,10,15;• }

• alterdatabasedatafile 6oNline;• ...• alterdatabasedatafile 15oNline;

Whenthehomeseemssmallforthewholefamily(Contd...)


• LSIcardisthecontrollerfor3activedisksandisbatterybackedupforwritebackcache

• 4th diskinComputenodeswas“hotspare”– ifanyofthe3disksintheRAID-5config broke,theRAIDconfig willintakethe4th disk

andRAIDrebuildwillbeinitiated• Since11.2.3.2.1(seemslikemedievalagenow!),youcanuseupthe4th diskalso in

theactiveRAIDconfig– ToincreasetheavailablestorageontheDBnodes

• Ifacomputenode’sdiskfaulted,theRAID-5offerstheprotection,butwestillneedtoreplacethediskasap– TheRAID-5isnowsaidtobeinadegradedmode

[root@jcdb02 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|firmware" Slot Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B70 Slot Number: 1 Firmware state: Online, Spun Up Device Firmware Level: 0B70 Slot Number: 2 Firmware state: Online, Spun Up Device Firmware Level: 0B70 Slot Number: 3 Firmware state: Online, Spun Up Device Firmware Level: 0B70 [root@xdhfd5cn02 ~]#


• Ifacomputenode’sdiskfaulted,theRAID-5offerstheprotection,butwestillneedtoreplacethediskasap– TheRAID-5isnowsaidtobeinadegradedmode

[root@jcdb03oracle]#/opt/MegaRAID/MegaCli/MegaCli64-LdPdInfo-a0|grep-iE"target|state|slot"VirtualDrive:0(TargetId:0)State:DegradedSlotNumber:0Firmwarestate:Online,SpunUpForeignState:NoneSlotNumber:1Firmwarestate:Online,SpunUpForeignState:NoneSlotNumber:2Firmwarestate:Online,SpunUpForeignState:NoneSlotNumber:3Firmwarestate:OfflineForeignState:None

WhenCellSystemdiskCrasheswhileyoudotheCellimaging?• It’sPRODcellimagingweekend• Youtypeandwaitforthepromptback:– ./patchmgr -cells/home/oracle/cell_group–patch

(Source:OracleDocumentation)

• Ofthe9000minutesinayear,yourcelldiskchoosestocrashduringthistime

• Gotthepromptbackwithfailuremessage:– [failed]Patchorrollbackfailedasreportedby

/root/_patch_hctap_/_p_/install.sh -querystateonthecell.– [ERROR]Patchorrollbackfailed.Pleaseruncleanupbeforeretrying.

• Ofthe12disks,thesystemdisk,whereyouaregoingtoimage,crashes(Murphy?)– Anyotherdisk,justadrop-rebalance, re-run failedimagingand

replacement

• Youhopeandpraythediskcrashalertisfalsealert,butit’sreal

WhenCellSystemdiskcrasheswhileyoudotheimaging?(Contd..)

WhenCellSystemdiskcrasheswhileyoudotheimaging?(Contd..)

• Badthing:everysinglecriticalOLTPapplicationiswaitingforthewindowtobeover(toughnegotiated11hwindow,whenyouareintoheavyconsolidationforROI)

• Goodthing:Youhaven’tpanickedandcanactuallyformulateaPOAduringcrisisJ

• Here’sasimplesequence,**AFTER** thediskisreplaced:– cellcli -ealtergriddisk allinactive– cellcli -ealtercellshutdownservicesall– #./patchmgr -cells/root/cells.txt-cleanup<=============== where

thefilecells.txtshouldhavejustonewordinit:thefailedcellnode,forexamplecellnode310

– #./patchmgr -cells/root/cells.txt-patch_check_prereq– #./patchmgr -cells/root/cells.txt–patch– #cellcli -ealtercellstartupservicescellsrv– #cellcli -ealtergriddisk allactive

OEL5toOEL6baggage(DBnodes)• Alot,repeat,lotofthingsmayneedfixingwhenyougofromaimage(say12.1.1.1.1)runningOEL5toanimage,say12.1.2.1.2runningOEL6

• Here’sasamplingofwhatyoumayneedtobepreparedfor:– ASRnotworking&MuttIssues– DBFSmountwasunabletomount

• exportORACLE_HOME=/u01/app/home/version/11.2.0.3/sw• exportLD_LIBRARY_PATH=$ORACLE_HOME/lib• exportTNS_ADMIN=$ORACLE_HOME/network/admin• nohup $ORACLE_HOME/bin/dbfs_client /@dbfsdb -ofailover-oallow_other /mnt/dbfs >/tmp/dbfs_mount.out &

OEL5toOEL6baggage(DBnodes)(Contd...)

– MissingRPMs• BothOracle

– authconfig (x86_64),libXmu (x86_64), libXpm (x86_64), libXt (x86_64)• And3rd Party

– Example:lftp-3.7.11-7.el5,mutt-1.4.2.2-6.el5,tcpdump-3.9.4-15.el5,ttmkfdir-3.0.9-23.el5

– OSServicesnotstartedwithautoenabledoption• CA-WAAE|acpid|altiris|netbackup|rscd|vasd|vxpbx_exchanged

• Vintela issues• NFSissue(uid &groupidshowing‘nobody’fornfs filesystem)• /etc/cron.allow willnothaveanythingandneeds restore• etc

ComputeNodeImaging– GoneHaywire

• Sometimes,yousailsmoothlyintotheperfectstorminsidetheSAMEoutagewindow– Dbnodeupdate.shononeofthenodescomplainsaboutRPMs

• Higherversionrpmsor• rpmsfrom2versionspresentor• missingrpms

– “[ERROR]Unabletouninstallsas_snmp”– “ERROR:Givenlocation(-l)doesnotexistordoesnothavetheproperrpms

available”– “ERROR:Itisalsopossiblethesystemalreadyisonthesameorhigherrelease

release”– “[ERROR]Ramfs imagefilenotfound/boot/initrd-2.6.39-

400.126.1.el5uek.img”– Cellphysicaldisk “notpresent”,thankfully, aftertherebootof

cellnode imaging(Oraclereplacedit)– Reboothungafterdbnodeupdate.shforoneofthedb nodes

(recycleviaILOM)

ComputeNodeImaging– GoneHaywire(Contd...)

– MissingIBDriversloadinkernel– Wayoutofthemess,shortofgoingbacktothenodereimaging• ManuallyloadIBmodules (%insmod mlx4_ib.ko,%insmodmlx4_core.ko)

• Copyinitrd fromanothernode(“[ERROR]Ramfs imagefilenotfound/boot/initrd-2.6.39-400.126.1.el5uek.img”)

• Rebootintolaterkernel,runthedbnodeupdate.sh

• LessonLearnt:– Don’ttroubleshoot,justrollback,theveryfirsttimedbnodeupdate.shfails

ComputeNodeImaging– GoneHaywire(Contd...)

Followingsectionstouchbrieflyafewcostlybugsthatfellonourlaps

• Maynotbeentirelyapplicabletoyourenvironment

• Still,but,nevertheless,thesehavecausedmajorheartacheseitherinsideaplannedoutageorcausedanunplannedoutage

ComputeNodewon’tcomeupaftercellimaging

AftercellimagingLVMwasfoundtobecorruptintheinitrd

Copyaproperlvm.conf fromtheoriginalinitrd imageandrebuilditas:1.extracttheinitrds#cd/boot#mkdir xy#cdx#zcat ../initrd-2.6.18-238*.img|cpio -idmv#cd../y#zcat ../initrd-2.6.18-194*.img|cpio -idmv#cd..

2.replacethelvm.conf#mvx/etc/lvm/lvm.conf{,.orig}#cp y/etc/lvm/lvm.conf x/etc/lvm/

3.rebuildanewinitrd#cdx#find./|cpio -Hnewc -o>/boot/initrd-new.img

4.modifythegrubsetting

Whenthehomeseemssmallforthewholefamily...

• Typically,multipleORACLE_HOMEsexistwithmultiplepatchsets/patches/versions

• Morecommoninahighlyconsolidatedenvironment,aspatchingwindows/bugsexposedvaryfromDBtoDB

• Unearththespace,youknewyouhaditallalong

Othertidbits

• VPNgoesoutduringCellImaging– Rollbackandimageagain,necessarily

• MonitorASRtrapfiles

• Chooseyourbattles,noteveryalertneedstobeactionable–

• Whenthere’sasuddenspikeinDBnodeharddiskdevices,checktheBBUcharge– itmaybedoingitsregulardischarge/charge(learncycle)

Configuration check discovered the following problems: [WARNING] Ethernet interface eth0 is not at 1000Mbps. It is at 100 Mbps. Check cables and switches.

[root@dbn07 compmon]# cat /opt/oracle.cellos/common/traps.state 1 ; Fri Mar 21 07:06:13 2016 ; 766e1bb1-d007-49cd-b374-9395066c68ac ; Physicaldisk 252:1 Make Model: HITACHI H103030SCSUN300G is at status predictive failure. Raised fault id: HALRT-02008 ; Physical disk should be replaced. Exadata Compute Server: Disk Serial Number: 3345GBV14E

Othertidbits

• Whenthere’sasuddenspikeinDBnodeharddiskdevices,checktheBBUcharge– itmaybedoingitsregulardischarge/charge(learncycle)

Non-ExadataSpecific

• WhatfollowsareexperiencesthatwerefacedinourExadataenvironment– ButtheyarenotExadataspecificissues,youcanfacetheminnon-Exadataworldaswell

ActiveClonerunningslowafterDBupgrade?

• BusinesscriticalnightlyclonejobtookalongertimeaftersourceDBupgrade

• Tookamonth’sgruelinginvestigationwithSupport/DEV• Onlymanualworkaroundavailable(DEVworkingonfix):

– Increase“_backup_ksfq_bufcnt”=64– Increase“_backup_ksfq_bufsz”=4

ActiveClonerunningslowafterDBupgrade?(Contd...)

• ThisparameterhasnoimpactormeaningoutsidethecontextofRMANandwillonlyinfluencetheIOdoneviaRMAN.

• ThisparameteronlyimpactstheIObuffersusedbyRMANanddoesnotimpactanyotherIO.

• ThewayRMANworksisitusesIObuffersineachRMANchanneltoreadfromdatafile andwritetodestination,calledI/PBuffersandO/PBuffers,respectively.

• Thewritedestinationoftheo/pbufferscouldbebackupmedialikededup applianceinthecaseofbackuportheauxiliaryinstanceinthecaseofRMANclone.

• Bysettingthisbuffercountparameterto64anddefaultbuffersizeof4MB,thechanneluses128buffersforbothi/pando/p,therebyspeedingupthecloneprocess

MysteryShutdown(wasnotatallfun)• Adatabase,let’scallitAAAA,wasgoingdowncleanly&mysteriously• Nohuman,noscript,nojobthatwewereawareofdidthat!• Wewerelosinghairfast,asthisoneDBkeptgoingforashutdown,inrandom

intervals• Finally,wenabbedtheculprit,initsact,red-handed:WasourDBUA!

• AAAAwasaclusteredDBof4instances,but3instancesshutdownanddisabled

• Twoconditions– IfDBUAwasstartedfromoneofthe3nodesaboveand– iftheengineerdidnotselectthecorrectDBtoupgradeinthe1st screen(clickNexttwice,as

DBUAloadsslower)andAAAAbeingalphabeticallythe1st DBselectedbydefaultindbua

• thenthisAAAAdbwasshutdownwithabovemessageinDBUAlog!

[Thread-48][2016-01-1422:52:25.335 EST][Database.checkClusterDB:3408]StopRACdb(inplaceupgradeneeddbtobeopeninmountmodefirst)

exadata - experiences - nyougnyoug.org/wp-content/uploads/2016/04/exadata_experiences.pdf · –...

Documents