Exadata- Experiences
Ifwehadtodoitagain...
ComputeNodescrashing- NetBackupBackupsonOS
• ComputeNodeServersstartedcrashingleft&right– inthemiddleofabusinessday
• CrashDumpAnalysisshowedNBUsoftwareinteractingwithMegaRAIDdriverforOSbackupcausingkernelpanic
ComputeNodescrashing- NetBackupBackupsonOS(Contd...)
• Novendorfingerpointing!– OracledevelopednewMEGARAIDdriverrpmaftertheyreproducedissuein-house
• OraclealsoadvisedtogofornativeLVMSnapshotbackupsovertotheNFSmounts
WBFC&NORMALredundancy- WaterandOil
• WBFC=WriteBackFlashCache• Itworkslikethis:
Source:Enkitec/ExpertOracleExadata
WBFC&NORMALredundancy- WaterandOil(Contd...)
• Thereare12disksand16FMODsineachcellserver
• DB/ASMlayerdon’tknowaboutcachinginI/Osubsystem
• I/Osubsytem responsibletoacknowledgewritecompletion
• DependingontheASMredundancylevel,ASMwritestoeither2or3disks
• IfweareabsorbingthosewritesinWBFC,then,thewritesgoto2FMODsindifferentFailgroups
WBFC&NORMALredundancy- WaterandOil (Contd...)
• Say,aFMODfaulted=>4Flashdisks absorbingwritesaregone• Thedirtybuffers(thattheDBthinksareondisk)ontheseare
goneaswell• But,sinceASMdid2suchwrites(NORMALredundancy), there’s
nolossofdataand“resilvering”kicksintosavetheday• Say,anotherFMODalsofaulted(Murphy?)thatheldthe2nd copy
WBFC&NORMALredundancy- WaterandOil (Contd...)
• InsteadofdismountingwholeASMDG,theseblocksthatarelostbothprimaryandthecopyaremarkedasBADFDATAintheDB(1557421.1)
• AllDBsinclusterwillhavetheBADFDATAiftheyhadactivewritesandarecaughtincross-fire
• Youareinla-laland=>allDBsoftheclusterarecorruptedandneedrestorefromRMAN
• DBProcessingcanstillgoonifimportanttablesofyourDBarenothitwithcorruption
WBFC&NORMALredundancy- WaterandOil (Contd...)
• selectFILE_NAME,file_id,TABLESPACE_NAMEfromdba_data_files whereFILE_IDin(selectdistinctFILE#fromgv$database_block_corruption);
• alterdatabasedatafile 6offline;• ...• alterdatabasedatafile 15offline;• run{• ALLOCATECHANNEL'T6'DEVICETYPE'SBT_TAPE'PARMS
'ENV=(NB_ORA_CLIENT=jc08bkp.mycompany.com,NB_ORA_POLICY=nbu_db_policy,NB_ORA_SCHED=full),BLKSIZE=1048576';
• restoredatafile 6,14,2,5,3,10,15;• recoverdatafile 6,14,2,5,3,10,15;• }
• alterdatabasedatafile 6oNline;• ...• alterdatabasedatafile 15oNline;
Whenthehomeseemssmallforthewholefamily(Contd...)
Whenthehomeseemssmallforthewholefamily(Contd...)
• LSIcardisthecontrollerfor3activedisksandisbatterybackedupforwritebackcache
• 4th diskinComputenodeswas“hotspare”– ifanyofthe3disksintheRAID-5config broke,theRAIDconfig willintakethe4th disk
andRAIDrebuildwillbeinitiated• Since11.2.3.2.1(seemslikemedievalagenow!),youcanuseupthe4th diskalso in
theactiveRAIDconfig– ToincreasetheavailablestorageontheDBnodes
• Ifacomputenode’sdiskfaulted,theRAID-5offerstheprotection,butwestillneedtoreplacethediskasap– TheRAID-5isnowsaidtobeinadegradedmode
[root@jcdb02 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|firmware" Slot Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B70 Slot Number: 1 Firmware state: Online, Spun Up Device Firmware Level: 0B70 Slot Number: 2 Firmware state: Online, Spun Up Device Firmware Level: 0B70 Slot Number: 3 Firmware state: Online, Spun Up Device Firmware Level: 0B70 [root@xdhfd5cn02 ~]#
Whenthehomeseemssmallforthewholefamily(Contd...)
• Ifacomputenode’sdiskfaulted,theRAID-5offerstheprotection,butwestillneedtoreplacethediskasap– TheRAID-5isnowsaidtobeinadegradedmode
[root@jcdb03oracle]#/opt/MegaRAID/MegaCli/MegaCli64-LdPdInfo-a0|grep-iE"target|state|slot"VirtualDrive:0(TargetId:0)State:DegradedSlotNumber:0Firmwarestate:Online,SpunUpForeignState:NoneSlotNumber:1Firmwarestate:Online,SpunUpForeignState:NoneSlotNumber:2Firmwarestate:Online,SpunUpForeignState:NoneSlotNumber:3Firmwarestate:OfflineForeignState:None
WhenCellSystemdiskCrasheswhileyoudotheCellimaging?• It’sPRODcellimagingweekend• Youtypeandwaitforthepromptback:– ./patchmgr -cells/home/oracle/cell_group–patch
(Source:OracleDocumentation)
• Ofthe9000minutesinayear,yourcelldiskchoosestocrashduringthistime
• Gotthepromptbackwithfailuremessage:– [failed]Patchorrollbackfailedasreportedby
/root/_patch_hctap_/_p_/install.sh -querystateonthecell.– [ERROR]Patchorrollbackfailed.Pleaseruncleanupbeforeretrying.
• Ofthe12disks,thesystemdisk,whereyouaregoingtoimage,crashes(Murphy?)– Anyotherdisk,justadrop-rebalance, re-run failedimagingand
replacement
• Youhopeandpraythediskcrashalertisfalsealert,butit’sreal
WhenCellSystemdiskcrasheswhileyoudotheimaging?(Contd..)
WhenCellSystemdiskcrasheswhileyoudotheimaging?(Contd..)
• Badthing:everysinglecriticalOLTPapplicationiswaitingforthewindowtobeover(toughnegotiated11hwindow,whenyouareintoheavyconsolidationforROI)
• Goodthing:Youhaven’tpanickedandcanactuallyformulateaPOAduringcrisisJ
• Here’sasimplesequence,**AFTER** thediskisreplaced:– cellcli -ealtergriddisk allinactive– cellcli -ealtercellshutdownservicesall– #./patchmgr -cells/root/cells.txt-cleanup<=============== where
thefilecells.txtshouldhavejustonewordinit:thefailedcellnode,forexamplecellnode310
– #./patchmgr -cells/root/cells.txt-patch_check_prereq– #./patchmgr -cells/root/cells.txt–patch– #cellcli -ealtercellstartupservicescellsrv– #cellcli -ealtergriddisk allactive
OEL5toOEL6baggage(DBnodes)• Alot,repeat,lotofthingsmayneedfixingwhenyougofromaimage(say12.1.1.1.1)runningOEL5toanimage,say12.1.2.1.2runningOEL6
• Here’sasamplingofwhatyoumayneedtobepreparedfor:– ASRnotworking&MuttIssues– DBFSmountwasunabletomount
• exportORACLE_HOME=/u01/app/home/version/11.2.0.3/sw• exportLD_LIBRARY_PATH=$ORACLE_HOME/lib• exportTNS_ADMIN=$ORACLE_HOME/network/admin• nohup $ORACLE_HOME/bin/dbfs_client /@dbfsdb -ofailover-oallow_other /mnt/dbfs >/tmp/dbfs_mount.out &
OEL5toOEL6baggage(DBnodes)(Contd...)
– MissingRPMs• BothOracle
– authconfig (x86_64),libXmu (x86_64), libXpm (x86_64), libXt (x86_64)• And3rd Party
– Example:lftp-3.7.11-7.el5,mutt-1.4.2.2-6.el5,tcpdump-3.9.4-15.el5,ttmkfdir-3.0.9-23.el5
– OSServicesnotstartedwithautoenabledoption• CA-WAAE|acpid|altiris|netbackup|rscd|vasd|vxpbx_exchanged
• Vintela issues• NFSissue(uid &groupidshowing‘nobody’fornfs filesystem)• /etc/cron.allow willnothaveanythingandneeds restore• etc
ComputeNodeImaging– GoneHaywire
• Sometimes,yousailsmoothlyintotheperfectstorminsidetheSAMEoutagewindow– Dbnodeupdate.shononeofthenodescomplainsaboutRPMs
• Higherversionrpmsor• rpmsfrom2versionspresentor• missingrpms
– “[ERROR]Unabletouninstallsas_snmp”– “ERROR:Givenlocation(-l)doesnotexistordoesnothavetheproperrpms
available”– “ERROR:Itisalsopossiblethesystemalreadyisonthesameorhigherrelease
release”– “[ERROR]Ramfs imagefilenotfound/boot/initrd-2.6.39-
400.126.1.el5uek.img”– Cellphysicaldisk “notpresent”,thankfully, aftertherebootof
cellnode imaging(Oraclereplacedit)– Reboothungafterdbnodeupdate.shforoneofthedb nodes
(recycleviaILOM)
ComputeNodeImaging– GoneHaywire(Contd...)
– MissingIBDriversloadinkernel– Wayoutofthemess,shortofgoingbacktothenodereimaging• ManuallyloadIBmodules (%insmod mlx4_ib.ko,%insmodmlx4_core.ko)
• Copyinitrd fromanothernode(“[ERROR]Ramfs imagefilenotfound/boot/initrd-2.6.39-400.126.1.el5uek.img”)
• Rebootintolaterkernel,runthedbnodeupdate.sh
• LessonLearnt:– Don’ttroubleshoot,justrollback,theveryfirsttimedbnodeupdate.shfails
ComputeNodeImaging– GoneHaywire(Contd...)
Followingsectionstouchbrieflyafewcostlybugsthatfellonourlaps
• Maynotbeentirelyapplicabletoyourenvironment
• Still,but,nevertheless,thesehavecausedmajorheartacheseitherinsideaplannedoutageorcausedanunplannedoutage
ComputeNodewon’tcomeupaftercellimaging
AftercellimagingLVMwasfoundtobecorruptintheinitrd
Copyaproperlvm.conf fromtheoriginalinitrd imageandrebuilditas:1.extracttheinitrds#cd/boot#mkdir xy#cdx#zcat ../initrd-2.6.18-238*.img|cpio -idmv#cd../y#zcat ../initrd-2.6.18-194*.img|cpio -idmv#cd..
2.replacethelvm.conf#mvx/etc/lvm/lvm.conf{,.orig}#cp y/etc/lvm/lvm.conf x/etc/lvm/
3.rebuildanewinitrd#cdx#find./|cpio -Hnewc -o>/boot/initrd-new.img
4.modifythegrubsetting
Whenthehomeseemssmallforthewholefamily...
• Typically,multipleORACLE_HOMEsexistwithmultiplepatchsets/patches/versions
• Morecommoninahighlyconsolidatedenvironment,aspatchingwindows/bugsexposedvaryfromDBtoDB
• Unearththespace,youknewyouhaditallalong
Othertidbits
• VPNgoesoutduringCellImaging– Rollbackandimageagain,necessarily
• MonitorASRtrapfiles
• Chooseyourbattles,noteveryalertneedstobeactionable–
• Whenthere’sasuddenspikeinDBnodeharddiskdevices,checktheBBUcharge– itmaybedoingitsregulardischarge/charge(learncycle)
Configuration check discovered the following problems: [WARNING] Ethernet interface eth0 is not at 1000Mbps. It is at 100 Mbps. Check cables and switches.
[root@dbn07 compmon]# cat /opt/oracle.cellos/common/traps.state 1 ; Fri Mar 21 07:06:13 2016 ; 766e1bb1-d007-49cd-b374-9395066c68ac ; Physicaldisk 252:1 Make Model: HITACHI H103030SCSUN300G is at status predictive failure. Raised fault id: HALRT-02008 ; Physical disk should be replaced. Exadata Compute Server: Disk Serial Number: 3345GBV14E
Othertidbits
• Whenthere’sasuddenspikeinDBnodeharddiskdevices,checktheBBUcharge– itmaybedoingitsregulardischarge/charge(learncycle)
Non-ExadataSpecific
• WhatfollowsareexperiencesthatwerefacedinourExadataenvironment– ButtheyarenotExadataspecificissues,youcanfacetheminnon-Exadataworldaswell
ActiveClonerunningslowafterDBupgrade?
• BusinesscriticalnightlyclonejobtookalongertimeaftersourceDBupgrade
• Tookamonth’sgruelinginvestigationwithSupport/DEV• Onlymanualworkaroundavailable(DEVworkingonfix):
– Increase“_backup_ksfq_bufcnt”=64– Increase“_backup_ksfq_bufsz”=4
ActiveClonerunningslowafterDBupgrade?(Contd...)
• ThisparameterhasnoimpactormeaningoutsidethecontextofRMANandwillonlyinfluencetheIOdoneviaRMAN.
• ThisparameteronlyimpactstheIObuffersusedbyRMANanddoesnotimpactanyotherIO.
• ThewayRMANworksisitusesIObuffersineachRMANchanneltoreadfromdatafile andwritetodestination,calledI/PBuffersandO/PBuffers,respectively.
• Thewritedestinationoftheo/pbufferscouldbebackupmedialikededup applianceinthecaseofbackuportheauxiliaryinstanceinthecaseofRMANclone.
• Bysettingthisbuffercountparameterto64anddefaultbuffersizeof4MB,thechanneluses128buffersforbothi/pando/p,therebyspeedingupthecloneprocess
MysteryShutdown(wasnotatallfun)• Adatabase,let’scallitAAAA,wasgoingdowncleanly&mysteriously• Nohuman,noscript,nojobthatwewereawareofdidthat!• Wewerelosinghairfast,asthisoneDBkeptgoingforashutdown,inrandom
intervals• Finally,wenabbedtheculprit,initsact,red-handed:WasourDBUA!
• AAAAwasaclusteredDBof4instances,but3instancesshutdownanddisabled
• Twoconditions– IfDBUAwasstartedfromoneofthe3nodesaboveand– iftheengineerdidnotselectthecorrectDBtoupgradeinthe1st screen(clickNexttwice,as
DBUAloadsslower)andAAAAbeingalphabeticallythe1st DBselectedbydefaultindbua
• thenthisAAAAdbwasshutdownwithabovemessageinDBUAlog!
[Thread-48][2016-01-1422:52:25.335 EST][Database.checkClusterDB:3408]StopRACdb(inplaceupgradeneeddbtobeopeninmountmodefirst)