indexer clustering basic, internals - splunkconf · [email protected] sr. software engineer splunk...

Copyright©2016SplunkInc.

[email protected].

Indexerclusteringbasics,internals&generaldebugging

Disclaimer

2

Duringthecourseofthispresentation,wemaymakeforwardlookingstatementsregardingfutureeventsortheexpectedperformanceofthecompany.Wecautionyouthatsuchstatementsreflectourcurrentexpectationsandestimatesbasedonfactorscurrentlyknowntousandthatactualeventsorresultscoulddiffermaterially.Forimportantfactorsthatmaycauseactualresultstodifferfromthose

containedinourforward-lookingstatements,pleasereviewourfilingswiththeSEC.Theforward-lookingstatementsmadeinthethispresentationarebeingmadeasofthetimeanddateofitslivepresentation.Ifreviewedafteritslivepresentation,thispresentationmaynotcontaincurrentoraccurateinformation.Wedonotassumeanyobligationtoupdateanyforwardlookingstatementswemaymake.Inaddition,anyinformationaboutourroadmapoutlinesourgeneralproductdirectionandissubjecttochangeatanytimewithoutnotice.Itisforinformationalpurposesonlyandshallnot,beincorporatedintoanycontractorothercommitment.Splunkundertakesnoobligationeithertodevelopthefeaturesor

functionalitydescribedortoincludeanysuchfeatureorfunctionalityinafuturerelease.

Indexerclustertopology

3

MasterSearchhead

FORWARDERS

Indexer Indexer IndexerForReplication

Forgenerationinfo

ForSearch

Forwardingdatatoindexers

Master-slavecommunication

Forreplication

Searchhead-idx

Whyindexerclustering

• Dataavailability:Yoursystemcantoleratedownedindexerswithoutloosingdataoraccesstothedata

• Disasterrecovery:Withmultisiteclustering,yoursystemcantoleratethefailureofanentiredatacenter

• Searchaffinity:Withmultisiteclustering,Searchheadscanaccessthedatathroughtheirlocalsitestherebyimprovingsearchperformancebyloweringnetworklatency

• Otheradvantages:uniformconfigurationacrossindexers,easeofmanagement&monitoringoftheindexers

4

Partsofthecluster• ClusterMaster• Managestheclusteractivities• Maintainsanin-memorystateofallthepeers&theircorrespondingbuckets,configs• Orchestratesremedialactivitiesduringpeerfailures• Tellssearchheadswheretosearch

• ClusterPeer(Indexer)• Receiveandindexincomingdata(typicallyfromforwarders)• Replicatedatatootherpeersfordataavailability• Respondtotheincomingsearchesbyprovidingsearchresults• Updateclustermasteronanystatechange(peer,buckets,configsetc.)

• Searchhead• Runs&coordinatessearches&aggregatesthesearchresultscomingfromindexers• Periodicallyinteractswithclustermasterforgenerationupdates

5

Communicationamongstmembers

6

Clustermaster&peerscommunicateoverRESTendpoints.FewExamples:• Peers->Master:

• /services/cluster/master/peers• Addpeertocluster• Heartbeattomaster

• /services/cluster/master/buckets• Notifymasteronbucketcreation&removal• Notifymasteronbucketstatechanges

• Master->Peers:• /services/cluster/slave/buckets

• Changeprimaries• Becomesearchable/unsearchable

• Searchhead->Master:• cluster/master/generation- Togetthelatestgenerationinformation

event=addPeer

• Peerjoinstheclusterbyexecutinganeventcalled‘addPeer’whichisaRESTcalltoCM(services/cluster/master/peers)

• Thishappensonpeerstartup.• OnAddPeerrequest,peerreportsitsentirestatetoclustermaster.• reportsallitsbucketsandcorrespondingstates• active_bundle_id,latest_bundle_id,mgmt_port,GUID,

replication_port• add_type=Initial-Add|ReAdd

• Masterstoresentirepeer’sstateinitsmemory

7

event=addPeer

• Slavelogs:08-02-201615:54:06.098-0700INFOCMSlave- event=addPeerstatus=successrequest:AddPeerRequest:{}

• UponsuccessfuladdPeer,masteralsologstoitssplunkd.log• 08-02-201615:54:06.094-0700INFOCMMaster- event=addPeer

guid=F1B6E8F0-002A-4947-83CA-0A5BC56E0A53peer_name=slave1AddPeerRequest:{}bucket_count=4

• OnaddPeersuccess,mastercommitsanewgeneration.• CMMaster- committinggen=1numpeers=1requesterReason=addPeerSuccess

guid=F1B6E8F0-002A-4947-83CA-0A5BC56E0A53lastCompleteGenId=0• Whenenoughreplication_factor#ofpeersjointhecluster,clustertransitionsintoindexingreadystate.

8

Heartbeats

Heartbeatingisawayclustermaster&peertelleachotherthattheyareupandrunningHeartbeathappensoverRESTendpoint(cluster/master/peers)Oncepeerregisterstomaster,itsendsoutheartbeatrequesttomasteronceineveryheartbeat_periodseconds(defaultsto1)MasterrespondsbacktotheheartbeatrequestindicatingitsupMasterandpeerexchangesomebasicinformation(likebundleId’s,peerstatesetc.)overtheheartbeats.

9

HeartbeatsMorethe#ofpeers,moretheheartbeatrequestsmasterreceivesandrespondtoForrelativelylargeclusters(with>50peersor200k+buckets),itsrecommendedtoadjustheartbeat_period valueto5-30.Mastermarksapeeras“Down”ifithasn’treceivedheartbeatforheartbeat_timeout period(defaultsto60seconds)Forrelativelylargeclusters,itsrecommendedtoadjustthisvalueto20x-60xofheartbeat_periodFYI:Itsrecommendedtoalsoadjustrestart_timeout asthepeerload(likebucket/summary/jobcount)goesup

10


````Configmanagementinthecluster

Bundle isbasicallyasetofupdatedconfigurationfiles(mostlyindexes.conf,props.conf,transforms.confetc)spreadoverdifferentappsdistributedtoclusterpeersfromclustermasterItsjustthecontentunder$SPLUNK_HOME/etc/master_appsInordertopushanewbundle,updateyourmaster_appscontent&run ‘splunkapplycluster-bundle[--skip-validation]’

Clusterbundles

Bundlepushisamultistepprocess• Creation• Happensatclustermaster• Involvescreatingthebundletarball&calculatingthechecksum• Masterdoesminimalconfigvalidationwhilecreatingthebundle• Masterupdatesitslatest_bundle_id tothenewbundlechecksum

• Validation• Happensattheclusterpeers• Peersdetectnewlatest_bundle_id frommaster&performsvalidation• Validationinvolvesdownloadingthebundle&actuallyvalidatingtheconfigs• Peerreportstheoutcomeofthevalidationtoclustermaster• Masterrevertsitslatest_bundle_id tooldbundleifanypeerreportserror

Clusterbundles

• Reload(or)Restart• Dependingonthecontentsofthebundle,clusterpeersdetermineiftheycanacceptthenewbundlewithoutarestart(byjustreloading)• Peerreportsthatbundleneedsrestart,CMthenissuesrolling-restartofclusterpeersforthenewbundletotakeintoeffect.

FYI:Itsnotrecommendedtochangeclusterpeerconfigurations(likeindexes,props,transformsetc.)locallyatthepeers.Alltheconfigsshouldcomefromclustermaster.Thisguaranteesuniformityoftheconfigurationamongclustermembers.

Clusterbundles


```` BUCKETS

Buckets arecreatedontheindexer(clusterpeer).Flowofbucketcreation:• Indexerreceivesraw-dataandtransformsthemintoevents• Groupstheeventsintoabucket&generatesindexforeachkeyword• Groupsbucketsintoalogical/physicalpartitioncalledindex• Typicaldataflowhierarchy:

16

Rawdata Events Slice Bucket

Index

brokeninto aregroupedinto arewrittento

aregroupedas

B1

B2

Bn

Buckets

Disk

Buckets

BucketisusuallyaunitofdatatheclusterisawareofFordataavailability,eachindexerreplicatesitsbucketsReplicationisoftwotypes:– Streamingreplication(forhotbuckets)– Non-streamingreplication(forwarm|coldbuckets)Bucketscanbesearchable orunsearchableAmongmultiplesearchablecopies,masterpicksonecopyas”primary”Peersonlyservedatafromprimary bucketstothesearchClusterpeernotifiesclustermasteruponeverystatechangeofitsbucket(s)sothatmasterstaysuptodate

17

Bucket

Rawdata

Searchfiles

Buckets

• Morebucketsmeansmorework• Sincebucketistheunitofthedatathatclusterhandles,Mostofthework/communication

intheclusterisrelatedtobuckets• Someexamplesofbucketrelatedwork:• Bucketcreation• Bucketstatechanges• Hot->warm,Warm->cold,Cold->frozen• Searchable->unsearchable,Unsearchable->searchable• Changingprimarymask(needsgenerationcommit)

• Buckettruncation• Bucketdeletion• Handlingreplications• Handlingsuccess|failures|errorsofvariousbuckettransitions&transactions

18

Buckets

Reduceddiskspaceforagedbuckets

Searchable bucketsoccupymorediskspaceduethesubstantialstoragerequirementsoftsidx/index filesInfrequentlysearchedold/agedsearchablebucketssizecanbegreatlyreducedwithtsidxreductionatthecostofsignificantsearchperformanceReducedtsidxfilesareone-thirdtotwo-thirdsmallerthantheoriginalonesEachindexerreducesitssearchablecopiesonitsownBydefaulttsidxreductionisdisabled(enableTsidxReduction=false)NOTE:tstats&typeaheadcommandswon’tworkonreducedbuckets

19


````Masterservice&fixups

• Clustermasterexecutesitsservice() callonceineveryfewseconds.• Masterschedulesallitspendingworkinthisservicecall.• Workinvolves:• Respondingtonodefailures(or)statetransitions• Runningfixup jobs(tomoveprimaries&meetfactors)

• Morethe#ofpeers&#ofbuckets,moretheworktodointheservicecall• Spikeintheservice()durationduringnodefailureifpeerhaslotofbuckets• Theintervalbetweentwosuccessiveservicecallscanbeconfiguredusingconfig“service_interval”

• Thenewdefaultvalueofservice_interval =0,whichmeansautomode

21

CMservice

• Inautomode,nextservice callisscheduledbasedondurationofthecurrentservicecall(intervaliscappedbymax_auto_service_interval)

• Alternatively,youcanmanuallytuneservice_interval astheclustergrowsinsize(alongwithheartbeat&restarttimeouts)

22

CMservice

Fixups• CMiteratesthroughlistofbucketsinitsfixup listattemptingtofixthem• Itinvolvesre-assigningprimaries,creatingreplicationcopies,makingbucketssearchable,rolling

buckets,freezingbucketsetc.• Assumingsf>1,primaryfixupsareexpectedtofinishfasterwithoutdelay• cluster/master/fixup endpointdisplaysbucketsinthefixup listby’level’(level=replication_factor,

search_factoretc.)• Itsexpectedforthemastertotakesometimetofixrf/sfiftherearelotofbucketsinfixup&thiscan

becarefullycontrolledbytuningmax_peer_rep_load(5) &max_peer_build_load(2)• Fixupsupportsa’filter’optionwhichallowsfilteringbucketsbasedonsomecondition

• Forexample/services/cluster/master/fixup?level=replication_factor&filter=minutes_in_fixup>100 listsbucketsstuckinfixupformorethan100minutes– Somethingwrongwiththisbucket?

FYI:CMdoesnotperformrep&searchfixupsinmaintenancemode,thiscanbehelpfultoavoidunnecessaryreplicationsduringplanneddowntimeofpeer(s)

23

UIactionsonbucketsstuckinfixup

24

Note:Becarefulwith’Deletecopy’especiallyifthereisonlyonecopy


````Clusterconfig/info

• services/cluster/config onmaster&peerslistsclusteringconfiguration

26

27

services/cluster/{master|slave}/infoDisplaysnodeconfiguration


````Debugging&logs

Index=_internal

• _internal indexisthesourceforalltheactivityofsplunkd• Fewlogfilestolookat(or)correlate• source=*splunkd.log* :togetanoverviewofwhatsplunkdisdoing• source=*splunkd_access.log*:toseeallincomingRESTcalls&responsecodes• Source=*metrics.log*:toseemetricsabouthowsplunkisperforming(differentthroughputs,queuesizes,responsetimes,jobscountetc.)

29

Clusteringrelatedlogs• LookforWARN/ERROR’s infollowingclusteringcomponentstogetanoverviewof

whatwentwrongwhenthingsgounexpected• Fewcomponentsatclustermaster:• CMMaster– handlesgeneralclustermasterfunctionality• CMPeer– handlesaparticularslave/peerspecificwork• CMBundleMgr– handlesclusterbundlerelatedfunctionality• CMRepJob– handlesanyreplicationrelatedjobs/functionality• CMBucket– representsabucket

• FewComponentsatclusterpeer:• CMSlave– handlesallthegeneralslave/peerfunctionality• CMBundleMgr– handlesslavebundlerelatedfunctionality• BucketReplicator(sendside),S2SFileReceiver(receiveside)– Replicatingbuckets

30

Logsrelatedtobuckets

• Searchbybid(index~0~1108~10BBFD2B-BDF8-411B-B574-FEAF37D6F486)helpsunderstand/tracemoreaboutwhatwentwrongwithaparticularbucket

• Mostoftheinternallogsusuallygetsrotatedfastintheproductionclustersso‘splunkdiag’mightnothaveany/alltheinformationrelatedtoaparticularbadbucket

• Exportingsearchresultsonabucketid(likeindex=_internal{source=*splunkd.log*}BUCKET_ID)helpsusunderstandmoreaboutwhatwentwrongwithaparticularbucket

31


````Recentenhancements

• Scalingmaster&peerstobeabletohandlelargerbucketvolumes• Batchingjobs,reducingrestarts,optimize/eliminateexpensiveoperations,reducingdiskscans

• Betterfailurerecoverywhenthingsgowrong• Autorecoverfromstateinconsistenciesb/wmaster&peers,Provideoptionstotakeactionsonanyanomalousbucketstates

• DataRebalancingforbalanceddata&searchloaddistribution• Summaryreplicationtoreduceio&cpuspikesduetosummary

regenerationonnodefailures• Tsidxreduction forreducestoragecosts

33

Recentenhancements

THANKYOU

indexer clustering basic, internals - splunkconf · [email protected] sr. software engineer splunk...

Documents